Core 1&2: Genome-Wide Mapping of Protein Interactions
We have combined Cores 1 and 2 into a single integrated
core that covers most of the methods devel-opment and
software engineering aspects of our proposal.
The overall goal is an integrated software system for
genome-wide mapping of the interactions of protein
receptors with drug-like and protein ligands. The four
groups of specific aims are:
Aim I: Create a software pipeline for automated, large-scale
protein structure modeling and docking of small ligands.
The input is a set of protein sequences, a database
of protein structures, and lists of ligands. The output is annotated
3D models of ligand-complexes.
Aim II: Create a software pipeline for automated,
large-scale protein-protein docking. The input is a
set of one or more protein sequences, a database of
protein structures, and a set of protein-protein inter-actions.
The output is models of protein complexes.
Aim III: Create technologies and environments to
facilitate development, testing, and application of
the pipeline including algorithms, databases, interfaces,
software backplane, information navigation, and hardware.
Aim IV: Improve the pipeline by computational appli-cations
to hard problems in biology, including funcfunctional
annotation of proteins based on their ligand binding
profiles, annotation of the functions of all Protein
Structure Initiative targets and their homologs,
and predictions of the functional consequences of point
mutations in proteins.

The pipeline requires (i) creation of protein structure models, typically using
comparative modeling, (ii) physics-based refinement of these models, (iii)
prediction of binding sites, (iv) docking ligands to the sites, (v) analysis
of predicted ligands and complexes for identifying substrates, pathways,
and leads for drug discovery, and (vi) databases of proteins, ligands, and
their interactions.
We will define communication protocols and interfaces between the software
modules to facilitate passing of information from one module to another.
The modules will be assembled into a pipeline capable of converting the
input databases of sequences, structures, ligand lists, and protein-protein
interactions into the structure models of proteins, protein-ligand complexes,
and protein-protein complexes.
The output results from all stages will
be stored in a central database. A sophisticated user interface based
on the best practices from computer science will allow us to analyze
and present the results. The database will also be linked to other
major biological databases and many of these links will be bi-directional.
To facilitate the improvement of the pipeline and to inform the users about
its limitations and strengths, we will develop tools for testing all aspects
of the pipeline.
We will also establish a large computer cluster and a software environment
that are essential for the pipeline development and application; we will
optimize a cluster of processors and its software environment for automated
modeling and docking calculations for millions of proteins and potential
ligands.
We will apply the pipeline to several computational
problems to illustrate the kinds of tasks that are possible, for
the first time, because of the automation of the entire modeling process.
The first of these is the development of general methods for functional
annotation based on ligand binding profiles. The
second is the application of these methods to the functional annotation
of protein structures determined in the Protein Structure Initiative and
their homologs. The third application is the prediction of functional changes
caused by non-synonymous single nucleotide polymorphisms in the protein
coding regions.
Finally, we will interact with other Cores of the proposal, in particular
with the Driving Biological Projects, Service, Training, and Dissemination
Cores as described below.
The Driving Biological Projects (DBP) have been selected among our existing
collaborations by three criteria: First, the potential of the project to
benefit from the pipeline and result in a significant scientific advance.
Second, the potential of the project to test the pipeline, provide experimental
data for its further improvement, and guide the development of the database
and graphical user interface for dissemination of results to other experimental
groups. And third, we selected projects to span a wide range of applications
exercising all modules of the pipeline of interest to the widest range of
chemists, biologists, and biomedical scientists. Together, the DBPs illustrate
our goal of developing a comprehensive description of the interactions between
proteins and their ligands by combining experiment and computation.
The investigators involved in this first
round of DBPs are located primarily at UCSF. The close proximity of the
computational and experimental researchers will help to foster the close
collaborations that we envision. The
breadth and depth of biomedical research at UCSF, and the collegial, collaborative
atmosphere are tremendous assets to the Center, and will help to ensure
that the pipeline development is guided by the needs of diverse biomedical
researchers. Future collaborations, to be funded through the DBP mechanism
and other sources, will be recruited broadly from other universities, as
described in Core 7.
Funding for preliminary results for these DBPs has been provided by NIH
institutes NIAID, NIGMS, NCI, and several other sources.
Kip Guy, Jim McKerrow, Joe DeRisi, Matt Jacobson, Tack Kuntz, Andrej Sali,
Brian Shoichet
This research is aimed at developing new chemotherapy for a group of parasitic
diseases including malaria, leishmaniasis, trypanosomiasis, and schistosomiasis.
Even though they affect hundreds of millions of people worldwide, there
has been little interest by the pharmaceutical industry in developing chemotherapy
for these diseases because they affect primarily poor people in poor regions
of the world. As a scientific matter, the development of new therapeutics
is hindered by the small number of available protein structures from each
genome.
Virtual screening against comparative protein
structure models on a genome-wide scale, although ambitious, has the potential
to greatly increase the number of lead compounds and protein targets for
drug discovery efforts. The computational work will be integrated
with existing experimental drug discovery efforts that exploit functional
genomic techniques based on DNA microarrays and modern parallel synthetic
methods. Existing
NIH support for this Driving Biological Project is provided by
NIAID grants AI053862 (PI, J. DeRisi) and AI35707, a Tropical Disease
Research Unit (PI, J. McKerrow).
David Agard, Tanja Kortemme, Wendell Lim, Christopher Voigt, David Baker,
Andrej Sali, Tack Kuntz
Genome-scale experiments suggest that protein-mediated
interactions in biological systems are organized into larger macromolecular
assemblies and complex cellular signaling networks. Our broad objective
for this DBP is to provide functional insights about the assembly
of proteins into these larger biological units. We will develop and test
combined computational and experimental methods that exploit the protein-protein
interactions generated by the pipeline described in Core 1&2.
First, we aim to calculate accurate high-resolution structural models of
two multi-component protein complexes involved in important biological processes,
the Tub4 assembly, which mediates microtubule nucleation in yeast, and the
Hsp90 molecular chaperone complex. Second, we aim to calculate structural
maps of interaction networks formed by two families of protein-protein interaction
modules, the SH3 (Src-homology 3) and PDZ ( P SD-95/ d isc-large/ z onula
occludens 1 homology) domains, and to test the utility of the maps to dissect
network function. Third, we aim to develop a predictive model of the organization
and function of the Bacillus subtilis stress response pathway.
These projects will generate key experimental
data to verify and improve all modules of the protein interaction pipeline.
Kathy Giacomini, Deanna Kroetz, Jasper Rine, Andrej Sali
With the sequencing of the human genome, numerous non-synonymous coding
single nucleotide polymorphisms (cSNPs) have been identified. A key challenge
is to understand and predict their effect on protein structure and function.
We will iterate through experiment and computation to develop computational
methods for accurate prediction of the structure and function of natural
variants of proteins in drug response and human disease, focusing on two
key transporters, ABCB1 (Multidrug Resistance Protein or P-glycoprotein)
and SLC22A1 (Organic Cation Transporter 1).
The experimental aims are to identify and functionally characterize protein
altering variants of ABCB1 and SLC22A1 in a large sample of several hundred
individuals, as well as apply random mutagenesis to create synthetic variants
followed by their functional characterization in yeast.
The computational aims are to adapt the
methods developed in Core 1&2
to predict the functional consequences of point mutations in membrane
transporters, as well as to apply, validate, and improve these methods
based on the experimentally generated data in this DBP.
The goal of Core 4 is to dramatically increase access to protein structure
modeling and docking and to bring these computational tools to the general
biomedical community. To achieve this goal, the Center will implement a
comprehensive interface that will allow facile access to all of the modules
of the pipeline. This central interface will encompass links to ancillary
interfaces providing specialized tools and data-viewing options for delivering
enriched perspectives to a user. The supporting infrastructure is designed
to enable two major types of access. First, browsing capabilities that will
enable a user to access all of our data in a timely and convenient manner;
these capabilities will be organized to provide results from all of the
different pipeline modules. Second, opportunities for users to analyze their
own data using our pipeline modules and tools. The specific aims of Core
4 are:
1. To provide an interface to our central database, CCPR Central, that
enables users to search and browse data generated by the Center.
2. To provide an interface to the software pipeline to enable outside scientists
to analyze their own data using our tools.
3. To provide an infrastructure that will enable outside scientists to
integrate and test their own software and tools in the context of the pipeline.
Training and user support will be provided to facilitate broad-based user
access that accommodates experimental biologists who are not trained in
sophisticated computational applications, as well as methods developers
and sophisticated users who wish to access data on a large-scale.
The CCPR aims to provide urgently needed training to enable biomedical
researchers to harness genomics and proteomics scale biological information.
The Center will enhance training of graduate students. New courses in using
computational biology tools will provide hands-on practical training and
studies of principle. A computational research seminar series will be expanded.
The later years of training for five competitively selected students in
the research groups of the CCPR investigators will be supported.
The graduate student training component of the Center will find a natural
home in the UCSF Program in Quantitative Biology (PQB) headed by David Agard
and Ken Dill, who are also investigators on this proposal. The PQB was founded
in 2001 as an umbrella program for the graduate programs in Biophysics,
Chemistry and Chemical Biology, Biological and Medical Informatics, and
Neuroscience. Its goal is to recruit students who are prepared in physics,
mathematics, computer science, and engineering to be trained alongside UCSF's
traditional complement of students having excellent backgrounds in physical
chemistry, biology, and biochemistry.
A hallmark of UCSF training is its collaborative environment. Students
who are developing computational methods for protein structure/function
analysis and prediction can be expected to have strong experimental components
to test the tools they develop. The NIH-sponsored Center would give an overarching
structure and goal to research training in biological computation in a culture
that already embraces collaboration.
The Center will also enhance the training of the postdoctoral fellows associated
with the Center. Postdocs will be offered more opportunities to teach and
study in classes, workshops, seminar series, and to participate fully in
the annual programmatic research retreats. They will be offered the option
of dual mentorship to provide more formal mentoring in the biological sciences.
We are pleased that all the UCSF investigators named in the proposal are
presently mentoring students and postdocs in their laboratories. Indeed,
their combined training records number in the hundreds of students, with
many of their trainees now in academic tenure-track or senior scientist
positions in industrial settings.
The faculty of the Center are committed to the goal of increasing the numbers
of underrepresented minorities and women in biocomputing sciences. This
proposal seeks funding to support two undergraduate summer research positions
in the proposed center through the University's established Summer Research
Training Program.
Hardware obtained under this proposal and our partnerships with Intel and
IBM will provide students with unprecedented access to new computing technologies.
And finally, we will conduct two annual workshops to interface the Center
with the external community, to facilitate training and technology transfer.
The Center will provide the resources for workshops at our new campus in Mission
Bay. These meetings will take place annually, and may be coordinated with
large scientific meetings held here in San Francisco.
The goals of the Center for Computational
Proteomics Research entail maximizing the dissemination of information,
knowledge, new software tools and techniques, and new discoveries to
the broadest possible range of the biomedical research and education
community. The CCPR will pursue wide publication and distribution in
peer-reviewed journals and at conferences, through this web site, and
the distribution of our new software tools.
We will also encourage methods developers
outside of our Center to improve our software at the source code level,
and we will incorporate these improvements into our codes and redistribute
them so that other researchers can benefit. We
will create a portal on our web site to our central database for
the purposes of browsing, searching, and downloading all of the scientific
data available at our Center. This portal will also provide the
capability for researchers outside of our Center to input their own data
to our data processing pipeline and to obtain the results from the calculations
performed on this data. Lastly, we will promote technology transfer and
commercialization of our software tools through the assistance of the
UCSF Office of Technology Management.
Core 7:
Management and Oversight