qb3

 

The six Center Cores

 

Core7 Core12


Core 1&2: Genome-Wide Mapping of Protein Interactions


We have combined Cores 1 and 2 into a single integrated core that covers most of the methods devel-opment and software engineering aspects of our proposal.
The overall goal is an integrated software system for genome-wide mapping of the interactions of protein receptors with drug-like and protein ligands. The four groups of specific aims are:


Aim I: Create a software pipeline for automated, large-scale protein structure modeling and docking of small ligands. The input is a set of protein sequences, a database of protein structures, and lists of ligands. The output is annotated 3D models of ligand-complexes.
Aim II: Create a software pipeline for automated, large-scale protein-protein docking. The input is a set of one or more protein sequences, a database of protein structures, and a set of protein-protein inter-actions. The output is models of protein complexes.
Aim III: Create technologies and environments to facilitate development, testing, and application of the pipeline including algorithms, databases, interfaces, software backplane, information navigation, and hardware.
Aim IV: Improve the pipeline by computational appli-cations to hard problems in biology, including funcfunctional annotation of proteins based on their ligand binding profiles, annotation of the functions of all Protein Structure Initiative targets and their homologs, and predictions of the functional consequences of point mutations in proteins.


The pipeline requires (i) creation of protein structure models, typically using comparative modeling, (ii) physics-based refinement of these models, (iii) prediction of binding sites, (iv) docking ligands to the sites, (v) analysis of predicted ligands and complexes for identifying substrates, pathways, and leads for drug discovery, and (vi) databases of proteins, ligands, and their interactions.

We will define communication protocols and interfaces between the software modules to facilitate passing of information from one module to another. The modules will be assembled into a pipeline capable of converting the input databases of sequences, structures, ligand lists, and protein-protein interactions into the structure models of proteins, protein-ligand complexes, and protein-protein complexes.

The output results from all stages will be stored in a central database. A sophisticated user interface based on the best practices from computer science will allow us to analyze and present the results. The database will also be linked to other major biological databases and many of these links will be bi-directional.

To facilitate the improvement of the pipeline and to inform the users about its limitations and strengths, we will develop tools for testing all aspects of the pipeline.

We will also establish a large computer cluster and a software environment that are essential for the pipeline development and application; we will optimize a cluster of processors and its software environment for automated modeling and docking calculations for millions of proteins and potential ligands.

We will apply the pipeline to several computational problems to illustrate the kinds of tasks that are possible, for the first time, because of the automation of the entire modeling process. The first of these is the development of general methods for functional annotation based on ligand binding profiles.   The second is the application of these methods to the functional annotation of protein structures determined in the Protein Structure Initiative and their homologs. The third application is the prediction of functional changes caused by non-synonymous single nucleotide polymorphisms in the protein coding regions.

Finally, we will interact with other Cores of the proposal, in particular with the Driving Biological Projects, Service, Training, and Dissemination Cores as described below.

Core 3: Driving Biological Projects

The Driving Biological Projects (DBP) have been selected among our existing collaborations by three criteria: First, the potential of the project to benefit from the pipeline and result in a significant scientific advance. Second, the potential of the project to test the pipeline, provide experimental data for its further improvement, and guide the development of the database and graphical user interface for dissemination of results to other experimental groups. And third, we selected projects to span a wide range of applications exercising all modules of the pipeline of interest to the widest range of chemists, biologists, and biomedical scientists. Together, the DBPs illustrate our goal of developing a comprehensive description of the interactions between proteins and their ligands by combining experiment and computation.

The investigators involved in this first round of DBPs are located primarily at UCSF. The close proximity of the computational and experimental researchers will help to foster the close collaborations that we envision.   The breadth and depth of biomedical research at UCSF, and the collegial, collaborative atmosphere are tremendous assets to the Center, and will help to ensure that the pipeline development is guided by the needs of diverse biomedical researchers. Future collaborations, to be funded through the DBP mechanism and other sources, will be recruited broadly from other universities, as described in Core 7.   

Funding for preliminary results for these DBPs has been provided by NIH institutes NIAID, NIGMS, NCI, and several other sources.

DBP 1: Development of Novel Antiparasitic Chemotherapeutics

Kip Guy, Jim McKerrow, Joe DeRisi, Matt Jacobson, Tack Kuntz, Andrej Sali, Brian Shoichet

This research is aimed at developing new chemotherapy for a group of parasitic diseases including malaria, leishmaniasis, trypanosomiasis, and schistosomiasis. Even though they affect hundreds of millions of people worldwide, there has been little interest by the pharmaceutical industry in developing chemotherapy for these diseases because they affect primarily poor people in poor regions of the world. As a scientific matter, the development of new therapeutics is hindered by the small number of available protein structures from each genome.

Virtual screening against comparative protein structure models on a genome-wide scale, although ambitious, has the potential to greatly increase the number of lead compounds and protein targets for drug discovery efforts. The computational work will be integrated with existing experimental drug discovery efforts that exploit functional genomic techniques based on DNA microarrays and modern parallel synthetic methods.   Existing NIH support for this Driving Biological Project is provided by NIAID grants AI053862 (PI, J. DeRisi) and AI35707, a Tropical Disease Research Unit (PI, J. McKerrow).

DBP 2: Protein-Protein Interactions: Macromolecular Complexes and Networks

David Agard, Tanja Kortemme, Wendell Lim, Christopher Voigt, David Baker, Andrej Sali, Tack Kuntz

Genome-scale experiments suggest that protein-mediated interactions in biological systems are organized into larger macromolecular assemblies and complex cellular signaling networks. Our broad objective for this DBP is to provide functional insights about the assembly of proteins into these larger biological units. We will develop and test combined computational and experimental methods that exploit the protein-protein interactions generated by the pipeline described in Core 1&2.

First, we aim to calculate accurate high-resolution structural models of two multi-component protein complexes involved in important biological processes, the Tub4 assembly, which mediates microtubule nucleation in yeast, and the Hsp90 molecular chaperone complex. Second, we aim to calculate structural maps of interaction networks formed by two families of protein-protein interaction modules, the SH3 (Src-homology 3) and PDZ ( P SD-95/ d isc-large/ z onula occludens 1 homology) domains, and to test the utility of the maps to dissect network function. Third, we aim to develop a predictive model of the organization and function of the Bacillus subtilis stress response pathway.

These projects will generate key experimental data to verify and improve all modules of the protein interaction pipeline.

DBP 3: Functional and Computational Analysis of SNPs in Drug Response Genes

Kathy Giacomini, Deanna Kroetz, Jasper Rine, Andrej Sali

With the sequencing of the human genome, numerous non-synonymous coding single nucleotide polymorphisms (cSNPs) have been identified. A key challenge is to understand and predict their effect on protein structure and function. We will iterate through experiment and computation to develop computational methods for accurate prediction of the structure and function of natural variants of proteins in drug response and human disease, focusing on two key transporters, ABCB1 (Multidrug Resistance Protein or P-glycoprotein) and SLC22A1 (Organic Cation Transporter 1).

The experimental aims are to identify and functionally characterize protein altering variants of ABCB1 and SLC22A1 in a large sample of several hundred individuals, as well as apply random mutagenesis to create synthetic variants followed by their functional characterization in yeast.

The computational aims are to adapt the methods developed in Core 1&2 to predict the functional consequences of point mutations in membrane transporters, as well as to apply, validate, and improve these methods based on the experimentally generated data in this DBP.

Core 4: Service to the Biomedical Community

The goal of Core 4 is to dramatically increase access to protein structure modeling and docking and to bring these computational tools to the general biomedical community. To achieve this goal, the Center will implement a comprehensive interface that will allow facile access to all of the modules of the pipeline. This central interface will encompass links to ancillary interfaces providing specialized tools and data-viewing options for delivering enriched perspectives to a user. The supporting infrastructure is designed to enable two major types of access. First, browsing capabilities that will enable a user to access all of our data in a timely and convenient manner; these capabilities will be organized to provide results from all of the different pipeline modules. Second, opportunities for users to analyze their own data using our pipeline modules and tools. The specific aims of Core 4 are:

1. To provide an interface to our central database, CCPR Central, that enables users to search and browse data generated by the Center.

2. To provide an interface to the software pipeline to enable outside scientists to analyze their own data using our tools.

3. To provide an infrastructure that will enable outside scientists to integrate and test their own software and tools in the context of the pipeline.

Training and user support will be provided to facilitate broad-based user access that accommodates experimental biologists who are not trained in sophisticated computational applications, as well as methods developers and sophisticated users who wish to access data on a large-scale.

Core 5: Training Center's Graduate and Postgraduate Students, and the Broad Biomedical Community

The CCPR aims to provide urgently needed training to enable biomedical researchers to harness genomics and proteomics scale biological information.

The Center will enhance training of graduate students. New courses in using computational biology tools will provide hands-on practical training and studies of principle. A computational research seminar series will be expanded. The later years of training for five competitively selected students in the research groups of the CCPR investigators will be supported.

The graduate student training component of the Center will find a natural home in the UCSF Program in Quantitative Biology (PQB) headed by David Agard and Ken Dill, who are also investigators on this proposal. The PQB was founded in 2001 as an umbrella program for the graduate programs in Biophysics, Chemistry and Chemical Biology, Biological and Medical Informatics, and Neuroscience. Its goal is to recruit students who are prepared in physics, mathematics, computer science, and engineering to be trained alongside UCSF's traditional complement of students having excellent backgrounds in physical chemistry, biology, and biochemistry.

A hallmark of UCSF training is its collaborative environment. Students who are developing computational methods for protein structure/function analysis and prediction can be expected to have strong experimental components to test the tools they develop. The NIH-sponsored Center would give an overarching structure and goal to research training in biological computation in a culture that already embraces collaboration.

The Center will also enhance the training of the postdoctoral fellows associated with the Center. Postdocs will be offered more opportunities to teach and study in classes, workshops, seminar series, and to participate fully in the annual programmatic research retreats. They will be offered the option of dual mentorship to provide more formal mentoring in the biological sciences.

We are pleased that all the UCSF investigators named in the proposal are presently mentoring students and postdocs in their laboratories. Indeed, their combined training records number in the hundreds of students, with many of their trainees now in academic tenure-track or senior scientist positions in industrial settings.

The faculty of the Center are committed to the goal of increasing the numbers of underrepresented minorities and women in biocomputing sciences. This proposal seeks funding to support two undergraduate summer research positions in the proposed center through the University's established Summer Research Training Program.

Hardware obtained under this proposal and our partnerships with Intel and IBM will provide students with unprecedented access to new computing technologies.

And finally, we will conduct two annual workshops to interface the Center with the external community, to facilitate training and technology transfer. The Center will provide the resources for workshops at our new campus in Mission Bay. These meetings will take place annually, and may be coordinated with large scientific meetings held here in San Francisco.

Core 6: Dissemination of Tools, Data, and Discoveries

The goals of the Center for Computational Proteomics Research entail maximizing the dissemination of information, knowledge, new software tools and techniques, and new discoveries to the broadest possible range of the biomedical research and education community. The CCPR will pursue wide publication and distribution in peer-reviewed journals and at conferences, through this web site, and the distribution of our new software tools.

We will also encourage methods developers outside of our Center to improve our software at the source code level, and we will incorporate these improvements into our codes and redistribute them so that other researchers can benefit.   We will create a portal on our web site to our central database for the purposes of browsing, searching, and downloading all of the scientific data available at our Center. This portal will also provide the capability for researchers outside of our Center to input their own data to our data processing pipeline and to obtain the results from the calculations performed on this data. Lastly, we will promote technology transfer and commercialization of our software tools through the assistance of the UCSF Office of Technology Management.

Core 7: Management and Oversight

Core7 Core12 Core3 Core4 Core5 Core6

 

The Management Core of the Center for Computational Proteomics Research will provide the necessary administration and oversight to ensure that the Center achieves its research, service, training, and dissemination goals within the proposed 5 to 10 year funding lifetime of the Center.
The central operating body of the Management and Oversight Core is the Executive Committee, chaired by the Principal Investigator Andrej Sali and consisting of the leaders of each of the Cores, the Program Manager, and the NIH Program Officer as an ex officio member. This group will allocate funds, evaluate progress to milestones, and provide policy directions. Additional oversight and advice will be provided by Irwin D. Kuntz and the External Advisory Committee of six distinguished scientists drawn from computer science and biology, theoreticians and experimentalists, industry and academia. The external advisers will meet once a year to hear reports from the Ex-ecutive Committee about progress in the various cores and how milestones are being met, and from users regarding the operation of the facility.The Executive Committee will meet monthly. The Committee will determine issues of allocation of funds, evaluate progress against milestones, and provide policy directions. In addition, it will be re-sponsible for the selection of the new Driving Bio-logical Projects, R01 and R23 grant proposals to be associated with the Center, as well as the curriculum issues described in Core 5.
The Management Core is composed of a Core Leader, Marvin Cassman, a Project Manager, and an administrative assistant. The Project Manager, sup-ported by an administrative assistant, will be responsible for much of the day-to-day management of the Center, and also contribute to long-range planning and execution. The Management Core Leader will help insure coordination between the Cores of the Center and together with the Project Manager will work to resolve any issues. They will report to the Principal Investigator and the Executive Committee.Each Core has a well-defined management structure and is headed by a Core Leader. The operation of the Center will be assisted by the staff of The California Institute for Quantitative Biomedical Research (QB3). For example, we will be able to use the videoconferencing facilities at QB3 as well as rely on the QB3 staff and its other resources to organize our workshops.
The management of the center will rely on a web site that will be used as a management tool, a communication tool for all members of the consortium, a de-pository of all relevant information, and the public face of the Center. The communications between the members of the Center are already supported by this web site (http://www.computationalproteomics.org). So far, it is used to archive meeting minutes, slide presenta-tions, papers, proposals, and other information relevant to the Center. In the future, the web site will also provide a discussion forum for the management, Center members, and external users, with the corresponding degrees of security. All of the activities of the Center will be noted and linked from the Center's home page.

 

 


Copyright 2003-2004 CCPR, webmaster