Predicting the Function of Proteins for Newly Sequenced Organisms

Algorithms and engineering for gene function annotation for Joint Genome Institute genomes

Steven E. Brenner
Lawrence Berkeley National Laboratory

Characterizing, understanding, and modifying many terrestrial environments requires a detailed understanding of microbial organisms and communities. Our knowledge of this microbial world has blossomed, as genome and metagenome projects reveal millions of microbial genes. However, biological interpretation of the encoded proteins requires understanding of their function. One of the most accurate and elegant approaches for predicting proteins’ functions uses a reconciled phylogenetic tree to integrate molecular function information for all proteins in a family. This approach, known as phylogenomics, is dependent upon painstaking manual analyses by domain experts, and it has therefore been largely restricted to small studies.

This project will focus on a previously constructed prototype for automated phylogenomic protein function prediction, called SIFTER (Statistical Inference of Function Through Evolutionary Relationships). Specifically, the team will refine SIFTER algorithms to make phylogenomics scalable to genomic and metagenomic volumes of sequence; enhance the prediction reliability of our method and rigorously assess it using well-studied families; broaden SIFTER’s applicability to a wider range of proteins, and incorporate a wide variety of functional evidence; and collaborate with the Department of Energy’s Joint Genome Institute to integrate SIFTER into their automated microbial protein annotation system.

The Department of Energy hopes to harness the potential of microbial organisms and communities for such mission needs as understanding the cycling of carbon and nutrients in the soil, the remediating contaminated sites, developing smart sensors, and generating biofuels such as hydrogen and ethanol. As genomic technologies mature, sequence data is accruing at a fierce rate. These data herald an unprecedented insight into microbes, offering profound opportunities for understanding the environment and developing methods of bioremediation. Of those microbes with implications for energy and the environment, roughly 200 have had their genomes sequenced. Massive environmental metagenomic sequencing efforts have offered us more potential peptides to study than all prior studies combined. Unleashing this information remains a major challenge. This project will improve computational methods for reliably detecting proteins’ functions of interest to the Department.

Science Application: Computational Biology

Project Title: Robust and Precise Gene Function Predictions on a Genomic Scale

Principal Investigator: Steven E. Brenner
Affiliation: Lawrence Berkeley National Laboratory

Participating Institutions and Co-Investigators:
Lawrence Berkeley National Laboratory - Steven E. Brenner (PI)
University of California at Berkeley - Michael I. Jordan

Funding Partners: Office of ScienceOffice of Advanced Scientific Computing Research, and Office of Biological and Environmental Research

Budget and Duration: Approximately $0.3 million per year for three years 1

Other SciDAC life sciences efforts



1Subject to acceptable progress review and the availability of appropriated funds

 


Home  |  ASCR  |  Contact Us  |  DOE disclaimer