The genome revolution has produced an abundance of protein sequence data. Traditional homology-based computer methods make it possible to establish evolutionary relationships between large numbers of these proteins. Yet among any set of new protein sequences, say from the complete genome sequence of a new organism, a significant fraction of the proteins cannot be assigned fuctions by traditional methods. A new sequence may have no recognizable homologs in other organisms, or it may have recognizable homologs, the cellular functions of which are yet unknown. The critical need to make some kind of functional inferences for the vast numbers of proteins that could not be functionally annoted by traditional homology methods led in 1999, and in the years that followed, to new ideas for inferring ‘functional linkages’ between proteins not related to each other by homology. These ‘non-homology’ or ‘genomic context’ methods included the ‘Phylogenetic Profile’ method and the ‘Rosetta-Stone’ method (both pioneered principally by Edward Marcotte and Matteo Pellegrini when they were postdocs with Eisenberg and Yeates), and others. Subsequent work has aimed to extend those ideas. One recent extension of Phylogenetic Profiles (developed by Peter Bowers and Shawn Cokus) involves an application of logic analysis to uncover proteins whose presence vs. absence across organisms is related to the presence or absence of two other proteins, taken in logical combination. These kinds of higher order relationships are expected to be abundant in the cell, but are not detected by the original Phylogenetic Profile method, which looks for direct similarity between the profiles of just two proteins at a time.
Our computational genomics work has touched on many other subjects as well: disulfide bonding in thermophiles, repetitive protein sequences, genomic encoding of unusual amino acids such as selenocysteine and pyrollysine, detection of protein targeting sequences, and the function of bacterial microcompartments.
| Jorda J, Yeates TO
Widespread disulfide bonding in proteins from thermophilic archaea.
Archaea. 2011. 2011:409156. 2011 PMID: 21941460
| Fan C, Cheng S, Liu Y, Escobar CM, Crowley CS, Jefferson RE, Yeates TO, Bobik TA
Short N-terminal sequences package proteins into bacterial microcompartments.
Proc. Natl. Acad. Sci. U.S.A.. Apr 2010. 107(16):7509-14. 2010 PMID: 20308536
| Beeby M, Bobik TA, Yeates TO
Exploiting genomic patterns to discover new supramolecular protein assemblies.
Protein Sci.. Jan 2009. 18(1):69-79. 2009 PMID: 19177352
| Sprinzak E, Cokus SJ, Yeates TO, Eisenberg D, Pellegrini M
Detecting coordinated regulation of multi-protein complexes using logic analysis of gene expression.
BMC Syst Biol. 2009. 3:115. 2009 PMID: 20003439
| Bowers PM, O’Connor BD, Cokus SJ, Sprinzak E, Yeates TO, Eisenberg D
Utilizing logical relationships in genomic data to decipher cellular processes.
FEBS J.. Oct 2005. 272(20):5110-8. 2005 PMID: 16218945
| Beeby M, O’Connor BD, Ryttersgaard C, Boutz DR, Perry LJ, Yeates TO
The genomics of disulfide bonding and protein stabilization in thermophiles.
PLoS Biol.. Sep 2005. 3(9):e309. 2005 PMID: 16111437
| Chaudhuri BN, Yeates TO
A computational method to predict genetically encoded rare amino acids in proteins.
Genome Biol.. 2005. 6(9):R79. 2005 PMID: 16168086
| Bowers PM, Cokus SJ, Eisenberg D, Yeates TO
Use of logic relationships to decipher protein network organization.
Science. Dec 2004. 306(5705):2246-9. 2004 PMID: 15618515
| O’Connor BD, Yeates TO
GDAP: a web tool for genome-wide protein disulfide bond prediction.
Nucleic Acids Res.. Jul 2004. 32(Web Server issue):W360-4. 2004 PMID: 15215411
| Bowers PM, Pellegrini M, Thompson MJ, Fierro J, Yeates TO, Eisenberg D
Prolinks: a database of protein functional linkages derived from coevolution.
Genome Biol.. 2004. 5(5):R35. 2004 PMID: 15128449
| Strong M, Graeber TG, Beeby M, Pellegrini M, Thompson MJ, Yeates TO, Eisenberg D
Visualization and interpretation of protein networks in Mycobacterium tuberculosis based on hierarchical clustering of genome-wide functional linkage maps.
Nucleic Acids Res.. Dec 2003. 31(24):7099-109. 2003 PMID: 14654685
| Mallick P, Boutz DR, Eisenberg D, Yeates TO
Genomic evidence that the intracellular proteins of archaeal microbes contain disulfide bonds.
Proc. Natl. Acad. Sci. U.S.A.. Jul 2002. 99(15):9679-84. 2002 PMID: 12107280
| Pellegrini M, Yeates TO
Searching for frameshift evolutionary relationships between protein sequence families.
Proteins. Nov 1999. 37(2):278-83. 1999 PMID: 10584072
| Marcotte EM, Pellegrini M, Thompson MJ, Yeates TO, Eisenberg D
A combined algorithm for genome-wide prediction of protein function.
Nature. Nov 1999. 402(6757):83-6. 1999 PMID: 10573421
| Marcotte EM, Pellegrini M, Yeates TO, Eisenberg D
A census of protein repeats.
J. Mol. Biol.. Oct 1999. 293(1):151-60. 1999 PMID: 10512723
| Marcotte EM, Pellegrini M, Ng HL, Rice DW, Yeates TO, Eisenberg D
Detecting protein function and protein-protein interactions from genome sequences.
Science. Jul 1999. 285(5428):751-3. 1999 PMID: 10427000
| Pellegrini M, Marcotte EM, Yeates TO
A fast algorithm for genome-wide analysis of proteins with repeated sequences.
Proteins. Jun 1999. 35(4):440-6. 1999 PMID: 10382671
| Pellegrini M, Marcotte EM, Thompson MJ, Eisenberg D, Yeates TO
Assigning protein functions by comparative genome analysis: protein phylogenetic profiles.
Proc. Natl. Acad. Sci. U.S.A.. Apr 1999. 96(8):4285-8. 1999 PMID: 10200254