Computational Genomics / Bioinformatics

A diagram illustrating a method for predicting mechanisms of protein targeting (e.g. to bacterial microcompartments) by special N or C-terminal sequence extensions. (Adapted from Fan, et al. 2010)
A diagram illustrating a method for predicting mechanisms of protein targeting (e.g. to bacterial microcompartments) by special N or C-terminal sequence extensions. (Adapted from Fan, et al. 2010)

The genome revolution has produced an abundance of protein sequence data. Traditional homology-based computer methods make it possible to establish evolutionary relationships between large numbers of these proteins. Yet among any set of new protein sequences, say from the complete genome sequence of a new organism, a significant fraction of the proteins cannot be assigned fuctions by traditional methods. A new sequence may have no recognizable homologs in other organisms, or it may have recognizable homologs, the cellular functions of which are yet unknown. The critical need to make some kind of functional inferences for the vast numbers of proteins that could not be functionally annoted by traditional homology methods led in 1999, and in the years that followed, to new ideas for inferring ‘functional linkages’ between proteins not related to each other by homology. These ‘non-homology’ or ‘genomic context’ methods included the ‘Phylogenetic Profile’ method and the ‘Rosetta-Stone’ method (both pioneered principally by Edward Marcotte and Matteo Pellegrini when they were postdocs with Eisenberg and Yeates), and others. Subsequent work has aimed to extend those ideas. One recent extension of Phylogenetic Profiles (developed by Peter Bowers and Shawn Cokus) involves an application of logic analysis to uncover proteins whose presence vs. absence across organisms is related to the presence or absence of two other proteins, taken in logical combination. These kinds of higher order relationships are expected to be abundant in the cell, but are not detected by the original Phylogenetic Profile method, which looks for direct similarity between the profiles of just two proteins at a time.

A diagram illustrating the idea of logic analysis of phylogenetic profiles. (Adapted from Bowers, et al. 2002)
A diagram illustrating the idea of logic analysis of phylogenetic profiles. (Adapted from Bowers, et al. 2002)

Our computational genomics work has touched on many other subjects as well: disulfide bonding in thermophiles, repetitive protein sequences, genomic encoding of unusual amino acids such as selenocysteine and pyrollysine, detection of protein targeting sequences, and the function of bacterial microcompartments.



Jorda J, Yeates TO
Widespread disulfide bonding in proteins from thermophilic archaea.
Archaea. 2011. 2011:409156. 2011 PMID: 21941460
PMC3177088 10.1155/2011/409156


Fan C, Cheng S, Liu Y, Escobar CM, Crowley CS, Jefferson RE, Yeates TO, Bobik TA
Short N-terminal sequences package proteins into bacterial microcompartments.
Proc. Natl. Acad. Sci. U.S.A.. Apr 2010. 107(16):7509-14. 2010 PMID: 20308536
PMC2867708 10.1073/pnas.0913199107


Beeby M, Bobik TA, Yeates TO
Exploiting genomic patterns to discover new supramolecular protein assemblies.
Protein Sci.. Jan 2009. 18(1):69-79. 2009 PMID: 19177352
PMC2708037 10.1002/pro.1
Sprinzak E, Cokus SJ, Yeates TO, Eisenberg D, Pellegrini M
Detecting coordinated regulation of multi-protein complexes using logic analysis of gene expression.
BMC Syst Biol. 2009. 3:115. 2009 PMID: 20003439
PMC2804736 10.1186/1752-0509-3-115


Bowers PM, O’Connor BD, Cokus SJ, Sprinzak E, Yeates TO, Eisenberg D
Utilizing logical relationships in genomic data to decipher cellular processes.
FEBS J.. Oct 2005. 272(20):5110-8. 2005 PMID: 16218945
Beeby M, O’Connor BD, Ryttersgaard C, Boutz DR, Perry LJ, Yeates TO
The genomics of disulfide bonding and protein stabilization in thermophiles.
PLoS Biol.. Sep 2005. 3(9):e309. 2005 PMID: 16111437
PMC1188242 10.1371/journal.pbio.0030309
Chaudhuri BN, Yeates TO
A computational method to predict genetically encoded rare amino acids in proteins.
Genome Biol.. 2005. 6(9):R79. 2005 PMID: 16168086
PMC1242214 10.1186/gb-2005-6-9-r79


Bowers PM, Cokus SJ, Eisenberg D, Yeates TO
Use of logic relationships to decipher protein network organization.
Science. Dec 2004. 306(5705):2246-9. 2004 PMID: 15618515
O’Connor BD, Yeates TO
GDAP: a web tool for genome-wide protein disulfide bond prediction.
Nucleic Acids Res.. Jul 2004. 32(Web Server issue):W360-4. 2004 PMID: 15215411
PMC441514 10.1093/nar/gkh376
Bowers PM, Pellegrini M, Thompson MJ, Fierro J, Yeates TO, Eisenberg D
Prolinks: a database of protein functional linkages derived from coevolution.
Genome Biol.. 2004. 5(5):R35. 2004 PMID: 15128449
PMC416471 10.1186/gb-2004-5-5-r35


Strong M, Graeber TG, Beeby M, Pellegrini M, Thompson MJ, Yeates TO, Eisenberg D
Visualization and interpretation of protein networks in Mycobacterium tuberculosis based on hierarchical clustering of genome-wide functional linkage maps.
Nucleic Acids Res.. Dec 2003. 31(24):7099-109. 2003 PMID: 14654685


Mallick P, Boutz DR, Eisenberg D, Yeates TO
Genomic evidence that the intracellular proteins of archaeal microbes contain disulfide bonds.
Proc. Natl. Acad. Sci. U.S.A.. Jul 2002. 99(15):9679-84. 2002 PMID: 12107280
PMC124975 10.1073/pnas.142310499


Eisenberg D, Marcotte EM, Xenarios I, Yeates TO
Protein function in the post-genomic era.
Nature. Jun 2000. 405(6788):823-6. 2000 PMID: 10866208


Pellegrini M, Yeates TO
Searching for frameshift evolutionary relationships between protein sequence families.
Proteins. Nov 1999. 37(2):278-83. 1999 PMID: 10584072
Marcotte EM, Pellegrini M, Thompson MJ, Yeates TO, Eisenberg D
A combined algorithm for genome-wide prediction of protein function.
Nature. Nov 1999. 402(6757):83-6. 1999 PMID: 10573421
Marcotte EM, Pellegrini M, Yeates TO, Eisenberg D
A census of protein repeats.
J. Mol. Biol.. Oct 1999. 293(1):151-60. 1999 PMID: 10512723
Marcotte EM, Pellegrini M, Ng HL, Rice DW, Yeates TO, Eisenberg D
Detecting protein function and protein-protein interactions from genome sequences.
Science. Jul 1999. 285(5428):751-3. 1999 PMID: 10427000
Pellegrini M, Marcotte EM, Yeates TO
A fast algorithm for genome-wide analysis of proteins with repeated sequences.
Proteins. Jun 1999. 35(4):440-6. 1999 PMID: 10382671
Pellegrini M, Marcotte EM, Thompson MJ, Eisenberg D, Yeates TO
Assigning protein functions by comparative genome analysis: protein phylogenetic profiles.
Proc. Natl. Acad. Sci. U.S.A.. Apr 1999. 96(8):4285-8. 1999 PMID: 10200254

structural, computational, and synthetic biology