Protein superfamily explained

A protein superfamily is the largest grouping (clade) of proteins for which common ancestry can be inferred (see homology). Usually this common ancestry is inferred from structural alignment[1] and mechanistic similarity, even if no sequence similarity is evident.[2] Sequence homology can then be deduced even if not apparent (due to low sequence similarity). Superfamilies typically contain several protein families which show sequence similarity within each family. The term protein clan is commonly used for protease and glycosyl hydrolases superfamilies based on the MEROPS and CAZy classification systems.[2] [3]

Identification

Superfamilies of proteins are identified using a number of methods. Closely related members can be identified by different methods to those needed to group the most evolutionarily divergent members.

Sequence similarity

See main article: Sequence homology. Historically, the similarity of different amino acid sequences has been the most common method of inferring homology.[4] Sequence similarity is considered a good predictor of relatedness, since similar sequences are more likely the result of gene duplication and divergent evolution, rather than the result of convergent evolution. Amino acid sequence is typically more conserved than DNA sequence (due to the degenerate genetic code), so it is a more sensitive detection method. Since some of the amino acids have similar properties (e.g., charge, hydrophobicity, size), conservative mutations that interchange them are often neutral to function. The most conserved sequence regions of a protein often correspond to functionally important regions like catalytic sites and binding sites, since these regions are less tolerant to sequence changes.

Using sequence similarity to infer homology has several limitations. There is no minimum level of sequence similarity guaranteed to produce identical structures. Over long periods of evolution, related proteins may show no detectable sequence similarity to one another. Sequences with many insertions and deletions can also sometimes be difficult to align and so identify the homologous sequence regions. In the PA clan of proteases, for example, not a single residue is conserved through the superfamily, not even those in the catalytic triad. Conversely, the individual families that make up a superfamily are defined on the basis of their sequence alignment, for example the C04 protease family within the PA clan.

Nevertheless, sequence similarity is the most commonly used form of evidence to infer relatedness, since the number of known sequences vastly outnumbers the number of known tertiary structures.[5] In the absence of structural information, sequence similarity constrains the limits of which proteins can be assigned to a superfamily.

Structural similarity

See main article: Structural alignment. Structure is much more evolutionarily conserved than sequence, such that proteins with highly similar structures can have entirely different sequences.[6] Over very long evolutionary timescales, very few residues show detectable amino acid sequence conservation, however secondary structural elements and tertiary structural motifs are highly conserved. Some protein dynamics[7] and conformational changes of the protein structure may also be conserved, as is seen in the serpin superfamily.[8] Consequently, protein tertiary structure can be used to detect homology between proteins even when no evidence of relatedness remains in their sequences. Structural alignment programs, such as DALI, use the 3D structure of a protein of interest to find proteins with similar folds.[9] However, on rare occasions, related proteins may evolve to be structurally dissimilar[10] and relatedness can only be inferred by other methods.[11] [12] [13]

Mechanistic similarity

See main article: Enzyme mechanism.

The catalytic mechanism of enzymes within a superfamily is commonly conserved, although substrate specificity may be significantly different. Catalytic residues also tend to occur in the same order in the protein sequence.[14] For the families within the PA clan of proteases, although there has been divergent evolution of the catalytic triad residues used to perform catalysis, all members use a similar mechanism to perform covalent, nucleophilic catalysis on proteins, peptides or amino acids.[15] However, mechanism alone is not sufficient to infer relatedness. Some catalytic mechanisms have been convergently evolved multiple times independently, and so form separate superfamilies,[16] [17] [18] and in some superfamilies display a range of different (though often chemically similar) mechanisms.[19]

Evolutionary significance

Protein superfamilies represent the current limits of our ability to identify common ancestry.[20] They are the largest evolutionary grouping based on direct evidence that is currently possible. They are therefore amongst the most ancient evolutionary events currently studied. Some superfamilies have members present in all kingdoms of life, indicating that the last common ancestor of that superfamily was in the last universal common ancestor of all life (LUCA).[21]

Superfamily members may be in different species, with the ancestral protein being the form of the protein that existed in the ancestral species (orthology). Conversely, the proteins may be in the same species, but evolved from a single protein whose gene was duplicated in the genome (paralogy).

Diversification

A majority of proteins contain multiple domains. Between 66-80% of eukaryotic proteins have multiple domains while about 40-60% of prokaryotic proteins have multiple domains.[4] Over time, many of the superfamilies of domains have mixed together. In fact, it is very rare to find “consistently isolated superfamilies”.[4] When domains do combine, the N- to C-terminal domain order (the "domain architecture") is typically well conserved. Additionally, the number of domain combinations seen in nature is small compared to the number of possibilities, suggesting that selection acts on all combinations.[4]

Examples

α/β hydrolase superfamily: Members share an α/β sheet, containing 8 strands connected by helices, with catalytic triad residues in the same order,[22] activities include proteases, lipases, peroxidases, esterases, epoxide hydrolases and dehalogenases.[23]
  • Alkaline phosphatase superfamily: Members share an αβα sandwich structure[24] as well as performing common promiscuous reactions by a common mechanism.[25]
  • Globin superfamily: Members share an 8-alpha helix globular globin fold.[26] [27]
  • Immunoglobulin superfamily: Members share a sandwich-like structure of two sheets of antiparallel β strands (Ig-fold), and are involved in recognition, binding, and adhesion.[28] [29]
  • PA clan: Members share a chymotrypsin-like double β-barrel fold and similar proteolysis mechanisms but sequence identity of <10%. The clan contains both cysteine and serine proteases (different nucleophiles).[2] [30]
  • Ras superfamily: Members share a common catalytic G domain of a 6-strand β sheet surrounded by 5 α-helices.[31]
  • RSH superfamily: Members share capability to hydrolyze and/or synthesize ppGpp alarmones in the stringent response. [32]
  • Serpin superfamily: Members share a high-energy, stressed fold which can undergo a large conformational change, which is typically used to inhibit serine and cysteine proteases by disrupting their structure.[8]
  • TIM barrel superfamily: Members share a large α8β8 barrel structure. It is one of the most common protein folds and the monophylicity of this superfamily is still contested.[33] [34]
  • Protein superfamily resources

    Several biological databases document protein superfamilies and protein folds, for example:

    Similarly there are algorithms that search the PDB for proteins with structural homology to a target structure, for example:

    See also

    Notes and References

    1. Holm L, Rosenström P . Dali server: conservation mapping in 3D . Nucleic Acids Research . 38 . Web Server issue . W545–9 . July 2010 . 20457744 . 2896194 . 10.1093/nar/gkq366 .
    2. Rawlings ND, Barrett AJ, Bateman A . MEROPS: the database of proteolytic enzymes, their substrates and inhibitors . Nucleic Acids Research . 40 . Database issue . D343–50 . January 2012 . 22086950 . 3245014 . 10.1093/nar/gkr987 .
    3. Henrissat B, Bairoch A . Updating the sequence-based classification of glycosyl hydrolases . The Biochemical Journal . 316 . Pt 2 . 695–6 . June 1996 . 8687420 . 1217404 . 10.1042/bj3160695 .
    4. Han JH, Batey S, Nickson AA, Teichmann SA, Clarke J . The folding and evolution of multidomain proteins . Nature Reviews Molecular Cell Biology . 8 . 4 . 319–30 . April 2007 . 17356578 . 10.1038/nrm2144 . 13762291 .
    5. Pandit SB, Gosar D, Abhiman S, Sujatha S, Dixit SS, Mhatre NS, Sowdhamini R, Srinivasan N . SUPFAM--a database of potential protein superfamily relationships derived by comparing sequence-based and structure-based families: implications for structural genomics and function annotation in genomes . Nucleic Acids Research . 30 . 1 . 289–93 . January 2002 . 11752317 . 99061 . 10.1093/nar/30.1.289 .
    6. Orengo CA, Thornton JM . Protein families and their evolution-a structural perspective . Annual Review of Biochemistry . 74 . 1 . 867–900 . 2005 . 15954844 . 10.1146/annurev.biochem.74.082803.133029 .
    7. Liu Y, Bahar I . Sequence evolution correlates with structural dynamics . Molecular Biology and Evolution . 29 . 9 . 2253–63 . September 2012 . 22427707 . 3424413 . 10.1093/molbev/mss097 .
    8. Silverman GA, Bird PI, Carrell RW, Church FC, Coughlin PB, Gettins PG, Irving JA, Lomas DA, Luke CJ, Moyer RW, Pemberton PA, Remold-O'Donnell E, Salvesen GS, Travis J, Whisstock JC . The serpins are an expanding superfamily of structurally similar but functionally diverse proteins. Evolution, mechanism of inhibition, novel functions, and a revised nomenclature . The Journal of Biological Chemistry . 276 . 36 . 33293–6 . September 2001 . 11435447 . 10.1074/jbc.R100016200 . free .
    9. Holm L, Laakso LM . Dali server update . Nucleic Acids Research . 44 . W1 . W351–5 . July 2016 . 27131377 . 4987910 . 10.1093/nar/gkw357 .
    10. Pascual-García A, Abia D, Ortiz ÁR, Bastolla U . Cross-Over between Discrete and Continuous Protein Structure Space: Insights into Automatic Classification and Networks of Protein Structures . PLOS Computational Biology . 5 . 3 . 2009 . e1000331 . 10.1371/journal.pcbi.1000331 . 19325884 . 2654728 . 2009PLSCB...5E0331P . free .
    11. Li D, Zhang L, Yin H, Xu H, Satkoski Trask J, Smith DG, Li Y, Yang M, Zhu Q . Evolution of primate α and θ defensins revealed by analysis of genomes . Molecular Biology Reports . 41 . 6 . 3859–66 . June 2014 . 24557891 . 10.1007/s11033-014-3253-z . 14936647 .
    12. Krishna SS, Grishin NV . Structural drift: a possible path to protein fold change . Bioinformatics . 21 . 8 . 1308–10 . April 2005 . 15604105 . 10.1093/bioinformatics/bti227 . free .
    13. Bryan PN, Orban J . Proteins that switch folds . Current Opinion in Structural Biology . 20 . 4 . 482–8 . August 2010 . 20591649 . 2928869 . 10.1016/j.sbi.2010.06.002 .
    14. Echave J, Spielman SJ, Wilke CO . Causes of evolutionary rate variation among protein sites . En . Nature Reviews. Genetics . 17 . 2 . 109–21 . February 2016 . 26781812 . 4724262 . 10.1038/nrg.2015.18 .
    15. Shafee T, Gatti-Lafranconi P, Minter R, Hollfelder F . Handicap-Recover Evolution Leads to a Chemically Versatile, Nucleophile-Permissive Protease . ChemBioChem . 16 . 13 . 1866–1869 . September 2015 . 26097079 . 4576821 . 10.1002/cbic.201500295 .
    16. Buller AR, Townsend CA . Intrinsic evolutionary constraints on protease structure, enzyme acylation, and the identity of the catalytic triad . Proceedings of the National Academy of Sciences of the United States of America . 110 . 8 . E653–61 . February 2013 . 23382230 . 3581919 . 10.1073/pnas.1221050110 . 2013PNAS..110E.653B . free .
    17. Coutinho PM, Deleury E, Davies GJ, Henrissat B . An evolving hierarchical family classification for glycosyltransferases . Journal of Molecular Biology . 328 . 2 . 307–17 . April 2003 . 12691742 . 10.1016/S0022-2836(03)00307-3 .
    18. Zámocký M, Hofbauer S, Schaffner I, Gasselhuber B, Nicolussi A, Soudi M, Pirker KF, Furtmüller PG, Obinger C . Independent evolution of four heme peroxidase superfamilies . Archives of Biochemistry and Biophysics . 574 . 108–19 . May 2015 . 25575902 . 4420034 . 10.1016/j.abb.2014.12.025 .
    19. Akiva. Eyal. Brown. Shoshana. Almonacid. Daniel E.. Barber. Alan E.. Custer. Ashley F.. Hicks. Michael A.. Huang. Conrad C.. Lauck. Florian. Mashiyama. Susan T.. 2013-11-23. The Structure–Function Linkage Database. Nucleic Acids Research. en. 42. D1. D521–D530. 10.1093/nar/gkt1130. 24271399. 3965090. 0305-1048.
    20. Shakhnovich BE, Deeds E, Delisi C, Shakhnovich E . Protein structure and evolutionary history determine sequence space topology . Genome Research . 15 . 3 . 385–92 . March 2005 . 15741509 . 551565 . 10.1101/gr.3133605 . q-bio/0404040 .
    21. Ranea JA, Sillero A, Thornton JM, Orengo CA . Protein superfamily evolution and the last universal common ancestor (LUCA) . Journal of Molecular Evolution . 63 . 4 . 513–25 . October 2006 . 17021929 . 10.1007/s00239-005-0289-7 . 2006JMolE..63..513R . 10261/78338 . 25258028 .
    22. Carr PD, Ollis DL . Alpha/beta hydrolase fold: an update . Protein and Peptide Letters . 16 . 10 . 1137–48 . 2009 . 19508187 . 10.2174/092986609789071298.
    23. Nardini M, Dijkstra BW . Alpha/beta hydrolase fold enzymes: the family keeps growing . Current Opinion in Structural Biology . 9 . 6 . 732–7 . December 1999 . 10607665 . 10.1016/S0959-440X(99)00037-8 .
    24. Web site: SCOP. 28 May 2014. https://web.archive.org/web/20140729042732/http://scop.mrc-lmb.cam.ac.uk/scop/data/scop.b.d.bah.A.html. 29 July 2014. dead.
    25. Mohamed MF, Hollfelder F . Efficient, crosswise catalytic promiscuity among enzymes that catalyze phosphoryl transfer . Biochimica et Biophysica Acta (BBA) - Proteins and Proteomics . 1834 . 1 . 417–24 . January 2013 . 22885024 . 10.1016/j.bbapap.2012.07.015 .
    26. Book: Branden . Carl . Tooze . John . vanc . Introduction to protein structure. 1999. Garland Pub.. New York. 978-0815323051. 2nd.
    27. Bolognesi M, Onesti S, Gatti G, Coda A, Ascenzi P, Brunori M . Aplysia limacina myoglobin. Crystallographic analysis at 1.6 A resolution . Journal of Molecular Biology . 205 . 3 . 529–44 . February 1989 . 2926816 . 10.1016/0022-2836(89)90224-6 .
    28. Bork P, Holm L, Sander C . The immunoglobulin fold. Structural classification, sequence patterns and common core . Journal of Molecular Biology . 242 . 4 . 309–20 . September 1994 . 7932691 . 10.1006/jmbi.1994.1582 .
    29. Brümmendorf T, Rathjen FG . Cell adhesion molecules 1: immunoglobulin superfamily . Protein Profile . 2 . 9 . 963–1108 . 1995 . 8574878 .
    30. Bazan JF, Fletterick RJ . Viral cysteine proteases are homologous to the trypsin-like family of serine proteases: structural and functional implications . Proceedings of the National Academy of Sciences of the United States of America . 85 . 21 . 7872–6 . November 1988 . 3186696 . 282299 . 10.1073/pnas.85.21.7872 . 1988PNAS...85.7872B . free .
    31. Vetter IR, Wittinghofer A . The guanine nucleotide-binding switch in three dimensions . Science . 294 . 5545 . 1299–304 . November 2001 . 11701921 . 10.1126/science.1062023 . 2001Sci...294.1299V . 6636339 .
    32. Atkinson . Gemma C. . Tenson . Tanel . Hauryliuk . Vasili . 2011-08-09 . The RelA/SpoT Homolog (RSH) Superfamily: Distribution and Functional Evolution of ppGpp Synthetases and Hydrolases across the Tree of Life . PLOS ONE . 6 . 8 . e23479 . 10.1371/journal.pone.0023479 . 1932-6203 . 3153485 . 21858139. 2011PLoSO...623479A . free .
    33. Nagano N, Orengo CA, Thornton JM . One fold with many functions: the evolutionary relationships between TIM barrel families based on their sequences, structures and functions . Journal of Molecular Biology . 321 . 5 . 741–65 . August 2002 . 12206759 . 10.1016/s0022-2836(02)00649-6 .
    34. Farber G . An α/β-barrel full of evolutionary trouble . Current Opinion in Structural Biology . 1993 . 3 . 3 . 409–412 . 10.1016/S0959-440X(05)80114-9.