Neutral network (evolution) explained

A neutral network is a set of genes all related by point mutations that have equivalent function or fitness.^[1] Each node represents a gene sequence and each line represents the mutation connecting two sequences. Neutral networks can be thought of as high, flat plateaus in a fitness landscape. During neutral evolution, genes can randomly move through neutral networks and traverse regions of sequence space which may have consequences for robustness and evolvability.

Genetic and molecular causes

See also: Neutral mutation and Robustness (evolution). Neutral networks exist in fitness landscapes since proteins are robust to mutations. This leads to extended networks of genes of equivalent function, linked by neutral mutations.^[2] ^[3] Proteins are resistant to mutations because many sequences can fold into highly similar structural folds.^[4] A protein adopts a limited ensemble of native conformations because those conformers have lower energy than unfolded and mis-folded states (ΔΔG of folding).^[5] ^[6] This is achieved by a distributed, internal network of cooperative interactions (hydrophobic, polar and covalent).^[7] Protein structural robustness results from few single mutations being sufficiently disruptive to compromise function. Proteins have also evolved to avoid aggregation^[8] as partially folded proteins can combine to form large, repeating, insoluble protein fibrils and masses.^[9] There is evidence that proteins show negative design features to reduce the exposure of aggregation-prone beta-sheet motifs in their structures.^[10] Additionally, there is some evidence that the genetic code itself may be optimised such that most point mutations lead to similar amino acids (conservative).^[11] ^[12] Together these factors create a distribution of fitness effects of mutations that contains a high proportion of neutral and nearly-neutral mutations.^[13]

Evolution

Neutral networks are a subset of the sequences in sequence space that have equivalent function, and so form a wide, flat plateau in a fitness landscape. Neutral evolution can therefore be visualised as a population diffusing from one set of sequence nodes, through the neutral network, to another cluster of sequence nodes. Since the majority of evolution is thought to be neutral,^[14] ^[15] a large proportion of gene change is the movement though expansive neutral networks.

Robustness

Evolvability

See also: Evolvability. Interest in the interplay between genetic drift and selection has been around since the 1930s when the shifting-balance theory proposed that in some situations, genetic drift could facilitate later adaptive evolution.^[20] Although the specifics of the theory were largely discredited,^[21] it drew attention to the possibility that drift could generate cryptic variation that, though neutral to current function, may affect selection for new functions (evolvability).^[22]

By definition, all genes in a neutral network have equivalent function, however some may exhibit promiscuous activities which could serve as starting points for adaptive evolution towards new functions.^[23] ^[24] In terms of sequence space, current theories predict that if the neutral networks for two different activities overlap, a neutrally evolving population may diffuse to regions of the neutral network of the first activity that allow it to access the second.^[25] This would only be the case when the distance between activities is smaller than the distance that a neutrally evolving population can cover. The degree of interpenetration of the two networks will determine how common cryptic variation for the promiscuous activity is in sequence space.^[26]

Mathematical Framework

The fact that neutral mutations were probably widespread was proposed by Freese and Yoshida in 1965.^[27] Motoo Kimura later crystallized a theory of neutral evolution in 1968^[28] with King and Jukes independently proposing a similar theory (1969).^[29] Kimura computed the rate of nucleotide substitutions in a population (i.e. the average time for one base pair replacement to occur within a genome) and found it to be ~1.8 years. Such a high rate would not be tolerated by any mammalian population according to Haldane's formula. He thus concluded that, in mammals, neutral (or nearly neutral) nucleotide substitution mutations of DNA must dominate. He computed that such mutations were occurring at the rate of roughly 0-5 per year per gamete.In later years, a new paradigm emerged, that placed RNA as a precursor molecule to DNA. A primordial molecule principle was put forth as early as 1968 by Crick,^[30] and lead to what is now known as The RNA World Hypothesis.^[31] DNA is found, predominantly, as fully base paired double helices, while biological RNA is single stranded and often exhibits complex base-pairing interactions. These are due to its increased ability to form hydrogen bonds, a fact which stems from the existence of the extra hydroxyl group in the ribose sugar.

In the 1970s, Stein and M. Waterman laid the groundwork for the combinatorics of RNA secondary structures.^[32] Waterman gave the first graph theoretic description of RNA secondary structures and their associated properties, and used them to produce an efficient minimum free energy (MFE) folding algorithm.^[33] An RNA secondary structure can be viewed as a diagram over N labeled vertices with its Watson-Crick base pairs represented as non-crossing arcs in the upper half plane. Therefore, a secondary structure is a scaffold having many sequences compatible with its implied base pairing constraints. Later, Smith and Waterman developed an algorithm that performed local sequence alignment. Another prediction algorithm for RNA secondary structure was given by Nussinov^[34] Nussinov's algorithm described the folding problem over a two letter alphabet as a planar graph optimization problem, where the quantity to be maximized is the number of matchings in the sequence string.

Come the year 1980, Howell et al. computed a generating function of all foldings of a sequence^[35] while D. Sankoff (1985) described algorithms for alignment of finite sequences, the prediction of RNA secondary structures (folding), and the reconstruction of proto-sequences on a phylo-genetic tree.^[36] Later, Waterman and Temple (1986) produced a polynomial time dynamic programming (DP) algorithm for predicting general RNA secondary structure.^[37] while in the year 1990, John McCaskill presented a polynomial time DP algorithm for computing the full equilibrium partition function of an RNA secondary structure.^[38] This changed the dominant calculation of RNA folding from a mapping of sequence to a particular 3D structure, to a mapping of sequence to a whole weighted ensemble of structures, which smooths RNA fitness, which depends on sequence via folding, facilitating more nearly neutral nets.

M. Zuker, implemented algorithms for computation of MFE RNA secondary structures^[39] based on the work of Nussinov et al., Smith and Waterman^[40] and Studnicka, et al.^[41] Later L. Hofacker (et al., 1994),^[42] presented The Vienna RNA package, a software package that integrated MFE folding and the computation of the partition function as well as base pairing probabilities.

Peter Schuster and W. Fontana (1994) shifted the focus towards sequence to structure maps (genotype–phenotype) . They used an inverse folding algorithm, to produce computational evidence that RNA sequences sharing the same structure are distributed randomly in sequence space. They observed that common structures can be reached from a random sequence by just a few mutations. These two facts lead them to conclude that the sequence space seemed to be percolated by neutral networks of nearest neighbor mutants that fold to the same structure.^[43]

In 1997, C. Reidys Stadler and Schuster laid the mathematical foundations for the study and modelling of neutral networks of RNA secondary structures. Using a random graph model they proved the existence of a threshold value for connectivity of random sub-graphs in a configuration space, parametrized by λ, the fraction of neutral neighbors. They showed that the networks are connected and percolate sequence space if the fraction of neutral nearest neighbors exceeds λ*, a threshold value. Below this threshold the networks are partitioned into a largest giant component and several smaller ones. Key results of this analysis where concerned with threshold functions for density and connectivity for neutral networks as well as Schuster's shape space conjecture.^[44] ^[45]

Notes and References

van Nimwegen. E. Crutchfield, JP . Huynen, M . Neutral evolution of mutational robustness.. Proceedings of the National Academy of Sciences of the United States of America. Aug 17, 1999. 96. 17. 9716–20. 10449760. 10.1073/pnas.96.17.9716. 22276. adap-org/9903006. 1999PNAS...96.9716V. free.
Taverna. DM. Goldstein, RA . Why are proteins so robust to site mutations?. Journal of Molecular Biology. Jan 18, 2002. 315. 3. 479–84. 11786027. 10.1006/jmbi.2001.5226.
Tokuriki. N. Tawfik, DS . Stability effects of mutations and protein evolvability.. Current Opinion in Structural Biology. Oct 2009. 19. 5. 596–604. 19765975. 10.1016/j.sbi.2009.08.003.
Meyerguz. L. Kleinberg, J . Elber, R . The network of sequence flow between protein structures.. Proceedings of the National Academy of Sciences of the United States of America. Jul 10, 2007. 104. 28. 11627–32. 17596339. 10.1073/pnas.0701393104. 1913895. 2007PNAS..10411627M. free.
Karplus. M. Behind the folding funnel diagram.. Nature Chemical Biology. Jun 17, 2011. 7. 7. 401–4. 21685880. 10.1038/nchembio.565.
Tokuriki. N. Stricher, F . Schymkowitz, J . Serrano, L . Tawfik, DS . The stability effects of protein mutations appear to be universally distributed.. Journal of Molecular Biology. Jun 22, 2007. 369. 5. 1318–32. 17482644 . 10.1016/j.jmb.2007.03.069. 24638570.
Shakhnovich. BE. Deeds, E . Delisi, C . Shakhnovich, E . Protein structure and evolutionary history determine sequence space topology.. Genome Research. Mar 2005. 15. 3. 385–92. 15741509 . 10.1101/gr.3133605 . 551565. q-bio/0404040.
Monsellier. E. Chiti, F . Prevention of amyloid-like aggregation as a driving force of protein evolution.. EMBO Reports. Aug 2007. 8. 8. 737–42. 17668004. 10.1038/sj.embor.7401034. 1978086.
Fink. AL. Protein aggregation: folding aggregates, inclusion bodies and amyloid.. Folding & Design. 1998. 3. 1. R9-23. 9502314. 10.1016/s1359-0278(98)00002-9. free.
Richardson. JS. Richardson, DC . Natural beta-sheet proteins use negative design to avoid edge-to-edge aggregation.. Proceedings of the National Academy of Sciences of the United States of America. Mar 5, 2002. 99. 5. 2754–9. 11880627. 10.1073/pnas.052706099. 122420. 2002PNAS...99.2754R. free.
Müller. MM . Allison, JR . Hongdilokkul, N . Gaillon, L . Kast, P . van Gunsteren, WF . Marlière, P . Hilvert, D . Directed evolution of a model primordial enzyme provides insights into the development of the genetic code.. PLOS Genetics. 2013. 9. 1. e1003187. 23300488. 10.1371/journal.pgen.1003187. 3536711 . free .
Firnberg. E. Ostermeier, M . The genetic code constrains yet facilitates Darwinian evolution.. Nucleic Acids Research. Aug 2013. 41. 15. 7420–8. 23754851. 10.1093/nar/gkt536. 3753648.
Hietpas. RT. Jensen, JD . Bolon, DN . Experimental illumination of a fitness landscape.. Proceedings of the National Academy of Sciences of the United States of America. May 10, 2011. 108. 19. 7896–901. 21464309. 10.1073/pnas.1016024108. 3093508. 2011PNAS..108.7896H. free.
Kimura, Motoo. (1983). The neutral theory of molecular evolution. Cambridge
Kimura . M. . 1968 . Evolutionary Rate at the Molecular Level . Nature . 217 . 5129. 624–6 . 10.1038/217624a0 . 5637732. 1968Natur.217..624K. 4161261 .
Proulx. SR. Adler, FR . The standard of neutrality: still flapping in the breeze?. Journal of Evolutionary Biology. Jul 2010. 23. 7. 1339–50. 20492093. 10.1111/j.1420-9101.2010.02006.x. 7774510. free.
van Nimwegen E. . Crutchfield J. P. . Huynen M. . Neutral evolution of mutational robustness . PNAS . 1999. 96 . 17 . 9716–9720 . 10.1073/pnas.96.17.9716 . 10449760 . 22276. 1999PNAS...96.9716V . free . adap-org/9903006 .
Bloom. JD. Lu, Z . Chen, D . Raval, A . Venturelli, OS . Arnold, FH . Evolution favors protein mutational robustness in sufficiently large populations.. BMC Biology. Jul 17, 2007. 5. 29. 17640347 . 10.1186/1741-7007-5-29 . 1995189. 0704.1885 . free .
Bershtein. Shimon. Goldin, Korina . Tawfik, Dan S. . Intense Neutral Drifts Yield Robust and Evolvable Consensus Proteins. Journal of Molecular Biology. June 2008. 379. 5. 1029–1044. 10.1016/j.jmb.2008.04.024. 18495157.
Wright. Sewel. The roles of mutation, inbreeding, crossbreeding and selection in evolution. Proceedings of the Sixth International Congress of Genetics. 1932. 356–366.
Coyne. JA. Barton NH . Turelli M . Perspective: a critique of Sewall Wright's shifting balance theory of evolution. Evolution. 1997. 51. 3. 643–671. 10.2307/2411143. 28568586. 2411143.
Davies. E. K.. High Frequency of Cryptic Deleterious Mutations in Caenorhabditis elegans. Science. 10 September 1999. 285. 5434. 1748–1751. 10.1126/science.285.5434.1748. 10481013.
Masel. J. Cryptic genetic variation is enriched for potential adaptations.. Genetics. Mar 2006. 172. 3. 1985–91. 16387877. 10.1534/genetics.105.051649. 1456269.
Hayden. EJ. Ferrada, E . Wagner, A . Cryptic genetic variation promotes rapid evolutionary adaptation in an RNA enzyme.. Nature. Jun 2, 2011. 474. 7349. 92–5. 21637259. 10.1038/nature10083. 4390213.
Bornberg-Bauer. E. Huylmans, AK . Sikosek, T . How do new proteins arise?. Current Opinion in Structural Biology. Jun 2010. 20. 3. 390–6. 20347587. 10.1016/j.sbi.2010.02.005.
Book: Wagner, Andreas. The origins of evolutionary innovations : a theory of transformative change in living systems. Oxford University Press. Oxford [etc.]. 978-0199692590. 2011-07-14.
Freese, E. and Yoshida, A. (1965). The role of mutations in evolution. In V Bryson, and H J Vogel, eds. Evolving Genes and Proteins, pp. 341-55. Academic, New York.
Kimura . M . 1968 . Evolutionary Rate at the Molecular Level . Nature . 217 . 5129. 624–6 . 10.1038/217624a0 . 5637732 . 1968Natur.217..624K . 4161261 .
King . JL . Jukes . TH . 1969 . Non-Darwinian Evolution . Science . 164 . 3881. 788–97 . 10.1126/science.164.3881.788 . 5767777 . 1969Sci...164..788L .
Crick . FH . 1968 . The origin of the genetic code . Journal of Molecular Biology . 38 . 3. 367–79 . 10.1016/0022-2836(68)90392-6 . 4887876 .
Robertson . MP . Joyce . GF . 2012 . The origins of the RNA world . Cold Spring Harbor Perspectives in Biology . 4. 5. a003608. 10.1101/cshperspect.a003608 . 20739415 . 3331698 .
Stein . P.R. . Waterman . M.S. . 1978 . On some new sequences generalizing the Catalan and Motzkin numbers . Discrete Math. . 26 . 3. 261–272 . 10.1016/0012-365x(79)90033-5 . free .
M.S. Waterman. Secondary structure of single - stranded nucleic acids. Adv. Math. I (suppl.), 1:167–212, 1978.
Nussiniv . etal . 1978 . Algorithms for Loop Matchings . SIAM Journal on Applied Mathematics . 35 . 1 . 68–82 . 10.1137/0135006 . 2101031.
Howell . J.A. . Smith . T.F. . Waterman . M.S. . 1980 . Computation of generating functions for biological molecules . SIAM J. Appl. Math. . 39 . 119133 . 10.1137/0139010 .
Sankoff . David . Simultaneous Solution of the RNA Folding, Alignment and Protosequence Problems . SIAM Journal on Applied Mathematics . October 1985 . 45 . 5 . 810–825 . 10.1137/0145048.
Waterman . M.S. . Smith . T.F. . 1986 . Rapid dynamic programming algorithms for RNA secondary structure . Adv. Appl. Math. . 7 . 4. 455–464 . 10.1016/0196-8858(86)90025-4 . free .
McCaskill . John . 1990. The Equilibrium Partition Function and Base Pair Binding Probabilities for RNA Secondary Structure . Biopolymers . 29 . 6–7. 1105–19 . 10.1002/bip.360290621 . 1695107 . 11858/00-001M-0000-0013-0DE3-9 . 12629688 . free .
Zuker . Michael . Stiegler . Patrick . 1981 . Optimal Computer Folding of Large RNA Sequences Using Thermodynamics . Nucleic Acids Research . 9. 1. 133–148. 10.1093/nar/9.1.133 . 326673 . 6163133 .
Smith . Temple F. . Waterman . Michael S. . 1981 . Identification of common molecular subsequences. . 147 . 1. 195–197 . 10.1016/0022-2836(81)90087-5 . 7265238 .
Studnicka. Gary M.. Rahn. Georgia M.. Cummings. Ian W.. Salser. Winston A.. 1978-09-01. Computer method for predicting the secondary structure of single-stranded RNA. Nucleic Acids Research. 5. 9. 3365–3388. 10.1093/nar/5.9.3365. 100768. 342256. 0305-1048.
Hofacker . I.L. . Fontana . W. . Stadler . P.F. . etal . 1994 . Fast folding and comparison of RNA secondary structures . Monatsh Chem . 125 . 2. 167 . 10.1007/BF00818163 . 19344304 .
Schuster. Peter. Fontana. Walter. Stadler. Peter F.. Hofacker. Ivo L.. 1994-03-22. From Sequences to Shapes and Back: A Case Study in RNA Secondary Structures. Proceedings of the Royal Society of London B: Biological Sciences. en. 255. 1344. 279–284. 10.1098/rspb.1994.0040. 0962-8452. 7517565. 1994RSPSB.255..279S. 12021473.
Web site: Neutral networks of RNA Secondary Structures.
Hofacker. Ivo L.. Schuster. Peter. Stadler. Peter F.. Combinatorics of RNA secondary structures. Discrete Applied Mathematics. en. 88. 1–3. 207–237. 10.1016/s0166-218x(98)00073-0. 1998.