A neutral network is a set of genes all related by point mutations that have equivalent function or fitness.[1] Each node represents a gene sequence and each line represents the mutation connecting two sequences. Neutral networks can be thought of as high, flat plateaus in a fitness landscape. During neutral evolution, genes can randomly move through neutral networks and traverse regions of sequence space which may have consequences for robustness and evolvability.
See also: Neutral mutation and Robustness (evolution). Neutral networks exist in fitness landscapes since proteins are robust to mutations. This leads to extended networks of genes of equivalent function, linked by neutral mutations.[2] [3] Proteins are resistant to mutations because many sequences can fold into highly similar structural folds.[4] A protein adopts a limited ensemble of native conformations because those conformers have lower energy than unfolded and mis-folded states (ΔΔG of folding).[5] [6] This is achieved by a distributed, internal network of cooperative interactions (hydrophobic, polar and covalent).[7] Protein structural robustness results from few single mutations being sufficiently disruptive to compromise function. Proteins have also evolved to avoid aggregation[8] as partially folded proteins can combine to form large, repeating, insoluble protein fibrils and masses.[9] There is evidence that proteins show negative design features to reduce the exposure of aggregation-prone beta-sheet motifs in their structures.[10] Additionally, there is some evidence that the genetic code itself may be optimised such that most point mutations lead to similar amino acids (conservative).[11] [12] Together these factors create a distribution of fitness effects of mutations that contains a high proportion of neutral and nearly-neutral mutations.[13]
Neutral networks are a subset of the sequences in sequence space that have equivalent function, and so form a wide, flat plateau in a fitness landscape. Neutral evolution can therefore be visualised as a population diffusing from one set of sequence nodes, through the neutral network, to another cluster of sequence nodes. Since the majority of evolution is thought to be neutral,[14] [15] a large proportion of gene change is the movement though expansive neutral networks.
See also: Robustness (evolution).
The more neutral neighbours a sequence has, the more robust to mutations it is since mutations are more likely to simply neutrally convert it into an equally functional sequence.[1] Indeed, if there are large differences between the number of neutral neighbours of different sequences within a neutral network, the population is predicted to evolve towards these robust sequences. This is sometimes called circum-neutrality and represents the movement of populations away from cliffs in the fitness landscape.[16]
In addition to in silico models,[17] these processes are beginning to be confirmed by experimental evolution of cytochrome P450s[18] and B-lactamase.[19]
See also: Evolvability. Interest in the interplay between genetic drift and selection has been around since the 1930s when the shifting-balance theory proposed that in some situations, genetic drift could facilitate later adaptive evolution.[20] Although the specifics of the theory were largely discredited,[21] it drew attention to the possibility that drift could generate cryptic variation that, though neutral to current function, may affect selection for new functions (evolvability).[22]
By definition, all genes in a neutral network have equivalent function, however some may exhibit promiscuous activities which could serve as starting points for adaptive evolution towards new functions.[23] [24] In terms of sequence space, current theories predict that if the neutral networks for two different activities overlap, a neutrally evolving population may diffuse to regions of the neutral network of the first activity that allow it to access the second.[25] This would only be the case when the distance between activities is smaller than the distance that a neutrally evolving population can cover. The degree of interpenetration of the two networks will determine how common cryptic variation for the promiscuous activity is in sequence space.[26]
The fact that neutral mutations were probably widespread was proposed by Freese and Yoshida in 1965.[27] Motoo Kimura later crystallized a theory of neutral evolution in 1968[28] with King and Jukes independently proposing a similar theory (1969).[29] Kimura computed the rate of nucleotide substitutions in a population (i.e. the average time for one base pair replacement to occur within a genome) and found it to be ~1.8 years. Such a high rate would not be tolerated by any mammalian population according to Haldane's formula. He thus concluded that, in mammals, neutral (or nearly neutral) nucleotide substitution mutations of DNA must dominate. He computed that such mutations were occurring at the rate of roughly 0-5 per year per gamete.In later years, a new paradigm emerged, that placed RNA as a precursor molecule to DNA. A primordial molecule principle was put forth as early as 1968 by Crick,[30] and lead to what is now known as The RNA World Hypothesis.[31] DNA is found, predominantly, as fully base paired double helices, while biological RNA is single stranded and often exhibits complex base-pairing interactions. These are due to its increased ability to form hydrogen bonds, a fact which stems from the existence of the extra hydroxyl group in the ribose sugar.
In the 1970s, Stein and M. Waterman laid the groundwork for the combinatorics of RNA secondary structures.[32] Waterman gave the first graph theoretic description of RNA secondary structures and their associated properties, and used them to produce an efficient minimum free energy (MFE) folding algorithm.[33] An RNA secondary structure can be viewed as a diagram over N labeled vertices with its Watson-Crick base pairs represented as non-crossing arcs in the upper half plane. Therefore, a secondary structure is a scaffold having many sequences compatible with its implied base pairing constraints. Later, Smith and Waterman developed an algorithm that performed local sequence alignment. Another prediction algorithm for RNA secondary structure was given by Nussinov[34] Nussinov's algorithm described the folding problem over a two letter alphabet as a planar graph optimization problem, where the quantity to be maximized is the number of matchings in the sequence string.
Come the year 1980, Howell et al. computed a generating function of all foldings of a sequence[35] while D. Sankoff (1985) described algorithms for alignment of finite sequences, the prediction of RNA secondary structures (folding), and the reconstruction of proto-sequences on a phylo-genetic tree.[36] Later, Waterman and Temple (1986) produced a polynomial time dynamic programming (DP) algorithm for predicting general RNA secondary structure.[37] while in the year 1990, John McCaskill presented a polynomial time DP algorithm for computing the full equilibrium partition function of an RNA secondary structure.[38] This changed the dominant calculation of RNA folding from a mapping of sequence to a particular 3D structure, to a mapping of sequence to a whole weighted ensemble of structures, which smooths RNA fitness, which depends on sequence via folding, facilitating more nearly neutral nets.
M. Zuker, implemented algorithms for computation of MFE RNA secondary structures[39] based on the work of Nussinov et al., Smith and Waterman[40] and Studnicka, et al.[41] Later L. Hofacker (et al., 1994),[42] presented The Vienna RNA package, a software package that integrated MFE folding and the computation of the partition function as well as base pairing probabilities.
Peter Schuster and W. Fontana (1994) shifted the focus towards sequence to structure maps (genotype–phenotype) . They used an inverse folding algorithm, to produce computational evidence that RNA sequences sharing the same structure are distributed randomly in sequence space. They observed that common structures can be reached from a random sequence by just a few mutations. These two facts lead them to conclude that the sequence space seemed to be percolated by neutral networks of nearest neighbor mutants that fold to the same structure.[43]
In 1997, C. Reidys Stadler and Schuster laid the mathematical foundations for the study and modelling of neutral networks of RNA secondary structures. Using a random graph model they proved the existence of a threshold value for connectivity of random sub-graphs in a configuration space, parametrized by λ, the fraction of neutral neighbors. They showed that the networks are connected and percolate sequence space if the fraction of neutral nearest neighbors exceeds λ*, a threshold value. Below this threshold the networks are partitioned into a largest giant component and several smaller ones. Key results of this analysis where concerned with threshold functions for density and connectivity for neutral networks as well as Schuster's shape space conjecture.[44] [45]