Infinite sites model explained

The Infinite sites model (ISM) is a mathematical model of molecular evolution first proposed by Motoo Kimura in 1969. Like other mutation models, the ISM provides a basis for understanding how mutation develops new alleles in DNA sequences. Using allele frequencies, it allows for the calculation of heterozygosity, or genetic diversity, in a finite population and for the estimation of genetic distances between populations of interest.

The assumptions of the ISM are that (1) there are an infinite number of sites where mutations can occur, (2) every new mutation occurs at a novel site, and (3) there is no recombination.[1] [2] The term ‘site’ refers to a single nucleotide base pair.[3] Because every new mutation has to occur at a novel site, there can be no homoplasy, or back-mutation to an allele that previously existed. All identical alleles are identical by descent. The four gamete rule can be applied to the data to ensure that they do not violate the model assumption of no recombination.[4]

The mutation rate (

\theta

) can be estimated as follows, where

\mu*

is the number of mutations found within a randomly selected DNA sequence (per generation),

Ne

is the effective population size.[5] The coefficient is the product of twice the gene copies in individuals of the population; in the case of diploid, biparentally-inherited genes the appropriate coefficient is 4 whereas for uniparental, haploid genes, such as mitochondrial genes, the coefficient would be 2 but applied to the female effective population size which is, for most species, roughly half of

Ne

.
*
\theta=4N
e\mu

When considering the length of a DNA sequence, the expected number of mutations is calculated as follows

\mu*=k\mu

Where k is the length of a DNA sequence and

\mu

is the probability a mutation will occur at a site.

Watterson developed an estimator for mutation rate that incorporates the number of segregating sites (Watterson's estimator).[6]

One way to think of the ISM is in how it applies to genome evolution. To understand the ISM as it applies to genome evolution, we must think of this model as it applies to chromosomes. Chromosomes are made up of sites, which are nucleotides represented by either A, C, G, or T. While individual chromosomes are not infinite, we must think of chromosomes as continuous intervals or continuous circles.[7]

Multiple assumptions are applied to understanding the ISM in terms of genome evolution:

Further reading

Notes and References

  1. Tajima . F . 1996 . Infinite-allele model and infinite-site model in population genetics . Journal of Genetics . 75 . 27–31 . 10.1007/bf02931749. 1330336 .
  2. Watterson . GA . 1975 . On the number of segregating sites in genetical models without recombination . Theoretical Population Biology . 7 . 2. 256–276 . 10.1016/0040-5809(75)90020-9 . 1145509.
  3. The Number of Heterozygous Nucleotide Sites Maintained in a Finite Population Due to Steady Flux of Mutations. Genetics. 1969-04-01. 0016-6731. 1212250. 5364968. 893–903. 61. 4. Motoo. Kimura. 10.1093/genetics/61.4.893.
  4. Hudson. Richard R.. Kaplan. Norman L.. 1985-09-01. Statistical Properties of the Number of Recombination Events in the History of a Sample of Dna Sequences. Genetics. en. 111. 1. 147–164. 10.1093/genetics/111.1.147. 0016-6731. 1202594. 4029609.
  5. Futschik . A . Gach . F . 2008 . On the inadmissibility of Watterson's estimator . Theoretical Population Biology . 73 . 2. 212–221 . 10.1016/j.tpb.2007.11.009. 18215409 .
  6. Ramirez-Soriano . A . Nielsen . R . 2009 . Correcting Estimators of Θ and Tajima's D for Ascertainment Biases Caused by the Single-Nucleotide Polymorphism Discovery Process . Genetics . 181 . 2. 701–710 . 10.1534/genetics.108.094060. 19087964 . 2644958 .
  7. Ma. Jian. Ratan. Aakrosh. Raney. Brian J.. Suh. Bernard B.. Miller. Webb. Haussler. David. 2008-09-23. The infinite sites model of genome evolution. Proceedings of the National Academy of Sciences. en. 105. 38. 14254–14261. 10.1073/pnas.0805217105. 0027-8424. 2533685. 18787111. free.