Incomplete lineage sorting,[1] [2] [3] also termed hemiplasy, deep coalescence, retention of ancestral polymorphism, or trans-species polymorphism, describes a phenomenon in population genetics when ancestral gene copies fail to coalesce (looking backwards in time) into a common ancestral copy until deeper than previous speciation events. It is caused by lineage sorting of genetic polymorphisms that were retained across successive nodes in the species tree.[4] In other words, the tree produced by a single gene differs from the population or species level tree, producing a discordant tree. Whatever the mechanism, the result is that a generated species level tree may differ depending on the selected genes used for assessment.[5] [6] This is in contrast to complete lineage sorting, where the tree produced by the gene is the same as the population or species level tree. Both are common results in phylogenetic analysis, although it depends on the gene, organism, and sampling technique.
The concept of incomplete lineage sorting has some important implications for phylogenetic techniques. The persistence of polymorphisms across different speciation events can cause incomplete lineage sorting. Suppose two subsequent speciation events occur where an ancestor species gives rise firstly to species A, and secondly to species B and C. When studying a single gene, it can have multiple versions (alleles) causing different characters to appear (polymorphisms). In the example shown in Figure 1, the gene G has two versions (alleles), G0 and G1. The ancestor of A, B and C originally had only one version of gene G, G0. At some point, a mutation occurred and the ancestral population became polymorphic, with some individuals having G0 and others G1. When species A split off, it retained only G1, while the ancestor of B and C remained polymorphic. When B and C diverged, B retained only G1 and C only G0; neither were now polymorphic in G. The tree for gene G shows A and B as sisters, whereas the species tree shows B and C as sisters. If the phylogeny of these species is based on gene G, it will not represent the actual relationships between the species. In other words, the most related species will not necessarily inherit the most related genes. This is of course a simplified example of incomplete lineage sorting, and in real research it is usually more complex containing more genes and species.[7]
However, other mechanisms can lead to the same apparent discordancy, for example, alleles can move across species boundaries via hybridization, and DNA can be transferred between species by viruses. This is illustrated in Figure 2. Here the ancestor of A, B and C, and the ancestor of B and C, had only the G0 version of gene G. A mutation occurred at the divergence of B and C, and B acquired a mutated version, G1. Some time later, the arrow shows that G1 was transferred from B to A by some means (e.g. hybridization or horizontal gene transfer). Studying only the final states of G in the three species makes it appear that A and B are sisters rather than B and C, as in Figure 1, but in Figure 2 this is not caused by incomplete lineage sorting.
Incomplete lineage sorting has important implications for phylogenetic research. There is a chance that when creating a phylogenetic tree it may not resemble actual relationships because of this incomplete lineage sorting. However, gene flow between lineages by hybridization or horizontal gene transfer may produce the same conflicting phylogenetic tree. Distinguishing these different processes may seem difficult, but much research and different statistical approaches are (being) developed to gain greater insight in these evolutionary dynamics.[8] One of the resolutions to reduce the implications of incomplete lineage sorting is to use multiple genes for creating species or population phylogenies. The more genes used, the more reliable the phylogeny becomes.
Incomplete lineage sorting commonly happens with sexual reproduction because the species cannot be traced back to a single person or breeding pair. When organism tribe populations are large (i.e. thousands) each gene has some diversity and the gene tree consists of other pre-existing lineages. If the population is bigger these ancestral lineages are going to persist longer. When you get large ancestral populations together with closely timed speciation events these different pieces of DNA retain conflicting affiliations. This makes it hard to determine a common ancestor or points of branching.[5]
When studying primates, chimpanzees and bonobos are more related to each other than any other taxa and are thus sister taxa. Still, for 1.6% of the bonobo genome, sequences are more closely related to homologues of humans than to chimpanzees, which is probably a result of incomplete lineage sorting.[5] A study of more than 23,000 DNA sequence alignments in the family Hominidae (great apes, including humans) showed that about 23% did not support the known sister relationship of chimpanzees and humans.
In human evolution, incomplete lineage sorting is used to diagram hominin lineages that may have failed to sort out at the same time that speciation occurred in prehistory.[9] Due to the advent of genetic testing and genome sequencing, researchers found that the genetic relationships between hominin lineages might disagree with previous understandings of their relatedness based on physical characteristics. Moreover, divergence of the last common ancestor (LCA) may not necessarily occur at the same time as speciation.[10] Lineage sorting is a method that allows paleoanthropologists to explore the genetic relationships and divergences that may not fit with their previous speciation models based on phylogeny alone.
Incomplete lineage sorting of the human family tree is an area of great interest. There are a number of unknowns when considering both the transition from archaic humans to modern humans and divergence of the other great apes from the hominin lineage.[11]
Incomplete lineage sorting means that the average divergence time between genes may differ from the divergence time between species. Models suggest that the average divergence time between the genes in the human and chimpanzee genome is older than the split between humans and gorillas. What this means is the common ancestor of humans and chimpanzees has left traces of genetic material that was present in the common ancestor of humans, chimpanzees, and gorillas.[10] However, the genetic tree slightly differs from that of the species or phylogeny tree.[12] In the phylogeny tree when we look at the evolutionary relationship between the human, bonobo chimpanzee, and gorilla, the results show that the separation of bonobo and chimpanzee transpired in a close proximity of time to the common ancestor of the bonobo-chimpanzee ancestor and humans,[10] indicating that humans and chimpanzees shared a common ancestor for several million years after separation from gorillas. This creates the phenomenon that is incomplete lineage sorting. Today researchers are relying on DNA fragments in order to study the evolutionary relationships among humans and their counterparts in the hope that it will provide information about speciation and ancestral processes from genomes from different types of humans.[13]
Incomplete lineage sorting is a common feature in viral phylodynamics, where the phylogeny represented by transmission of a disease from one person to the next, which is to say the population level tree, often doesn't correspond to the tree created from a genetic analysis due to the population bottlenecks that are an inherent feature of viral transmission of disease. Figure 3 illustrates how this can occur. This has relevance to criminal transmission of HIV where in some criminal cases, a phylogenetic analysis of one or two genes from the strains from the accused and the victim have been used to infer transmission; however, the commonality of incomplete lineage sorting means that transmission cannot be inferred solely on the basis of such a basic analysis.[14]
Jacques and List (2019)[15] show that the concept of incomplete lineage sorting can be applied to account for non-treelike phenomena in language evolution. Kalyan and François (2019), proponents of the method of historical glottometry, a model challenging the applicability of the tree model in historical linguistics, concur that "Historical Glottometry does not challenge the family tree model once incomplete lineage sorting has been taken into account."[16]