Gene redundancy is the existence of multiple genes in the genome of an organism that perform the same function. Gene redundancy can result from gene duplication.[1] Such duplication events are responsible for many sets of paralogous genes. When an individual gene in such a set is disrupted by mutation or targeted knockout, there can be little effect on phenotype as a result of gene redundancy, whereas the effect is large for the knockout of a gene with only one copy.[2] Gene knockout is a method utilized in some studies aiming to characterize the maintenance and fitness effects functional overlap.[3]
Classical models of maintenance propose that duplicated genes may be conserved to various extents in genomes due to their ability to compensate for deleterious loss of function mutations.[4] [5] These classical models do not take into account the potential impact of positive selection. Beyond these classical models, researchers continue to explore the mechanisms by which redundant genes are maintained and evolve.[6] [7] [8] Gene redundancy has long been appreciated as a source of novel gene origination; that is, new genes may arise when selective pressure exists on the duplicate, while the original gene is maintained to perform the original function, as proposed by newer models.
Gene redundancy most often results from Gene duplication.[9] Three of the more common mechanisms of gene duplication are retroposition, unequal crossing over, and non-homologous segmental duplication. Retroposition is when the mRNA transcript of a gene is reverse transcribed back into DNA and inserted into the genome at a different location. During unequal crossing over, homologous chromosomes exchange uneven portions of their DNA. This can lead to the transfer of one chromosome's gene to the other chromosome, leaving two of the same gene on one chromosome, and no copies of the gene on the other chromosome. Non-homologous duplications result from replication errors that shift the gene of interest into a new position. A tandem duplication then occurs, creating a chromosome with two copies of the same gene. Figure 1 provides a visualization of these three mechanisms.[10] When a gene is duplicated within a genome, the two copies are initially functionally redundant. These redundant genes are considered paralogs as they accumulate changes over time, until they functionally diverge.[11]
Much research is centered around the question of how redundant genes persist.[12] Three models have arisen to attempt to explain preservation of redundant genes: adaptive radiation, divergence, and escape from adaptive conflict. Notably, retainment following a duplication event is influenced by type of duplication event and type of gene class. That is, some gene classes are better suited for redundancy following a small scale duplication or whole genome duplication event.[13] Redundant genes are more likely to survive when they are involved in complex pathways and are the product of whole genome duplication or multifamily duplication.
The currently accepted outcomes for single gene duplicates include: gene loss (non-functionalization), functional divergence, and conservation for increased genetic robustness. Otherwise, multigene families may undergo concerted evolution, or birth and death evolution. Concerted evolution is the idea that genes in a group, such as a gene family, evolve in parallel. The birth death evolution concept is that the gene family undergoes strong purifying selection.
As the genome replicates over many generations, the redundant gene's function will most likely evolve due to Genetic drift. Genetic drift influences genetic redundancy by either eliminating variants or fixing variants in the population. In the event that genetic drift maintains the variants, the gene may accumulate mutations that change the overall function.[14] However, many redundant genes may diverge but retain original function by mechanisms such as subfunctionalization, which preserves original gene function albeit by complementary action of the duplicates. The three mechanisms of functional divergence in genes are nonfunctionalization (or gene loss), neofunctionalization and subfunctionalization.
During nonfunctionalization, or degeneration/gene loss, one copy of the duplicated gene acquires mutations that render it inactive or silent. Non-functionalization is often the result of single gene duplications. At this time, the gene has no function and is called a pseudogene. Pseudogenes can be lost over time due to genetic mutations. Neofunctionalization occurs when one copy of the gene accumulates mutations that give the gene a new, beneficial function that is different than the original function. Subfunctionalization occurs when both copies of the redundant gene acquire mutations. Each copy becomes only partially active; two of these partial copies then act as one normal copy of the original gene. Figure 2 to the right provides a visualization of this concept.
Transposable elements play various roles in functional differentiation. By enacting recombination, transposable elements can move redundant sequences in the genome.[15] This change in sequence structure and location is a source of functional divergence. Transposable elements potentially impact gene expression, given that they contain a sizeable amount of micro-RNAs.
The evolution and origin of redundant genes remain unknown, largely because evolution happens over such a long period of time. Theoretically, a gene can not be maintained without mutation unless it has a selective pressure acting on it. Gene redundancy, therefore, would allow both copies of the gene to accumulate mutations as long as the other was still able to perform its function. This means that all redundant genes should theoretically become a pseudogene and eventually be lost. Scientists have devised two hypotheses as to why redundant genes can remain in the genome: the backup hypothesis and the piggyback hypothesis.[16]
The backup hypothesis proposes that redundant genes remain in the genome as a sort of "back-up plan". If the original gene loses its function, the redundant gene is there to take over and keep the cell alive. The piggyback hypothesis states that two paralogs in the genome have some kind of non-overlapping function as well as the redundant function. In this case, the redundant part of the gene remains in the genome due to the proximity to the area that codes for the unique function.[17] The reason redundant genes remain in the genome is an ongoing question and gene redundancy is being studied by researchers everywhere. There are many hypotheses in addition to the backup and piggyback models. For example, at the University of Michigan, a study provides the theory that redundant genes are maintained in the genome by reduced expression.
Researchers often use the history of redundant genes in the form of gene families to learn about the phylogeny of a species. It takes time for redundant genes to undergo functional diversification; the degree of diversification between orthologs tells us how closely related the two genomes are. Gene duplication events can also be detected by looking at increases in gene duplicates.
A good example of using gene redundancy in evolutionary studies is the Evolution of the KCS gene family in plants. This paper studies how one KCS gene evolved into an entire gene family via duplication events. The number of redundant genes in the species allows researchers to determine when duplication events took place and how closely related species are.
Currently, there are three ways to detect paralogs in a known genomic sequence: simple homology (FASTA), gene family evolution (TreeFam) and orthology (eggNOG v3). Researchers often construct phylogenies and utilize microarrays to compare the structures of genomes to identify redundancy.[18] Methods like creating syntenic alignments and analysis of orthologous regions are used to compare multiple genomes. Single genomes can be scanned for redundant genes using exhaustive pairwise comparisons. Before performing more laborious analyses of redundant genes, researchers typically test for functionality by comparing open reading frame length and the rates between silent and non-silent mutations. Since the Human Genome Project's completion, researchers are able to annotate the human genome much more easily. Using online databases like the Genome Browser at UCSC, researchers can look for homology in the sequence of their gene of interest.
The mode of duplication by which redundancy occurs has been found to impact the classifications in breast cancer disposition genes.[19] Gross duplications complicate clinical interpretation because it is difficult to discern if they occur in tandem. Recent methods, like DNA breakpoint assay, have been used to determine tandem status. In turn, these tandem gross duplications can be more accurately screened for pathogenic status. This research has important implications for evaluating risk of breast cancer.
Researchers have also identified redundant genes that confer selective advantage on the organismal level. The partial ARM1 gene, a redundant gene resulting from a partial duplication, has been found to confer resistance to Blumeria graminis, a mildew fungus.[20] This gene exists in members of the Triticeae tribe, including wheat, rye, and barley.
The Human Olfactory Receptor (OR) gene family contains 339 intact genes and 297 pseudogenes. These genes are found in different locations throughout the genome, but only about 13% are on different chromosomes or on distantly spaced loci. 172 subfamilies of OR genes have been found in humans, each at its own loci. Because the genes in each of these subfamilies are structurally and functionally similar, and in close proximity to each other, it is hypothesized that each evolved from single genes undergoing duplication events. The high number of subfamilies in humans explains why we are able to recognize so many odors.
Human OR genes have homologues in other mammals, such as mice, that demonstrate the evolution of Olfactory Receptor genes. One particular family that is involved in the initial event of odor perception has been found to be highly conserved throughout all of vertebrate evolution.[21]
Duplication events and redundant genes have often been thought to have a role in some human diseases. Large scale whole genome duplication events that occurred early in vertebrate evolution may be the reason that human monogenic disease genes often contain a high number of redundant genes. Chen et al. hypothesizes that the functionally redundant paralogs in human monogenic disease genes mask the effects of dominant deleterious mutations, thereby maintaining the disease gene in the human genome.[22]
Whole genome duplications may be a leading cause of retention of some tumor causing genes in the human genome.[23] For example, Strout et al.[24] have shown that tandem duplication events, likely via homologous recombination, are linked to acute myeloid leukemia. The partial duplication of the ALL1 (MLL) gene is a genetic defect has been found in patients with acute myeloid leukemia.