G-value paradox explained
The G-value paradox arises from the lack of correlation between the number of protein-coding genes among eukaryotes and their relative biological complexity. The microscopic nematode Caenorhabditis elegans, for example, is composed of only a thousand cells but has about the same number of genes as a human.[1] [2] Researchers suggest resolution of the paradox may lie in mechanisms such as alternative splicing and complex gene regulation that make the genes of humans and other complex eukaryotes relatively more productive.
DNA and biological complexity
The lack of correlation between the morphological complexity of eukaryotes and the amount of genetic information they carry has long puzzled researchers.[3] The sheer amount of DNA in an organism, measured by the mass of DNA present in the nucleus or the number of constituent nucleotide pairs, varies by several orders of magnitude among eukaryotes and often is unrelated to an organism's size or developmental complexity.[4] One amoeba has 200 times more DNA per cell than humans,[5] and even insects and plants within the same genus can vary dramatically in their quantity of DNA.[6] This C-value paradox troubled genome scientists for many years.
Eventually, researchers recognized that not all DNA contributes directly to the production of proteins and other biological functions.[7] Susumu Ohno coined the phrase "junk DNA" to describe these nonfunctional swaths of DNA.[8] They include introns, genetic sequences that are removed after transcription into mRNA and thus are not translated into proteins;[9] transposable elements that are mobile fragments of DNA, most of which are nonfunctional in humans;[10] and pseudogenes, nonfunctional DNA sequences that originated from functional genes.[11] The share of the human genome that may be considered "junk" remains controversial. Estimates reach as low as 8%[12] and as high as 80%,[13] with one researcher arguing that there is a fixed ceiling of 15% imposed by the genome's genetic load.[14] (Prokaryotes, which have little "junk" DNA by comparison, exhibit a fairly close relationship between genome size and biological functionality).[15]
In any case, the assumption was that once the C-paradox was swept away and the focus shifted to the number of protein-coding genes, the anticipated correlation between genetic information and biological complexity in eukaryotes would emerge.[16] Unfortunately, the G-value paradox simply picked up where the C-value paradox left off, because the discrepancy persisted when comparisons were narrowed to just protein-coding genes.[17]
G-value paradox
Estimates of the number of coding genes in the human genome reached upwards of 100,000 prior to the human genome project,[18] but since have dwindled to as low as 19,000 following completion of that massive sequencing effort and subsequent refinements. By comparison, the microscopic water flea Daphnia pulex has about 31,000 genes;[19] the nematode C. elegans about 19,700; the fruit fly (Drosophila melanogaster) about 14,000;[20] the zebrafish (Danio rerio), 26,000;[21] and the small flowering plant Arabidopsis thaliana, 27,000.[22] Plants in general tend to have more genes than other eukaryotes.[23] One explanation is their higher incidence of gene and whole genome duplication and retention of those additional genes, due in part to their development of a large collection of defensive secondary metabolites.
The apparent disconnect between the number of genes in a species and its biological complexity was dubbed the G-value paradox. While the C-value paradox unraveled with the discovery of massive sequences of noncoding DNA, resolution of the G-value paradox appears to rest on differences in genome productivity. Humans and other complex eukaryotes simply may be able to do more with what they have, genetically speaking.
Among the mechanisms cited for this greater productivity are more sophisticated transcriptional controls, multifunctional proteins, more interaction between protein products, alternative splicing[24] and post-translational modifications that may produce several protein products from the same genetic raw material.[25] In addition, thousands of non-coding RNAs that are transcribed from DNA but not translated into protein have emerged as important regulators of gene expression and development in humans and other eukaryotes.[26] They include short RNA sequences, such as microRNAs (miRNAs), small interfering RNAs (siRNAs) and Piwi-interacting RNAs (piRNAs), and long, non-coding RNAs (lncRNA) that may regulate gene expression at different stages of development.[27] Some researchers suggest that instead of the number of genes the focus now should shift to gene interactions and the network of genetic regulatory mechanisms that allow them to support a variety of biological activities.[28] These transitions have taken analysis of genetic complexity from the C-value to the G-value to what some refer to as the I-value, a measure of the total information contained in a genome.
Defining complexity
One of the challenges in the long debate over the mismatch between genome size and biological complexity has been ambiguity in defining complexity. Is it the number of cell types in an organism, the sophistication of its nervous system or the number of different proteins it produces? By some definitions, the greater complexity of humans compared to other organisms may be illusory.[29] Even once complexity is defined, some researchers argue complexity in function does not necessarily require the same complexity in process. Evolution is not a paragon of efficiency but travels a crooked path that leads to a more cumbersome genome than is necessary in some species.[30]
Notes and References
- Ezkurdia I, Juan D, Rodriguez JM, Frankish A, Diekhans M, Harrow J, Vazquez J, Valencia A, Tress ML . 6 . Multiple evidence strands suggest that there may be as few as 19,000 human protein-coding genes . Human Molecular Genetics . 23 . 22 . 5866–78 . November 2014 . 24939910 . 4204768 . 10.1093/hmg/ddu309 .
- Hillier LW, Coulson A, Murray JI, Bao Z, Sulston JE, Waterston RH . Genomics in C. elegans: so many genes, such a little worm . Genome Research . 15 . 12 . 1651–60 . December 2005 . 16339362 . 10.1101/gr.3729105 . free .
- Gall JG . Chromosome structure and the C-value paradox . The Journal of Cell Biology . 91 . 3 Pt 2 . 3s–14s . December 1981 . 7033242 . 2112778 . 10.1083/jcb.91.3.3s .
- Cavalier-Smith T . Nuclear volume control by nucleoskeletal DNA, selection for cell volume and cell growth rate, and the solution of the DNA C-value paradox . Journal of Cell Science . 34 . 247–78 . December 1978 . 10.1242/jcs.34.1.247 . 372199 .
- Holm-Hansen O . Algae: amounts of DNA and organic carbon in single cells . Science . 163 . 3862 . 87–8 . January 1969 . 5812598 . 10.1126/science.163.3862.87 . 1969Sci...163...87H . 44975843 .
- Thomas CA . The genetic organization of chromosomes . Annual Review of Genetics . 5 . 1 . 237–56 . 1971 . 16097657 . 10.1146/annurev.ge.05.120171.001321 .
- Gregory TR . Synergy between sequence and size in large-scale genomics . Nature Reviews. Genetics . 6 . 9 . 699–708 . September 2005 . 16151375 . 10.1038/nrg1674 . 24237594 .
- Ohno. S.. 1972. So much "junk" DNA in our genome. Brookhaven Symp. Biol.. 23. 366–370. 5065367.
- Gilbert W . Genes-in-pieces revisited . Science . 228 . 4701 . 823–4 . May 1985 . 4001923 . 10.1126/science.4001923 . 1985Sci...228..823G .
- Orgel LE, Crick FH . Selfish DNA: the ultimate parasite . Nature . 284 . 5757 . 604–7 . April 1980 . 7366731 . 10.1038/284604a0 . 1980Natur.284..604O . 4233826 .
- Balakirev ES, Ayala FJ . 24683075 . Pseudogenes: are they "junk" or functional DNA? . Annual Review of Genetics . 37 . 1 . 123–51 . 2003 . 14616058 . 10.1146/annurev.genet.37.040103.103949 .
- Rands CM, Meader S, Ponting CP, Lunter G . 8.2% of the Human genome is constrained: variation in rates of turnover across functional element classes in the human lineage . PLOS Genetics . 10 . 7 . e1004525 . July 2014 . 25057982 . 4109858 . 10.1371/journal.pgen.1004525 . Mikkel H. . Schierup . free .
- Dunham I, Kundaje A, Aldred SF, Collins PJ, Davis CA, Doyle F, etal . ENCODE Project Consortium . An integrated encyclopedia of DNA elements in the human genome . Nature . 489 . 7414 . 57–74 . September 2012 . 22955616 . 3439153 . 10.1038/nature11247 . 2012Natur.489...57T .
- Graur D . An Upper Limit on the Functional Fraction of the Human Genome . Genome Biology and Evolution . 9 . 7 . 1880–1885 . July 2017 . 28854598 . 5570035 . 10.1093/gbe/evx121 . Bill . Martin .
- Taft RJ, Pheasant M, Mattick JS . 16226307 . The relationship between non-protein-coding DNA and eukaryotic complexity . BioEssays . 29 . 3 . 288–99 . March 2007 . 17295292 . 10.1002/bies.20544 .
- Hahn MW, Wray GA . 2810069 . The g-value paradox . Evolution & Development . 4 . 2 . 73–5 . 2002 . 12004964 . 10.1046/j.1525-142X.2002.01069.x .
- Claverie JM . 11444318 . Gene number. What if there are only 30,000 human genes? . Science . 291 . 5507 . 1255–7 . February 2001 . 11233450 . 10.1126/science.1058969 .
- Fields C, Adams MD, White O, Venter JC . How many genes in the human genome? . Nature Genetics . 7 . 3 . 345–6 . July 1994 . 7920649 . 10.1038/ng0794-345 . 26164550 .
- Colbourne JK, Pfrender ME, Gilbert D, Thomas WK, Tucker A, Oakley TH, etal . The ecoresponsive genome of Daphnia pulex . Science . 331 . 6017 . 555–61 . February 2011 . 21292972 . 3529199 . 10.1126/science.1197761 . 2011Sci...331..555C .
- Hales KG, Korey CA, Larracuente AM, Roberts DM . Genetics on the Fly: A Primer on the Drosophila Model System . Genetics . 201 . 3 . 815–42 . November 2015 . 26564900 . 4649653 . 10.1534/genetics.115.183392 .
- Howe K, Clark MD, Torroja CF, Torrance J, Berthelot C, Muffato M, etal . The zebrafish reference genome sequence and its relationship to the human genome . Nature . 496 . 7446 . 498–503 . April 2013 . 23594743 . 3703927 . 10.1038/nature12111 . 2013Natur.496..498H .
- Swarbreck D, Wilks C, Lamesch P, Berardini TZ, Garcia-Hernandez M, Foerster H, Li D, Meyer T, Muller R, Ploetz L, Radenbaugh A, Singh S, Swing V, Tissier C, Zhang P, Huala E . 6 . The Arabidopsis Information Resource (TAIR): gene structure and function annotation . Nucleic Acids Research . 36 . Database issue . D1009-14 . January 2008 . 17986450 . 2238962 . 10.1093/nar/gkm965 .
- Sterck L, Rombauts S, Vandepoele K, Rouzé P, Van de Peer Y . How many genes are there in plants (... and why are they there)? . Current Opinion in Plant Biology . 10 . 2 . 199–203 . April 2007 . 17289424 . 10.1016/j.pbi.2007.01.004 .
- Nagasaki H, Arita M, Nishizawa T, Suwa M, Gotoh O . Species-specific variation of alternative splicing and transcriptional initiation in six eukaryotes . Gene . 364 . 53–62 . December 2005 . 16219431 . 10.1016/j.gene.2005.07.027 .
- Wray GA, Hahn MW, Abouheif E, Balhoff JP, Pizer M, Rockman MV, Romano LA . The evolution of transcriptional regulation in eukaryotes . Molecular Biology and Evolution . 20 . 9 . 1377–419 . September 2003 . 12777501 . 10.1093/molbev/msg140 . free .
- Gaiti F, Calcino AD, Tanurdžić M, Degnan BM . Origin and evolution of the metazoan non-coding regulatory genome . Developmental Biology . 427 . 2 . 193–202 . July 2017 . 27880868 . 10.1016/j.ydbio.2016.11.013 . free .
- Leone S, Santoro R . Challenges in the analysis of long noncoding RNA functionality . FEBS Letters . 590 . 15 . 2342–53 . August 2016 . 27417130 . 10.1002/1873-3468.12308 . 19766152 . free .
- Szathmáry E, Jordán F, Pál C . Molecular biology and evolution. Can genes explain biological complexity? . Science . 292 . 5520 . 1315–6 . May 2001 . 11360989 . 10.1126/science.1060852 . 86104866 .
- McShea DW . Perspective Metazoan Complexity and Evolution: Is There a Trend? . 29590466 . Evolution; International Journal of Organic Evolution . 50 . 2 . 477–492 . April 1996 . 28568940 . 10.1111/j.1558-5646.1996.tb03861.x .
- Jacob F . 29756896 . Evolution and tinkering . Science . 196 . 4295 . 1161–6 . June 1977 . 860134 . 10.1126/science.860134 . 1977Sci...196.1161J .