Candidate gene explained

The candidate gene approach to conducting genetic association studies focuses on associations between genetic variation within pre-specified genes of interest, and phenotypes or disease states. This is in contrast to genome-wide association studies (GWAS), which is a hypothesis-free approach that scans the entire genome for associations between common genetic variants (typically SNPs) and traits of interest. Candidate genes are most often selected for study based on a priori knowledge of the gene's biological functional impact on the trait or disease in question.^[1] ^[2] The rationale behind focusing on allelic variation in specific, biologically relevant regions of the genome is that certain alleles within a gene may directly impact the function of the gene in question and lead to variation in the phenotype or disease state being investigated. This approach often uses the case-control study design to try to answer the question, "Is one allele of a candidate gene more frequently seen in subjects with the disease than in subjects without the disease?"^[1] Candidate genes hypothesized to be associated with complex traits have generally not been replicated by subsequent GWASs^[3] ^[4] ^[5] ^[6] or highly powered replication attempts.^[7] ^[8] The failure of candidate gene studies to shed light on the specific genes underlying such traits has been ascribed to insufficient statistical power, low prior probability that scientists can correctly guess a specific allele within a specific gene that is related to a trait, poor methodological practices, and data dredging.^[9] ^[10]

Selection

Suitable candidate genes are generally selected based on known biological, physiological, or functional relevance to the disease in question. This approach is limited by its reliance on existing knowledge about known or theoretical biology of disease. However, molecular tools are allowing insight into disease mechanisms and pinpointing potential regions of interest in the genome. Genome-wide association studies (GWAS) and quantitative trait locus (QTL) mapping examine common variation across the entire genome, and as such can detect a new region of interest that is in or near a potential candidate gene. Microarray data allow researchers to examine differential gene expression between cases and controls, and can help pinpoint new potential genes of interest.^[11]

The great variability between organisms can sometimes make it difficult to distinguish normal variation in single-nucleotide polymorphisms (SNP) from a candidate gene with disease-associated variation.^[12] In analyzing large amounts of data, there are several other factors that can help lead to the most probable variant. These factors include priorities in SNPs, relative risk of functional change in genes, and linkage disequilibrium among SNPs.

In addition, the availability of genetic information through online databases enables researchers to mine existing data and web-based resources for new candidate gene targets.^[13] Many online databases are available to research genes across species.

Gene is one such database that allows access to information about phenotypes, pathways, and variations of many genes across species.
When examining functionality between genes in pathways, the Gene Ontology Consortium can help map these relationships. The GO Project describes gene products in three different ways via a species-independent manner: biological processes, cellular components, and molecular functions. Using this information can further a priori knowledge of a pathway and thus help to choose the most likely candidate gene involved.
Topp Gene is another useful database that allows users to prioritize candidate genes using functional annotations or network analysis.^[14] ToppGene aids researchers in selecting a subset of likely candidate genes from larger sets of candidate genes, likely discovered through high-throughput genome technologies.
Lynx is an integrated systems biology platform that allows users to prioritize candidate genes using both functional annotations and gene pairwise association networks.^[15] Lynx provides two sophisticated prioritization tools, Cheetoh^[16] and PINTA,^[17] to help users select candidate genes from the whole genome based on the relevance to input gene list which can be a list of known genes contributing to certain disease or phenotype, or differentially expressed gene from next-generation RNA sequencing technology.

Prior to the candidate-gene approach

Before the candidate-gene approach was fully developed, various other methods were used to identify genes linked to disease-states. These methods studied genetic linkage and positional cloning through the use of a genetic screen, and were effective at identifying relative risk genes in Mendelian diseases.^[18] ^[19] However, these methods are not as beneficial when studying complex diseases for several reasons:^[18]

Complex diseases tend to vary in both age of onset and severity. This can be due to variation in penetrance and expressivity.^[20] For most human diseases, variable expressivity of the disease phenotype is the norm. This makes choosing one specific age group or phenotypic marker more difficult to select for study.^[18]
The origins of complex disease involve many biological pathways, some of which may differ between disease phenotypes.^[18]
Most importantly, complex diseases often illustrate genetic heterogeneity – multiple genes can be found that interact and produce one disease state. Oftentimes, each single gene is partially responsible for the phenotype produced and overall risk for the disorder.^[18] ^[21]

Criticisms

A study of candidate genes seeks to balance the use of data while attempting to minimize the chance of creating false positive or negative results.^[18] Because this balance can often be difficult, there are several criticisms of the candidate gene approach that are important to understand before beginning such a study. For instance, the candidate-gene approach has been shown to produce a high rate of false positives,^[22] which requires that the findings of single genetic associations be treated with great caution.^[23]

One critique is that findings of association within candidate-gene studies have not been easily replicated in follow up studies.^[24] For instance, a recent investigation on 18 well-studied candidate genes for depression (10 publications or more each) failed to identify any significant association with depression, despite using samples orders of magnitude larger than those from the original publications.^[25] In addition to statistical issues (e.g. underpowered studies), population stratification has often been blamed for this inconsistency; therefore caution must also be taken in regards to what criteria define a certain phenotype, as well as other variations in design study.^[18]

Additionally, because these studies incorporate a priori knowledge, some critics argue that our knowledge is not sufficient to make valid predictions. Therefore, results gained from these 'hypothesis-driven' approaches are dependent on the ability to select plausible candidates from the genome, rather than use a hypothesis-free approach.

Use in research studies

One of the earliest successes using the candidate gene approach was finding a single base mutation in the non-coding region of the APOC3 (apolipoprotein C3 gene) that associated with higher risks of hypertriglyceridemia and atherosclerosis.^[26] In a study by Kim et al., genes linked to the obesity trait in both pigs and humans were discovered using comparative genomics and chromosomal heritability.^[27] By using these two methods, the researchers were able to overcome the criticism that candidate gene studies are solely focused on prior knowledge. Comparative genomics was completed by examining both human and pig quantitative trait loci through a method known as genome-wide complex trait analysis (GCTA), which allowed the researchers to then map genetic variance to specific chromosomes. This allowed the parameter of heritability to provide understanding of where phenotypic variation was on specific chromosomal regions, thus extending to candidate markers and genes within these regions. Other studies may also use computational methods to find candidate genes in a widespread, complementary way, such as one study by Tiffin et al. studying genes linked to type 2 diabetes.^[12]

Many studies have similarly used candidate genes as part of a multi-disciplinary approach to examining a trait or phenotype. One example of manipulating candidate genes can be seen in a study completed by Martin E. Feder on heat-shock proteins and their function in Drosophila melanogaster.^[28] Feder designed a holistic approach to study Hsp70, a candidate gene that was hypothesized to play a role in how an organism adapted to stress. D. melanogaster is a highly useful model organism for studying this trait due to the way it can support a diverse number of genetic approaches for studying a candidate gene. The different approaches this study took included both genetically modifying the candidate gene (using site-specific homologous recombination and the expression of various proteins), as well as examining the natural variation of Hsp70. He concluded that the results of these studies gave a multi-faceted view of Hsp70. The manipulation of candidate genes is also seen in Caspar C. Chater's study of the origin and function of stomata in Physcomitrella patens, a moss. PpSMF1, PpSMF2 and PpSCRM1 were the three candidate genes that were knocked down by homologous recombination to see any changes in the development of stomata. With the knock down experiment, Chater observed that PpSMF1 and PpSCRM1 were responsible for stomata development in P. patens.^[29] By engineering and modifying these candidate genes, they were able to confirm the ways in which this gene was linked to a change phenotype. Understanding the natural and historical context in which these phenotypes operate by examining the natural genome structure complemented this.

External links

Notes and References

Kwon JM, Goate AM . The candidate gene approach . Alcohol Research & Health . 24 . 3 . 164–168 . 2000 . 11199286 . 6709736 .
Zhu M, Zhao S . Candidate gene identification approach: progress and challenges . International Journal of Biological Sciences . 3 . 7 . 420–427 . October 2007 . 17998950 . 2043166 . 10.7150/ijbs.3.420 .
Johnson EC, Border R, Melroy-Greif WE, de Leeuw CA, Ehringer MA, Keller MC . No Evidence That Schizophrenia Candidate Genes Are More Associated With Schizophrenia Than Noncandidate Genes . Biological Psychiatry . 82 . 10 . 702–708 . November 2017 . 28823710 . 5643230 . 10.1016/j.biopsych.2017.06.033 .
Chabris CF, Hebert BM, Benjamin DJ, Beauchamp J, Cesarini D, van der Loos M, Johannesson M, Magnusson PK, Lichtenstein P, Atwood CS, Freese J, Hauser TS, Hauser RM, Christakis N, Laibson D . 6 . Most reported genetic associations with general intelligence are probably false positives . Psychological Science . 23 . 11 . 1314–1323 . 2012-09-24 . 23012269 . 3498585 . 10.1177/0956797611435528 .
Bosker FJ, Hartman CA, Nolte IM, Prins BP, Terpstra P, Posthuma D, van Veen T, Willemsen G, DeRijk RH, de Geus EJ, Hoogendijk WJ, Sullivan PF, Penninx BW, Boomsma DI, Snieder H, Nolen WA . 6 . Poor replication of candidate genes for major depressive disorder using genome-wide association data . Molecular Psychiatry . 16 . 5 . 516–532 . May 2011 . 20351714 . 10.1038/mp.2010.38 . free .
Border R, Johnson EC, Evans LM, Smolen A, Berley N, Sullivan PF, Keller MC . No Support for Historical Candidate Gene or Candidate Gene-by-Interaction Hypotheses for Major Depression Across Multiple Large Samples . The American Journal of Psychiatry . 176 . 5 . 376–387 . May 2019 . 30845820 . 6548317 . 10.1176/appi.ajp.2018.18070881 .
Duncan LE, Keller MC . A critical review of the first 10 years of candidate gene-by-environment interaction research in psychiatry . The American Journal of Psychiatry . 168 . 10 . 1041–1049 . October 2011 . 21890791 . 3222234 . 10.1176/appi.ajp.2011.11020191 .
Culverhouse RC, Saccone NL, Horton AC, Ma Y, Anstey KJ, Banaschewski T, Burmeister M, Cohen-Woods S, Etain B, Fisher HL, Goldman N, Guillaume S, Horwood J, Juhasz G, Lester KJ, Mandelli L, Middeldorp CM, Olié E, Villafuerte S, Air TM, Araya R, Bowes L, Burns R, Byrne EM, Coffey C, Coventry WL, Gawronski KA, Glei D, Hatzimanolis A, Hottenga JJ, Jaussent I, Jawahar C, Jennen-Steinmetz C, Kramer JR, Lajnef M, Little K, Zu Schwabedissen HM, Nauck M, Nederhof E, Petschner P, Peyrot WJ, Schwahn C, Sinnamon G, Stacey D, Tian Y, Toben C, Van der Auwera S, Wainwright N, Wang JC, Willemsen G, Anderson IM, Arolt V, Åslund C, Bagdy G, Baune BT, Bellivier F, Boomsma DI, Courtet P, Dannlowski U, de Geus EJ, Deakin JF, Easteal S, Eley T, Fergusson DM, Goate AM, Gonda X, Grabe HJ, Holzman C, Johnson EO, Kennedy M, Laucht M, Martin NG, Munafò MR, Nilsson KW, Oldehinkel AJ, Olsson CA, Ormel J, Otte C, Patton GC, Penninx BW, Ritchie K, Sarchiapone M, Scheid JM, Serretti A, Smit JH, Stefanis NC, Surtees PG, Völzke H, Weinstein M, Whooley M, Nurnberger JI, Breslau N, Bierut LJ . 6 . Collaborative meta-analysis finds no evidence of a strong interaction between stress and 5-HTTLPR genotype contributing to the development of depression . Molecular Psychiatry . 23 . 1 . 133–142 . January 2018 . 28373689 . 5628077 . 10.1038/mp.2017.44 .
Farrell MS, Werge T, Sklar P, Owen MJ, Ophoff RA, O'Donovan MC, Corvin A, Cichon S, Sullivan PF . 6 . Evaluating historical candidate genes for schizophrenia . Molecular Psychiatry . 20 . 5 . 555–562 . May 2015 . 25754081 . 4414705 . 10.1038/mp.2015.16 .
Risch N, Herrell R, Lehner T, Liang KY, Eaves L, Hoh J, Griem A, Kovacs M, Ott J, Merikangas KR . 6 . Interaction between the serotonin transporter gene (5-HTTLPR), stressful life events, and risk of depression: a meta-analysis . JAMA . 301 . 23 . 2462–71 . June 2009 . 19531786 . 2938776 . 10.1001/jama.2009.878 .
Wayne ML, McIntyre LM . Combining mapping and arraying: An approach to candidate gene identification . Proceedings of the National Academy of Sciences of the United States of America . 99 . 23 . 14903–14906 . November 2002 . 12415114 . 137517 . 10.1073/pnas.222549199 . free . 2002PNAS...9914903W .
Tiffin N, Adie E, Turner F, Brunner HG, van Driel MA, Oti M, Lopez-Bigas N, Ouzounis C, Perez-Iratxeta C, Andrade-Navarro MA, Adeyemo A, Patti ME, Semple CA, Hide W . 6 . Computational disease gene identification: a concert of methods prioritizes type 2 diabetes and obesity candidate genes . Nucleic Acids Research . 34 . 10 . 3067–3081 . 2006 . 16757574 . 1475747 . 10.1093/nar/gkl381 .
Zhu M, Zhao S . Candidate gene identification approach: progress and challenges . International Journal of Biological Sciences . 3 . 7 . 420–427 . October 2007 . 17998950 . 2043166 . 10.7150/ijbs.3.420 .
Chen J, Bardes EE, Aronow BJ, Jegga AG . ToppGene Suite for gene list enrichment analysis and candidate gene prioritization . Nucleic Acids Research . 37 . Web Server issue . W305–W311 . July 2009 . 19465376 . 2703978 . 10.1093/nar/gkp427 .
Sulakhe D, Balasubramanian S, Xie B, Feng B, Taylor A, Wang S, Berrocal E, Dave U, Xu J, Börnigen D, Gilliam TC, Maltsev N . 6 . Lynx: a database and knowledge extraction engine for integrative medicine . Nucleic Acids Research . 42 . Database issue . D1007–D1012 . January 2014 . 24270788 . 3965040 . 10.1093/nar/gkt1166 .
Xie B, Agam G, Balasubramanian S, Xu J, Gilliam TC, Maltsev N, Börnigen D . Disease gene prioritization using network and feature . Journal of Computational Biology . 22 . 4 . 313–323 . April 2015 . 25844670 . 4808289 . 10.1089/cmb.2015.0001 .
Nitsch D, Gonçalves JP, Ojeda F, de Moor B, Moreau Y . Candidate gene prioritization by network analysis of differential expression using machine learning approaches . BMC Bioinformatics . 11 . 1 . 460 . September 2010 . 20840752 . 2945940 . 10.1186/1471-2105-11-460 . free .
Tabor HK, Risch NJ, Myers RM . Candidate-gene approaches for studying complex genetic traits: practical considerations . Nature Reviews. Genetics . 3 . 5 . 391–397 . May 2002 . 11988764 . 10.1038/nrg796 . 23314997 .
Teixeira LV, Lezirovitz K, Mandelbaum KL, Pereira LV, Perez AB. August 2011. Candidate gene linkage analysis indicates genetic heterogeneity in Marfan syndrome. Brazilian Journal of Medical and Biological Research . 44. 8. 793–800. 10.1590/s0100-879x2011007500095. 21789464. free.
Lobo I . Same genetic mutation, different genetic disease phenotype.. Nature Education. 2008. 1. 1. 64.
Gizer IR, Ficks C, Waldman ID . Candidate gene studies of ADHD: a meta-analytic review . Human Genetics . 126 . 1 . 51–90 . July 2009 . 19506906 . 10.1007/s00439-009-0694-x . 166017 .
Border . Richard . Johnson . Emma C. . Evans . Luke M. . Smolen . Andrew . Berley . Noah . Sullivan . Patrick F. . Keller . Matthew C. . 2019-05-01 . No Support for Historical Candidate Gene or Candidate Gene-by-Interaction Hypotheses for Major Depression Across Multiple Large Samples . American Journal of Psychiatry . 176 . 5 . 376–387 . 10.1176/appi.ajp.2018.18070881 . 0002-953X . 6548317 . 30845820.
Sullivan PF . Spurious genetic associations . Biological Psychiatry . 61 . 10 . 1121–1126 . May 2007 . 17346679 . 10.1016/j.biopsych.2006.11.010 . 35033987 .
Hutchison KE, Stallings M, McGeary J, Bryan A . Population stratification in the candidate gene study: fatal threat or red herring? . Psychological Bulletin . 130 . 1 . 66–79 . January 2004 . 14717650 . 10.1037/0033-2909.130.1.66 .
Border R, Johnson EC, Evans LM, Smolen A, Berley N, Sullivan PF, Keller MC . No Support for Historical Candidate Gene or Candidate Gene-by-Interaction Hypotheses for Major Depression Across Multiple Large Samples . The American Journal of Psychiatry . 176 . 5 . 376–387 . May 2019 . 30845820 . 6548317 . 10.1176/appi.ajp.2018.18070881 .
Rees A, Shoulders CC, Stocks J, Galton DJ, Baralle FE . DNA polymorphism adjacent to human apoprotein A-1 gene: relation to hypertriglyceridaemia . Lancet . 1 . 8322 . 444–446 . February 1983 . 6131168 . 10.1016/s0140-6736(83)91440-x . 29511911 .
Kim J, Lee T, Kim TH, Lee KT, Kim H. December 2012. An integrated approach of comparative genomics and heritability analysis of pig and human on obesity trait: evidence for candidate genes on human chromosome 2. BMC Genomics. 13. 711. 10.1186/1471-2164-13-711. 3562524. 23253381 . free .
Feder ME . Engineering Candidate Genes in Studies of Adaptation: The Heat-Shock Protein Hsp70 in Drosophila melanogaster . The American Naturalist . 154 . S1 . S55–S66 . July 1999 . 29586709 . 10.1086/303283 . 4394996 .
Chater CC, Caine RS, Tomek M, Wallace S, Kamisugi Y, Cuming AC, Lang D, MacAlister CA, Casson S, Bergmann DC, Decker EL, Frank W, Gray JE, Fleming A, Reski R, Beerling DJ . 6 . Origin and function of stomata in the moss Physcomitrella patens . Nature Plants . 2 . 12 . 16179 . November 2016 . 27892923 . 5131878 . 10.1038/nplants.2016.179 .