Phenome-wide association study explained
In genetics and genetic epidemiology, a phenome-wide association study, abbreviated PheWAS, is a study design in which the association between single-nucleotide polymorphisms or other types of DNA variants is tested across a large number of different phenotypes.[1] The aim of PheWAS studies (or PheWASs) is to examine the causal linkage between known sequence differences and any type of trait, including molecular, biochemical, cellular, and especially clinical diagnoses and outcomes.[2] [3] [4] It is a complementary approach to the genome-wide association study, or GWAS, methodology.[5] A fundamental difference between GWAS and PheWAS designs is the direction of inference: in a PheWAS it is from exposure (the DNA variant) to many possible outcomes, that is, from SNPs to differences in phenotypes and disease risk. In a GWAS, the polarity of analysis is from one or a few phenotypes to many possible DNA variants. The approach has proven useful in rediscovering previously reported genotype-phenotype associations, as well as in identifying new ones.[6]
The PheWAS approach was originally developed due to the widespread availability of both anonymized human clinical electronic health record (EHR) data and matched genotype data, using phenotypes defined by groupings of (ICD) codes called phecodes.[7] Massive genome and phenome data sets for model organisms were being assembled have also proved effective for PheWAS.[8] PheWASs have also been conducted using data from existing epidemiological studies. In 2010, a proof-of-concept PheWAS study was published based on EHR billing codes from a single study site.[9] Though this study was generally underpowered, its results suggested the potential existence of new associations between multiple phenotypes, possibly due to a common underlying cause. This paper also coined the abbreviation "PheWAS".[10] As of 2019, PheWAS in the EHR has been conducted using ICD-9-CM,[11] ICD-10, and ICD-10-CM[12] diagnosis codes.
Methods
PheWAS initially started from the growing use of EMR (electronic medical record) for clinical practice and patient care. One of the main components of EMR system is the International Classification of Disease version 9-CM (ICD9) codes, used as a tool for medical billing record. This system includes information of 14,000 diseases binned into different hierarchy codes. These phenotypic information is the basis of the PheWAS study, which associates a genetic variant (or a combination of variants) with a wide range of phenotypes.
Most common PheWAS studies would divide its cohort into two groups: individuals who did not have a specific ICD9 code are treated as “controls” while individuals who has an ICD9 code associated with them are considered “cases”.[13] Starting from the given genetic variant, a PheWAS would systematically perform genetic variant (typically a SNP) analysis to identify how a particular genotype would be associated to a phenotype. From the variant data, PheWAS calculates their genotype distribution and the chi-squared distribution, followed by Fisher's exact test to calculate the P-value, identifying how relevant a genotype would be to a certain phenotype of interest from the EMR.[14] Often times, Bonferroni correction is then applied to take into consideration the multiple comparisons done while calculating the P-value.
Proof of Concept
The first study of PheWAS was done on 6000 European-American population with 5 SNPs of interest picked for validation: rs1333049, rs2200733, rs3135388, rs6457620, and rs1333049. Quality control was done by examining marker and sample genotyping efficiency, allele frequency calculations, and Hardy-Weinberg equilibrium tests.
This initial PheWAS aim to examine the impact of genetic variants across various phenotypes. Since the ICD9 was not specifically designed for research purposes, this PheWAS devised a new way to simplify the code for genetic studies. Specifically, three modifications were made to the ICD9:
- First, they combine three-digit codes from diseases that arise from the same or similar origin. For example, tuberculosis has three subtypes and all three are merged to one case group of 010.
- Secondly, the addition of a fourth digit identifier for phenotypes that are clinically distinct, but are categorized to be the same. An instance would be Type I and Type II diabetes, two clinically distinct phenotypes that fall under ICD9 code of ‘250’. An additional fourth digit will be added to differentiate the two phenotypes.
- Lastly, codes that are deemed to be useless for genotypic-phenotypic analysis are ignored. Cases such as foreign object contamination or non-specific symptoms / non-specific laboratory result would fall under this category.
As one example of its successes, this PheWAS show evidence of strong association between rs3135388 and multiple sclerosis (MS), which was a previously studied association. Twenty-two other diseases also demonstrated significant associations with P < 0.05.
Applications
Pleiotropy Study
One of the main advantages of the PheWAS study is its potential to identify genomic variants with pleiotropic properties. Understanding cross-phenotype (CP) associations, where one genetic variation can affect two or more independent phenotypes, is the key to understanding the pleiotropic effect. The pleiotropic effect study was done by first obtaining the summary of genotype and phenotype data from the Population Architecture using Genomics and Epidemiology (PAGE) study sites. After several quality control and data organization steps, either the standard logistic or linear regression analysis is performed depending on the phenotypic information. Subsequently, all continuous phenotypes are log-transformed before the association between the SNPs and the transformed phenotypes is finally calculated.
Generally, there are two types of results from a PheWAS study:
- A result can be considered 'expected' if a genotype-phenotype association has previously been observed and reported. These cases can validate the legitimacy of PheWAS and act as a positive control. For instance, diabetes-related phenotypes like hyperinsulinemia, obesity, and fasting glucose level are shown to be associated with phenotypes of type II diabetes from PheWAS.
- A 'novel' result would be of more interest since it demonstrates the power of PheWAS for discovering new associations that have not been reported before. If two disease phenotypes are not previously-known as related and a single SNP is shown to give rise to these two phenotypes, it is a good indication that a pleiotropic effect is present. An example of this pleiotropic discovery is a SNP that's already known to be associated with diabetes turns out to also be significantly associated with arthritis.
Even though novel associations between phenotype is discovered, further biological studies is necessary to determine whether it actually reflects the system.[15]
Drug Response Variability
A PheWAS has also successfully highlights discrepancies in drug response among individuals. A quantitative PheWAS study was done to identify variation in thiopurine response.[16] The EMR stores quantitative value of IBD patient's TPMT (thiopurine S-methyltransferase) activity, which then allow researchers to split the patients it into three categories: low TPMTa, normal TPMTa, and very high TPMTa. It was found that cohorts with very high TPMTa level are associated with diabetes mellitus and iron-deficiency anemia, which further shows that thiopurine therapy are three times more likely to fail in patients with very high TPMTa. Performing thiopurine therapy on patient with very high TPMTa level may increase the frequency of anemia episode. This PheWAS finding may further the progress of personalized treatment based on patient's measurement. Instead of treating IBD patients with the conventional thiopurine treatment, patient may benefit more from more intensive therapy or other approaches.
Clinical Significance
A clinical test has been done by utilizing PheWAS on HIV patients, obtained from the AIDS Clinical Trial Group (ACTG) datasets from 27 different laboratories. Identifying accuracy between PheWAS and clinical trials is important before pushing PheWAS further for making clinical decisions. Forty-seven percent of the previously-reported associations were successfully reproduced in this study, demonstrating this PheWAS' capability for working with clinical data. Additionally, several pleiotropic effects were discovered using this clinical data. Specifically, a block of SNPs on chromosome 7 were associated to both LDL-C phenotypes and the total cholesterol level according to this study. For clinical relevance, more research need to be done to validate the pleiotropic effect obtained from PheWAS.
Limitations
Despite the promising potentials, PheWAS has some potential limitations:
- Statistical limitation: Bonferroni correction is potentially not addressing the entirety of the dataset (it may be prohibitively conservative).
- ICD9-notation limitation: not every phenotype can be represented in an ICD9 code. One ICD9 code can have a high variability, making it impossible to assess the validity of phenotypes that are coded to ICD9 for all the patients.
- Association limitation: upon performing a regression analysis for variant-phenotype association, covariates like age and sex may contribute in the resulting phenotypes. A simple regression analysis would fail to take into account these covariates. Therefore, a follow-up phenotype-specific validation need to be done, which ideally would include information about the patient's covariates.
- Every novel pleiotropy discovery will need further biological validation to ensure that the data-driven association is not a mere statistical coincident.
External links
Notes and References
- Pendergrass SA, Brown-Gentry K, Dudek SM, Torstenson ES, Ambite JL, Avery CL, Buyske S, Cai C, Fesinmeyer MD, Haiman C, Heiss G, Hindorff LA, Hsu CN, Jackson RD, Kooperberg C, Le Marchand L, Lin Y, Matise TC, Moreland L, Monroe K, Reiner AP, Wallace R, Wilkens LR, Crawford DC, Ritchie MD . 6 . The use of phenome-wide association studies (PheWAS) for exploration of novel genotype-phenotype relationships and pleiotropy discovery . Genetic Epidemiology . 35 . 5 . 410–422 . July 2011 . 21594894 . 3116446 . 10.1002/gepi.20589 .
- Denny JC, Bastarache L, Roden DM . Phenome-Wide Association Studies as a Tool to Advance Precision Medicine . Annual Review of Genomics and Human Genetics . 17 . 353–373 . August 2016 . 27147087 . 5480096 . 10.1146/annurev-genom-090314-024956 .
- Bush WS, Oetjens MT, Crawford DC . Unravelling the human genome-phenome relationship using phenome-wide association studies . Nature Reviews. Genetics . 17 . 3 . 129–145 . March 2016 . 26875678 . 10.1038/nrg.2015.36 . 32967414 .
- Wang X, Pandey AK, Mulligan MK, Williams EG, Mozhui K, Li Z, Jovaisaite V, Quarles LD, Xiao Z, Huang J, Capra JA, Chen Z, Taylor WL, Bastarache L, Niu X, Pollard KS, Ciobanu DC, Reznik AO, Tishkov AV, Zhulin IB, Peng J, Nelson SF, Denny JC, Auwerx J, Lu L, Williams RW . 6 . Joint mouse-human phenome-wide association to test gene function and disease risk . Nature Communications . 7 . 10464 . February 2016 . 26833085 . 4740880 . 10.1038/ncomms10464 . 2016NatCo...710464W .
- Hebbring SJ . The challenges, advantages and future of phenome-wide association studies . Immunology . 141 . 2 . 157–165 . February 2014 . 24147732 . 3904236 . 10.1111/imm.12195 .
- Cronin RM, Field JR, Bradford Y, Shaffer CM, Carroll RJ, Mosley JD, Bastarache L, Edwards TL, Hebbring SJ, Lin S, Hindorff LA, Crane PK, Pendergrass SA, Ritchie MD, Crawford DC, Pathak J, Bielinski SJ, Carrell DS, Crosslin DR, Ledbetter DH, Carey DJ, Tromp G, Williams MS, Larson EB, Jarvik GP, Peissig PL, Brilliant MH, McCarty CA, Chute CG, Kullo IJ, Bottinger E, Chisholm R, Smith ME, Roden DM, Denny JC . 6 . Phenome-wide association studies demonstrating pleiotropy of genetic variants within FTO with and without adjustment for body mass index . English . Frontiers in Genetics . 5 . 250 . 2014 . 25177340 . 4134007 . 10.3389/fgene.2014.00250 . free .
- Bastarache L . Using Phecodes for Research with the Electronic Health Record: From PheWAS to PheRS . Annual Review of Biomedical Data Science . 4 . 1–19 . July 2021 . 34465180 . 9307256 . 10.1146/annurev-biodatasci-122320-112352 .
- Li H, Wang X, Rukina D, Huang Q, Lin T, Sorrentino V, Zhang H, Bou Sleiman M, Arends D, McDaid A, Luan P, Ziari N, Velázquez-Villegas LA, Gariani K, Kutalik Z, Schoonjans K, Radcliffe RA, Prins P, Morgenthaler S, Williams RW, Auwerx J . 6 . An Integrated Systems Genetics and Omics Toolkit to Probe Gene Function . Cell Systems . 6 . 1 . 90–102.e4 . January 2018 . 29199021 . 10.1016/j.cels.2017.10.016 . free .
- Denny JC, Ritchie MD, Basford MA, Pulley JM, Bastarache L, Brown-Gentry K, Wang D, Masys DR, Roden DM, Crawford DC . 6 . PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations . Bioinformatics . 26 . 9 . 1205–1210 . May 2010 . 20335276 . 2859132 . 10.1093/bioinformatics/btq126 .
- Roden DM . Phenome-wide association studies: a new method for functional genomics in humans . The Journal of Physiology . 595 . 12 . 4109–4115 . June 2017 . 28229460 . 5471509 . 10.1113/jp273122 .
- Wei WQ, Bastarache LA, Carroll RJ, Marlo JE, Osterman TJ, Gamazon ER, Cox NJ, Roden DM, Denny JC . 6 . Evaluating phecodes, clinical classification software, and ICD-9-CM codes for phenome-wide association studies in the electronic health record . PLOS ONE . 12 . 7 . e0175508 . 2017 . 28686612 . 5501393 . 10.1371/journal.pone.0175508 . free . 2017PLoSO..1275508W .
- Wu P, Gifford A, Meng X, Li X, Campbell H, Varley T, Zhao J, Carroll R, Bastarache L, Denny JC, Theodoratou E, Wei WQ . 6 . Mapping ICD-10 and ICD-10-CM Codes to Phecodes: Workflow Development and Initial Evaluation . JMIR Medical Informatics . 7 . 4 . e14325 . November 2019 . 31553307 . 6911227 . 10.2196/14325 . free .
- Pendergrass SA, Ritchie MD . Phenome-Wide Association Studies: Leveraging Comprehensive Phenotypic and Genotypic Data for Discovery . Current Genetic Medicine Reports . 3 . 2 . 92–100 . June 2015 . 26146598 . 4489156 . 10.1007/s40142-015-0067-9 .
- Robinson JR, Denny JC, Roden DM, Van Driest SL . Genome-wide and Phenome-wide Approaches to Understand Variable Drug Actions in Electronic Health Records . Clinical and Translational Science . 11 . 2 . 112–122 . March 2018 . 29148204 . 5866959 . 10.1111/cts.12522 .
- Moore CB, Verma A, Pendergrass S, Verma SS, Johnson DH, Daar ES, Gulick RM, Haubrich R, Robbins GK, Ritchie MD, Haas DW . 6 . Phenome-wide Association Study Relating Pretreatment Laboratory Parameters With Human Genetic Variants in AIDS Clinical Trials Group Protocols . Open Forum Infectious Diseases . 2 . 1 . ofu113 . January 2015 . 25884002 . 4396430 . 10.1093/ofid/ofu113 .
- Neuraz A, Chouchana L, Malamut G, Le Beller C, Roche D, Beaune P, Degoulet P, Burgun A, Loriot MA, Avillach P . 6 . Phenome-wide association studies on a quantitative trait: application to TPMT enzyme activity and thiopurine therapy in pharmacogenomics . PLOS Computational Biology . 9 . 12 . e1003405 . December 2013 . 24385893 . 3873228 . 10.1371/journal.pcbi.1003405 . 2013PLSCB...9E3405N . free .