Weighted correlation network analysis, also known as weighted gene co-expression network analysis (WGCNA), is a widely used data mining method especially for studying biological networks based on pairwise correlations between variables. While it can be applied to most high-dimensional data sets, it has been most widely used in genomic applications. It allows one to define modules (clusters), intramodular hubs, and network nodes with regard to module membership, to study the relationships between co-expression modules, and to compare the network topology of different networks (differential network analysis). WGCNA can be used as a data reduction technique (related to oblique factor analysis), as a clustering method (fuzzy clustering), as a feature selection method (e.g. as gene screening method), as a framework for integrating complementary (genomic) data (based on weighted correlations between quantitative variables), and as a data exploratory technique. Although WGCNA incorporates traditional data exploratory techniques, its intuitive network language and analysis framework transcend any standard analysis technique. Since it uses network methodology and is well suited for integrating complementary genomic data sets, it can be interpreted as systems biologic or systems genetic data analysis method. By selecting intramodular hubs in consensus modules, WGCNA also gives rise to network based meta analysis techniques.
The WGCNA method was developed by Steve Horvath, a professor of human genetics at the David Geffen School of Medicine at UCLA and of biostatistics at the UCLA Fielding School of Public Health and his colleagues at UCLA, and (former) lab members (in particular Peter Langfelder, Bin Zhang, Jun Dong). Much of the work arose from collaborations with applied researchers. In particular, weighted correlation networks were developed in joint discussions with cancer researchers Paul Mischel, Stanley F. Nelson, and neuroscientists Daniel H. Geschwind, Michael C. Oldham, according to the acknowledgement section in.
A weighted correlation network can be interpreted as special case of a weighted network, dependency network or correlation network. Weighted correlation network analysis can be attractive for the following reasons:
First, one defines a gene co-expression similarity measure which is used to define the network. We denote the gene co-expression similarity measure of a pair of genes i and j by
sij
unsigned | |
s | |
ij |
=|cor(xi,xj)|
where gene expression profiles
xi
xj
xi
xj
signed | |
s | |
ij |
=0.5+0.5cor(xi,xj)
As the unsigned measure
unsigned | |
s | |
ij |
signed | |
s | |
ij |
cor(xi,xj)=-1
Next, an adjacency matrix (network),
A=[aij]
A
S=[sij]
S
sij>\tau
,
where the power
\beta
\beta=6
\beta=12
\beta
\beta
Since
log(aij)=\betalog(sij)
\beta
A major step in the module centric analysis is to cluster genes into network modules using a network proximity measure. Roughly speaking, a pair of genes has a high proximity if it is closely interconnected. By convention, the maximal proximity between two genes is 1 and the minimum proximity is 0. Typically, WGCNA uses the topological overlap measure (TOM) as proximity. which can also be defined for weighted networks. The TOM combines the adjacency of two genes and the connection strengths these two genes share with other "third party" genes. The TOM is a highly robust measure of network interconnectedness (proximity). This proximity is used as input of average linkage hierarchical clustering. Modules are defined as branches of the resulting cluster tree using the dynamic branch cutting approach.Next the genes inside a given module are summarized with the module eigengene, which can be considered as the best summary of the standardized module expression data. The module eigengene of a given module is defined as the first principal component of the standardized expression profiles. Eigengenes define robust biomarkers, and can be used as features in complex machine learning models such as Bayesian networks.[1] To find modules that relate to a clinical trait of interest, module eigengenes are correlated with the clinical trait of interest, which gives rise to an eigengene significance measure. Eigengenes can be used as features in more complex predictive models including decision trees and Bayesian networks. One can also construct co-expression networks between module eigengenes (eigengene networks), i.e. networks whose nodes are modules.To identify intramodular hub genes inside a given module, one can use two types of connectivity measures. The first, referred to as
kMEi=cor(xi,ME)
Zsummary
WGCNA has been widely used for analyzing gene expression data (i.e. transcriptional data), e.g. to find intramodular hub genes. Such as, WGCNA study reveals novel transcription factors are associated with Bisphenol A (BPA) dose-response.[2]
It is often used as data reduction step in systems genetic applications where modules are represented by "module eigengenes" e.g. Module eigengenes can be used to correlate modules with clinical traits. Eigengene networks are coexpression networks between module eigengenes (i.e. networks whose nodes are modules) .WGCNA is widely used in neuroscientific applications, e.g. and for analyzing genomic data including microarray data,[3] single cell RNA-Seq data[4] DNA methylation data, miRNA data, peptide counts and microbiota data (16S rRNA gene sequencing). Other applications include brain imaging data, e.g. functional MRI data.
The WGCNA R software packageprovides functions for carrying out all aspects of weighted network analysis (module construction, hub gene selection, module preservation statistics, differential network analysis, network statistics). The WGCNA package is available from the Comprehensive R Archive Network (CRAN), the standard repository forR add-on packages.