Author: | Michael Love Constantin Ahlmann-Eltze Kwame Forbes Simon Anders Wolfgang Huber |
Latest Release Version: | 1.40.2 |
Repo: | DESeq2 on GitHub |
Engines: | --> |
Operating System: | Linux, macOS, Windows |
Platform: | R programming language |
Genre: | Bioinformatics |
License: | GNU Lesser General Public License |
Website: | DESeq2 on Bioconductor |
DESeq2 is a software package in the field of bioinformatics and computational biology for the statistical programming language R. It is primarily employed for the analysis of high-throughput RNA sequencing (RNA-seq) data to identify differentially expressed genes between different experimental conditions. DESeq2 employs statistical methods to normalize and analyze RNA-seq data, making it a valuable tool for researchers studying gene expression patterns and regulation. It is available through the Bioconductor repository.
It was first presented in 2014.[1] As of September 2023, its use has been cited over 30,000 times.[2]
One of the key steps in the analysis of RNA-seq data is data normalization.[3] DESeq2 employs the "size factor" normalization method, which adjusts for differences in sequencing depth between samples. This normalization ensures that the expression values of genes are comparable across samples, allowing for accurate identification of differentially expressed genes. In addition to size factor normalization, DESeq2 also employs a variance-stabilizing transformation, which further enhances the quality of the data by stabilizing the variance across different expression levels.[4] This combination of normalization techniques minimizes bias and improves the accuracy of differential expression analysis.
DESeq2 makes available negative binomial distribution models to account for the over-dispersion commonly observed in RNA-seq data.[5] This modeling approach takes into consideration the variability that is not adequately explained by a simple Poisson distribution. By incorporating the negative binomial distribution, DESeq2 accurately models the dispersion of gene expression counts and provides more reliable estimates of differential expression.
DESeq2 also offers an adaptive shrinkage procedure, known as the "apeglm" method, which is particularly useful when dealing with small sample sizes.[6] This technique effectively shrinks the log-fold changes of gene expression estimates, reducing the impact of extreme values and improving the stability of results. This is especially valuable for researchers working with limited biological replicates, as it helps to mitigate the problem of low statistical power.
Further, DESeq2 allows users to incorporate relevant covariates into their analyses. This feature enables researchers to account for potential confounding factors, such as batch effects or experimental conditions, that can influence gene expression. By including covariates in the analysis, DESeq2 offers a more accurate assessment of the true differential expression patterns in the data.
DESeq2 is interfaced through R, via the bioconductor repository.[7] The repository provides comprehensive documentation and tutorials, making it accessible to a wide range of researchers.