DESeq2 explained

Author:Michael Love
Constantin Ahlmann-Eltze
Kwame Forbes
Simon Anders
Wolfgang Huber
Latest Release Version:1.40.2
Repo:DESeq2 on GitHub
Engines:-->
Operating System:Linux, macOS, Windows
Platform:R programming language
Genre:Bioinformatics
License:GNU Lesser General Public License
Website:DESeq2 on Bioconductor

DESeq2 is a software package in the field of bioinformatics and computational biology for the statistical programming language R. It is primarily employed for the analysis of high-throughput RNA sequencing (RNA-seq) data to identify differentially expressed genes between different experimental conditions. DESeq2 employs statistical methods to normalize and analyze RNA-seq data, making it a valuable tool for researchers studying gene expression patterns and regulation. It is available through the Bioconductor repository.

It was first presented in 2014.[1] As of September 2023, its use has been cited over 30,000 times.[2]

Features

One of the key steps in the analysis of RNA-seq data is data normalization.[3] DESeq2 employs the "size factor" normalization method, which adjusts for differences in sequencing depth between samples. This normalization ensures that the expression values of genes are comparable across samples, allowing for accurate identification of differentially expressed genes. In addition to size factor normalization, DESeq2 also employs a variance-stabilizing transformation, which further enhances the quality of the data by stabilizing the variance across different expression levels.[4] This combination of normalization techniques minimizes bias and improves the accuracy of differential expression analysis.

DESeq2 makes available negative binomial distribution models to account for the over-dispersion commonly observed in RNA-seq data.[5] This modeling approach takes into consideration the variability that is not adequately explained by a simple Poisson distribution. By incorporating the negative binomial distribution, DESeq2 accurately models the dispersion of gene expression counts and provides more reliable estimates of differential expression.

DESeq2 also offers an adaptive shrinkage procedure, known as the "apeglm" method, which is particularly useful when dealing with small sample sizes.[6] This technique effectively shrinks the log-fold changes of gene expression estimates, reducing the impact of extreme values and improving the stability of results. This is especially valuable for researchers working with limited biological replicates, as it helps to mitigate the problem of low statistical power.

Further, DESeq2 allows users to incorporate relevant covariates into their analyses. This feature enables researchers to account for potential confounding factors, such as batch effects or experimental conditions, that can influence gene expression. By including covariates in the analysis, DESeq2 offers a more accurate assessment of the true differential expression patterns in the data.

Use

DESeq2 is interfaced through R, via the bioconductor repository.[7] The repository provides comprehensive documentation and tutorials, making it accessible to a wide range of researchers.

Notes and References

  1. Love . Michael I . Huber . Wolfgang . Anders . Simon . Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 . Genome Biology . December 2014 . 15 . 12 . 550 . 10.1186/s13059-014-0550-8 . 25516281 . 4302049 . free .
  2. Citation Metrics . 2014 . University of Otago . 10.1186/s13059-014-0550-8 . 25516281 . free . Love . M. I. . Huber . W. . Anders . S. . Genome Biology . 15 . 12 . 550 . 4302049 .
  3. Evans . Ciaran . Hardin . Johanna . Stoebel . Daniel M . Selecting between-sample RNA-Seq normalization methods from the perspective of their assumptions . Briefings in Bioinformatics . 28 September 2018 . 19 . 5 . 776–792 . 10.1093/bib/bbx008. 28334202 . 6171491 .
  4. Web site: varianceStabilizingTransformation: Apply a variance stabilizing transformation (VST) to the... . rdrr.io . 28 September 2023 . https://web.archive.org/web/20230928130633/https://rdrr.io/bioc/DESeq2/man/varianceStabilizingTransformation.html . 28 September 2023.
  5. Web site: Gene-level differential expression analysis . HBC Training . 15 May 2020 . Github.io . 28 September 2023 . https://web.archive.org/web/20230928131519/https://hbctraining.github.io/DGE_workshop_salmon_online/lessons/01c_RNAseq_count_distribution.html . 28 September 2023.
  6. Chipman . Hugh A. . Kolaczyk . Eric D. . McCulloch . Robert E. . Adaptive Bayesian Wavelet Shrinkage . Journal of the American Statistical Association . December 1997 . 92 . 440 . 1413 . 10.2307/2965411. 2965411 .
  7. Web site: DESeq2: An Overview of a Popular RNA-Seq Analysis Package . pluto.bio . 27 September 2023 . https://web.archive.org/web/20230927154600/https://pluto.bio/blog/deseq2-an-overview-of-popular-rna-seq-analysis-package . 27 September 2023 . 18 October 2021.