SnpEff | |
Author: | Pablo Cingolani |
Released: | 2012 |
Latest Release Version: | 5.2c |
License: | MIT |
Programming Language: | Java |
SnpEff is an open source tool that performs annotation on variants and predicts their effects on genes by using an interval forest approach. This program takes pre-determined variants listed in a data file that contains the nucleotide change and its position and predicts if the variants are deleterious. This program was first developed to predict effects of single nucleotide polymorphisms (SNPs) in Drosophila,.[1] As of July 2024, this SnpEff paper has been cited 10076 times. SnpEff has been used for various applications[2] [3] [4] – from personalized medicine,[5] to profiling bacteria.[6] This annotation and prediction software can be compared to ANNOVAR and Variant Effect Predictor, but each use different nomenclatures.[7] [8]
SnpEff has the capability to work on Windows, Unix or Mac systems, although the installation steps differ. For all systems, SnpEff is first downloaded as a ZIP file, decompressed [9] and then copy-pasted into the desired software (Windows) or requires an additional command line (Unix and Mac). Once the software is installed, the user inputs a VCF or TXT file into the tool kit that contains the tab-separated columns: Chromosome name, Position, Variant’s ID, Reference genome, Alternative, Quality score, Quality filter and Information.
The chromosome name and position columns describe where the variant is located – chromosome number and nucleotide position. If the variant has a previously determined name (example: rs34567), it goes in the ID column. The reference column provides the specific nucleotide in the reference genome – differentiations from the reference are noted in the Alternative section. How accurate the variant is will be the Quality column and its readout from Quality filters are included in the filter column. Any other genomic information is put in the INFO column, which is altered to display the output after running SnpEff.
The output in the INFO section includes: the effect of the variant (stop loss, stop gain, etc.), effect impact on gene (High, Moderate, Low or Modifier), functional class of the variant (nonsense, missense, frameshift etc.), codon change, amino acid change, amino acid length, gene name, gene biotype (protein coding, pseudogene, rRNA, etc.[10]), coding information, transcript information, exon information and any errors or warnings detected. The Effect impact is what SnpEff uses to determine how deleterious the variant is on genes. For example, a HIGH impact output means that SnpEff predicts that the variant causes deleterious gene effects.
SnpEff is typically used for research and academic purposes at institutions and companies - and in some instances, personalized medicine. However, Pablo Cingolani now recommends that ClinEff (a combination of SnpEff and SnpSift) be used for medical purposes.
SnpEff contains many advantages and limitations. It is able to analyze all variants from the 1000 Genome Project in less than 15 minutes and can be integrated into other tools such as Galaxy, GATK and GKNO. It can be combined with other toolkits to narrow variant prediction parameters (example: whitelist [11]).
SnpEff Limitations: