DRIP-seq (DRIP-sequencing) is a technology for genome-wide profiling of a type of DNA-RNA hybrid called an "R-loop".[1] DRIP-seq utilizes a sequence-independent but structure-specific antibody for DNA-RNA immunoprecipitation (DRIP) to capture R-loops for massively parallel DNA sequencing.[1]
An R-loop is a three-stranded nucleic acid structure, which consists of a DNA-RNA hybrid duplex and a displaced single stranded DNA (ssDNA).[2] R-loops are predominantly formed in cytosine-rich genomic regions during transcription[2] and are known to be involved with gene expression and immunoglobulin class switching.[1] [3] [4] They have been found in a variety of species, ranging from bacteria to mammals.[2] They are preferentially localized at CpG island promoters in human cells and highly transcribed regions in yeast.[1] [3]
Under abnormal conditions, namely elevated production of DNA-RNA hybrids, R-loops can cause genome instability by exposing single-stranded DNA to endogenous damages exerted by the action of enzymes such as AID and APOBEC, or overexposure to chemically reactive species.[4] Therefore, understanding where and in what circumstances R-loops are formed across the genome is crucial for the better understanding of genome instability. R-loop characterization was initially limited to locus specific approaches.[5] However, upon the arrival of massive parallel sequencing technologies and thereafter derivatives like DRIP-seq, the possibility to investigate entire genomes for R-loops has opened up.
DRIP-seq relies on the high specificity and affinity of the S9.6 monoclonal antibody (mAb) towards DNA-RNA hybrids of various lengths. S9.6 mAb was first created and characterized in 1986 and is currently used for the selective immunoprecipitation of R-loops.[6] Since then, it was used in diverse immunoprecipitation methods for R-loop characterization.[1] [3] [7] [8] The concept behind DRIP-seq is similar to ChIP-sequencing; R-loop fragments are the main immunoprecipitated material in DRIP-seq.
DRIP-seq is mainly used for genome-wide mapping of R-loops. Identifying R-loop formation sites allows the study of diverse cellular events, such as the function of R-loop formation at specific regions, the characterization of these regions, and the impact on gene expression. It can also be used to study the influence of R-loops in other processes like DNA replication and synthesis. Indirectly, DRIP-seq can be performed on mutant cell lines deficient in genes involved in R-loop resolution.[3] These types of studies provide information about the roles of the mutated gene in suppressing DNA-RNA formation and potentially about the significance of R-loops in genome instability.
DRIP-seq was first used for genome-wide profiling of R-loops in humans, which showed widespread R-loop formation at CpG island promoters.[1] Particularly, the researchers found that R-loop formation is associated with the unmethylated state of CpG islands.
DRIP-seq was later used to profile R-loop formation at transcription start and termination sites in human pluripotent Ntera2 cells.[7] In this study, the researchers revealed that R-loops on 3' ends of genes may be correlated with transcription termination.
First, genomic DNA (gDNA) is extracted from cells of interest by proteinase K treatment followed by phenol-chloroform extraction and ethanol precipitation. Additional zymolyase digestion is necessary for yeast cells to remove the cell wall prior to proteinase K treatment. gDNA can also be extracted with a variety of other methods, such as column-based methods.
gDNA is treated with S1 nuclease to remove undesired ssDNA and RNA, followed by ethanol precipitation to remove the S1 nuclease. Then, gDNA is fragmented with restriction endonuclease, yielding double-stranded DNA (dsDNA) fragments of different sizes. Alternatively, gDNA fragments can be generated by sonication.
Fragmented gDNA is incubated with the DNA-RNA structure-specific S9.6 mAb. This step is unique for the DRIP-seq protocol, since it entirely relies on the high specificity and affinity of the S9.6 mAb for DNA-RNA hybrids. The antibody will recognize and bind these regions dispersed across the genome and will be used for immunoprecipitation. The S9.6 antibodies are bound to magnetic beads by interacting with specific ligands (i.e. protein A or protein G) on the surface of the beads. Thus, the DNA-RNA containing fragments will bind to the beads by means of the antibody.
The magnetic beads are washed to remove any gDNA not bound to the beads by a series of washes and DNA-RNA hybrids are recovered by elution. To remove the antibody bound to the nucleic acid hybrids, proteinase K treatment is performed followed by phenol-chloroform extraction and ethanol precipitation. This results in the isolation of purified DNA-RNA hybrids of different sizes.
For massive parallel sequencing of these fragments, the immunoprecipitated material is sonicated, size selected and ligated to barcoded oligonucleotide adaptors for cluster enrichment and sequencing.
To detect sites of R-loop formation, the hundreds of millions of sequencing reads from DRIP-seq are first aligned to a reference genome with a short-read sequence aligner, then peak calling methods designed for ChIP-seq can be used to evaluate DRIP signals.[1] If different cocktails of restriction enzymes were used for different DRIP-seq experiments of the same sample, consensus DRIP-seq peaks are called.[7] Typically, peaks are compared against those from a corresponding RNase H1-treated sample, which serves as an input control.[1] [7]
Due to the absence of another antibody-based method for R-loop immunoprecipitation, validation of DRIP-seq results is difficult. However, results of other R-loop profiling methods, such as DRIVE-seq, may be used to measure consensus.
On the other hand, DRIP-seq relies on existing short-read sequencing platforms for the sequencing of R-loops. In other words, all inherent limitations of these platform also apply to DRIP-seq. In particular, typical short-read sequencing platforms would produce uneven read coverage in GC-rich regions. Sequencing long R-loops might pose a challenge because R-loops are predominantly formed in cytosine-rich DNA regions. Moreover, GC-rich regions tend to have low complexity by nature, which is difficult for short read aligners to produce unique alignments.
Although there are several other methods for analysis and profiling of R-loop formation,[5] [9] [10] [11] [12] [13] [14] [15] [16] only few provide coverage and robustness at the genome-wide scale.[1] [3]