Pore-C is a genomic technique which utilizes chromatin conformation capture (3C) and Oxford Nanopore Technologies' (ONT) long-read sequencing to characterize three-dimensional (3D) chromatin structure. To characterize concatemers, the originators of Pore-C developed an algorithm to identify alignments that are assigned to a restriction fragment; concatemers with greater than two associated fragments are deemed high order. Pore-C attempts to improve on previous 3C technologies, such as Hi-C and SPRITE, by not requiring DNA amplification prior to sequencing. This technology was developed as a simpler and more easily scalable method of capturing higher-order chromatin structure and mapping regions of chromatin contact. In addition, Pore-C can be used to visualize epigenomic interactions due to the capability of ONT long-read sequencing to detect DNA methylation. Applications of this technology include analysis of combinatorial chromatin interactions, the generation of de novo chromosome scale assemblies, visualization of regions associated with multi-locus histone bodies, and detection and resolution of structural variants.
Although the DNA within eukaryotic cells is linear, it is also intricately folded and packaged to fit within each cell’s nucleus. Thus, specific parts of the genome may be closer in physical space than would otherwise appear to be based on DNA sequence alone. The 3D genome refers to how DNA is spatially organized within cells. The 3D structures found in the genome include active and inactive chromatin, chromatin loops, and topologically associated domains (TADs). These structures function to regulate gene expression. In genomic and epigenomic research, chromatin structure is most often visualized by 3C techniques, which quantify interactions between loci to construct a 3D map. The fundamental 3C technique is used to quantify interactions between pairs of genomic loci. Methods that are derived from this technique, such as 4C, 5C, and Hi-C assays, allow quantification of pairwise interactions between multiple loci. Other variations, such as ChIP-loop and ChIA-PET, combine 3C with immunoprecipitation assays to detect interactions mediated by a protein of interest. These techniques all involve an amplification step, most often using polymerase chain reaction (PCR). A limitation of most current 3D chromatin assays is that they are less useful to categorize interactions between more than two loci, and Pore-C was developed to fill this gap in technology. Additionally, not requiring PCR amplification simplifies the workflow, therefore Pore-C is intended to be simpler and more easily scalable than previous techniques. Pore-C can also be used in populations of cells to characterize topology polymorphisms at specific loci.
Many methods to characterize the 3D genome are variations on 3C technology. Like other 3C-based technologies, Pore-C seeks to characterize the architecture of the 3D genome by determining which genomic loci are in close spatial proximity (within ~200 nm). Similar to previous 3C-based methods, Pore-C relies on crosslinking, restriction enzyme digestion, proximity ligation, reverse cross-linking, and protein degradation steps. However, Pore-C is distinct from many previous methods in its subsequent utilization of ONT long-read sequencing, which facilitates the resolution of multi-way chromosome contacts and simultaneous detection of DNA methylation
First, in order to preserve the 3D structure of the genome from degradation in subsequent steps, DNA is cross-linked to DNA-associated proteins, such as histones.Formaldehyde is used for cross-linking, as it joins DNA to proteins with covalent bonds, thus temporarily locking the 3D genome in place. Specifically, after a series of washes with phosphate-buffered saline (PBS), cells are pelletted with centrifugation, and then resuspended in a formaldehyde and PBS solution. Following a short incubation period, glycine is added to stop the cross-linking reaction. By quenching the excess formaldehyde, glycine prevents the reaction from going to completion, thereby maximizing the efficiency of later steps and ensuring the cross-linking reaction is reversible.
Cross-linking generates loops of DNA, with each loop arising from a separate locus. To capture long-range interactions between distant loci, potentially from different chromosomes, these loops are first cut and then re-joined back together based on proximity. Although fragments deriving from the same loop may reanneal back together, sometimes fragments from separate loops will ligate together, thus creating chimeric sequences. The cutting and rejoining of DNA is achieved by the in situ restriction enzyme digestion and proximity ligation steps respectively. Specifically, a restriction endonuclease cuts the DNA to create free ends, whereas T4 ligase is used to join fragments together. Ultimately, these steps result in genomic loci close together in physical space being linked together on contiguous DNA segments referred to as concatemers.
Next, in order to isolate DNA for sequencing, proteins bound to the DNA have to be detached and degraded. First, Proteinase K, sodium dodecyl sulfate (SDS; a detergent), Tween-20, and nuclease-free water are added. Subsequently, the reaction is heated to 56 °C in a thermocycler for optimal reaction kinetics. Proteinase K degrades proteins, and SDS acts a denaturing agent that disrupts protein structure. This reaction results in the breakage of covalent bonds between DNA and protein and removes potential protein contamination. DNA is then isolated and purified, typically using phenol-chloroform extraction followed by ethanol precipitation.
Pore-C concatemers undergo size selection prior to library preparation and ONT long-read sequencing. Via size selection, Pore-C is able to detect high-order interactions, which are defined as concatemers containing greater than two DNA fragments. Specifically, Pore-C size selection enriches for DNA sequences greater than 1.5 kb, thereby filtering out shorter concatemers unlikely to contain greater than two fragments. Many size selection methods have been developed for ONT long-read sequencing. For example, Solid Phase Reversible Immobilisation (SPRI) size selection has been used in the Pore-C literature. Following size selection, library preparation for ONT long-read sequencing is performed, usually with a ligation sequencing kit provided by ONT. Key steps include DNA repair and adaptor ligation. Subsequently, DNA is loaded onto flow cells for sequencing, where each concatemer is fed through a pore, aided by a motor protein. Nitrogenous DNA bases are read out by their characteristic disruption of an electric current
Overall, bioinformatic approaches applied to Pore-C data allow for the inference of pairwise and multi-way contacts between loci. Since concatemers in Pore-C contain DNA sequences that come from different regions of the genome, aligning sequencing reads to a reference genome is challenging. One solution to this problem involves a bioinformatic pipeline using a greedy piece-wise algorithm. Further analysis of Pore-C results depends on the study and what other data types are available.
Pore-C is a relatively new method, so its applications have not yet been fully appreciated. A strength of Pore-C over previous methods is its ability to detect interactions between more than two genomic loci. Such high-order interactions enable the study of cellular processes, such as gene expression regulation at a more system-level scale. With statistical methods, Pore-C data can be used to identify cooperative interactions, wherein high-order interactions are observed at a frequency greater than the sum of their expected pairwise contacts. In addition, using ONT long reads, Pore-C can detect DNA methylation, thereby providing an additional layer of epigenetic information to analyze. In the future, Pore-C may be applied to study how the 3D genome changes during developmental processes, such as cellular differentiation. Additionally, Pore-C may be applied to the study of cancer, where the 3D genome is often structurally rearranged, which can result in aberrant gene transcription via processes such as enhancer hijacking.