ScGET-seq explained
Single-cell genome and epigenome by transposases sequencing (scGET-seq) is a DNA sequencing method for profiling open and closed chromatin. In contrast to single-cell assay for transposase-accessible chromatin with sequencing (scATAC-seq), which only targets active euchromatin,[1] scGET-seq is also capable of probing inactive heterochromatin.[2]
This is achieved through the use of TnH, which is created by linking the chromodomain (CD) of heterochromatin protein-1-alpha (HP-1
) to the Tn5
transposase. TnH is then able to target histone 3 lysine 9 trimethylation (
H3K9me3), a marker for heterochromatin.
[3] Akin to RNA velocity, which uses the ratio of spliced to unspliced RNA to infer the kinetics of changes in gene expression over the course of cellular development,[4] the ratio of TnH to Tn5 signals obtained from scGET-seq can be used to calculate chromatin velocity, which measures the dynamics of chromatin accessibility over the course of cellular developmental pathways.
History
Transcriptional regulation is tightly linked to chromatin states. Chromatin that is open, or permissive to transcription, make up only 2-3% of the genome, but encompass 94.4% of transcription factor binding sites.[5] [6] Conversely, more tightly packed DNA, or heterochromatin, is responsible for genome organization and stability.[7] Chromatin density also changes over the course of cellular differentiation processes,[8] but there is a lack of high-throughput sequencing methods for directly assaying heterochromatin.
Many genomic-related diseases such as cancer are highly linked to changes in their epigenome. Cancers in particular are characterized by single-cell heterogeneity, which can drive metastasis and treatment resistance.[9] [10] The mechanisms that underlie these processes are still largely unknown, although the advent of single-cell technologies, including single-cell epigenomics, has contributed greatly to their elucidation.[11]
In 2015, ATAC-seq, which uses the Tn5 transposase to fragment and tag accessible chromatin, or euchromatin, for sequencing, became feasible at the single-cell resolution.[12] scGET-seq builds upon this technology by also providing information on heterochromatin, providing a more comprehensive look at chromatin structure and dynamics within each cell.[13]
Methods
Sample preparation
Sample preparation for scGET-seq starts with obtaining a suspension of nuclei from cells using a method appropriate for the starting material.[14]
The next step is to produce the TnH transposase. Tn5 is a transposase that cuts and ligates adapters to genomic regions unbound by nucleosomes (open chromatin).[15] HP-1a is a member of the HP1 family and is able to recognize and specifically bind to H3K9me3.[16] [17] Its chromodomain uses an induced-fit mechanism for recognizing this chromatin modification.[18] Linking the first 112 amino acids of HP-1a containing the chromodomain to Tn5 using a three poly-tyrosine-glycine-serine (TGS) linker leads to the creation of the TnH transposase, which is capable of targeting heterochromatin marked by H3K9me3.
Library preparation is done using a modified protocol for single-cell ATAC-seq,[19] where the nuclei suspension is sequentially incubated with the Tn5 transposase first, and then TnH.
Data analysis
The goals of the data analysis are:
- To identify and characterize distinct cell populations using clustering
- To profile chromatin accessibility across the genome
- To predict copy-number variants and single-nucleotide variants
Pre-processing
- Post-sequencing, reads need to be demultiplexed and mapped to the appropriate reference genome. Duplicated reads are identified and removed.
- "Peaks", or regions in the DNA enriched in the number of reads mapped, are identified.[20]
- Quality control is performed, and cells with low numbers of reads or few detected features are filtered out.
- Four count matrices (matrices where each column is a cell and each row is a feature) are generated: Tn5-dhs, Tn5-complement, TnH-dhs and TnH-complement, representing signal from accessible and compacted chromatin.
Analysis
Dimension reduction, visualization and clustering
Each of the matrices are filtered of shared regions and then normalized and log2 transformed. Linear dimension reduction is done using principal component analysis (PCA). Groups of cells are identified using a k-NN algorithm[21] and Leiden algorithm.[22] Finally, the four matrices are combined using matrix factorization[23] and UMAP reduction.[24]
Cell identification annotation
There are two approaches to cell identity annotation: Annotation based on feature annotation of ATAC peaks, and annotation based on integration with reference scRNA-seq data.[25]
Applications
Current
By using the ratio of Tn5 to TnH signals, quantitative values describing how quickly and in what direction chromatin remodelling is taking place can be calculated (chromatin velocity). By isolating regions that are most dynamic and identifying which transcription factors bind there, chromatin velocity can be used to infer the dynamic epigenetic processes happening within a given cell and the contributions of various transcription factors to those processes.
Future
Chromatin remodelling precedes changes in gene expression and enhances the understanding of trajectories and mechanisms of cellular changes.[26] [27] Thus, platforms and tools for integration of multimodal data are areas of active research[28] [29] [30] Incorporating temporal and directionality elements through integration of chromatin velocity with RNA velocity has been proposed to reveal even more information about differentiation pathways.[31] [32]
Limitations
scGET-seq has some of the same limitations as scATAC-seq. Both processes require nuclei samples from viable cells, and high cellular viability.[13] Low cellular viability leads to high background DNA contamination that do not accurately represent authentic biological signals. Additionally, the sparsity and noisy nature of scATAC-seq and scGET-seq data makes analysis challenging, and there is no consensus yet on how to best manage this data[33]
Another limitation is that scGET-seq still needs the validation of SNVs results by bulk genome sequencing. Even though there is a high correlation of mutations between bulk exome sequencing and scGET-seq results, scGET-seq fails to capture all exome SNVs.
Notes and References
- Yan F, Powell DR, Curtis DJ, Wong NC . From reads to insight: a hitchhiker's guide to ATAC-seq data analysis . Genome Biology . 21 . 1 . 22 . February 2020 . 32014034 . 6996192 . 10.1186/s13059-020-1929-3 . free .
- Tedesco M, Giannese F, Lazarević D, Giansanti V, Rosano D, Monzani S, Catalano I, Grassi E, Zanella ER, Botrugno OA, Morelli L, Panina Bordignon P, Caravagna G, Bertotti A, Martino G, Aldrighetti L, Pasqualato S, Trusolino L, Cittaro D, Tonon G . 6 . Chromatin Velocity reveals epigenetic dynamics by single-cell profiling of heterochromatin and euchromatin . Nature Biotechnology . 40 . 2 . 235–244 . February 2022 . 34635836 . 10.1038/s41587-021-01031-1 . 238637962 . 11368/3007419 . free .
- Kouzarides T . Chromatin modifications and their function . English . Cell . 128 . 4 . 693–705 . February 2007 . 17320507 . 10.1016/j.cell.2007.02.005 . 11691263 . free .
- La Manno G, Soldatov R, Zeisel A, Braun E, Hochgerner H, Petukhov V, Lidschreiber K, Kastriti ME, Lönnerberg P, Furlan A, Fan J, Borm LE, Liu Z, van Bruggen D, Guo J, He X, Barker R, Sundström E, Castelo-Branco G, Cramer P, Adameyko I, Linnarsson S, Kharchenko PV . 6 . RNA velocity of single cells . Nature . 560 . 7719 . 494–498 . August 2018 . 30089906 . 6130801 . 10.1038/s41586-018-0414-6 . 2018Natur.560..494L .
- Klemm SL, Shipony Z, Greenleaf WJ . Chromatin accessibility and the regulatory epigenome . Nature Reviews. Genetics . 20 . 4 . 207–220 . April 2019 . 30675018 . 10.1038/s41576-018-0089-8 . 59159906 .
- Thurman RE, Rynes E, Humbert R, Vierstra J, Maurano MT, Haugen E, Sheffield NC, Stergachis AB, Wang H, Vernot B, Garg K, John S, Sandstrom R, Bates D, Boatman L, Canfield TK, Diegel M, Dunn D, Ebersol AK, Frum T, Giste E, Johnson AK, Johnson EM, Kutyavin T, Lajoie B, Lee BK, Lee K, London D, Lotakis D, Neph S, Neri F, Nguyen ED, Qu H, Reynolds AP, Roach V, Safi A, Sanchez ME, Sanyal A, Shafer A, Simon JM, Song L, Vong S, Weaver M, Yan Y, Zhang Z, Zhang Z, Lenhard B, Tewari M, Dorschner MO, Hansen RS, Navas PA, Stamatoyannopoulos G, Iyer VR, Lieb JD, Sunyaev SR, Akey JM, Sabo PJ, Kaul R, Furey TS, Dekker J, Crawford GE, Stamatoyannopoulos JA . 6 . The accessible chromatin landscape of the human genome . Nature . 489 . 7414 . 75–82 . September 2012 . 22955617 . 10.1038/nature11232 . 3721348 . 4304439 . 2012Natur.489...75T .
- Penagos-Puig A, Furlan-Magaril M . Heterochromatin as an Important Driver of Genome Organization . Frontiers in Cell and Developmental Biology . 8 . 579137 . 2020 . 33072761 . 7530337 . 10.3389/fcell.2020.579137 . free .
- Golkaram M, Jang J, Hellander S, Kosik KS, Petzold LR . The Role of Chromatin Density in Cell Population Heterogeneity during Stem Cell Differentiation . Scientific Reports . 7 . 1 . 13307 . October 2017 . 29042584 . 5645312 . 10.1038/s41598-017-13731-3 . 2017NatSR...713307G .
- Dagogo-Jack I, Shaw AT . Tumour heterogeneity and resistance to cancer therapies . Nature Reviews. Clinical Oncology . 15 . 2 . 81–94 . February 2018 . 29115304 . 10.1038/nrclinonc.2017.166 . 2194691 .
- Lawson DA, Kessenbrock K, Davis RT, Pervolarakis N, Werb Z . Tumour heterogeneity and metastasis at single-cell resolution . Nature Cell Biology . 20 . 12 . 1349–1360 . December 2018 . 30482943 . 6477686 . 10.1038/s41556-018-0236-7 .
- Dai Z, Gu XY, Xiang SY, Gong DD, Man CF, Fan Y . Research and application of single-cell sequencing in tumor heterogeneity and drug resistance of circulating tumor cells . Biomarker Research . 8 . 1 . 60 . November 2020 . 33292625 . 7653877 . 10.1186/s40364-020-00240-1 . free .
- Pott S, Lieb JD . Single-cell ATAC-seq: strength in numbers . Genome Biology . 16 . 1 . 172 . August 2015 . 26294014 . 4546161 . 10.1186/s13059-015-0737-7 . free .
- Tang L . Sketching open and closed chromatin . Nature Methods . 18 . 12 . 1448 . December 2021 . 34862496 . 10.1038/s41592-021-01351-9 . 244871731 .
- Web site: Isolation of Nuclei for Single Cell RNA Sequencing & Tissues for Single Cell RNA Sequencing -Demonstrated Protocol -Sample Prep -Single Cell Gene Expression -Official 10x Genomics Support . 2022-03-02 . support.10xgenomics.com.
- Book: Hsu FM, Gohain M, Chang P, Lu JH, Chen PY . Chapter 4 - Bioinformatics of Epigenomic Data Generated From Next-Generation Sequencing . January 2018 . Epigenetics in Human Disease . 6 . 65–106 . Tollefsbol TO . Translational Epigenetics . Academic Press . en . 10.1016/B978-0-12-812215-0.00004-2 . 978-0-12-812215-0 . Second .
- Bannister AJ, Zegerman P, Partridge JF, Miska EA, Thomas JO, Allshire RC, Kouzarides T . Selective recognition of methylated lysine 9 on histone H3 by the HP1 chromo domain . Nature . 410 . 6824 . 120–124 . March 2001 . 11242054 . 10.1038/35065138 . 2001Natur.410..120B . 4334447 .
- Watanabe S, Mishima Y, Shimizu M, Suetake I, Takada S . Interactions of HP1 Bound to H3K9me3 Dinucleosome by Molecular Simulations and Biochemical Assays . English . Biophysical Journal . 114 . 10 . 2336–2351 . May 2018 . 29685391 . 6129468 . 10.1016/j.bpj.2018.03.025 . 2018BpJ...114.2336W .
- Nielsen PR, Nietlispach D, Mott HR, Callaghan J, Bannister A, Kouzarides T, Murzin AG, Murzina NV, Laue ED . 6 . Structure of the HP1 chromodomain bound to histone H3 methylated at lysine 9 . Nature . 416 . 6876 . 103–107 . March 2002 . 11882902 . 10.1038/nature722 . 4423019 . 2002Natur.416..103N .
- Web site: Chromium Single Cell ATAC Reagent Kits User Guide (v1.1 Chemistry) -User Guide -Official 10x Genomics Support . 2022-03-02 . support.10xgenomics.com.
- Baek S, Lee I . Single-cell ATAC sequencing analysis: From data preprocessing to hypothesis generation . Computational and Structural Biotechnology Journal . 18 . 1429–1439 . 2020-01-01 . 32637041 . 10.1016/j.csbj.2020.06.012 . 7327298 .
- Polański K, Young MD, Miao Z, Meyer KB, Teichmann SA, Park JE . BBKNN: fast batch alignment of single cell transcriptomes . Bioinformatics . 36 . 3 . 964–965 . February 2020 . 31400197 . 10.1093/bioinformatics/btz625 . 9883685 .
- Traag VA, Waltman L, van Eck NJ . From Louvain to Leiden: guaranteeing well-connected communities . Scientific Reports . 9 . 1 . 5233 . March 2019 . 30914743 . 10.1038/s41598-019-41695-z . 6435756 . 1810.08473 . 2019NatSR...9.5233T .
- Žitnik M, Zupan B . Data Fusion by Matrix Factorization . IEEE Transactions on Pattern Analysis and Machine Intelligence . 37 . 1 . 41–53 . January 2015 . 26353207 . 10.1109/TPAMI.2014.2343973 . 1307.0803 . 362295 .
- Web site: UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction — umap 0.5 documentation . 2022-03-04 . umap-learn.readthedocs.io.
- Abdelaal T, Michielsen L, Cats D, Hoogduin D, Mei H, Reinders MJ, Mahfouz A . A comparison of automatic cell identification methods for single-cell RNA sequencing data . Genome Biology . 20 . 1 . 194 . September 2019 . 31500660 . 6734286 . 10.1186/s13059-019-1795-z . free .
- Stadhouders R, Vidal E, Serra F, Di Stefano B, Le Dily F, Quilez J, Gomez A, Collombet S, Berenguer C, Cuartero Y, Hecht J, Filion GJ, Beato M, Marti-Renom MA, Graf T . 6 . Transcription factors orchestrate dynamic interplay between genome topology and gene regulation during cell reprogramming . Nature Genetics . 50 . 2 . 238–249 . February 2018 . 29335546 . 5810905 . 10.1038/s41588-017-0030-7 .
- Ranzoni AM, Tangherloni A, Berest I, Riva SG, Myers B, Strzelecka PM, Xu J, Panada E, Mohorianu I, Zaugg JB, Cvejic A . 6 . Integrative Single-Cell RNA-Seq and ATAC-Seq Analysis of Human Developmental Hematopoiesis . Cell Stem Cell . 28 . 3 . 472–487.e7 . March 2021 . 33352111 . 7939551 . 10.1016/j.stem.2020.11.015 .
- Lin Y, Wu TY, Wan S, Yang JY, Wong WH, Wang YX . scJoint integrates atlas-scale single-cell RNA-seq and ATAC-seq data with transfer learning . Nature Biotechnology . 703–710 . January 2022 . 40 . 5 . 35058621 . 10.1038/s41587-021-01161-6 . 9186323 . 246150572 .
- Stuart T, Butler A, Hoffman P, Hafemeister C, Papalexi E, Mauck WM, Hao Y, Stoeckius M, Smibert P, Satija R . 6 . Comprehensive Integration of Single-Cell Data . English . Cell . 177 . 7 . 1888–1902.e21 . June 2019 . 31178118 . 6687398 . 10.1016/j.cell.2019.05.031 .
- Wang C, Sun D, Huang X, Wan C, Li Z, Han Y, Qin Q, Fan J, Qiu X, Xie Y, Meyer CA, Brown M, Tang M, Long H, Liu T, Liu XS . 6 . Integrative analyses of single-cell transcriptome and regulome using MAESTRO . Genome Biology . 21 . 1 . 198 . August 2020 . 32767996 . 7412809 . 10.1186/s13059-020-02116-x . free .
- Xu Y, Begoli E, McCord RP . 2021-12-01 . sciCAN: Single-cell chromatin accessibility and gene expression data integration via Cycle-consistent Adversarial Network . bioRxiv . en . 2021.11.30.470677 . 10.1101/2021.11.30.470677. 244821695 .
- Chen Z, King WC, Gerstein M, Zhang J . 2022-02-23 . scDVF: Single-cell Transcriptomic Deep Velocity Field Learning with Neural Ordinary Differential Equations . bioRxiv . en . 2022.02.15.480564 . 10.1101/2022.02.15.480564. 247000437 .
- Baek S, Lee I . Single-cell ATAC sequencing analysis: From data preprocessing to hypothesis generation . Computational and Structural Biotechnology Journal . 18 . 1429–1439 . January 2020 . 32637041 . 10.1016/j.csbj.2020.06.012 . 7327298 .