Linked-read sequencing, a type of DNA sequencing technology, uses specialized technique that tags DNA molecules with unique barcodes before fragmenting them. Unlike traditional sequencing technology, where DNA is broken into small fragments and then sequenced individually, resulting in short read lengths that has difficulties in accurately reconstructing the original DNA sequence, the unique barcodes of linked-read sequencing allows scientists to link together DNA fragments that come from the same DNA molecule. A pivotal benefit of this technology lies in the small quantities of DNA required for large genome information output, effectively combining the advantages of long-read and short-read technologies.[1]
This sequencing method was originally developed by 10x Genomics in 2015, and was launched under the name 'GemCode' or 'Chromium'. GemCode employed a method of gel bead-based barcoding to amalgamate short DNA fragments.[2] The longer fragments produced by this could then be sequenced using validated technology such as Illumina next-generation sequencing.[3] An updated version of linked-read sequencing was introduced by the same company in 2018, termed 'Linked-Reads V2'. While GemCode uses a single barcode for tagging of both the gel bead and the DNA fragment, Linked-Reads V2 uses separate barcodes for improved detection of genetic variants.
The group developed the linked-read sequencing technology published their first paper regarding this technology in 2016. The authors of this paper developed the linked-read sequencing technology initially to sequence the genomes of both healthy individuals and cancer patients to determine somatic mutations, copy number variations, and structural variations in cancer genomes. Later that year, another research group combined linked-read sequencing technology with long-read sequencing technology to assemble human genome. Both studies demonstrated the utility of linked-read sequencing in comprehensive genome analysis and in understanding genetic diseases. However, in 2019, a lawsuit relating to patent infringement resulted in 10x Genomics discontinuing their line of linked-read products.
The linked-read sequencing is microfluidic-based, and only needs nanograms of input DNA. One nanogram of DNA can be distributed across more than 100,000 droplet partitions, where DNA fragments are barcoded and subjected to polymerase chain reactions (PCR). As a result, DNA fragments (or reads) that share the same barcode can be grouped as coming from one single long input DNA sequence. And, long range information can be assembled from short reads.
Steps of Linked-read sequencing:
During barcode sequencing, high molecular weight DNA samples that contain the targeted DNA sequence, ranging from fifty to several hundred kilobases in size, are combined with gel beads containing unique barcodes, enzymes, and sequencing reagents. Microfluidic device can partition input DNA molecules into individual nanoliter-sized droplets of water-in-oil emulsion, called GEMs. Each GEM contains gel beads coated with the same barcode and primers, and a small amount of DNA. The primers are complementary to specific regions of the DNA molecule, allowing for amplification of the DNA in the droplets through PCR. The barcodes enable the identification and grouping of sequencing reads that originate from the same long fragment, which is crucial for downstream analysis.
The barcoded DNA fragments are amplified using PCR to create a library of DNA fragments with identical barcodes. All the fragments derived from a given DNA molecule are tagged with the same barcode.[4] This step increases the quantity of DNA for sequencing and reduces the chances of losing unique DNA fragments during sequencing. Droplets (or GEM) are later collected in a tube, and the emulsion is broken, releasing the amplified, barcoded DNA sequences.
Standard Illumina next-generation sequencing technology can be used to sequence libraries.[5] During sequencing, the barcodes are read along with the DNA sequences, allowing researchers and scientists to group together DNA fragments that originate from the same DNA molecule. Even though each DNA fragment is typically not fully sequenced, the information from many overlapping fragments in the same genomic region can be combined to reconstruct the long stretches of the genome. Therefore, a genome can be easily assembled from scratch without any prior reference.
The raw sequencing data is then processed through bioinformatics (e.g., the GemCode analysis software developed by 10x Genomics) to remove low-quality reads and to assign reads to their respective barcodes. Reads can be aligned to a reference genome or assembled de novo to generate long-range contigs. The read alignment step is important for determining the order and orientation of the long DNA fragments, and for identifying genomic variations, such as insertions or deletions.
Linked-read sequencing can facilitate de novo genome assembly, which involves reconstructing a genome from scratch without any prior reference. Linked-read sequencing enables assembly of large genomic regions, and helps improve the completeness and contiguity of the resulting genome. This can be particularly useful for studying organisms that lack a high-quality reference genome, such as non-model organisms or organisms with complex genomes.[6] Many scientists have been using linked-read sequencing technology for de novo genome assembly recently in a variety of organisms, including humans, plants, and animals.[7] [8] For example, Dr. Evan Eichler and his research group used linked-read sequencing to assemble genome of orangutan, which had previously been difficult to study due to its complex genome. The resulting genome assembly helped scientists to study new insights into the evolutionary history of primates and the genetic basis of human diseases. Also, the aligned or assembled reads can be used for other genetic investigations or downstream analysis, such as haplotype phasing.
Haplotype refers to a group of genetic variants inherited together on a chromosome from one parent due to their genetic linkage. Haplotype phasing (also called haplotype estimation) refers to the process of reconstructing individual haplotypes, important for determining the genetic basis of diseases.[9] Linked-read sequencing allows consistent coverage of genes related to different diseases, helping scientists to obtain all the regions carrying mutations from targeted genes.[10] For example, in 2018, a group of researchers used linked-read sequencing technology to sequence genetic information from a pregnant woman who was a carrier of Duchenne muscular dystrophy (DMD) mutation. Linked-read sequencing allows them to identify the maternal haplotypes and determine the presence of the mutant alleles in the foetal DNA. This non-invasive prenatal diagnosis of DMD demonstrates the clinical applicability of linked-read sequencing.
Structural variations, such as deletions, duplications, inversions, translocations, and other rearrangements, are common in human genomes. These variations can have significant impacts on genome functions, and have been implicated in many diseases. Linked-read sequencing technology labels all reads that originate from the same long DNA fragment with the same barcode, so it enables the detection of a large number of structural variants. Complexity of structural variants can be resolved with linked-read sequencing, and provide a complete picture of the genomic landscape. Many scientists have already been using linked-read sequencing to identify and characterise structural variants in diverse populations, including people with genetic disorders or cancers [11]
Transcriptome analysis is the study of all the RNA transcripts that are produced by the genome of an organism. Linked-read sequencing has been used by researchers to assemble transcript isoforms and alternative splicing events.[12] Information regarding alternative splicing events can provide insights into the regulation of gene expression in human transcriptome
Epigenetics refers to the study of heritable changes in genetic activities that are distinct from changes in DNA sequences. Epigenetic analysis involves studying DNA-protein interactions, histone modifications, and DNA methylation. Linked-read sequencing has been used for studying DNA methylation patterns by many studies.[13] [14] For example, in 2021, a study investigated the DNA methylation differences in peripheral blood cells between twins, in which one twin had Alzheimer’s Disease and the other was cognitively normal. Linked-read sequencing technology allowed researchers to identify more than 3000 differentially methylated regions between these twins discordant for Alzheimer’s Disease, and investigation of these differentially methylated regions eventually led to identification of genes enriched in neurodevelopmental processes, neuronal signalling, and immune system functions
In 2018, Bio-Rad Laboratories filed a lawsuit against 10x Genomics stating that their linked-read technology infringed on three patents which had been licensed from Bio-Rad at the University of Chicago. Bio-Rad was awarded a sum of $23,930,716 by a jury. The 10x Genomics filed a motion for judgement as a matter of law (JMOL) but were denied in 2019, and the court proceedings concluded in 2020. Following this lawsuit, 10x Genomics discontinued their linked-read assay.[15] An exception was made for linked-read products which had already been sold by the company prior to the lawsuit, allowing 10x Genomics to continue to provide those researchers with services such as support and warranty maintenance for this technology.