Triple-stranded DNA (also known as H-DNA or Triplex-DNA) is a DNA structure in which three oligonucleotides wind around each other and form a triple helix. In triple-stranded DNA, the third strand binds to a B-form DNA (via Watson–Crick base-pairing) double helixby forming Hoogsteen base pairs or reversed Hoogsteen hydrogen bonds.
Examples of triple-stranded DNA from natural sources with the necessary combination of base composition and structural elements have been described, for example in Satellite DNA.[1]
A thymine (T) nucleobase can bind to a Watson–Crick base-pairing of T-A by forming a Hoogsteen hydrogen bond. The thymine hydrogen bonds with the adenosine (A) of the original double-stranded DNA to create a T-A*T base-triplet.[2]
There are two classes of triplex DNA: intermolecular and intramolecular formations. An intermolecular triplex refers to triplex formation between a duplex and a different (third) strand of DNA. The third strand can either be from a neighboring chromosome or a triplex forming oligonucleotide (TFO). Intramolecular triplex DNA is formed from a duplex with homopurine and homopyrimidine strands with mirror repeat symmetry.[4] The degree of supercoiling in DNA influences the amount of intramolecular triplex formation that occurs.[5] There are two different types of intramolecular triplex DNA: H-DNA and H*-DNA. Formation of H-DNA is stabilized under acidic conditions and in the presence of divalent cations such as Mg2+. In this conformation, the homopyrimidine strand in the duplex bends back to bind to the purine strand in a parallel fashion. The base triads used to stabilize this conformation are T-A*T and C-G*A+. The cytosine of this base triad needs to be protonated in order to form this intramolecular triple helix, which is why this conformation is stabilized under acidic conditions.[6] H*-DNA has favorable formation conditions at neutral pH and in the presence of divalent cations. This intramolecular conformation is formed from the binding of the homopurine and purine strand of the duplex in an antiparallel fashion. It is stabilized by T-A*A and C-G*G base triplets.
TFOs are short (≈15-25 nt) nucleic acid strands that bind in the major groove of double-stranded DNA to form intramolecular triplex DNA structures. There is some evidence that they are also able to modulate gene activity in vivo. In peptide nucleic acid (PNA), the sugar-phosphate backbone of DNA is replaced with a protein-like backbone. PNAs form P-loops while interacting with duplex DNA, forming a triplex with one strand of DNA while displacing the other. Very unusual recombination or parallel triplexes, or R-DNA, have been assumed to form under RecA protein in the course of homologous recombination.[7]
TFOs bind specifically to homopurine-homopyrimidine regions that are often common in promoter and intron sequences of genes, influencing cell signaling.[8] TFOs can inhibit transcription by binding with high specificity to the DNA helix, thereby blocking the binding and function of transcription factors for particular sequences. By introducing TFOs into a cell (through transfection or other means), the expression of certain genes can be controlled.[9] This application has novel implications in site-specific mutagenesis and gene therapy. In human prostate cancer cells, a transcription factor Ets2 is over-expressed and thought to drive forward the growth and survival of cells in such excess. Carbone et al. designed a sequence-specific TFO to the Ets2 promoter sequence that down-regulated the gene expression and led to a slowing of cell growth and cell death.[10] Changxian et al. have also presented a TFO targeting the promoter sequence of bcl-2, a gene inhibiting apoptosis.[11]
The observed inhibition of transcription can also have negative health effects like its role in the recessive, autosomal gene for Friedreich's Ataxia.[12] In Fredrick's Ataxia, triplex DNA formation impairs the expression of intron 1 of the FXN gene. This results in the degeneration of the nervous system and spinal cord, impairing the movement of the limbs.[13] To combat this triplex instability, nucleotide excision repair proteins (NERs) have been shown to recognize and repair triple-stranded DNA structures, reinstating full availability of the previously inhibited and unstable gene.[14]
Peptide nucleic acids are synthetic oligonucleotides that resist protease degradation and are used to induce repair at site specific triplex formation regions on DNA genomic sites. PNAs are able to bind with high affinity and sequence specificity to a complementary DNA sequence through Watson-Crick base pairing binding and are able to form triple helices through parallel orientation Hoogsteen bonds with the PNA facing the 5’-end of the DNA strand.[15] The PNA-DNA triplex are stable because PNAs consist of a neutrally charged pseudopeptide backbone which binds to bind to the double stranded DNA (dsDNA) sequence.[16] Similar to homopyrimidine in TFOs, homopyrimidine in PNAs are able to form a bond with the complementary homopurine in target sequence of the dsDNA. These DNA analogues are able to bind to dsDNA by exploiting ambient DNA conditions and different predicting modes of recognition. This is different from TFOs which bind though the major groove recognition of the dsDNA.
One of the predicting modes of recognition used for recognition is through a duplex invasion. Within mixed A–T/G–C dsDNA sequence is targeted by a pair of pseudo-complementary (pc) PNAs which are able to bind to dsDNAs via double invasion through the simultaneous formation of diaminopurine (D) and thiouracil (Us) which substitute for adenine and thymine, respectively. The pc PNA pair form a D-T and Us -A and G-C or C-G Watson-Crick paired PNA-DNA helix with each of complementary DNA strands. Another form of recognized duplex invasion at targeted sequence can occur in dsDNA containing mixed T–C sequences.[17] This form of duplex invasion is achieved through a complementary sequence of homopurine PNA oligomers. This triplex is formed from a PNA-DNA hybrid that binds anti-parallel with the complementary DNA sequence and results in a displaced non-complementary DNA strand.
Additionally, PNA can be modified to form “clamp” triplex structures at the target site. One type of “clamp” formed is a bis-PNA structure, in which two PNA molecules are held together by a flexible linker such as 8-amino-3,6-dioxaoctanoic acid (O).[18] The bis-PNA structure forms a PNA-DNA-PNA triplex at the target site, where one strand forms Watson-Crick base pairs with DNA in an antiparallel orientation and the other strand forms Hoogsteen base pairs with the homopurine DNA strand in the DNA-PNA duplex. A tail clamp PNA (tcPNA) is also another form of triplex clamp that can also be formed. TcPNAs contain an extended 5-10 bp tail that forms a PNA/DNA duplex in addition to a PNA-DNA-PNA “clamp”. This allows for more specified PNA binding without the need for a homopyrimidie/pyridine stretch. These clamp structures had been shown to have high affinity and specificity. The addition of lysine residues to either or both ends of PNA's could be used to increase cellular uptake and binding.
Triple-stranded DNA has been implicated in the regulation of several genes. For instance, the c-myc gene has been extensively mutated to examine the role that triplex DNA, versus the linear sequence, plays in gene regulation. A c-myc promoter element, termed the nuclease-sensitive element or NSE, can form tandem intramolecular triplexes of the H-DNA type and has a repetitive sequence motif (ACCCTCCCC)4. The mutated NSE was examined for transcriptional activity and for its intra- and intermolecular triplex-forming ability. The transcriptional activity of mutant NSEs can be predicted by the element's ability to form H-DNA and not by repeat number, position, or the number of mutant base pairs. DNA may therefore be a dynamic participant in the transcription of the c-myc gene.[19]
According to several published articles, H-DNA has the ability to regulate gene expression depending on factors such as location and sequences in proximity. Although intergenic regions of the prokaryotic genome have shown low traces of naturally occurring H-DNA or triplex motifs, H-DNA structures have shown to be more prevalent in the eukaryotic genome. H-DNA has been shown to be especially abundant in mammalian cells including humans (1 in every 50,000 bp). Genetic sequences involved in gene regulation are typically found in the promoter regions of the eukaryotic genome.
Consequently, the promoter region has displayed the ability to form H-DNA with a higher frequency. A bioinformatic analysis of the S. cerevisiae genome observed the occurrence of H-DNA and other triplate DNA motifs in four organizational regions: introns, exons, promoter regions and miscellaneous regions. The bioinformatic displayed a total of 148 H-DNA or triplet DNA possible structures. The promoter region accounted for the higher frequency with 71 triplate structures, while the exons accounted for 57 triplate structures and the introns and miscellaneous accounted for 2 and 18 structures.[20]
In vitro and in vivo studies of eukaryotic genome expression resulted in one of three results: up regulation, down regulation, or no change in the presence of H-DNA motifs. Kato et al. reported upregulation expression of lacZ, when H-DNA was introduced to the B-lactamase promoter.[21] On the other hand, a similar study (Brachmachari et al.) reported no statistically significant inhibition of the lacZ reporter gene when H-DNA was inserted into the genome of mammalian COS cells. Although studies suggest regulation of H-DNA, the mechanism is still under investigation. Potaman et al. associates the mechanism of gene regulation to the interactions between the H-DNA and the TATA box found in the promoter region of Na,K-ATPase. In H-DNA formations adjacent to a TATA box, the H-DNA structure destabilizes the T-A bonds essential for transcription. The interference with the TATA box inhibits the transcriptional machinery and transcription initiation which interferes with gene expression.[22] Other mechanisms associated with the genomic expression of a genetic sequence in the presence of H-DNA involves TFOs. In vitro studies have highlighted a decrease in gene expression in the presence of TFOs in mammalian cells.[23] Another possible mechanism presented by Valentina et al. suggest the 13-mer AG motif oligonucleotide triplex complex (TFO complex) downregulates the transcription of mRNA through competitive inhibition.[24] Direct inhibition of gene expression from H-DNA is key to mutagenesis, replication inhibition, and even DNA recombination in the genome.
H-DNA motifs have been shown to stimulate homologous recombination with different mechanisms. Initial implications for the role of H-DNA in recombination came in the early 1990s when observing RecA, a bacterial DNA recombination protein composed of triple-helix DNA. RecA exhibits enzymatic activity essential for recombination.[25] Homologous recombination involving H-DNA motifs have also been found in eukaryotes. RadA, a homologous protein to RecA, has been shown to have the same enzymatic activity in recombination as RecA.[26] The protein has the ability to promote and exchange homologous strands through parallel triple stranded helices.[27] [28] The single stranded DNA (ssDNA) and complementary double stranded DNA (dsDNA) will form a D-loop structure.[29] Another possible mechanism for RecA involves the ssDNA from two separate H-DNA structures to form Watson-Crick base pairs. The new structure is known as a Holliday junction, an intermediate in homologous recombination. H-DNA is also found in other forms of recombination. In mammalian cells, H-DNA-sequences displayed a high frequency of recombination. For example, a study conducted on myeloma cell line of mice found H-DNA structures in Cγ2a and Cγ2b, which participate in sister chromatid exchange.
Considerable research has been funneled into the biological implications relating to the presence of H-DNA in the major breakpoint regions (Mbr) and double-strand-breakpoints of certain genes. Recent work has linked the presence of non-B-DNA structures with cases of genetic instability.[30]
Polypurine mirror-repeat H-DNA forming sequences were found neighboring the P1 promoter of the c-MYC gene and are associated with the major breakpoint hotspots of this region. Cases of genetic instability were also observed in the F1 offspring of transgenic mice after incorporation of human H-DNA-forming sequences paired with Z-DNA sequences into their genomes where no instability was previously reported.[31] Additionally, formation of R.R.Y. H-DNA conformations have been observed at the Mbr of the bcl-2 gene. Formation of these structures has been posited to cause the t(14;18) translocation observed in many cancers and most follicular lymphomas. This observation has led to research that indicated a substantial decrease in translocation events can be observed after blocking the formation of H-DNA by altering the sequence of this region slightly.[32] Long tracts of GAA·TTC have also been observed to form very stable H-DNA structures. Interactions between these two H-DNA structures, termed sticky DNA, has been shown to interrupt transcription of the X25, or frataxin gene. As decreased levels of the protein frataxin is associated with Friedreich's ataxia, formation of this instability has been suggested to be the basis for this genetic disease.[33] [34] Triple-stranded DNA has been observed in supercoiled Satellite DNA in regions where microsatellite copy numbers are highly variable, along with inverted-repeat Z-DNA structures within a larger 2.1kb satellite DNA repeat unit. [35]
Additionally, H-DNA has been shown to cause mutations related to critical cellular processes like DNA replication and transcription. The importance of these processes for survival has led to the development of complex DNA repair mechanisms that allow cells to recognize and fix DNA damage. Non-canonical DNA structures can be perceived as damage by the cell, and recent work has shown an increased prevalence of mutations near non-B-DNA-forming sequences. Some of these mutations are due to the interactions between H-DNA and the enzymes involved in DNA replication and transcription, where H-DNA interferes with these processes and triggers various DNA repair mechanisms. This can cause genetic instability and implicates H-DNA in cancer formation.
DNA replication has been shown to affect the function of various DNA repair enzymes. H-DNA formation involves the formation of single-stranded DNA (ssDNA), which is more susceptible to attack by nucleases.[36] Various nucleases have been shown to interact with H-DNA in a replication-dependent or replication-independent manner.
A study using human cells found that the nucleotide excision repair (NER) nucleases ERCC1-XPF and ERCC1-XPG induced genetic instability.[37] These enzymes cleave H-DNA at the loop formed by the two Hoogsteen hydrogen-bonded strands and the 5' end of the other Watson-Crick hydrogen-bonded strand, respectively. This cleavage has been shown to induce large deletions that cause double strand breaks (DSBs) in DNA that can lead to genetic instability. In cells deficient in ERCC1-XPF and ERCC1-XPG, these deletions were less prevalent near H-DNA forming sequences. Additionally, more mutations were found in ERCC1-XPF and ERCC1-XPG deficient cells in the absence of DNA replication, which suggests they process H-DNA in a replication-independent manner.
Alternatively, the DNA-replication repair nuclease FEN1 was found to suppress genetic instability. Similar to ERCC1-XPG, FEN1 cleaves H-DNA at the 5' end of the strand not involved in Hoogsteen hydrogen-bonding. HeLa cells deficient in FEN1 showed higher prevalence of deletions near H-DNA forming sequences, but H-DNA induced mutagenesis was more pronounced in FEN1 deficient cells in the presence of DNA replication. This suggests FEN1 suppresses H-DNA-induced mutagenesis in a replication-dependent manner.
H-DNA has been implicated in human cancer etiology because of the prevalence of H-DNA-forming sequences near translocation breakpoints in cancer genomes. Replication-mediated nuclease activity with H-DNA highlights another way H-DNA-induced mutagenesis and lead to cancer growth.
H-DNA forming sequences can also cause genetic instability by interfering with and stopping transcription prematurely. The DNA unwinding involved in transcription makes it more susceptible to damage. In transcription-coupled repair (TCR), a lesion on the template strand of DNA stops the function of RNA polymerase and signals TCR factors to resolve the damage by excising it.[38] H-DNA can be perceived as one of these lesions.
A study observing transcription by T7 RNA polymerase on a stable H-DNA-forming sequence analog found transcription blockage at the duplex-to-triplex junction. Here, the template strand was the central strand of the H-DNA, and the difficulty of disrupting its Watson-Crick and Hoogsteen hydrogen bonds stopped transcription from progressing.[39]
When transcription by T7 was observed on the P0 promoter of the c-MYC gene, the shortened transcription products that were found indicated that transcription was stopped in close proximity to the H-DNA forming sequence downstream of the promoter. Formation of H-DNA in this region prevents T7 from traveling down the template strand because of the steric hindrance it causes. This stops transcription and signals for TCR factors to come resolve the H-DNA, which results in DNA excision that can cause genetic instability. The mirror symmetry and prevalence of guanine residues in the c-MYC gene gives it a high propensity for non-canonical DNA structure formation.[40] This coupled with the activity of TCR factors during transcription makes it highly mutagenic, with it playing a role in the development of Burkitt lymphoma and leukemia.
The triple-stranded DNA regions can be generated through the association of Triplex Forming Oligonucleotides (TFO) and Peptide Nucleic Acids (PNAs). Historically, TFO binding has been shown to inhibit transcription, replication, and protein binding to DNA.[41] TFOs tethered to mutagens have also been shown to promote DNA damage and induce mutagenesis. Although TFO have been known to hinder transcription and replication of DNA, recent studies have shown that TFO can be utilized to mediate site specific gene modifications both in vitro and in vivo. Another recent study has also shown that TFOs can be used for suppression of oncogenes and proto-oncogenes to reduce cancer cell growth. For example, a recent study has used TFOs to reduce cellular death in hepatoma cells through the decreasing the expression of MET.
PNA TFOs have the ability to enhance recombination frequencies, leading to targeted, specific editing of genes. The PNA-DNA-PNA triplex helix is able to be recognized by the cell's own DNA repair mechanism, which sensitizes the surrounding DNA for homologous recombination. In order for a site-specific PNA structure to mediate recombination within a DNA sequence, a bis-PNA structure can be coupled with a 40nt DNA fragment that is homologous to an adjacent region on the target gene. The linking of a TFO to a donor DNA strand has been shown to induce recombination of the targeted gene and the adjacent gene target region. The mechanism for this form of recombination and repair have been linked to the nucleotide excision repair (NER) pathway playing a role in recognizing and repairing triplex structures. Multiple investigations suggests that the xeroderma pigmentosum group A (XPA) and replication protein A (RPA), which are NER factors, are able to bind specifically as a complex to cross-linked triplex structures. It is known that this mechanism alongside others play a role in recognizing and repairing triplex structures.
The in vivo delivery of TFOs has been a major barrier in using TFOs for gene modification.[42] One study on in vivo targeting of hematopoietic stem cells proposed a novel technique of conjugating PNA molecules with cell penetrating peptide (CPPs) alongside poly(lactic-co-glycolic acid) (PLGA) nanoparticles to enable 6 bp modifications in the CCR5 gene.[43] The editing of the CCR5 gene has been linked to HIV-1 resistance.[44] CPPs are proteins that are able to carry “cargo” such as small proteins or molecules successfully into cells. The PGLAs are biodegradable material that encapsulate PNA molecules as nanoparticles for site specific genome modifications. The study found that the PNA-DNA PGLA nanoparticles were able to effectively edit the hematopoietic stem cells with lower toxicity and virus-free and the conjugation with CPP offered direct targeting of the genes for site-specific mutagenesis in the stem cells.
In a novel study of cystic fibrosis (CF) gene therapy, three tail-clamp peptide nucleic acids (PNAs) alongside donor DNA molecule were engineered to be delivered by nanoparticles to correct F508 del mutations on the cystic fibrosis transmembrane conductance regulator (CFTR) in human bronchial epithelial cells in vivo and in vitro.[45] The F508 del mutation is the most commonly occurring mutation which leads a person to have CF.[46] The F508 mutation leads to a loss of function of the CFTR, which is a plasma membrane chloride channel that is regulated by a cyclic-adenosine monophosphate(cAMP). In this study, they were able to create the novel treatment approach for CF through the use of nanoparticles to correct the F508 del CFTR mutation both in vitro in human bronchial epithelial (HBE) cells and in vivo in a CF mouse model which resulted in the appearance of CFTR-dependent chloride transport.
See also: Obsolete models of DNA structure. Triple-stranded DNA structures were common hypotheses in the 1950s when scientists were struggling to discover DNA's true structural form. Watson and Crick (who later won the Nobel Prize for their double-helix model) originally considered a triple-helix model, as did Pauling and Corey, who published a proposal for their triple-helix model in 1953,[47] [48] as well as fellow scientist Fraser.[49] However, Watson and Crick soon identified several problems with these models:
Fraser's model differed from Pauling and Corey's in that in his model the phosphates are on the outside and the bases are on the inside, linked together by hydrogen bonds. However, Watson and Crick found Fraser's model to be too ill-defined to comment specifically on its inadequacies.
An alternative triple-stranded DNA structure was described in 1957.[50] Felsenfeld, Davies, and Rich predicted that if one strand contained only purines and the other strand only purines, the strand would undergo a conformational change to form a triple stranded DNA helix. The triple-stranded DNA (H-DNA) was predicted to be composed of one polypurine and two polypyrimidine strands. It was thought to occur in only one in vivo biological process: as an intermediate product during the action of the E. coli recombination enzyme RecA. Early models in the 1960s predicted the formation of complexes between polycetiylic and guanine oligonucleotides. The models suggested interactions known as Hoogsten pairing (non-Watson-Crick interactions) located in the major groove. Shortly after, triple helices composed of one pyrimidine and two purine strands were predicted. The discovery of in H-DNA stretches in supercoiled plasmids peaked modern interest in the potential function of triplex structures in living cells.[51] Additionally, it was soon found that homopyrimidine and some purine-rich oligonucleotide are able form a stable H-DNA structure with the homopurine-homopyrimidine binding sequence-specific structures on the DNA duplexes.[52]