Ribonuclease H (abbreviated RNase H or RNH) is a family of non-sequence-specific endonuclease enzymes that catalyze the cleavage of RNA in an RNA/DNA substrate via a hydrolytic mechanism. Members of the RNase H family can be found in nearly all organisms, from bacteria to archaea to eukaryotes.
The family is divided into evolutionarily related groups with slightly different substrate preferences, broadly designated ribonuclease H1 and H2.[1] The human genome encodes both H1 and H2. Human ribonuclease H2 is a heterotrimeric complex composed of three subunits, mutations in any of which are among the genetic causes of a rare disease known as Aicardi–Goutières syndrome. A third type, closely related to H2, is found only in a few prokaryotes,[2] whereas H1 and H2 occur in all domains of life. Additionally, RNase H1-like retroviral ribonuclease H domains occur in multidomain reverse transcriptase proteins, which are encoded by retroviruses such as HIV and are required for viral replication.[3] [4]
In eukaryotes, ribonuclease H1 is involved in DNA replication of the mitochondrial genome. Both H1 and H2 are involved in genome maintenance tasks such as processing of R-loop structures.
Ribonuclease H is a family of endonuclease enzymes with a shared substrate specificity for the RNA strand of RNA-DNA duplexes. By definition, RNases H cleave RNA backbone phosphodiester bonds to leave a 3' hydroxyl and a 5' phosphate group.[5] RNases H have been proposed as members of an evolutionarily related superfamily encompassing other nucleases and nucleic acid processing enzymes such as retroviral integrases, DNA transposases, Holliday junction resolvases, Piwi and Argonaute proteins, various exonucleases, and the spliceosomal protein Prp8.[6] [7]
RNases H can be broadly divided into two subtypes, H1 and H2, which for historical reasons are given Arabic numeral designations in eukaryotes and Roman numeral designations in prokaryotes. Thus the Escherichia coli RNase HI is a homolog of the Homo sapiens RNase H1. In E. coli and many other prokaryotes, the rnhA gene encodes HI and the rnhB gene encodes HII. A third related class, called HIII, occurs in a few bacteria and archaea; it is closely related to prokaryotic HII enzymes.
The structure of RNase H commonly consists of a 5-stranded β-sheet surrounded by a distribution of α-helices.[8] All RNases H have an active site centered on a conserved sequence motif composed of aspartate and glutamate residues, often referred to as the DEDD motif. These residues interact with catalytically required magnesium ions.[3]
RNases H2 are larger than H1 and usually have additional helices. The domain organization of the enzymes varies; some prokaryotic and most eukaryotic members of the H1 group have an additional small domain at the N-terminus known as the "hybrid binding domain", which facilitates binding to RNA:DNA hybrid duplexes and sometimes confers increased processivity.[9] While all members of the H1 group and the prokaryotic members of the H2 group function as monomers, eukaryotic H2 enzymes are obligate heterotrimers. Prokaryotic HIII enzymes are members of the broader H2 group and share most structural features with H2, with the addition of an N-terminal TATA box binding domain. Retroviral RNase H domains occurring in multidomain reverse transcriptase proteins have structures closely resembling the H1 group.
RNases H1 have been extensively studied to explore the relationships between structure and enzymatic activity. They are also used, especially the E. coli homolog, as model systems to study protein folding.[10] [11] [12] Within the H1 group, a relationship has been identified between higher substrate-binding affinity and the presence of structural elements consisting of a helix and flexible loop providing a larger and more basic substrate-binding surface. The C-helix has a scattered taxonomic distribution; it is present in the E. coli and human RNase H1 homologs and absent in the HIV RNase H domain, but examples of retroviral domains with C-helices do exist.[13] [14]
Ribonuclease H enzymes cleave the phosphodiester bonds of RNA in a double-stranded RNA:DNA hybrid, leaving a 3' hydroxyl and a 5' phosphate group on either end of the cut site with a two-metal-ion catalysis mechanism, in which two divalent cations, such as Mg2+ and Mn2+, directly participate in the catalytic function. Depending on the differences in their amino acid sequences, these RNases H are classified into type 1 and type 2 RNases H.[5] [15] Type 1 RNases H have prokaryotic and eukaryotic RNases H1 and retroviral RNase H. Type 2 RNases H have prokaryotic and eukaryotic RNases H2 and bacterial RNase H3. These RNases H exist in a monomeric form, except for eukaryotic RNases H2, which exist in a heterotrimeric form.[16] [17] RNase H1 and H2 have distinct substrate preferences and distinct but overlapping functions in the cell. In prokaryotes and lower eukaryotes, neither enzyme is essential, whereas both are believed to be essential in higher eukaryotes. The combined activity of both H1 and H2 enzymes is associated with maintenance of genome stability due to the enzymes' degradation of the RNA component of R-loops.[18] [19]
Symbol: | RNase H |
Pfam: | PF00075 |
Pfam Clan: | CL0219 |
Interpro: | IPR002156 |
Prosite: | PS50879 |
Ribonuclease H1 enzymes require at least four ribonucleotide-containing base pairs in a substrate and cannot remove a single ribonucleotide from a strand that is otherwise composed of deoxyribonucleotides. For this reason, it is considered unlikely that RNase H1 enzymes are involved in the processing of RNA primers from Okazaki fragments during DNA replication. RNase H1 is not essential in unicellular organisms where it has been investigated; in E. coli, RNase H1 knockouts confer a temperature-sensitive phenotype, and in S. cerevisiae, they produce defects in stress response.[20]
In many eukaryotes, including mammals, RNase H1 genes include a mitochondrial targeting sequence, leading to expression of isoforms with and without the MTS present. As a result, RNase H1 is localized to both mitochondria and the nucleus. In knockout mouse models, RNase H1-null mutants are lethal during embryogenesis due to defects in replicating mitochondrial DNA.[21] [22] The defects in mitochondrial DNA replication induced by loss of RNase H1 are likely due to defects in R-loop processing.
Symbol: | RNase HII |
Pfam: | PF01351 |
Pfam Clan: | CL0219 |
Interpro: | IPR024567 |
In prokaryotes, RNase H2 is enzymatically active as a monomeric protein. In eukaryotes, it is an obligate heterotrimer composed of a catalytic subunit A and structural subunits B and C. While the A subunit is closely homologous to the prokaryotic RNase H2, the B and C subunits have no apparent homologs in prokaryotes and are poorly conserved at the sequence level even among eukaryotes.[23] [24] The B subunit mediates protein-protein interactions between the H2 complex and PCNA, which localizes H2 to replication foci.
Both prokaryotic and eukaryotic H2 enzymes can cleave single ribonucleotides in a strand. however, they have slightly different cleavage patterns and substrate preferences: prokaryotic enzymes have lower processivity and hydrolyze successive ribonucleotides more efficiently than ribonucleotides with a 5' deoxyribonucleotide, while eukaryotic enzymes are more processive and hydrolyze both types of substrate with similar efficiency.[24] The substrate specificity of RNase H2 gives it a role in ribonucleotide excision repair, removing misincorporated ribonucleotides from DNA, in addition to R-loop processing.[25] [26] [27] Although both H1 and H2 are present in the mammalian cell nucleus, H2 is the dominant source of RNase H activity there and is important for maintaining genome stability.
Some prokaryotes possess an additional H2-type gene designated RNase HIII in the Roman-numeral nomenclature used for the prokaryotic genes. HIII proteins are more closely related to the H2 group by sequence identity and structural similarity, but have substrate preferences that more closely resemble H1.[28] Unlike HI and HII, which are both widely distributed among prokaryotes, HIII is found in only a few organisms with a scattered taxonomic distribution; it is somewhat more common in archaea and is rarely or never found in the same prokaryotic genome as HI.[29]
The active site of nearly all RNases H contains four negatively charged amino acid residues, known as the DEDD motif; often a histidine e.g. in HIV-1, human or E. coli is also present.
The charged residues bind two metal ions that are required for catalysis; under physiological conditions these are magnesium ions, but manganese also usually supports enzymatic activity, while calcium or high concentration of Mg2+ inhibits activity.[30] [31]
Based on experimental evidence and computer simulations the enzyme activates a water molecule bound to one of the metal ions with the conserved histidine.[32] The transition state is associative in nature [33] and forms an intermediate with protonated phosphate and deprotonated alkoxide leaving group. The leaving group is protonated via the glutamate which has an elevated pKa and is likely to be protonated. The mechanism is similar to RNase T and the RuvC subunit in the Cas9 enzyme which both also use a histidine and a two-metal ion mechanism.
The mechanism of the release of the cleaved product is still unresolved. Experimental evidence from time-resolved crystallography and similar nucleases points to a role of a third ion in the reaction recruited to the active site. [34] [35]
The human genome contains four genes encoding RNase H:
In addition, genetic material of retroviral origin appears frequently in the genome, reflecting integration of the genomes of human endogenous retroviruses. Such integration events result in the presence of genes encoding retroviral reverse transcriptase, which includes an RNase H domain. An example is ERVK6.[36] Long terminal repeat (LTR) and non-long terminal repeat (non-LTR) retrotransposons are also common in the genome and often include their own RNase H domains, with a complex evolutionary history.[37] [38] [39]
In small studies, mutations in human RNase H1 have been associated with chronic progressive external ophthalmoplegia, a common feature of mitochondrial disease.
Mutations in any of the three RNase H2 subunits are well-established as causes of a rare genetic disorder known as Aicardi–Goutières syndrome (AGS), which manifests as neurological and dermatological symptoms at an early age.[40] The symptoms of AGS closely resemble those of congenital viral infection and are associated with inappropriate upregulation of type I interferon. AGS can also be caused by mutations in other genes: TREX1, SAMHD1, ADAR, and MDA5/IFIH1, all of which are involved in nucleic acid processing.[41] Characterization of mutational distribution in an AGS patient population found 5% of all AGS mutations in RNASEH2A, 36% in 2B, and 12% in 2C. Mutations in 2B have been associated with somewhat milder neurological impairment and with an absence of interferon-induced gene upregulation that can be detected in patients with other AGS-associated genotypes.
See also: Retroviral ribonuclease H. Two groups of viruses use reverse transcription as part of their life cycles: retroviruses, which encode their genomes in single-stranded RNA and replicate through a double-stranded DNA intermediate; and dsDNA-RT viruses, which replicate their double-stranded DNA genomes through an RNA "pregenome" intermediate. Pathogenic examples include human immunodeficiency virus and hepatitis B virus, respectively. Both encode large multifunctional reverse transcriptase (RT) proteins containing RNase H domains.[42] [43]
Retroviral RT proteins from HIV-1 and murine leukemia virus are the best-studied members of the family.[44] [45] Retroviral RT is responsible for converting the virus' single-stranded RNA genome into double-stranded DNA. This process requires three steps: first, RNA-dependent DNA polymerase activity produces minus-strand DNA from the plus-strand RNA template, generating an RNA:DNA hybrid intermediate; second, the RNA strand is destroyed; and third, DNA-dependent DNA polymerase activity synthesizes plus-strand DNA, generating double-stranded DNA as the final product. The second step of this process is carried out by an RNase H domain located at the C-terminus of the RT protein.[46] [47]
RNase H performs three types of cleaving actions: non-specific degradation of the plus-strand RNA genome, specific removal of the minus-strand tRNA primer, and removal of the plus-strand purine-rich polypurine tract (PPT) primer.[48] RNase H plays a role in the priming of the plus-strand, but not in the conventional method of synthesizing a new primer sequence. Rather RNase H creates a "primer" from the PPT that is resistant to RNase H cleavage. By removing all bases but the PPT, the PPT is used as a marker for the end of the U3 region of its long terminal repeat.
Because RNase H activity is required for viral proliferation, this domain has been considered a drug target for the development of antiretroviral drugs used in the treatment of HIV/AIDS and other conditions caused by retroviruses. Inhibitors of retroviral RNase H of several different chemotypes have been identified, many of which have a mechanism of action based on chelation of the active-site cations.[49] Reverse-transcriptase inhibitors that specifically inhibit the polymerase function of RT are in widespread clinical use, but not inhibitors of the RNase H function; it is the only enzymatic function encoded by HIV that is not yet targeted by drugs in clinical use.[50]
RNases H are widely distributed and occur in all domains of life. The family belongs to a larger superfamily of nuclease enzymes[6] [7] and is considered to be evolutionarily ancient.[51] In prokaryotic genomes, multiple RNase H genes are often present, but there is little correlation between occurrence of HI, HII, and HIII genes and overall phylogenetic relationships, suggesting that horizontal gene transfer may have played a role in establishing the distribution of these enzymes. RNase HI and HIII rarely or never appear in the same prokaryotic genome. When an organism's genome contains more than one RNase H gene, they sometimes have significant differences in activity level. These observations have been suggested to reflect an evolutionary pattern that minimizes functional redundancy among RNase H genes. RNase HIII, which is unique to prokaryotes, has a scattered taxonomic distribution and is found in both bacteria and archaea; it is believed to have diverged from HII fairly early.[52]
The evolutionary trajectory of RNase H2 in eukaryotes, especially the mechanism by which eukaryotic homologs became obligate heterotrimers, is unclear; the B and C subunits have no apparent homologs in prokaryotes.[24]
Because RNase H specifically degrades only the RNA in double-stranded RNA:DNA hybrids, it is commonly used as a laboratory reagent in molecular biology. Purified preparations of E. coli RNase HI and HII are commercially available. RNase HI is often used to destroy the RNA template after first-strand complementary DNA (cDNA) synthesis by reverse transcription. It can also be used to cleave specific RNA sequences in the presence of short complementary segments of DNA.[53] Highly sensitive techniques such as surface plasmon resonance can be used for detection.[54] [55] RNase HII can be used to degrade the RNA primer component of an Okazaki fragment or to introduce single-stranded nicks at positions containing a ribonucleotide. A variant of hot start PCR, known as RNase H-dependent PCR or rhPCR, has been described using a thermostable RNase HII from the hyperthermophilic archaeon Pyrococcus abyssi.[56] Of note, the ribonuclease inhibitor protein commonly used as a reagent is not effective at inhibiting the activity of either HI or HII.
Ribonucleases H were first discovered in the laboratory of Peter Hausen when researchers found RNA:DNA hybrid endonuclease activity in calf thymus in 1969 and gave it the name "ribonuclease H" to designate its hybrid specificity.[57] [58] RNase H activity was subsequently discovered in E. coli[59] and in a sample of oncoviruses with RNA genomes during early studies of viral reverse transcription.[60] [61] It later became clear that calf thymus extract contained more than one protein with RNase H activity[62] and that E. coli contained two RNase H genes.[63] [64] Originally, the enzyme now known as RNase H2 in eukaryotes was designated H1 and vice versa, but the names of the eukaryotic enzymes were switched to match those in E. coli to facilitate comparative analysis, yielding the modern nomenclature in which the prokaryotic enzymes are designated with Roman numerals and the eukaryotic enzymes with Arabic numerals.[28] [65] The prokaryotic RNase HIII, reported in 1999, was the last RNase H subtype to be identified.
Characterizing eukaryotic RNase H2 was historically a challenge, in part due to its low abundance. Careful efforts at purification of the enzyme suggested that, unlike the E. coli RNase H2, the eukaryotic enzyme had multiple subunits.[66] The S. cerevisiae homolog of the E. coli protein (that is, the H2A subunit) was easily identifiable by bioinformatics when the yeast genome was sequenced,[67] but the corresponding protein was found not to have enzymatic activity in isolation. Eventually, the yeast B and C subunits were isolated by co-purification and found to be required for enzymatic activity.[68] However, the yeast B and C subunits have very low sequence identity to their homologs in other organisms, and the corresponding human proteins were conclusively identified only after mutations in all three were found to cause Aicardi–Goutières syndrome.