5-Methylcytosine is a methylated form of the DNA base cytosine (C) that regulates gene transcription and takes several other biological roles.[1] When cytosine is methylated, the DNA maintains the same sequence, but the expression of methylated genes can be altered (the study of this is part of the field of epigenetics). 5-Methylcytosine is incorporated in the nucleoside 5-methylcytidine.
In 5-methylcytosine, a methyl group is attached to the 5th atom in the 6-atom ring, counting counterclockwise from the NH-bonded nitrogen at the six o'clock position. This methyl group distinguishes 5-methylcytosine from cytosine.
While trying to isolate the bacterial toxin responsible for tuberculosis, W.G. Ruppel isolated a novel nucleic acid named tuberculinic acid in 1898 from Tubercle bacillus.[2] The nucleic acid was found to be unusual, in that it contained in addition to thymine, guanine and cytosine, a methylated nucleotide. In 1925, Johnson and Coghill successfully detected a minor amount of a methylated cytosine derivative as a product of hydrolysis of tuberculinic acid with sulfuric acid.[3] [4] This report was severely criticized because their identification was based solely on the optical properties of the crystalline picrate, and other scientists failed to reproduce the same result.[5] But its existence was ultimately proven in 1948, when Hotchkiss separated the nucleic acids of DNA from calf thymus using paper chromatography, by which he detected a unique methylated cytosine, quite distinct from conventional cytosine and uracil.[6] After seven decades, it turned out that it is also a common feature in different RNA molecules, although the precise role is uncertain.[7]
The function of this chemical varies significantly among species:[8]
While spontaneous deamination of cytosine forms uracil, which is recognized and removed by DNA repair enzymes, deamination of 5-methylcytosine forms thymine. This conversion of a DNA base from cytosine (C) to thymine (T) can result in a transition mutation.[11] In addition, active enzymatic deamination of cytosine or 5-methylcytosine by the APOBEC family of cytosine deaminases could have beneficial implications on various cellular processes as well as on organismal evolution.[12] The implications of deamination on 5-hydroxymethylcytosine, on the other hand, remains less understood.
The NH2 group can be removed (deamination) from 5-methylcytosine to form thymine with use of reagents such as nitrous acid; cytosine deaminates to uracil (U) under similar conditions.
5-methylcytosine is resistant to deamination by bisulfite treatment, which deaminates cytosine residues. This property is often exploited to analyze DNA cytosine methylation patterns with bisulfite sequencing.[13]
5mC marks are placed on genomic DNA via DNA methyltransferases (DNMTs). There are 5 DNMTs in humans: DNMT1, DNMT2, DNMT3A, DNMT3B, and DNMT3L, and in algae and fungi 3 more are present (DNMT4, DNMT5, and DNMT6).[14] DNMT1 contains the replication foci targeting sequence (RFTS) and the CXXC domain which catalyze the addition of 5mC marks. RFTS directs DNMT1 to loci of DNA replication to assist in the maintenance of 5mC on daughter strands during DNA replication, whereas CXXC contains a zinc finger domain for de novo addition of methylation to the DNA.[15] DNMT1 was found to be the predominant DNA methyltransferase in all human tissue.[16] Primarily, DNMT3A and DNMT3B are responsible for de novo methylation, and DNMT1 maintains the 5mC mark after replication. DNMTs can interact with each other to increase methylating capability. For example, 2 DNMT3L can form a complex with 2 DNMT3A to improve interactions with the DNA, facilitating the methylation.[17] Changes in the expression of DNMT results in aberrant methylation. Overexpression produces increased methylation, whereas disruption of the enzyme decreased levels of methylation.
The mechanism of the addition is as follows: first a cysteine residue on the DNMT's PCQ motif creates a nucleophillic attack at carbon 6 on the cytosine nucleotide that is to be methylated. S-Adenosylmethionine then donates a methyl group to carbon 5. A base in the DNMT enzyme deprotonates the residual hydrogen on carbon 5 restoring the double bond between carbon 5 and 6 in the ring, producing the 5-methylcytosine base pair.
After a cytosine is methylated to 5mC, it can be reversed back to its initial state via multiple mechanisms. Passive DNA demethylation by dilution eliminates the mark gradually through replication by a lack of maintenance by DNMT. In active DNA demethylation, a series of oxidations converts it to 5-hydroxymethylcytosine (5hmC), 5-formylcytosine (5fC), and 5-carboxylcytosine (5caC), and the latter two are eventually excised by thymine DNA glycosylase (TDG), followed by base excision repair (BER) to restore the cytosine. TDG knockout produced a 2-fold increase of 5fC without any statistically significant change to levels of 5hmC, indicating 5mC must be iteratively oxidized at least twice before its full demethylation.[18] The oxidation occurs through the TET (Ten-eleven translocation) family dioxygenases (TET enzymes) which can convert 5mC, 5hmC, and 5fC to their oxidized forms. However, the enzyme has the greatest preference for 5mC and the initial reaction rate for 5hmC and 5fC conversions with TET2 are 4.9-7.6 fold slower.[19] TET requires Fe(II) as cofactor, and oxygen and α-ketoglutarate (α-KG) as substrates, and the latter substrate is generated from isocitrate by the enzyme isocitrate dehydrogenase (IDH).[20] Cancer however can produce 2-hydroxyglutarate (2HG) which competes with α-KG, reducing TET activity, and in turn reducing conversion of 5mC to 5hmC.[21]
In cancer, DNA can become both overly methylated, termed hypermethylation, and under-methylated, termed hypomethylation.[22] CpG islands overlapping gene promoters are de novo methylated resulting in aberrant inactivation of genes normally associated with growth inhibition of tumors (an example of hypermethylation).[23] Comparing tumor and normal tissue, the former had elevated levels of the methyltransferases DNMT1, DNMT3A, and mostly DNMT3B, all of which are associated with the abnormal levels of 5mC in cancer. Repeat sequences in the genome, including satellite DNA, Alu, and long interspersed elements (LINE), are often seen hypomethylated in cancer, resulting in expression of these normally silenced genes, and levels are often significant markers of tumor progression. It has been hypothesized that there a connection between the hypermethylation and hypomethylation; over activity of DNA methyltransferases that produce the abnormal de novo 5mC methylation may be compensated by the removal of methylation, a type of epigenetic repair. However, the removal of methylation is inefficient resulting in an overshoot of genome-wide hypomethylation. The contrary may also be possible; over expression of hypomethylation may be silenced by genome-wide hypermethylation. Cancer hallmark capabilities are likely acquired through epigenetic changes that alter the 5mC in both the cancer cells and in surrounding tumor-associated stroma within the tumor microenvironment.[24] The anticancer drug Cisplatin has been reported to react with 5mC.[25]
"Epigenetic age" refers to the connection between chronological age and levels of DNA methylation in the genome.[26] Coupling the levels of DNA methylation, in specific sets of CpGs called "clock CpGs", with algorithms that regress the typical levels of collective genome-wide methylation at a given chronological age, allow for epigenetic age prediction. During youth (0–20 years old), changes in DNA methylation occur at a faster rate as development and growth progresses, and the changes begin to slow down at older ages. Multiple epigenetic age estimators exist. Horvath's clock measures a multi-tissue set of 353 CpGs, half of which positively correlate with age, and the other half negatively, to estimate the epigenetic age.[27] Hannum's clock utilizes adult blood samples to calculate age based on an orthogonal basis of 71 CpGs.[28] Levine's clock, known as DNAm PhenoAge, depends on 513 CpGs and surpasses the other age estimators in predicting mortality and lifespan, yet displays bias with non-blood tissues.[29] There are reports of age estimators with the methylation state of only one CpG in the gene ELOVL2.[30] Estimation of age allows for prediction lifespan through expectations of age related conditions that individuals may be subject to based on their 5mC methylation markers.