Epigenetics of human development is the study of how epigenetics (hertiable characteristics that do not involve changes in DNA sequence) effects human development.
Development before birth, including gametogenesis, embryogenesis, and fetal development, is the process of body development from the gametes are formed to eventually combine into a zygote to when the fully developed organism exits the uterus. Epigenetic processes are vital to fetal development due to the need to differentiate from a single cell to a variety of cell types that are arranged in such a way to produce cohesive tissues, organs, and systems.
Epigenetic modifications such as methylation of CpGs (a dinucleotide composed of a 2'-deoxycytosine and a 2' deoxyguanosine) and histone tail modifications allow activation or repression of certain genes within a cell, in order to create cell memory either in favor of using a gene or not using a gene. These modifications can either originate from the parental DNA, or can be added to the gene by various proteins and can contribute to differentiation. Processes that alter the epigenetic profile of a gene include production of activating or repressing protein complexes, usage of non-coding RNAs to guide proteins capable of modification, and the proliferation of a signal by having protein complexes attract either another protein complex or more DNA in order to modify other locations in the gene.
Gene expression refers to the transcription of a gene but the RNA produced does not necessarily have to encode a protein product. Transcription may produce so called noncoding RNA products such as tRNA and regulatory RNA. Repression may refer to the decrease in transcription of a gene or inhibition of a protein. Proteins are often inhibited by binding the active site or causing a conformational change so that the active site can no longer bind. By making these alterations, proteins, like transcription factors, may bind DNA less or some protein may be inhibited so that it becomes a block in a signaling cascade and certain genes will then not be induced to be expressed. Repression can occur pre- or post-transcriptionally. Methylating the DNA or modifying the histones that the DNA wraps around is one example that commonly leads to repression. Pre-transcriptional repression can also occur by altering the proteins that allow transcription to occur, namely the polymerase complex. Proteins can sit on the DNA strand and serve as a kind of block to polymerase proteins, halting them from transcribing. Post-transcriptional repression generally refers to the degradation of the RNA product or binding the RNA with proteins so that it cannot be translated or carry out its function.
DNA methylation in humans and most other mammals refers to the methylation of a CpG. Methylation of these cytosines are common in DNA, and in sufficient numbers can prevent proteins from attaching to the DNA by obscuring the domain binding site's matching DNA to the protein. Regions in which cytosines prior to guanines are clustered and highly unmethylated are called CpG islands, and often serve as promoters, or transcription start sites.
Histone modifications are modifications made to the amino acid residues in the tails of the histones that either restrict the histone's ability to bind to DNA or boost the histone's ability to bind to DNA. Histone modifications also act as sites for proteins to attach, which then further alter the gene's expression. Two common histone modifications are acetylation and methylation. Acetylation is when a protein adds an acetyl group to a lysine in a histone tail in order to restrict the ability of the histone to bind to DNA. This acetylation is commonly found on lysine 9 of histone 3, notated as H3K9ac. This results in the DNA being more open to transcription, due to the decreased binding to the histone. Methylation, meanwhile, is when a protein adds a methyl group to a lysine in a histone tail, although more than one methyl group can be added at a time. Two sites for histone methylation are common in current studies: trimethylation of lysine 4 on histone 3 (H3K4me3), which causes activation, and trimethylation of lysine 27 on histone 3 (H3K27me3), which causes repression.
Cis acting elements refer to mechanisms that act on the same chromosome they come from, usually either in the same region from which they were produced or a region very close to this origin region. For example, a long non-coding RNA that is produced at one location silences the same or a different location on the same chromosome. Trans acting elements, however, are gene products from one location that act on a different chromosome, either the other in a chromosomal pair, or on a different chromosome from a separate chromosome pair. An example of this is a long non coding RNA from Hox gene C silences Hox gene D on a different chromosome, from a different chromosomal pair.
Hox genes are genes in humans that regulate body plan development. Humans have four sets of Hox genes, numbering 39 genes altogether, all of which aid in the differentiation of cells by location. Hox genes are activated early in the development of the embryo, in order to plan the development of the differing structures of the body. They also show colinearity with the body plan, meaning that the order of the Hox genes is similar to the expression levels of the Hox genes on the anterior-posterior axis. This colinearity allows for a spatial and temporal activation of genes in order to produce a proper body structure.[1] [2]
Hox genes are regulated using a variety of epigenetic mechanisms, including the use of lncRNAs such as HOTAIR, the Trithorax (TrxG) group of proteins, and the Polycomb (PcG) group of proteins.
In Hox genes, long non-coding RNAs allow for communication between different Hox genes and different sets of Hox genes in order to coordinate body plan in the cell. One example of a long non-coding RNA that coordinates between Hox gene sets is HOTAIR, which is an RNA transcript produced in the HoxC cassette that represses transcription of a large number of genes in the HoxD cassette. Thus, HOTAIR regulates the HoxD genes from the HoxC genes in order to coordinate transcription of the Hox genes.
The PcG and TrxG genes that produce protein complexes responsible for continuing the activation and the repression patterns in the Hox genes initially formed by the maternal factors. PcG genes are responsible for repressing chromatin in Hox clusters meant to be inactivated in the differentiated cell. PcG proteins repress genes by forming polycomb repressive complexes, such as PRC1 and PRC2. PRC2 complexes repress by trimethylating histone 3 at lysine 27 through histone methyltransferases Ezh2 and Ezh1. PRC2 is recruited by many elements, including CpG islands.[1] PRC1, meanwhile, ubiquitinates H2AK119 using Ring1A/B's E3 ligase activity, causing stalling of RNA polymerase II. Furthermore, Ring1B, a member of the PRC1 complex, also represses Hox genes with Me118, Mph2, and RYBP by compacting the chromatin into higher-order structures.[3] TrxG genes, meanwhile, are responsible for activating genes by trimethylating lysine 4 of the histone H3 tail. Genes with similar transcriptional marks tend to cluster together in distinct structures. In bivalent domains, both of these marks are present, indicating genes that are silenced but can be rapidly activated when necessary.[1] [4]
231 ncRNAs are present in the four Hox gene cassettes. Similarly to the Hox protein-coding genes, the ncRNAs show differential expression according to the cell's location on the anterior-posterior and proximal-distal axes. These lncRNAs can act either on the set of genes which they are present in, or can act on a separate gene set within the Hox genes.[5] [6] [7]
HOTTIP is a long non-coding RNA that assists in regulating the HoxA genes. It is produced from the 5' end of the HoxA gene cassette, and activates HoxA genes. Loops within the chromosome bring HOTTIP closer to its targets; this allows HOTTIP to bind to WDR5/MLL protein complexes to aid in trimethylation of lysine 4 of histone 3.[5]
HOTAIR is a long non-coding RNA that assists in regulating the HoxD genes. It is produced in the HoxC cassette, near the divide between expressed and unexpressed genes, and represses HoxD genes. HOTAIR acts by attaching to Suz12 in the PRC2 complex, and then guides this complex to the genes to be repressed. PRC2 then trimethylates the lysine 27 of histone 3, repressing the gene of interest.[1]
In female humans, Barr bodies are defined as the condensed and inactivated X-chromosome that is found in every cell of the adult. Because females have two nearly identical X chromosomes, one of them must be silenced so that the expression levels of the genes on the X-chromosome are of the proper dosage. Thus, males and females have the same level of X-chromosome expression, despite being born with one X for males and two for females. This is also why individuals with Klinefelter syndrome, a disease in which more than two sex chromosomes are present in the body, have fewer symptoms than individuals with other types of aneuploidy, which are often fatal before birth.[8]
Inactivation of one of the X chromosomes is initiated by a long non coding RNA called Xist. This lncRNA is expressed on the same chromosome it represses, known as working in cis. Recent research has shown that a repeat element in the RNA of Xist causes PRC2 to bind to the RNA. Another part of the RNA binds to the X-chromosome positioning PRC2 such that it can methylate various regions on the X-chromosome. This methylation causes other factors like histone deacetylases (HDACs) to bind to the chromosome and propagate heterochromatin formation, even into active gene regions. This heterochromatin greatly reduces, if not completely silences gene expression of the Barr body. Xist will be continuously created to maintain a condensed and silenced Barr body.[9] [10] [11]
In human cells with more than one X chromosome, two long non-coding RNAs are produced: Tsix is produced by one X chromosome, and Xist is produced by all of the other X chromosomes. Tsix is a long non-coding RNA that prevents repression of an X chromosome, while Xist is a long non-coding RNA that acts to repress and condense an entire X chromosome. The actions of Xist serve to create a Barr body in the cell.
In embryonic development, when the zygote is still composed of just a few cells, each cell of the zygote will randomly choose an X-chromosome to condense and silence. From then on, the daughter cells of that cell will always silence the same X-chromosome as the parent cell it propagated from. This creates what is known as the “mosaic effect,” in which differential X-chromosome expression creates differing genotypes throughout a single organism. This may or may not be evident in females, depending on how the genes of the X-chromosomes affect phenotype. If the alleles for a gene are identical on both X-chromosomes, then you will see no difference between the cells that chose one X over the other. If the alleles are different for, say, fur color, then you may see patches of one color and patches of the other color. In calico cats the mosaic pattern of X inactivation is easily seen because a gene affecting coat color is carried on the X, resulting in patches of color on the coat. The mosaic pattern of X inactivation may also determine how penetrant a disease is, if the disease allele is present on one X-chromosome and not the other. The organism may have few cells in which the diseased allele has not been condensed, leading to little expression of the disease allele. This is referred to as skewed X-chromosome inactivation.
Imprinting is defined as the differential expression of paternal and maternal alleles of a gene, due to epigenetic marks introduced onto the chromosome during the production of egg and sperm. These marks usually lead to differential expression of the specific sets of genes from the maternal and paternal chromosomes. Imprinting is carried out through many epigenetic mechanisms like methylation, histone modifications, rearrangement of higher order chromatin structure, non-coding RNAs, and interfering RNAs.[12] [13]
A single evolutionary purpose of imprinting is still unknown, since the mechanisms and effects seem to be so diverse. One hypothesis states that imprinting occurs in order to carry out the evolutionary goal of the parent, that being the differential partition of resources. The male seeks to provide maximum resources for his offspring so that his genes may be passed on successfully to the next generation, whereas the female must partition resources between all her offspring, and so must limit resources given.[14] [15]
Another hypothesis states that imprinting may help protect the female from ovarian trophoblastic disease and parthenogenesis. Trophoblastic disease occurs when a sperm fertilizes an egg with no nucleus and a cancer-like mass forms in the placenta.[16] Parthenogenesis occurs when an unfertilized egg develops into a fully functional organism that is genetically identical to the parent, who is female in the case of animals or both sexes, in the case of plants.[17] This does not occur naturally in mammals. In most animals, especially mammals, uniparental inheritance of chromosomes is often lethal or results in developmental abnormalities, sometimes physically but often cognitively. Other hypotheses point to the function of imprinting as a way of establishing the proper amount of expression or functional haploidy, much like silencing the extra X-chromosome in females (see section on Barr bodies). Imprinting may help in the differentiation of cells by silencing pluripotency genes or other developmental genes. Supporting this hypothesis, imprinted genes have been shown to differ in their expression between tissue types in the same organism, pointing to divergent outcomes as a result of developmental events during embryogenesis. Regardless of whether there is a single purpose for imprinting, numerous studies have shown that a normal and functional organism cannot be made without the various imprinting mechanisms.
In mammals, imprinted genes are often clustered in the genome, probably because they share transcriptional regulators or regulatory regions that impact the expression of multiple genes. It is easier for a lncRNA to silence multiple genes if they are closer together, making silencing more efficient. In some cases, when a gene is transcribed it overlaps another region nearby or opposite (antisense) to it, often silencing it. In the case of the Ifg2 and H19 genes, CTCF, a transcriptional repressor protein, is involved. CTCF binds to the unmethylated maternal ICR region but not the methylated paternal ICR region. ICR is a shared control region of Ifg2 and H19 that, when deleted, results in the loss of imprinting of these genes. CTCF then binds another region of the chromosome, creating a loop where Igf2 is blocked from transcription, but H19 is not, resulting in the maternal chromosome expressing H19 but not Igf2. CTCF has been shown to directly interact with Suz12, a subunit of PRC2, in order to silence the Ifg2 promoter region through hypermethylation. Conversely, the paternal H19 promoter is highly methylated during embryogenesis so that Ifg2 will not be silenced. Should CTCF fail to bind, H19 on the maternal chromosome has reduced expression and Igf2 is not silenced properly, resulting in biallelic expression. Mice have homologues of these genes, but silence them in a different way, where biallelic expression occurs and then antisense RNA is used to silence one of the genes.[18] [19] [20]
Airn is an lncRNA used to silence Igf2r and other surrounding genes. In the mechanism to silence Igf2r, the transcription of the lncRNA Airn silences the expression of Igf2r, as opposed to an active repression mechanism. Airn is the antisense gene of Ifg2r, so if Airn is being transcribed, the transcriptional machinery may cover a part of or the entire promoter region of Igf2r, so RNA polymerase cannot bind to the promoter region of Igf2r in order to initiate transcription. This mechanism is very efficient in that Igf2r is silenced by transcription of Airn, while the RNA product silences other genes near Igf2r. The imprinting mechanisms described above work on the chromosome that the Airn lncRNA is produced, but there are many other imprinted genes that work to silence genes on other chromosomes or to silence the similar allele(s) on the opposing chromosome of the same pair. Some imprinted genes code for regulatory RNA elements such as lncRNA, small nucleolar RNA, and micro RNA, so the expression of these genes results in the silencing of some other gene.[21]
From these examples, researchers have seen similar patterns in developmental genetics. It is imperative that many genes are silenced at the right time so that cells can maintain their identity and expressional integrity. Failure to do so often leads to symptoms such as cognitive abnormalities, if not fatality.[22]
The lncRNA Airn is an lncRNA that regulates Igf2r expression. Igf2r is a gene which expresses a receptor for insulin-like growth factor 2, and assists in lysosomal enzyme transport, activation of growth factors, and degradation of insulin-like growth factor 2.[23] This lncRNA is an RNA modified by imprinting, leading to Airn expression in the paternal allele, but not in the maternal allele. Airn acts by cis-acting silencing of the Igf2r region through overlapping the Igf2r gene through the antisense transcript encoded by Airn. Airn is silenced in the maternal allele through Igf2r transcription. In the brain, however, Igf2r alleles are both expressed due to Airn mediation being repressed in neuronal cells.[24]
PRC2 (Polycomb repressive complex 2) is a complex of proteins that repress chromatin by histone methylation and by working to recruit other proteins that help further the repression of chromatin. The structure of this complex and group of mechanisms used by this complex are highly conserved across various eukaryotic species. Very few species have duplicates of these complexes in the genome beyond PRC1 and PRC2.[25] [26]
Long non-coding RNAs, or lncRNAs, are RNA transcripts produced by RNA polymerase II that are not translated but participate in the regulation of gene expression. Long non-coding RNAs are used in various epigenetic processes in development, including the regulation of Hox genes, as well as in the creation of Barr bodies.[27] [28]
Although PRC2 seems to have a very simple mechanism and works on many genes and chromosomes across the genome, it often has very specific binding regions and has been observed to localize to specific genes to cause their repression. Recent research shows that it probably does this through the binding of long non coding RNAs (lncRNAs). Xist and Hox genes have both been studied extensively and display this mechanism very well. The lncRNA that the complex binds does not necessarily need to hybridize to the target region in order to silence it, as evidenced by the PRC2-lncRNA complex working on regions other than the region from which this complex was produced. However, the three-dimensional configuration of the RNA often gives the complex specific localization to regions where the RNA is created to bind.
PRC2 is a multi-protein complex composed of four major subunits (E2H1/2, SUZ12, EED, and RbAp46/48) and three variable subunits (AEBP2, JARID2, and PCLs). The three variable subunits are used for catalysis of enzymatic reactions or binding to specific regions, not for repression of genes or chromatin. Similar to a zinc finger, AEBP2 docks onto the major grooves of DNA to assist in binding.[29] PRC2 is usually recruited by other proteins or lncRNa and then catalyzes the trimethylation of lysine 27 of histone 3 tails (H3K27me3). This methylation is thought to cause repression by steric hindrance of RNA polymerase II. Even though the polymerase is not prevented from binding, the polymerase, after beginning transcription, will pause at H3K27me3 marks. The short transcript produced by the pausing of the polymerase often recruits regulatory complexes, like PRC2. Thus, PRC2 represses by two mechanisms: by directly altering the structure of the chromatin through methylation or by binding of transcripts.[18] [30]
PRC2 has been shown in many experiments to be necessary for the proper formation of organs, starting with the maintenance of cellular differentiation and silencing of pluripotency genes. The exact mechanism in early embryogenesis that induces cells to differentiate is still unclear, but this mechanism has been closely linked to protein kinase A (PKA). Since the PRC2 complex has sites able to be phosphorylated and has differentiated behavior based on the level of phosphorylation, a logical hypothesis can be made that PKA affects PRC2 behavior and may phosphorylate PRC2, activating the protein and starting the methylation cascade that silences genes.[31]
Experimentally, PRC2 has been shown to be highly enriched at the Hox genes and near developmental gene regulators, resulting in their methylation. Some time after the second or third cleavage event, PRC2 begins to bind to these developmental genes, even though they have the markers for highly active genes like H3K9me3. This has been described as the “leaking” of PRC2 binding. Variable binding will cause some genes to be silenced before others, causing differentiation, but this is likely regulated by the organism. What causes the specificity of cell differentiation is still unknown but some hypotheses say it largely has to do with the cell environment and the “awareness” of the cells to each other, considering all cells in this stage contain identical genomes at this point. The maintained cell lines after this differentiation event are largely dependent on PRC2. Without it, pluripotency genes will still be active, causing the cells to be unstable and reversion back to a stem cell-like stage where the cell would have to undergo differentiation again in order to return to its normal state. Properly differentiated cells have silenced pluripotency genes.[32]
PRC2 is also highly associated with intergenic regions, subtelomeric regions, and long-terminal repeat transposons. PRC2 acts to create heterochromatin in these regions through similar mechanisms to the mechanism used to repress genes. Heterochromatin formation is imperative in these regions in order to regulate gene expression, maintain chromatin shape, prevent degradation of the chromosome, and reduce the event of transposon “hopping” or spontaneous recombination.[32]
Thus, PRC2 is not only essential to the initiation of differentiation in development, but also for maintaining heterochromatin in all cell stages and for silencing genes and chromosome regions that would undo the cell differentiation that had already occurred or negatively affect the survival of the cell or the organism as a whole.
Neat1 is an lncRNA which assists in forming the structure of nuclear structures known as paraspeckles: nuclear bodies which contain RNA-binding proteins.[33] They control gene expression in the nucleus by retaining RNA in the nucleus that would otherwise alter gene expression.[34] Paraspeckles form a significant portion of the corpus luteum of the ovary; in Neat1 impaired mice, corpus luteum formation is highly dysfunctional, causing ovarian defects and lowered progesterone levels resulting in a lack of pregnancy in Neat1 deficient mice. Neat1 assists in regulation of luteal genes by preventing the protein Sfpq from inhibiting Nr5a1 and Sp1, allowing luteal genes to be regularly transcribed. Neat1 is regulated by histone deacetylases.[33] [35] [36]
Evf2 is a lncRNA that acts in forebrain neuronal differentiation during embryonic development. Evf2 is transcribed from an ultraconserved region, or a region that is very highly conserved among most vertebrate species, within the region from Dlx5 to Dlx6. This region is a target for SHH, a highly important regulator of central nervous system development. Evf2, when transcribed, recruits Dlx and Mecp2 through cis and trans-acting mechanisms to the Dlx5/6 region in the ventral forebrain, causing GABAergic interneurons in the hippocampus to be formed.[37] Evf2 acts by forming a complex with Dlx4 that increases Dlx4 transcription activation ability and stability.[38]
Malat1, another neurological lncRNA, causes increased synaptic function and greater amounts of dendrite development. Increases of Malat1 increase neuronal density, while decreases of Malat1 decrease neuronal density. Malat1 acts by regulating the expression levels of Nlgn1 and SynCAM1 which are important genes in synapse formation.[38]
Bromodomain protein 4, or BRD4, is a protein which binds to acetylated tails of histones H3 and H4 to aid active gene transcription by decompaction using the bromodomain with the assistance of the acetylated K5 on H4. BRD4 is a member of the BET protein family, which includes other bromodomain-containing proteins and their homologues in other species. BRD4 is a protein which functions in both gene activation and repression in cell cycle control and DNA replication. BRD4 functions by binding to the acetylated tails and then attaching to other proteins, allowing those proteins to either activate or repress the histones next to BRD4.[39]
BRD4 aids in early cell development by activating pluripotent genes through interacting with Oct4 and recruiting P-TEFb (positive transcription elongation factor). By occupying pluripotent genes and X-chromosome inactivation lncRNAs in their regulatory regions, BRD4 enhances activation of these DNA regions. BRD4 enhances this activation by recruiting P-TEFb; if either BRD4 or P-TEFb is not functional, pluripotent gene transcription is blocked, and the cell differentiates into a neuroectodermal cell.[40]
BRD4 can act as epigenetic bookmarking throughout the cell cycle, including after transcription, due to its association with P-TEFb, allowing BRD4 to enhance RNAPII.[40]
BRD4 also assists in the hyperacetylation of histones in the sperm nucleus. Histone hyperacetylation, the addition of acetyl groups to lysines on the amino tails of histones in an amount much larger than normal, is believed to assist in histone removal from the sperm nucleus.[41] [42]
Examples of diseases caused by epigenetic dysfunction in development include: