A primary transcript is the single-stranded ribonucleic acid (RNA) product synthesized by transcription of DNA, and processed to yield various mature RNA products such as mRNAs, tRNAs, and rRNAs. The primary transcripts designated to be mRNAs are modified in preparation for translation. For example, a precursor mRNA (pre-mRNA) is a type of primary transcript that becomes a messenger RNA (mRNA) after processing.
Pre-mRNA is synthesized from a DNA template in the cell nucleus by transcription. Pre-mRNA comprises the bulk of heterogeneous nuclear RNA (hnRNA). Once pre-mRNA has been completely processed, it is termed "mature messenger RNA", or simply "messenger RNA". The term hnRNA is often used as a synonym for pre-mRNA, although, in the strict sense, hnRNA may include nuclear RNA transcripts that do not end up as cytoplasmic mRNA.
There are several steps contributing to the production of primary transcripts. All these steps involve a series of interactions to initiate and complete the transcription of DNA in the nucleus of eukaryotes. Certain factors play key roles in the activation and inhibition of transcription, where they regulate primary transcript production. Transcription produces primary transcripts that are further modified by several processes. These processes include the 5' cap, 3'-polyadenylation, and alternative splicing. In particular, alternative splicing directly contributes to the diversity of mRNA found in cells. The modifications of primary transcripts have been further studied in research seeking greater knowledge of the role and significance of these transcripts. Experimental studies based on molecular changes to primary transcripts and the processes before and after transcription have led to greater understanding of diseases involving primary transcripts.
See main article: Transcription (genetics).
The steps contributing to the production of primary transcripts involve a series of molecular interactions that initiate transcription of DNA within a cell's nucleus. Based on the needs of a given cell, certain DNA sequences are transcribed to produce a variety of RNA products to be translated into functional proteins for cellular use. To initiate the transcription process in a cell's nucleus, DNA double helices are unwound and hydrogen bonds connecting compatible nucleic acids of DNA are broken to produce two unconnected single DNA strands.[1] One strand of the DNA template is used for transcription of the single-stranded primary transcript mRNA. This DNA strand is bound by an RNA polymerase at the promoter region of the DNA.[2] In eukaryotes, three kinds of RNA—rRNA, tRNA, and mRNA—are produced based on the activity of three distinct RNA polymerases, whereas, in prokaryotes, only one RNA polymerase exists to create all kinds of RNA molecules.[3] RNA polymerase II of eukaryotes transcribes the primary transcript, a transcript destined to be processed into mRNA, from the antisense DNA template in the 5' to 3' direction, and this newly synthesized primary transcript is complementary to the antisense strand of DNA. RNA polymerase II constructs the primary transcript using a set of four specific ribonucleoside monophosphate residues (adenosine monophosphate (AMP), cytidine monophosphate (CMP), guanosine monophosphate (GMP), and uridine monophosphate (UMP)) that are added continuously to the 3' hydroxyl group on the 3' end of the growing mRNA.
Studies of primary transcripts produced by RNA polymerase II reveal that an average primary transcript is 7,000 nucleotides in length, with some growing as long as 20,000 nucleotides in length.[2] The inclusion of both exon and intron sequences within primary transcripts explains the size difference between larger primary transcripts and smaller, mature mRNA ready for translation into protein.
A number of factors contribute to the activation and inhibition of transcription and therefore regulate the production of primary transcripts from a given DNA template.
Activation of RNA polymerase activity to produce primary transcripts is often controlled by sequences of DNA called enhancers. Transcription factors, proteins that bind to DNA elements to either activate or repress transcription, bind to enhancers and recruit enzymes that alter nucleosome components, causing DNA to be either more or less accessible to RNA polymerase. The unique combinations of either activating or inhibiting transcription factors that bind to enhancer DNA regions determine whether or not the gene that enhancer interacts with is activated for transcription or not.[4] Activation of transcription depends on whether or not the transcription elongation complex, itself consisting of a variety of transcription factors, can induce RNA polymerase to dissociate from the Mediator complex that connects an enhancer region to the promoter.
Inhibition of RNA polymerase activity can also be regulated by DNA sequences called silencers. Like enhancers, silencers may be located at locations farther up or downstream from the genes they regulate. These DNA sequences bind to factors that contribute to the destabilization of the initiation complex required to activate RNA polymerase, and therefore inhibit transcription.[5]
Histone modification by transcription factors is another key regulatory factor for transcription by RNA polymerase. In general, factors that lead to histone acetylation activate transcription while factors that lead to histone deacetylation inhibit transcription.[6] Acetylation of histones induces repulsion between negative components within nucleosomes, allowing for RNA polymerase access. Deacetylation of histones stabilizes tightly coiled nucleosomes, inhibiting RNA polymerase access. In addition to acetylation patterns of histones, methylation patterns at promoter regions of DNA can regulate RNA polymerase access to a given template. RNA polymerase is often incapable of synthesizing a primary transcript if the targeted gene's promoter region contains specific methylated cytosines— residues that hinder binding of transcription-activating factors and recruit other enzymes to stabilize a tightly bound nucleosome structure, excluding access to RNA polymerase and preventing the production of primary transcripts.
R-loops are formed during transcription. An R-loop is a three-stranded nucleic acid structure containing a DNA-RNA hybrid region and an associated non-template single-stranded DNA. Actively transcribed regions of DNA often form R-loops that are vulnerable to DNA damage. Introns reduce R-loop formation and DNA damage in highly expressed yeast genes.[7]
DNA damages arise in each cell, every day, with the number of damages in each cell reaching tens to hundreds of thousands, and such DNA damages can impede primary transcription.[8] The process of gene expression itself is a source of endogenous DNA damages resulting from the susceptibility of single-stranded DNA to damage.[8] Other sources of DNA damage are conflicts of the primary transcription machinery with the DNA replication machinery, and the activity of certain enzymes such as topoisomerases and base excision repair enzymes. Even though these processes are tightly regulated and are usually accurate, occasionally they can make mistakes and leave behind DNA breaks that drive chromosomal rearrangements or cell death.[8]
See main article: Post-transcriptional modification. Transcription, a highly regulated phase in gene expression, produces primary transcripts. However, transcription is only the first step which should be followed by many modifications that yield functional forms of RNAs.[9] Otherwise stated, the newly synthesized primary transcripts are modified in several ways to be converted to their mature, functional forms to produce different proteins and RNAs such as mRNA, tRNA, and rRNA.
The basic primary transcript modification process is similar for tRNA and rRNA in both eukaryotic and prokaryotic cells. On the other hand, primary transcript processing varies in mRNAs of prokaryotic and eukaryotic cells.[9] For example, some prokaryotic bacterial mRNAs serve as templates for synthesis of proteins at the same time they are being produced via transcription. Alternatively, pre-mRNA of eukaryotic cells undergo a wide range of modifications prior to their transport from the nucleus to cytoplasm where their mature forms are translated.[9] These modifications are responsible for the different types of encoded messages that lead to translation of various types of products. Furthermore, primary transcript processing provides a control for gene expression as well as a regulatory mechanism for the degradation rates of mRNAs. The processing of pre-mRNA in eukaryotic cells includes 5' capping, 3' polyadenylation, and alternative splicing.
See main article: Five-prime cap. Shortly after transcription is initiated in eukaryotes, a pre-mRNA's 5' end is modified by the addition of a 7-methylguanosine cap, also known as a 5' cap.[9] The 5' capping modification is initiated by the addition of a GTP to the 5' terminal nucleotide of the pre-mRNA in reverse orientation followed by the addition of methyl groups to the G residue.[9] 5' capping is essential for the production of functional mRNAs since the 5' cap is responsible for aligning the mRNA with the ribosome during translation.[9]
See main article: Polyadenylation. In eukaryotes, polyadenylation further modifies pre-mRNAs during which a structure called the poly-A tail is added.[9] Signals for polyadenylation, which include several RNA sequence elements, are detected by a group of proteins which signal the addition of the poly-A tail (approximately 200 nucleotides in length). The polyadenylation reaction provides a signal for the end of transcription and this reaction ends approximately a few hundred nucleotides downstream from the poly-A tail location.[9]
See main article: Alternative splicing. Eukaryotic pre-mRNAs have their introns spliced out by spliceosomes made up of small nuclear ribonucleoproteins.[10] [11]
In complex eukaryotic cells, one primary transcript is able to prepare large amounts of mature mRNAs due to alternative splicing. Alternative splicing is regulated so that each mature mRNA may encode a multiplicity of proteins. The effect of alternative splicing in gene expression can be seen in complex eukaryotes which have a fixed number of genes in their genome yet produce much larger numbers of different gene products.[9] Most eukaryotic pre-mRNA transcripts contain multiple introns and exons. The various possible combinations of 5' and 3' splice sites in a pre-mRNA can lead to different excision and combination of exons while the introns are eliminated from the mature mRNA. Thus, various kinds of mature mRNAs are generated.[9] Alternative splicing takes place in a large protein complex called the spliceosome. Alternative splicing is crucial for tissue-specific and developmental regulation in gene expression.[9] Alternative splicing can be affected by various factors, including mutations such as chromosomal translocation.
In prokaryotes, splicing is done by autocatalytic cleavage or by endolytic cleavage. Autocatalytic cleavages, in which no proteins are involved, are usually reserved for sections that code for rRNA, whereas endolytic cleavage corresponds to tRNA precursors.
A study by Cindy L. Wills and Bruce J. Dolnick from the Department of Experimental Therapeutics at Roswell Park Comprehensive Cancer Center (then known as the Roswell Park Memorial Institute) in Buffalo, New York and from the Cell and Molecular Biology Program at University of Wisconsin in Madison, Wisconsin was made to understand cellular processes involving primary transcripts. Researchers wanted to understand whether 5-Fluorouracil (FUra), a drug known for use in cancer treatment, inhibits or shuts down dihydrofolate reductase (DHFR) pre-mRNA processing and/or nuclear mRNA stability in methotrexate-resistant KB cells. Long-term exposure to FUra had no effect on the level of DHFR pre-mRNA containing certain introns, which are sections of pre-mRNA that are usually cut out of the sequence as a part of processing. However, levels of total DHFR mRNA decreased two-fold in cells exposed to 1.0 μM FUra. There was no significant change in the half-life, which refers to the time it takes 50% of the mRNA to decay, of total DHFR mRNA or pre-mRNA observed in cells exposed to FUra. And nuclear/cytoplasmic RNA labeling experiments demonstrated that the rate of nuclear DHFR RNA changing to cytoplasmic DHFR mRNA decreased in cells treated with FUra. These results provide further evidence that FUra may help in the processing of mRNA precursors and/or affect the stability of nuclear DHFR mRNA.[12]
Judith Lengyel and Sheldon Penman from the department of Biology at the Massachusetts Institute of Technology (MIT) in Cambridge, Massachusetts wrote an article about one type of primary transcript involved in the genes of two dipterans, or insects that have two wings: Drosophila and Aedes. The article describes how researchers looked at hnRNA, or basically pre-mRNA, primary transcripts in the two kinds of insects. The size of hnRNA transcripts and the fraction of hnRNA that is converted to mRNA in cell lines, or groups of cells derived from a single cell of whatever one is studying, of Drosophila melanogaster and Aedes albopictus were compared. Both insects are dipterans, but Aedes has a larger genome than Drosophila. This means that Aedes has more DNA, which means more genes. The Aedes line make larger hnRNA than did the Drosophila line even though the two cell lines grew under similar conditions and produced mature or processed mRNA of the same size and sequence complexity. These data suggest that the size of hnRNA increases with increasing genome size, which is obviously shown by Aedes.[13]
Ivo Melcak, Stepanka Melcakova, Vojtech Kopsky, Jaromıra Vecerova and Ivan Raska from the department of Cell Biology at the Institute of Experimental Medicine, at the Academy of Sciences of Czech Republic in Prague studied the influences of nuclear speckles on pre-mRNA. Nuclear speckles (speckles) are a part of the nuclei of cells and are enriched with splicing factors known for involvement in mRNA processing. Nuclear speckles have shown to serve neighboring active genes as storage places of these splicing factors. In this study, researchers showed that, in HeLa cells which derived from cells of a person who had cervical cancer and have proven useful for experiments, the first group of spliceosomes on pre-mRNA come from these speckles. Researchers used microinjections of spliceosome-accepting and mutant adenovirus pre-mRNAs with differential splicing factor binding to make different groups and then followed the sites in which they were heavily present. Spliceosome-accepting pre-mRNAs were rapidly targeted into the speckles, but the targeting was found to be temperature-dependent. The polypyrimidine tract sequences in mRNA promote the construction of spliceosome groups and is required for targeting, but, by itself, was not sufficient. The downstream flanking sequences were particularly important for the targeting of the mutant pre-mRNAs in the speckles. In supportive experiments, the behavior of the speckles was followed after the microinjection of antisense deoxyoligoribonucleotides (complementary sequences of DNA and or RNA to a specific sequence) and, in this case, specific sequences of snRNAs. snRNAs are known for helping in the processing of pre-mRNA as well. Under these conditions, spliceosome groups formed on endogenous pre-mRNAs. Researchers concluded that the spliceosome groups on microinjected pre-mRNA form inside the speckles. Pre-mRNA targeting and buildup in the speckles is a result of the loading of splicing factors to the pre-mRNA, and the spliceosome groups gave rise to the speckled pattern observed.[14]
Research has also led to greater knowledge about certain diseases related to changes within primary transcripts. One study involved estrogen receptors and differential splicing. The article entitled, "Alternative splicing of the human estrogen receptor alpha primary transcript: mechanisms of exon skipping" by Paola Ferro, Alessandra Forlani, Marco Muselli and Ulrich Pfeffer from the laboratory of Molecular Oncology at National Cancer Research Institute in Genoa, Italy, explains that 1785 nucleotides of the region in the DNA that codes for the estrogen receptor alpha (ER-alpha) are spread over a region that holds more than 300,000 nucleotides in the primary transcript. Splicing of this pre-mRNA frequently leads to variants or different kinds of the mRNA lacking one or more exons or regions necessary for coding proteins. These variants have been associated with breast cancer progression.[15] In the life cycle of retroviruses, proviral DNA is incorporated in transcription of the DNA of the cell being infected. Since retroviruses need to change their pre-mRNA into DNA so that this DNA can be integrated within the DNA of the host it is affecting, the formation of that DNA template is a vital step for retrovirus replication. Cell type, the differentiation or changed state of the cell, and the physiological state of the cell, result in a significant change in the availability and activity of certain factors necessary for transcription. These variables create a wide range of viral gene expression. For example, tissue culture cells actively producing infectious virions of avian or murine leukemia viruses (ASLV or MLV) contain such high levels of viral RNA that 5–10% of the mRNA in a cell can be of viral origin. This shows that the primary transcripts produced by these retroviruses do not always follow the normal path to protein production and convert back into DNA in order to multiply and expand.[16]