5′ untranslated region |
The 5′ untranslated region (also known as 5′ UTR, leader sequence, transcript leader, or leader RNA) is the region of a messenger RNA (mRNA) that is directly upstream from the initiation codon. This region is important for the regulation of translation of a transcript by differing mechanisms in viruses, prokaryotes and eukaryotes. While called untranslated, the 5′ UTR or a portion of it is sometimes translated into a protein product. This product can then regulate the translation of the main coding sequence of the mRNA. In many organisms, however, the 5′ UTR is completely untranslated, instead forming a complex secondary structure to regulate translation.
The 5′ UTR has been found to interact with proteins relating to metabolism, and within the 5′ UTR. In addition, this region has been involved in transcription regulation, such as the sex-lethal gene in Drosophila. Regulatory elements within 5′ UTRs have also been linked to mRNA export.[1]
The 5′ UTR begins at the transcription start site and ends one nucleotide (nt) before the initiation sequence (usually AUG) of the coding region. In prokaryotes, the length of the 5′ UTR tends to be 3–10 nucleotides long, while in eukaryotes it tends to be anywhere from 100 to several thousand nucleotides long.[2] For example, the ste11 transcript in Schizosaccharomyces pombe has a 2273 nucleotide 5′ UTR[3] while the lac operon in Escherichia coli only has seven nucleotides in its 5′ UTR.[4] The differing sizes are likely due to the complexity of the eukaryotic regulation which the 5′ UTR holds as well as the larger pre-initiation complex that must form to begin translation.
The 5′ UTR can also be completely missing, in the case of leaderless mRNAs. Ribosomes of all three domains of life accept and translate such mRNAs.[5] Such sequences are naturally found in all three domains of life. Humans have many pressure-related genes under a 2 - 3 nucleotide leader. Mammals also have other types of ultra-short leaders like the TISU sequence.[6]
The elements of a eukaryotic and prokaryotic 5′ UTR differ greatly. The prokaryotic 5′ UTR contains a ribosome binding site (RBS), also known as the Shine–Dalgarno sequence (AGGAGGU), which is usually 3–10 base pairs upstream from the initiation codon.[4] In contrast, the eukaryotic 5′ UTR contains the Kozak consensus sequence (ACCAUGG), which contains the initiation codon.[4] The eukaryotic 5′ UTR also contains cis-acting regulatory elements called upstream open reading frames (uORFs) and upstream AUGs (uAUGs) and termination codons, which have a great impact on the regulation of translation (see below). Unlike prokaryotes, 5′ UTRs can harbor introns in eukaryotes. In humans, ~35% of all genes harbor introns within the 5′ UTR.[7]
As the 5′ UTR has high GC content, secondary structures often occur within it. Hairpin loops are one such secondary structure that can be located within the 5′ UTR. These secondary structures also impact the regulation of translation.[8]
In bacteria, the initiation of translation occurs when IF-3, along with the 30S ribosomal subunit, bind to the Shine–Dalgarno (SD) sequence of the 5′ UTR.[4] This then recruits many other proteins, such as the 50S ribosomal subunit, which allows for translation to begin. Each of these steps regulates the initiation of translation.
Initiation in Archaea is less understood. SD sequences are much rarer, and the initiation factors have more in common with eukaryotic ones. There is no homolog of bacterial IF3.[9] Some mRNAs are leaderless.[10]
In both domains, genes without Shine–Dalgarno sequences are also translated in a less understood manner. A requirement seems to be a lack of secondary structure near the initiation codon.[11]
The regulation of translation in eukaryotes is more complex than in prokaryotes. Initially, the eIF4F complex is recruited to the 5′ cap, which in turn recruits the ribosomal complex to the 5′ UTR. Both eIF4E and eIF4G bind the 5′ UTR, which limits the rate at which translational initiation can occur. However, this is not the only regulatory step of translation that involves the 5′ UTR.
RNA-binding proteins sometimes serve to prevent the pre-initiation complex from forming. An example is regulation of the msl2 gene. The protein SXL attaches to an intron segment located within the 5′ UTR segment of the primary transcript, which leads to the inclusion of the intron after processing.[12] This sequence allows the recruitment of proteins that bind simultaneously to both the 5′ and 3′ UTR, not allowing translation proteins to assemble. However, it has also been noted that SXL can also repress translation of RNAs that do not contain a poly(A) tail, or more generally, 3′ UTR.
Another important regulator of translation is the interaction between 3′ UTR and the 5′ UTR.The closed-loop structure inhibits translation. This has been observed in Xenopus laevis, in which eIF4E bound to the 5′ cap interacts with Maskin bound to CPEB on the 3′ UTR, creating translationally inactive transcripts. This translational inhibition is lifted once CPEB is phosphorylated, displacing the Maskin binding site, allowing for the polymerization of the PolyA tail, which can recruit the translational machinery by means of PABP.[13] However, it is important to note that this mechanism has been under great scrutiny.[14]
See main article: Iron response element. Iron levels in cells are maintained by translation regulation of many proteins involved in iron storage and metabolism. The 5′ UTR has the ability to form a hairpin loop secondary structure (known as the iron response element or IRE) that is recognized by iron-regulatory proteins (IRP1 and IRP2). In low levels of iron, the ORF of the target mRNA is blocked as a result of steric hindrance from the binding of IRP1 and IRP2 to the IRE. When iron is high, then the two iron-regulatory proteins do not bind as strongly and allow proteins to be expressed that have a role in iron concentration control. This function has gained some interest after it was revealed that the translation of amyloid precursor protein may be disrupted due to a single-nucleotide polymorphism to the IRE found in the 5′ UTR of its mRNA, leading to a spontaneous increased risk of Alzheimer's disease.[15]
uORFs and reinitiationSee main article: Upstream open reading frame. Another form of translational regulation in eukaryotes comes from unique elements on the 5′ UTR called upstream open reading frames (uORF). These elements are fairly common, occurring in 35–49% of all human genes.[16] A uORF is a coding sequence located in the 5′ UTR located upstream of the coding sequences initiation site. These uORFs contain their own initiation codon, known as an upstream AUG (uAUG). This codon can be scanned for by ribosomes and then translated to create a product,[17] which can regulate the translation of the main protein coding sequence or other uORFs that may exist on the same transcript.
The translation of the protein within the main ORF after a uORF sequence has been translated is known as reinitiation.[18] The process of reinitiation is known to reduce the translation of the ORF protein. Control of protein regulation is determined by the distance between the uORF and the first codon in the main ORF. A uORF has been found to increase reinitiation with the longer distance between its uAUG and the start codon of the main ORF, which indicates that the ribosome needs to reacquire translation factors before it can carry out translation of the main protein. For example, ATF4 regulation is performed by two uORFs further upstream, named uORF1 and uORF2, which contain three amino acids and fifty-nine amino acids, respectively. The location of uORF2 overlaps with the ATF4 ORF. During normal conditions, the uORF1 is translated, and then translation of uORF2 occurs only after eIF2-TC has been reacquired. Translation of the uORF2 requires that the ribosomes pass by the ATF4 ORF, whose start codon is located within uORF2. This leads to its repression. However, during stress conditions, the 40S ribosome will bypass uORF2 because of a decrease in concentration of eIF2-TC, which means the ribosome does not acquire one in time to translate uORF2. Instead, ATF4 is translated.
In addition to reinitiation, uORFs contribute to translation initiation based on:
See main article: Internal ribosome entry site. Viral (as well as some eukaryotic) 5′ UTRs contain internal ribosome entry sites, which is a cap-independent method of translational activation. Instead of building up a complex at the 5′ cap, the IRES allows for direct binding of the ribosomal complexes to the transcript to begin translation.[19] The IRES enables the viral transcript to translate more efficiently due to the lack of needing a preinitation complex, allowing the virus to replicate quickly.[4]
Transcription of the msl-2 transcript is regulated by multiple binding sites for fly Sxl at the 5′ UTR.[20] In particular, these poly-uracil sites are located close to a small intron that is spliced in males, but kept in females through splicing inhibition. This splicing inhibition is maintained by Sxl. When present, Sxl will repress the translation of msl2 by increasing translation of a start codon located in a uORF in the 5′ UTR (see above for more information on uORFs). Also, Sxl outcompetes TIA-1 to a poly(U) region and prevents snRNP (a step in alternative splicing) recruitment to the 5′ splice site.