Αr35 RNA explained

αr35 is a family of bacterial small non-coding RNAs with representatives in a reduced group of Alphaproteobacteria from the order Hyphomicrobiales. The first member of this family (Smr35B) was found in a Sinorhizobium meliloti 1021 locus located in the symbiotic plasmid B (pSymB). Further homology and structure conservation analysis have identified full-length SmrB35 homologs in other legume symbionts (i.e. Rhizobium leguminosarum bv.viciae, R. leguminosarum bv. trifolii and R. etli), as well as in the human and plant pathogens Brucella anthropi and Agrobacterium tumefaciens, respectively. αr35 RNA species are 139-142 nt long (Table 1) and share a common secondary structure consisting of two stem loops and a well conserved rho independent terminator (Figure 1, 2, 3). Most of the αr35 transcripts can be catalogued as trans-acting sRNAs expressed from well-defined promoter regions of independent transcription units within intergenic regions of the Alphaproteobacterial genomes (Figure 5).

Discovery and Structure

Smr35B sRNA was firstly described by del Val et al.,[1] as a result of a computational comparative genomic approach in the intergenic regions (IGRs) of the reference S. meliloti 1021 strain. Northern hybridization experiments confirmed that the predicted smr35B locus did express a single transcript of the expected size, which accumulated differentially in free-living and endosymbiotic bacteria. TAP-based 5’-RACE experiments mapped the transcription start site (TSS) of the full-length Smr35B transcript to the 577,730 nt position in the S. meliloti 1021 genome (http://iant.toulouse.inra.fr/bacteria/annotation/cgi/rhime.cgi) whereas the 3’-end was assumed to be located at the 577,868 nt position matching the last residue of the consecutive stretch of Us of a bona fide Rho-independent terminator (Figure 5). Recent deep sequencing-based characterization of the small RNA fraction (50-350 nt) of S. meliloti further confirmed the expression of Smr35B (here referred to as SmelB053), and mapped the 5’- and 3´-ends of the molecule to the positions proposed earlier.[2]

The nucleotide sequence of Smr35B was initially used as query to search against the Rfam database. This homology search rendered no matches to known bacterial sRNA in this database. Smr35B was next BLASTed with default parameters against all the currently available bacterial genomes (1,615 sequences at 20 April 2011; https://www.ncbi.nlm.nih.gov;). The regions exhibiting significant homology to the query sequence (78-89% similarity) were extracted to create a Covariance Model (CM) from a seed alignment using Infernal (version1.0)[3] (Figure 2).

This CM was used in a further search for new members of the αr35 family in the existing bacterial genomic databases.

Table 1: Smr35B homologs in other symbionts and pathogens
CM model Name GI accession number begin end strand %GC length Organism
class=cellinside αr35class=cellinside Smr35Bclass=cellinside gi|16263748|ref|NC_003078.1|class=cellinside 577730class=cellinside 577868class=cellinside +class=cellinside 52class=cellinside 139class=cellinside Sinorhizobium meliloti 1021 plasmid pSymB
class=cellinside αr35class=cellinside Atr35Cclass=cellinside gi|159185562|ref|NC_003063.2|class=cellinside 132595class=cellinside 132733class=cellinside +class=cellinside 48class=cellinside 139class=cellinside Agrobacterium tumefaciens str. C58 chromosome linear
class=cellinside αr35class=cellinside Rlvr35Cclass=cellinside gi|116249766|ref|NC_008380.1|class=cellinside 2256716class=cellinside 2256853class=cellinside +class=cellinside 55class=cellinside 138class=cellinside Rhizobium leguminosarum bv. viciae 3841
class=cellinside αr35class=cellinside Rlt1325r35p04class=cellinside gi|241258599|ref|NC_012852.1|class=cellinside 114247class=cellinside 114385class=cellinside -class=cellinside 56class=cellinside 139class=cellinside Rhizobium leguminosarum bv. trifolii WSM1325 plasmid pR132504
class=cellinside αr35class=cellinside Rlt1325r35p02class=cellinside gi|241666492|ref|NC_012858.1|class=cellinside 466255class=cellinside 466394class=cellinside -class=cellinside ?class=cellinside 140class=cellinside Rhizobium leguminosarum bv. trifolii WSM1325 plasmid pR132502
class=cellinside αr35class=cellinside ReCFNr35fclass=cellinside gi|86360734|ref|NC_007766.1|class=cellinside 136368class=cellinside 136508class=cellinside +class=cellinside 57class=cellinside 141class=cellinside Rhizobium etli CFN 42 plasmid p42f
class=cellinside αr35class=cellinside Oar35CIIclass=cellinside gi|153010078|ref|NC_009668.1|class=cellinside 1587138class=cellinside 1587279class=cellinside -class=cellinside 52class=cellinside 142class=cellinside Brucella anthropi ATCC 49188 chromosome 2

The results were manually inspected to deduce a consensus secondary structure for the family (Figure 1 and Figure 2). The consensus structure was also independently predicted with the program locARNATE[4] with very similar predictions. The manual inspection of the 84 sequences found with the CM using Infernal allowed finding seven true homolog sequences: two copies in Rhizobium leguminosarum bv. viciae (chromosome and plasmid pRL11), two copies in Rhizobium leguminosarumbv. trifolii WSM1325 (plasmid pR132504 and plasmid pR132502), in Rhizobium etli CFN 42 plasmid p42f and in the chromosomes of Agrobacterium tumefaciens and Brucella anthropi. All these sequences showed significant Infernal E-values (1.38e-33 – 1.05e-11) and bit-scores. In the case of S. meliloti a second copy was identified in the symbiotic plasmid pSymB (574630-574766) with a significant E-value (3.73e-07) but no expression has been detected under any of the tested conditions (unpublished data). The rest of the sequences found with the model showed high E-values between (8.76e-12 and 1.e-3) but very low bit-scores, which usually is a sign of a remote homologue. However, a manual inspection of these cases showed that the rho independent terminator and the second stem were the only conserved regions, failing the first stem. This two stem arregment construction was largely extended in all the Alphaproteobacteria, being specially conserved in Brucella species.

Expression information

Smr35B expression was first assessed by del Val et al. in S. meliloti 1021 under different biological conditions; i.e. bacterial growth in TY, minimal medium (MM) and luteolin-MM broth and endosymbiotic bacteria (i.e. mature symbiotic alfalfa nodules). Expression of Smr35B in free-living bacteria was found to be growth-dependent, being the gene down-regulated when bacteria entered the stationary phase. Supplementation of MM with luteolin, the plant flavone that specifically induces transcription of the S. meliloti nodulation genes, stimulated the expression of Smr35B by ~4 fold. In contrast, the Smr35B transcript was not detected in mature nodule tissues. Schlüter et al. further described up-regulation of Smr35B upon an osmotic upshift.

Promoter Analysis

All αr35 loci have recognizable σ70-dependent promoters showing a -35/-10 consensus motif CTTAGAC-n17-CTATAT previously shown to be widely conserved among several other genera in the Alphaproteobacteria.[5] To identify binding sites for other known transcription factors we used the fasta sequences provided by RegPredict[6] (http://regpredict.lbl.gov/regpredict/help.html), and used those position weight matrices (PSWM) provided by RegulonDB[7] (http://regulondb.ccg.unam.mx). We built PSWM for each transcription factor from the RegPredict sequences using the Consensus/Patser program, choosing the best final matrix for motif lengths between 14 and 30 if the corresponding length had not been previously specified (see "Consensus matrices" threshold (average E-value < 10E-10) for each matrix was established (see "Thresholded consensus" in http://gps-tools2.its.yale.edu). Moreover, we searched for conserved unknown motifs using MEME[8] (http://meme.sdsc.edu/meme4_6_1/intro.html) and used relaxed regular expressions (i.e. pattern matching) over all Smr35B homologs promoters. Only an inverted repeat structure built around the motif T-N11-A was found 55 nt upstream of the transcription start site of SmrB35 in S. meliloti which is a degenerated motif of the known conserved nod boxes (Figure 4). This characteristic sequence has been proposed as the specific binding site for the LysR-type proteins.[9] All promoter regions of the seed SmrB35 homologs presented the motif as well.

Genomic Context

Most of the members of the αr35 family are trans-encoded sRNAs transcribed from independent promoters in the IGRs of the rhizobial megaplasmids. Exceptions are SmrB35 homologs of R. leguminosarum bv. viciae (Rlvr35C),and R. etli CFN 42 plasmid p42f (ReCFNr35f), which are encoded in the opposite strand of annotated genes, partially overlapping ORFs. The predicted protein products of these overlapping ORFs could not be assigned to any functional category on the basis of the amino acid sequence homology.[10] [11] [12] Thus, these αr35 members are putative cis-encoded antisense sRNAs.The genomic regions of the trans-encoded αr35 sRNAs exhibit partial conservation mainly limited to the sRNA-coding sequence and one flanking gene. Most of the flanking genes of the αr35 loci encode transcription factors and proteins related to nitrogen and glutamine metabolism.

Table 2: Detailed Genomic context information of the α35 sRNA seed members.! Family !! Feature !! Name !! Strand !! Begin !! End !! Protein name !! Annotation !! Organism
αr35ALIGN=LEFT geneALIGN=LEFT SM_b20551ALIGN=LEFT RALIGN=RIGHT 576952ALIGN=RIGHT 577398ALIGN=LEFT NP_437070.1ALIGN=LEFT proteolysisALIGN=LEFT Sinorhizobium meliloti 1021 plasmid pSymB (NC_003078)
αr35ALIGN=LEFT sRNAALIGN=LEFT Smr35BALIGN=LEFT DALIGN=RIGHT 577730ALIGN=RIGHT 577868ALIGN=LEFT
ALIGN=LEFT
ALIGN=LEFT Sinorhizobium meliloti 1021 plasmid pSymB (NC_003078)
αr35ALIGN=LEFT geneALIGN=LEFT SM_b20552ALIGN=LEFT DALIGN=RIGHT 578150ALIGN=RIGHT 578881ALIGN=LEFT NP_437071.1ALIGN=LEFT nitrogen compound metabolic processALIGN=LEFT Sinorhizobium meliloti 1021 plasmid pSymB (NC_003078)
αr35ALIGN=LEFT geneALIGN=LEFT Oant_4157ALIGN=LEFT DALIGN=RIGHT 1586007ALIGN=RIGHT 1587065ALIGN=LEFT YP_001372686.1ALIGN=LEFT nitrogen compound metabolic processALIGN=LEFT Brucella anthropi ATCC 49188 chromosome 2 (NC_009668)
αr35ALIGN=LEFT sRNAALIGN=LEFT Oar35CIIALIGN=LEFT RALIGN=RIGHT 1587138ALIGN=RIGHT 1587279ALIGN=LEFT
ALIGN=LEFT
ALIGN=LEFT Brucella anthropi ATCC 49188 chromosome 2 (NC_009668)
αr35ALIGN=LEFT geneALIGN=LEFT Oant_4158ALIGN=LEFT RALIGN=RIGHT 1587338ALIGN=RIGHT 1587724ALIGN=LEFT YP_001372687.1ALIGN=LEFT proteolysisALIGN=LEFT Brucella anthropi ATCC 49188 chromosome 2 (NC_009668)
αr35ALIGN=LEFT geneALIGN=LEFT RHE_PF00127ALIGN=LEFT RALIGN=RIGHT 133963ALIGN=RIGHT 134406ALIGN=LEFT YP_472745.1ALIGN=LEFT hypothetical proteinALIGN=LEFT Rhizobium etli CFN 42 plasmid p42f (NC_007766)
αr35ALIGN=LEFT geneALIGN=LEFT RHE_PF00128ALIGN=LEFT DALIGN=RIGHT 136269ALIGN=RIGHT 136700ALIGN=LEFT YP_472746.1ALIGN=LEFT hypothetical proteinALIGN=LEFT Rhizobium etli CFN 42 plasmid p42f (NC_007766)
αr35ALIGN=LEFT sRNAALIGN=LEFT ReCFNr35fALIGN=LEFT DALIGN=RIGHT 136368ALIGN=RIGHT 136508ALIGN=LEFT
ALIGN=LEFT
ALIGN=LEFT Rhizobium etli CFN 42 plasmid p42f (NC_007766)
αr35ALIGN=LEFT geneALIGN=LEFT RHE_PF00129ALIGN=LEFT DALIGN=RIGHT 137962ALIGN=RIGHT 138264ALIGN=LEFT YP_472747.1ALIGN=LEFT membrane proteinALIGN=LEFT Rhizobium etli CFN 42 plasmid p42f (NC_007766)
αr35ALIGN=LEFT geneALIGN=LEFT Atu3124ALIGN=LEFT DALIGN=RIGHT 132103ALIGN=RIGHT 132318ALIGN=LEFT NP_357476.1ALIGN=LEFT
ALIGN=LEFT Agrobacterium tumefaciens str. C58 chromosome linear (NC_003063)
αr35ALIGN=LEFT sRNAALIGN=LEFT Atr35CALIGN=LEFT DALIGN=RIGHT 132595ALIGN=RIGHT 132733ALIGN=LEFT
ALIGN=LEFT
ALIGN=LEFT Agrobacterium tumefaciens str. C58 chromosome linear (NC_003063)
αr35ALIGN=LEFT geneALIGN=LEFT Atu3126ALIGN=LEFT DALIGN=RIGHT 133057ALIGN=RIGHT 133344ALIGN=LEFT NP_357475.1ALIGN=LEFT nitrogen compound metabolic processALIGN=LEFT Agrobacterium tumefaciens str. C58 chromosome linear (NC_003063)
αr35ALIGN=LEFT geneALIGN=LEFT RL2133ALIGN=LEFT DALIGN=RIGHT 2256297ALIGN=RIGHT 2256500ALIGN=LEFT YP_767731.1ALIGN=LEFT hypothetical proteinALIGN=LEFT Rhizobium leguminosarum bv. viciae 3841 (NC_008380)
αr35ALIGN=LEFT geneALIGN=LEFT RL2134ALIGN=LEFT RALIGN=RIGHT 2256617ALIGN=RIGHT 2256982ALIGN=LEFT YP_767732.1ALIGN=LEFT hyphotetical proteinALIGN=LEFT Rhizobium leguminosarum bv. viciae 3841 (NC_008380)
αr35ALIGN=LEFT sRNAALIGN=LEFT Rlvr35CALIGN=LEFT DALIGN=RIGHT 2256716ALIGN=RIGHT 2256853ALIGN=LEFT
ALIGN=LEFT
ALIGN=LEFT Rhizobium leguminosarum bv. viciae 3841 (NC_008380)
αr35ALIGN=LEFT geneALIGN=LEFT RL2135ALIGN=LEFT DALIGN=RIGHT 2256994ALIGN=RIGHT 2257383ALIGN=LEFT YP_767733.1ALIGN=LEFT transposase-related proteinALIGN=LEFT Rhizobium leguminosarum bv. viciae 3841 (NC_008380)
αr35ALIGN=LEFT geneALIGN=LEFT Rleg_6079ALIGN=LEFT DALIGN=RIGHT 113829ALIGN=RIGHT 114197ALIGN=LEFT YP_002978585.1ALIGN=LEFT membrane proteiinALIGN=LEFT Rhizobium leguminosarum trifolii WSM1325 plasmid pR132502 (NC_012852)
αr35ALIGN=LEFT sRNAALIGN=LEFT Rlt132504r35p04ALIGN=LEFT RALIGN=RIGHT 114247ALIGN=RIGHT 114385ALIGN=LEFT
ALIGN=LEFT
ALIGN=LEFT Rhizobium leguminosarum trifolii WSM1325 plasmid pR132502 (NC_012852)
αr35ALIGN=LEFT geneALIGN=LEFT Rleg_6080ALIGN=LEFT RALIGN=RIGHT 114489ALIGN=RIGHT 115121ALIGN=LEFT YP_002978586.1ALIGN=LEFT endonucleaseALIGN=LEFT Rhizobium leguminosarum trifolii WSM1325 plasmid pR132502 (NC_012852)
αr35ALIGN=LEFT geneALIGN=LEFT Rleg_7049ALIGN=LEFT DALIGN=RIGHT 465959ALIGN=RIGHT 466222ALIGN=LEFT YP_002985022.1ALIGN=LEFT
ALIGN=LEFT Rhizobium leguminosarum trifolii WSM1325 plasmid pR132504 (NC_012858)
αr35ALIGN=LEFT sRNAALIGN=LEFT Rlt132502r35p02ALIGN=LEFT RALIGN=RIGHT 466255ALIGN=RIGHT 466394ALIGN=LEFT
ALIGN=LEFT
ALIGN=LEFT Rhizobium leguminosarum trifolii WSM1325 plasmid pR132504 (NC_012858)
αr35ALIGN=LEFT geneALIGN=LEFT Rleg_7050ALIGN=LEFT RALIGN=RIGHT 466934ALIGN=RIGHT 467824ALIGN=LEFT YP_002985023.1ALIGN=LEFT transcription regulatorALIGN=LEFT Rhizobium leguminosarum trifolii WSM1325 plasmid pR132504 (NC_012858)
αr35ALIGN=LEFT geneALIGN=LEFT pRL110105ALIGN=LEFT DALIGN=RIGHT 122566ALIGN=RIGHT 123456ALIGN=LEFT YP_771137.1ALIGN=LEFT transcription regulatorALIGN=LEFT Rhizobium leguminosarum bv. viciae 3841 plasmid pRL11 (NC_008384)
αr35ALIGN=LEFT sRNAALIGN=LEFT Rlvr35p11ALIGN=LEFT DALIGN=RIGHT 124030ALIGN=RIGHT 124162ALIGN=LEFT
ALIGN=LEFT
ALIGN=LEFT Rhizobium leguminosarum bv. viciae 3841 plasmid pRL11 (NC_008384)
αr35ALIGN=LEFT geneALIGN=LEFT pRL110106ALIGN=LEFT RALIGN=RIGHT 124229ALIGN=RIGHT 124447ALIGN=LEFT YP_771138.1ALIGN=LEFT hyphotetical proteinALIGN=LEFT Rhizobium leguminosarum bv. viciae 3841 plasmid pRL11 (NC_008384)

Notes and References

  1. del Val C, Rivas E, Torres-Quesada O, Toro N, Jiménez-Zurdo JI . Identification of differentially expressed small non-coding RNAs in the legume endosymbiont Sinorhizobium meliloti by comparative genomics . Mol Microbiol . 66. 5 . 1080–1091 . 2007 . 17971083. 10.1111/j.1365-2958.2007.05978.x . 2780559.
  2. Schlüter JP, Reinkensmeier J, Daschkey S, Evguenieva-Hackenberg E, Janssen S, Jänicke S, Becker JD, Giegerich R, Becker A . A genome-wide survey of sRNAs in the symbiotic nitrogen-fixing alpha-proteobacterium Sinorhizobium meliloti. BMC Genomics. 11. 245. 2010. 10.1186/1471-2164-11-436. 20637113. 436. 3091635 . free .
  3. Nawrocki EP, Kolbe DL, Eddy SR. Infernal 1.0: inference of RNA alignments. Bioinformatics. 25. 10. 1335–1337. 10.1093/bioinformatics/btp157. 2009. 19307242. 2732312.
  4. Will S, Reiche K, Hofacker IL, Stadler PF, Backofen R. Inferring Noncoding RNA Families and Classes by Means of Genome-Scale Structure-Based Clustering. PLOS Comput Biol. 4. 65. e65. 10.1371/journal.pcbi.0030065. 17432929. 2007. 1851984. 2007PLSCB...3...65W . free .
  5. MacLellan SR, MacLean AM, Finan TM. Promoter prediction in the rhizobia. 152. 6. Microbiology. 1751–1763. 2006. 10.1099/mic.0.28743-0. 16735738. free.
  6. Novichkov PS, Rodionov DA, Stavrovskaya ED, Novichkova ES, Kazakov AE, Gelfand MS, Arkin AP, Mironov AA, Dubchak I . RegPredict: an integrated system for regulon inference in prokaryotes by comparative genomics approach. Nucleic Acids Research. 38. Web Server issue. W299–W307. 2010. 10.1093/nar/gkq531. 20542910. 2896116.
  7. Gama-Castro S, Salgado H, Peralta-Gil M, Santos-Zavaleta A, Muniz-Rascado L, Solano-Lira H, Jimenez-Jacinto V, Weiss V, Garcia-Sotelo JS, Lopez-Fuentes A, Porron-Sotelo L, Alquicira-Hernandez S, Medina-Rivera A, Martinez-Flores I, Alquicira-Hernandez K, Martinez-Adame R, Bonavides-Martinez C, Miranda-Rios J, Huerta AM, Mendoza-Vargas A, Collado-Torres L, Taboada B, Vega-Alvarado L, Olvera M, Olvera L, Grande R, Morett E, Collado-Vides J . RegulonDB version 7.0: transcriptional regulation of Escherichia coli K-12 integrated within genetic sensory response units (Gensor Units). Nucleic Acids Research. 39. Database issue. D98–D105. 2010. 10.1093/nar/gkq1110. 21051347. 3013702.
  8. Bailey TL, Elkan C . Proceedings. International Conference on Intelligent Systems for Molecular Biology . Fitting a mixture model by expectation maximization to discover motifs in biopolymers. 28–36. AAAI Press, Menlo Park, California. 1994. 2. 7584402.
  9. Goethals K, Van Montagu M, Holsters M . Conserved motifs in a divergent nod box of Azorhizobium caulinodans ORS571 reveal a common structure in promoters regulated by LysR-type proteins. Proc Natl Acad Sci U S A. 1992. 89. 5. 1646–1650. 10.1073/pnas.89.5.1646. 1542656. 48509. 1992PNAS...89.1646G. free.
  10. Vinayagam A, del Val C, Schubert F, Eils R, Glatting KH, Suhai S, König R . GOPET: a tool for automated predictions of Gene Ontology terms . BMC Bioinformatics . 7. 171 . 2006 . 16549020. 10.1186/1471-2105-7-161 . 1434778 . free .
  11. Conesa A, Götz S, García-Gómez JM, Terol J, Talón M, Robles M . Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics. 2005. 21. 18. 3674–3676. 16081474. 10.1093/bioinformatics/bti610. free.
  12. del Val C, Ernst P, Falkenhahn M, Fladerer C, Glatting KH, Suhai S, Hotz-Wagenblatt A . ProtSweep, 2Dsweep and DomainSweep: protein analysis suite at DKFZ. Nucleic Acids Res. 35. Web Server issue. W444–50. 17526514. 10.1093/nar/gkm364. 1933246. 2007.