Small integral membrane protein 14 explained

Small integral membrane protein 14, also known as SMIM14 or C4orf34, is a protein encoded on chromosome 4 of the human genome by the SMIM14 gene.[1] SMIM14 has at least 298 orthologs mainly found in jawed vertebrates and no paralogs.[2] SMIM14 is classified as a type I transmembrane protein. While this protein is not well understood by the scientific community, the transmembrane domain of SMIM14 may be involved in ER retention.[3]

Gene

The SMIM14 gene is located on the minus strand at cytogenetic band 4p14 and is 92,567 base pairs in length.[4] The gene has five exons, four of which constitute the open-reading frame for SMIM14.[5]

The Kozak sequence, which functions as the protein translation initiation site in most eukaryotic mRNA transcripts, is considered a strong motif.[6] There is no signal peptide in SMIM14, but the encoded transmembrane domain acts as the signal sequence. It is predicted that one disulfide bridge is encoded in SMIM14, which stabilizes the tertiary (and sometimes quaternary) structures of proteins. There are at least ten polyadenylation sequences in the 3’ UTR of the SMIM14 gene, indicating transcription termination.

SMIM14 is expressed at four-times the level of an average gene.[7]

Gene regulation

Promoter

SMIM14 has seven predicted promoter regions. The promoter with the greatest number of transcripts and CAGE tags is approximately 1,420 base pairs in length. It is found on the minus strand and has a start position at residue 39,638,806 and ends at residue 39,640,225. The identified promoter has five coding transcripts and a maximum of 105,458 CAGE tags from one of the transcripts.[8]

Promoter IDStart PositionEnd PositionLength (bp)Coding Transcripts
GXP_15011239,549,54739,550,8121,2660
GXP_319801339,583,91939,584,9581,0400
GXP_952040639,605,10539,606,1441,040N/A
GXP_952040739,626,49039,627,5291,040N/A
GXP_675087639,627,08239,628,1211,0401
GXP_319801539,638,19139,639,2301,0400
GXP_675087739,638,80639,640,2251,4205
For the SMIM14 gene, the associated CpG sites are found in CpG island 76; additional transcription factors can bind to this promoter to drive SMIM14 gene expression.[9]
Literature-curated Transcription Factors

(via ORegAnno)

SMARCA4
STAT1
RBL2
TRIM28
EGR1
TFAP2C

RNA and expression

SMIM14 has three mRNA transcript variants. Transcript variant 1 is the longest variant, with 6,397 base pairs.

TranscriptLength (bp)Accession Number
Transcript variant 16,397NM_001317896.2
Transcript variant 26,252NM_174921
Transcript variant 36,263NM_001317897
SMIM14 has high expression in the liver, adrenal gland, colon, and prostate. It is under-expressed in peripheral blood lymphocytes, skeletal muscles, and the heart.[10]

Protein

From SMIM14, transcript variant 1, a protein of 99 amino acids is synthesized.[11]

Primary structure

The predicted molecular weight (Mw) of the SMIM14 protein is 10710.34 Da. The SMIM14 protein carries no electrical charge at a pH value of 5.10 (i.e. isoelectric point, pI).[12] The abundance of every amino acid is within the normal range for humans.

Transmembrane domain and motifs

The Kozak sequence is considered a strong motif.

SMIM14 has one transmembrane domain, so it is classified as a single-pass membrane protein.[13] The transmembrane domain extends from residues 51–70.[14] It is predicted that within the domain, there is a dileucine motif, which plays a role in the sorting of transmembrane proteins to endosomes and lysosomes.[15] The N-terminus is positioned in the extracellular space, while the C-terminus is located inside the cell, further classifying SMIM14 as a type I transmembrane protein.

Secondary structure

It is predicted that there is an ɑ-helix within the transmembrane domain.[16] It is also predicted that SMIM14 is randomly coiled near the C-terminus.[17] A random coil is regarded as the protein's lack of a secondary structure, so it assumes a relaxed, non-interacting nor stabilizing conformation. It is also predicted that extended strands (E-strands) are throughout the protein. E-strands are a common secondary structure, as well, and are often characterized by their involvement in hydrogen bonding with polar side chains.

Within the N-terminus, SMIM14 is predicted to have three palmitoylation sites,[18] which facilitates the clustering of proteins, and one disulfide bridge, stabilizing the structure of the protein. There is also a predicted glycosaminoglycan site spanning residues 45–48, proximal to the transmembrane domain.[19] The C-terminus is predicted to have two unidentified phosphorylation sites and one PKA-phosphorylation site.[20]

Subcellular location

SMIM14, a transmembrane protein, is usually expressed in the ER membrane. While there is no conventional ER retention signal within SMIM14 coding sequences, it has been suggested that the transmembrane domain mediates ER retention.

Homology

SMIM14 has no known paralogs and at least 298 orthologs.

Paralogs

Through BLAST, it has been established that there are no paralogs of the SMIM14 gene in Homo sapiens.

Orthologs

SMIM14 is conserved in most vertebrates, excluding hagfish, lampreys, lobe-finned fish, and lungfish.[21] For invertebrates, they are conserved in flatworms, roundworms, mollusks, and arthropods. It is also relatively conserved in distant relatives, such as sea anemones and corals.

SpeciesCommon NameTaxonsDoD (mya)% Identity% SimilarityCorrected % Divergence (m)Accession Number
Mastomys couchaSouthern multimammate mouserodentia9087.998.012.9XP_031198284.1
Phyllostomus discolorpale spear-nosed batmammalia9693.499.06.70XP_028361411.1
Manacus vitellinusgolden-collared manakinaves31285.191.116.1XP_017923893.1
Python bivittatusBurmese pythonreptilia31280.289.122.1XP_007426519
Nanorana parkerihigh Himalaya frogamphibia35269.279.836.8XP_018420132.1
Danio reriozebrafishactinopterygii43568.082.538.6NP_991165.1
Rhincodon typuswhale sharkchondrichthyes47371.884.533.1XP_020383770.1
Ciona intestinalissea vaseascidiacea67642.755.385.1XP_026690156.1
StrongylocentrotuspurpuratusPacific purple sea urchinechinodermata68450.568.068.3XP_787363.2
Lingula anatinalamp shellbrachiopoda79759.074.352.8XP_013382479.1
Limulus polyphemusAtlantic horseshoe crabarthropoda79749.565.070.3XP_013782563.1
Agrilus planipennisemerald ash borerinsecta79739.857.392.1XP_018319678.1
Octopus vulgarisoctopusmollusca79751.064.467.3XP_029637526.1
Strongyloides rattithreadwormnematoda79733.348.1110XP_024504825.1
Exaiptasia pallidasea anemoneanthozoa82458.265.554.1XP_020902189.1
Schistosoma haematobiumurinary blood flukeplatyhelminthes82437.453.398.3XP_012793134.1
The sequence of the SMIM14 gene is highly conserved in orthologs proximal to the N-terminus. In stark contrast, the C-terminus is more varied across orthologs. Sequence analysis of the SMIM14 gene in humans suggests that the C-terminus encodes a disproportionate amount of proline residues (9 out of 29; 31%) with several proline-rich sequences (PXXP). Proline-rich domains are usually associated with protein-protein interactions; thus, the C-terminus has a high probability of interacting with proteins.

Protein interactions

SMIM14 has been predicted to interact with the FATE1 protein, which is involved in the Ca2+ transfer from the ER to mitochondria, a regulatory mechanism for apoptosis.[22] [23] It has also been predicted that SMIM14 interacts with LSM4, a glycine-rich protein that plays a role in pre-mRNA splicing.[24] [25]

References

  1. 2019-07-07. Homo sapiens small integral membrane protein 14 (SMIM14), transcript variant 1, mRNA. en-US.
  2. Web site: SMIM14 orthologs. NCBI. en. 2020-02-07.
  3. Jun . Mi-Hee . Jun . Young-Wu . Kim . Kun-Hyung . Lee . Jin-A . Jang . Deok-Jin . Characterization of the cellular localization of C4orf34 as a novel endoplasmic reticulum resident protein . BMB Reports . 31 October 2014 . 47 . 10 . 563–568 . 10.5483/bmbrep.2014.47.10.252 . 24499674 . 4261514 .
  4. Chalifa-Caspi . V. . Shmueli . O . Benjamin-Rodrig . H . Rosen . N . Shmoish . M . Yanai . I . Ophir . R . Kats . P . Safran . M . Lancet . D . GeneAnnot: Interfacing GeneCards with high-throughput gene expression compendia . Briefings in Bioinformatics . 1 January 2003 . 4 . 4 . 349–360 . 10.1093/bib/4.4.349 . 14725348 . free .
  5. Web site: SMIM14 Gene - GeneCards SIM14 Protein SIM14 Antibody. www.genecards.org. 2020-02-25.
  6. Hernández . Greco . Osnaya . Vincent G. . Pérez-Martínez . Xochitl . Conservation and Variability of the AUG Initiation Codon Context in Eukaryotes . Trends in Biochemical Sciences . 1 December 2019 . 44 . 12 . 1009–1021 . 10.1016/j.tibs.2019.07.001 . 31353284 . 198966937 . free .
  7. Web site: AceView: Gene:C4orf34, a comprehensive annotation of human, mouse and worm genes with mRNAs or ESTsAceView.. www.ncbi.nlm.nih.gov. 2020-04-30.
  8. Cartharius . K. . Frech . K. . Grote . K. . Klocke . B. . Haltmeier . M. . Klingenhoff . A. . Frisch . M. . Bayerlein . M. . Werner . T. . MatInspector and beyond: promoter analysis based on transcription factor binding sites . Bioinformatics . 1 July 2005 . 21 . 13 . 2933–2942 . 10.1093/bioinformatics/bti473 . 15860560 . free .
  9. Kent . W. J. . Sugnet . C. W. . Furey . T. S. . Roskin . K. M. . Pringle . T. H. . Zahler . A. M. . Haussler . a. D. . The Human Genome Browser at UCSC . Genome Research . 16 May 2002 . 12 . 6 . 996–1006 . 10.1101/gr.229102 . 12045153 . 186604 .
  10. Web site: 49002542 - GEO Profiles - NCBI. www.ncbi.nlm.nih.gov. 2020-04-30.
  11. Web site: small integral membrane protein 14 [Homo sapiens] - Protein - NCBI]. www.ncbi.nlm.nih.gov. 2020-04-30.
  12. Brendel . V. . Bucher . P. . Nourbakhsh . I. R. . Blaisdell . B. E. . Karlin . S. . Methods and algorithms for statistical analysis of protein sequences. . Proceedings of the National Academy of Sciences . 15 March 1992 . 89 . 6 . 2002–2006 . 10.1073/pnas.89.6.2002 . 1549558 . 48584 . 1992PNAS...89.2002B . free .
  13. Kall . L. . Krogh . A. . Sonnhammer . E. L.L. . Advantages of combined transmembrane topology and signal peptide prediction--the Phobius web server . Nucleic Acids Research . 8 May 2007 . 35 . Web Server . W429–W432 . 10.1093/nar/gkm256 . 17483518 . 1933244 .
  14. Gouw . Marc . Michael . Sushama . Sámano-Sánchez . Hugo . Kumar . Manjeet . Zeke . András . Lang . Benjamin . Bely . Benoit . Chemes . Lucía B . Davey . Norman E . Deng . Ziqi . Diella . Francesca . Gürth . Clara-Marie . Huber . Ann-Kathrin . Kleinsorg . Stefan . Schlegel . Lara S . Palopoli . Nicolás . Roey . Kim V . Altenberg . Brigitte . Reményi . Attila . Dinkel . Holger . Gibson . Toby J . The eukaryotic linear motif resource – 2018 update . Nucleic Acids Research . 4 January 2018 . 46 . D1 . D428–D434 . 10.1093/nar/gkx1077 . 29136216 . 5753338 .
  15. Bonifacino . Juan S. . Traub . Linton M. . Signals for Sorting of Transmembrane Proteins to Endosomes and Lysosomes . Annual Review of Biochemistry . June 2003 . 72 . 1 . 395–447 . 10.1146/annurev.biochem.72.121801.161800 . 12651740 .
  16. Combet . C . Blanchet . C . Geourjon . C . Deléage . G . NPS@: Network Protein Sequence Analysis . Trends in Biochemical Sciences . March 2000 . 25 . 3 . 147–150 . 10.1016/s0968-0004(99)01540-6 . 10694887 .
  17. Ashok Kumar . T . CFSSP: Chou and Fasman Secondary Structure Prediction server . Wide Spectrum . 1 April 2013 . 1 . 9 . 15–19 . 10.5281/ZENODO.50733 .
  18. Ren . J. . Wen . L. . Gao . X. . Jin . C. . Xue . Y. . Yao . X. . CSS-Palm 2.0: an updated software for palmitoylation sites prediction . Protein Engineering Design and Selection . 27 August 2008 . 21 . 11 . 639–644 . 10.1093/protein/gzn039 . 18753194 . 2569006 .
  19. Gouw . Marc . Michael . Sushama . Sámano-Sánchez . Hugo . Kumar . Manjeet . Zeke . András . Lang . Benjamin . Bely . Benoit . Chemes . Lucía B . Davey . Norman E . Deng . Ziqi . Diella . Francesca . Gürth . Clara-Marie . Huber . Ann-Kathrin . Kleinsorg . Stefan . Schlegel . Lara S . Palopoli . Nicolás . Roey . Kim V . Altenberg . Brigitte . Reményi . Attila . Dinkel . Holger . Gibson . Toby J . The eukaryotic linear motif resource – 2018 update . Nucleic Acids Research . 4 January 2018 . 46 . D1 . D428–D434 . 10.1093/nar/gkx1077 . 29136216 . 5753338 .
  20. Blom . Nikolaj . Sicheritz-Pontén . Thomas . Gupta . Ramneek . Gammeltoft . Steen . Brunak . Søren . Prediction of post-translational glycosylation and phosphorylation of proteins from the amino acid sequence . Proteomics . June 2004 . 4 . 6 . 1633–1649 . 10.1002/pmic.200300771 . 15174133 . 18810164 .
  21. Altschul. Stephen F.. Gish. Warren. Miller. Webb. Myers. Eugene W.. Lipman. David J.. October 1990. Basic local alignment search tool. Journal of Molecular Biology. en. 215. 3. 403–410. 10.1016/S0022-2836(05)80360-2. 2231712. 14441902 .
  22. Web site: FATE1 - Fetal and adult testis-expressed transcript protein - Homo sapiens (Human) - FATE1 gene & protein. www.uniprot.org. 2020-04-30.
  23. Doghman-Bouguerra . Mabrouka . Granatiero . Veronica . Sbiera . Silviu . Sbiera . Iuliu . Lacas-Gervais . Sandra . Brau . Frédéric . Fassnacht . Martin . Rizzuto . Rosario . Lalli . Enzo . FATE 1 antagonizes calcium- and drug-induced apoptosis by uncoupling ER and mitochondria . EMBO Reports . September 2016 . 17 . 9 . 1264–1280 . 10.15252/embr.201541504 . 27402544 . 5007562 .
  24. Web site: LSM4 - U6 snRNA-associated Sm-like protein LSm4 - Homo sapiens (Human) - LSM4 gene & protein. www.uniprot.org. 2020-04-30.
  25. Bertram . Karl . Agafonov . Dmitry E. . Dybkov . Olexandr . Haselbach . David . Leelaram . Majety N. . Will . Cindy L. . Urlaub . Henning . Kastner . Berthold . Lührmann . Reinhard . Stark . Holger . Cryo-EM Structure of a Pre-catalytic Human Spliceosome Primed for Activation . Cell . August 2017 . 170 . 4 . 701–713.e11 . 10.1016/j.cell.2017.07.011 . 28781166 . 12185819 . free .