Chromosome 1 open reading frame 141, or C1orf141 is a protein which, in humans, is encoded by gene C1orf141.[1] It is a precursor protein that becomes active after cleavage.[2] The function is not yet well understood, but it is suggested to be active during development[3]
This gene is located on chromosome 1 at position 1p31.3. It is encoded on the antisense strand of DNA spanning from 67,092,176 to 67,141,646 and has 10 total exons. It overlaps slightly with the gene IL23R being encoded on the sense strand.
A specific promoter region has not been predicted for C1orf141 so the 1000 base pairs upstream of the start of transcription was analyzed for transcription factor binding sites.[4] The transcription factors below represent a subset of the transcription factor binding sites found within this region that give an idea of the kind of factors that could bind to the promoter
The C1orf141 gene appears to have two common isoforms and seven less common transcript variants.
C1orf141 Isoform 1 | 2177 | 400 | |
C1orf141 Isoform 2 | 2203 | 217 | |
C1orf141 Isoform X1 | 2348 | 471 | |
C1orf141 Isoform X2 | 2265 | 458 | |
C1orf141 Isoform X3 | 1875 | 333 | |
C1orf141 Isoform X4 | 920 | 243 | |
C1orf141 Isoform X5 | 612 | 154 | |
C1orf141 Isoform X6 | 639 | 146 | |
C1orf141 Isoform X7 | 514 | 138 |
The primary encoded precursor protein (C1orf141 Isoform 1) consists of 400 amino acid residues and is 2177 base pairs long. It consists of 7 exons and a domain of unknown function DUF4545.[5] Its predicted molecular mass is 54.4 kDa and predicted isoelectric point is 9.63.[6]
The C1orf141 precursor protein has more lysine amino acid residues and less glycine amino acid residues than expected when compared to other human proteins. The sequence has 11.7% lysine and only 2.1% glycine.
C1orf141 is modified post translation to form a mature protein product. It undergoes O-linked glycosylation, sumoylation, glycation, and phosphorylation.[7] [8] [9] [10] One N-terminal cleavage occurs followed by acetylation. Propeptide cleavage occurs at the start site of the final exon.
The secondary structure for uncleaved C1orf141 consists primarily of alpha helices with a few small segments of beta sheets. These helices can be seen in the model of the tertiary structure predicted by the I-TASSER program.[11] The program Phyre2 also predicts the protein to be made up primarily of alpha helices.[12] After propeptide cleavage of C1orf141, I-TASSER predicts that only alpha helices remain.
There are currently no experimentally confirmed interactions for C1orf141. The STRING database for protein interactions identified ten potential proteins that interact with C1orf141 through text mining.[13] These include SALT1, C8orf74, SHCBP1L, ACTL9, RBM44, CCDC116, ADO, WDR78, ZNF365, SPATA45.[14] [15] [16] [17] Through investigation of the papers where these interaction predictions were found, a solid link was not clear for any of the identified proteins.
C1orf141 is expressed in 30 different tissues but primarily in the testes. Other tissues where expression is above baseline levels are the brain, lungs, and ovaries.
The subcellular localization for C1orf141 is predicted to be in the nucleus. There are two nuclear localization signals within the protein sequence, one of which stays present after propeptide cleavage.[18]
The function of C1orf141 is not yet fully understood and has not been experimentally confirmed. However, expression data shows that the protein is active in some developmental stages. RNA-Seq data taken at different stages of development show expression at varying levels throughout. Expression rates are seen at higher levels in the fetal developmental stage than the adult in the protein's ETS profile.[19] Microarray data for cumulus cells during natural and stimulated in vitro fertilization show relatively high levels of expression.[20] There is no significant change in expression in adult tissue disease states.
There are no paralogs for C1orf141[21]
Orthologous sequences are seen primarily in other mammalian species. The most distant ortholog identified through a NCBI BLAST search is a Reptilian species, but that is the only non-mammalian species. This list contains a subset of the species identified as orthologs to display the diversity of the species where orthologs can be found. Each species was compared to the human C1orf141 isoform that includes each coding exon, isoform X1.
Homo sapiens | Human | Primate | XP_011539768.1 | 0 | 471 | 100% | 100% | |
Gorilla gorilla gorilla | Western Lowland Gorilla | Primate | XP_018892062.1 | 8.61 | 469 | 97% | 98% | |
Otolemur garnettii | Northern Greater Galago | Primate | XP_023365656.1 | 84 | 457 | 59% | 70% | |
Tupaia chinensis | Northern Treeshrew | Scandentia | XP_006171456.1 | 88 | 468 | 62% | 74% | |
Oryctolaguscuniculus | European Rabbit | Lagomorpha | XP_017201685.1 | 88 | 470 | 56% | 68% | |
Fukomys damarensis | Damaraland Mole Rat | Rodentia | XP_010603404.1 | 88 | 479 | 54% | 66% | |
Chinchilla lanigera | Long-tailed Chincilla | Rodentia | XP_013369940.1 | 94 | 476 | 50% | 65% | |
Ochotona princeps | American Pika | Lagomorpha | XP_012783463.1 | 94 | 450 | 50% | 67% | |
Miniopterus natalensis | Natal long-fingered bat | Chiroptera | XP_016064273.1 | 94 | 390 | 63% | 72% | |
Panthera pardus | Leopard | Carnivora | XP_019304485.1 | 94 | 450 | 62% | 74% | |
Enhydra lutris kenyoni | Sea Otter | Carnivora | XP_022351992.1 | 94 | 451 | 62% | 74% | |
Balaenoptera acutorostrata scammoni | Minke Whale | Cetacea | XP_007164359.1 | 94 | 432 | 60% | 60% | |
Delphinapterus leucas | Beluga Whale | Cetacea | XP_022436606.1 | 94 | 432 | 59% | 72% | |
Sus scrofa | Wild Boar | Cetartiodactyla | XP_005656203.1 | 94 | 442 | 56% | 70% | |
Pteropus vampyrus | Large Flying Fox | Chiroptera | XP_011367916.1 | 94 | 470 | 56% | 68% | |
Ovis aries | Sheep | Cetartiodactyla | XP_012026840.1 | 94 | 431 | 55% | 69% | |
Bos taurus | Cattle | Cetartiodactyla | NP_001070559.1 | 94 | 430 | 54% | 69% | |
Condylura cristata | Star-nosed Mole | Eulipotyphla | XP_012577585.1 | 94 | 432 | 52% | 64% | |
Desmodus rotundus | Common Vampire Bat | Chiroptera | XP_024421106.1 | 94 | 398 | 48% | 59% | |
Sarcophilus harrisii | Tasmanian Devil | Marsupiala | XP_012405605.1 | 160 | 356 | 43% | 63% | |
Phascolarctos cinereus | Koala | Marsupiala | XP_020848724.1 | 160 | 204 | 29% | 50% | |
Monodelphis domestica | Gray Short-tailed Opossum | Marsupiala | XP_007480481.1 | 160 | 524 | 25% | 48% | |
Pogona vitticeps | Central Bearded Dragon | Reptilia | XP_020661721.1 | 320 | 501 | 28% | 54% |
Using the Molecular Clock Hypothesis, the m value (the number of corrected amino acid changes per 100 residues) was calculated for C1orf141 and plotted against the divergence of species. When compared to the same m value plot for hemoglobin, fibrinogen alpha chain, and cytochrome c, it is clear that the C1orf141 gene is evolving at a faster rate than all three.