Chromosome 1 open reading frame (C1orf167) is a protein which in humans is encoded by the C1orf167 gene.[1] The NCBI accession number is NP_001010881. The protein is 1468 amino acids in length with a molecular weight of 162.42 kDa. The mRNA sequence was found to be 4689 base pairs in length.[2] [3]
It can be located on chromosome 1 at position 1p36.22 on the plus strand and spans from positions 11,824,457 to 1,849,503.[4]
C1orf167 has one known alias with the name Chromosome 1 Open Reading Frame 167.[5]
There are 26 exons associated with the protein.
A splice region that is conserved in primate orthologs of the C1orf167 mRNA was located between exon 1 and exon 2.[6]
The mRNA sequence has 8 known splice isoforms as determined by the conserved domains.[7] The isoforms span the regions 426-863, 981-1418, 954-1391, 999-1329, 999-1400, 999-1436, 999-1404. and 999-1463 of the mRNA sequence.[8]
Alternative splicing produces two known isoforms of the human protein. They are XP_006711141.1 which is 1489aa in length and XP_003307860.2 which is 713aa in length.[9] [10]
The protein has an isoelectric point (pI) of 11. The predicted molecular weight (mW) is 160kDa for the human protein, but ranges from 140-180kDa for more distant orthologs.[11] Compositional analysis revealed the most abundant amino acid to be Alanine (A) at 12.4% of the total protein. The analysis also revealed C1orf167 protein to be rich in Tryptophan (W) and deficient in Tyrosine (Y) and Isoleucine (I).[12]
C1orf167 is predicted to be localized to the cell nucleus.[13]
C1orf167 is predicted to undergo phosphorylation, O-Glycosylation, SUMOylation, glycation, and cleavage by staphylococcal peptidase I (Q105, Q321) and Glutamyl endopeptidase (Q1101).[14] [15] [16] [17] [18]
Species | |||||||
H. sapiens | T. manatus latirostris | U. parryii | D. novaehollandiae | P. vitticeps | C. milli | ||
---|---|---|---|---|---|---|---|
SUMOylation | K22 | IVTLE447-451,K604, K605, VRVVP 684-688, | VAVVD502-506 | K434 | K57,K128,K578,K993, K1388 | ISILH 121-125,K264,K477, K497, K522 IVSIC 621-625 LCLVY 703-707 VVVLR 975-979, VLQLR 1027-1031 K1199 K1208 | |
O-GlcNAcylation | Many* | Similar Distribution (but more sites) | Similar Distribution(but fewer sites) | Similar Distribution(but fewer sites) | Similar Distribution | Similar Distribution(but fewer sites) | |
Glycation of ε amino groups of lysines | K -22, 114, 323,399,433,505, 701, 710,720, 832,975, 1138,1279, 1306,1394, 1418 | K-335,516,534,605, 747,757, 1080,1125, 1189, 1382 | K-114, 123,333,462,651, 660,661, 938, 1111, 1149 | K-72,103,128,133, 183,240,241, 248,290,398, 437,466,483, 494,505,552, 589,718, 767,772,820, 974,1106 | K-14,57,60,89,96,128,133,157,275, 423,488,578,619, 647,890,900,952, 983,993,1208,1279, 1288, | K-4,56,106,131,163,169,177,235,291, 480, 566,660,666,717, 780,814,827, 853, 857, 936, 954, 964,974, 986, 1015, 1079, 1208 | |
Nuclear Export Signal | L84 | L808 | L84 | L589 | V869, L874 | L186, L188, L1117 | |
Phosphorylation | Many* | Similar Distribution | Similar Distribution | Similar Distribution | Similar Distribution | Similar Distribution | |
Proteinase Cleavage Sites | Q105, Q321, Q1101 | Q441, Q1030 | Q72 | Q60 | Q90, Q155, Q498 | Q520, Q809, Q908, |
One domain of unknown function, located from 954aa-1418aa, is 465 amino acids in length.
C1orf167 was determined to be rich in alpha helices. No notable regions of beta pleated sheets or coils were predicted.[19] In particular, high confidence was indicated for 42 alpha helices with the longest alpha helix region spanning from residues 450aa to 1182aa. This long alpha helix region includes a significant portion of the conserved DUF which spans 954aa-1418aa.[20] [21] [22] [23] [24]
The best-aligned structural analog, generated by I-TASSER, of C1orf167 had a confidence (c-score) score of -0.68 given a range of [-5,2] with higher values indicating a higher confidence. Per Swiss Model, two monomers are predicted to form an alpha helix.[25] Both of the helices are aligned facing outwards with hydrophobic amino acids such as glutamic acid (E) on the interior and asparagine (R), Serine (and lysine (K) on the exterior. Asparagine residues may serve as an important oligosaccharide binding site.[26]
C1orf167 has high expression in the larynx, blood, placenta, testis and prostate, with the highest expression found in the testis.[27] The promoter GXP_5109290 spans 1507 base pairs on chromosome 1.[28] GXP_5109290 was found to be conserved in the bonobo (Pan Paniscus), gorilla (Gorilla Gorilla Gorilla), mouse (Mus musculus), chimp (Pan Troglodytes), and rhesus monkey (Macaca mulata).[29] [30]
There were 10 interactions identified by STRING.[31]
No known paralogs or paralogous domains were identified for C1orf167.
Using NCBI BLAST, orthologs of C1orf167 were determined. No orthologs could be found in single-celled organisms, or fungi whose genomes have been sequenced. In terms of multi-cellular organisms, orthologs were found in mammals, aves, reptiles, and cartilaginous fishes. The table below shows a representative sample of 20 of the orthologs for C1orf167. The table is organized based on the time of divergence from humans in millions of years (MYA) and then by sequence similarity.
Genus and Species | Common Name | Taxonomic Group | Date of Divergence | Accession # | Sequence Length | Sequence Identity | Sequence Similarity | |
---|---|---|---|---|---|---|---|---|
Homo sapiens | Humans | Mammalia | 0 | NP_001010881.1 | 1449aa | 100% | 100% | |
Pan troglodytes | Chimpanzee | Mammalia (primate) | 6.6 | XP_024212133.1 | 1442 aa | 97% | 97% | |
Piliocolobus tephrosceles | Ugandan Red Colobus | Mammalia (primate) | 29 | XP_026303745.1 | 1453aa | 87% | 90% | |
Macaca fascicularis | Crab-eating Macaque | Mammalia (primate) | 29.4 | XP_015298104.1 | 1444aa | 87% | 90% | |
Trichechus manatus latirostris | American Manatee | Mammalia (sirenia) | 76 | XP_023587965.1 | 1631aa | 49% | 56% | |
Marmota flaviventris | Yellow-bellied Marmot | Mammalia (rodentia) | 90 | XP_027803235.1 | 1284aa | 49.16% | 57% | |
Galeopterus variegatus | Sunda Flying Lemur | Mammalia (primate) | 90 | XP_008588133.1 | 1439aa | 54% | 60% | |
Camelus ferus | Bactrian Camel | Mammalia (artiodactyla) | 90 | XP_014421294.1 | 1442aa | 53% | 62% | |
Miniopterus natalensis | Natal Clinging Bat | Mammalia (chiroptera) | 96 | XP_016061116.1 | 1644aa | 48.64% | 56% | |
Desmodus rotundus | Common Vampire Bat | Mammalia (chiroptera) | 96 | XP_024410696.1 | 1548aa | 47.97% | 56% | |
Ictidomys tridecemlineatus | Thirteen-lined Ground Squirrel | Mammalia (rodentia) | 96 | XP_021576066.1 | 1349aa | 47.59% | 56% | |
Urocitellus parryii | Arctic Ground Squirrel | Mammalia (rodentia) | 96 | XP_026253666.1 | 1299aa | 46.47% | 55% | |
Myotis brandtii | Brandt's Bat | Mammalia (chiroptera) | 105 | XP_014400940.1 | 1390aa | 50.19% | 59% | |
Dromaius novaehollandiae | Emu | Aves | 312 | XP_025951247.1 | 1154aa | 31.56% | 47% | |
Pseudopodoces humilis | Ground Tit | Aves | 312 | XP_014112713.1 | 1415aa | 30.34% | 47% | |
Columba livia | Rock Dove | Aves | 312 | XP_021137589.1 | 1430aa | 30.45% | 46% | |
anser cygnoides domesticus | Swan Goose | Aves | 312 | XP_013043263.1 | 1126aa | 27% | 40% | |
Alligator sinensis | Chinese Alligator | Reptilia | 312 | XP_025067177.1 | 1626aa | 34% | 45% | |
Pogona vitticeps | Central Bearded Dragon | Reptilia | 312 | XP_020637641.1 | 1388aa | 27.76% | 38% | |
Callorhinchus milii | Australian Ghostshark | Chondrichthyes | 473 | XP_007896104.1 | 1210aa | 29% | 43% |
At this time the function of C1orf167 is uncharacterized.
According to the EST profile for breakdown by healthy state, the expression levels of C1orf167 were higher than healthy cells for leukemia, head, neck and lung cancers. Based on the results from NCBI GeoProfiles, C1orf167 was found to have increased expression on dendritic cells for patients experiencing Chlamydia pneumoniae infections. Increased expression of C1orf167 was also indicated for Human Pulmonary Tuberculosis tissues given the presence of caseous tuberculosis granulomas in the lungs when compared to normal lung tissues.[32]