Chromosome 4 open reading frame 50 is a protein that in humans is encoded by the C4orf50 gene.[1] The protein localizes in the nucleus. C4orf50 has orthologs in vertebrates but not invertebrates[2]
The C4orf50 gene is on chromosome 4 at position 4p16.2 and is located on the minus strand.[3] The gene's longest isoform consists of 11 exons, a coding sequence of 6370 nucleotides, and an upstream in-frame stop codon.[4] Other genes in the gene neighborhood include: CRMP1 and JAKMIP1
C4orf50 is 1508 amino acids long and has a calculated molecular weight of 30 kDa. The isoelectric point is at approximately a pH of 5.6.[5] In addition, the protein has higher than normal amounts of glutamic acid and arginine, and lower than normal amounts of phenylalanine and tyrosine.[6]
i-TASSER and Phyre 2 predict C4orf50 to have a tertiary structure rich in alpha helices concentrated near the N-terminus and C-terminus.[7] [8]
C4orf50 RNA is expressed lowly and ubiquitously in most tissue types. C4orf50 is expressed at a much higher level in the brain, testis, adrenal, and prostate. C4orf50 was expressed in specific parts of the brain including the hippocampus and striatum. Other tissues with moderate expression included the frontal lobe, parietal lobe, and amygdala. In all available RNA-sequencing data shows C4orf50 is found in the brain.
It is predicted that C4orf50 has 21 phosphorylation sites, one sulfonation site, one N-glycosylation site, and several O-glycosylation sites.[9]
The primary subcellular location is the nucleus. Immunofluorescent staining of C4orf50 antibodies show that C4orf50 is present in the nucleus, but the reason remains unknown.[10] C4orf50 is less abundant than most proteins in humans[10]
OrthologsC4orf50 in Homo sapiens is poorly conserved. It is found in vertebrates but not invertebrates and has many orthologs including mammals, reptiles, birds, amphibians, and fish.[11] Table 1 below shows orthologs of C4orf50 in mammals, reptiles, birds, amphibians, and fish. C4orf50 is evolving considerably quickly compared to reference sequences Cytochrome C and Fibrinogen alpha. This is shown to the right when comparing the divergence rates of C4orf50, Cytochrome C, and Fibrinogen Alpha.
Genus and Species | Common Name | Taxonomic Group | Median Date of Divergence (MYA*) | Accession # | Sequence Length (aa) | Sequence Identity to Human Protein (%) | Sequence Similarity to Human Protein (%) | |
---|---|---|---|---|---|---|---|---|
Homo sapiens | Human | Primate | 0 | XP_047271622 | 1508 | 100 | 100 | |
Tupaia chinensis | Chinese Tree Shrew | Tupaiidae | 85 | XP_027622007 | 1448 | 93 | 53.2 | |
Mus musculus | House Mouse | Rodentia | 87 | XP_006504299 | 1238 | 90 | 41.9 | |
Talpa occidentalis | Iberian Mole | Talpidae | 94 | XP_037386436 | 1364 | 79 | 44.3 | |
Mauremys mutica | Yellow Pond Turtle | Testudines | 319 | XP_044874448 | 1954 | 62 | 30.5 | |
Alligator mississippiensis | American Alligator | Crocodilia | 319 | XP_019333198 | 1893 | 37 | 28.3 | |
Apteryx rowi | Okarito Kiwi | Apterygiformes | 319 | XP_025910622 | 1459 | 8 | 47.2 | |
Aquila chrysaetos chrysaetos | Golden Eagle | Accipitriformes | 319 | XP_040979081 | 1611 | 10 | 38.3 | |
Gallus gallus | Chicken | Galliformes | 319 | XP_046772670 | 1627 | 7 | 44.6 | |
Anser cygnoides | Swan Goose | Anseriformes | 319 | XP_047902118 | 1596 | 18 | 31.7 | |
Falco cherrug | Saker Falcon | Falconiformes | 319 | XP_027669980 | 1518 | 8 | 50.4 | |
Strigops | Kakapo | Psittaciformes | 319 | XP_030347251 | 1497 | 8 | 50.4 | |
Geotrypetes seraphini | Gaboon Caecillian | Dermophiidae | 353 | XP_033815404 | 1897 | 11 | 37.8 | |
Halichoerus grypus | Grey Seal | Phocidae | 94 | XP_035960566 | 1536 | 85 | 51 | |
Amblyraja radiata | Thorny Skate | Rajiformes | 464 | XP_032876992 | 2434 | 74 | 50.8 |