c7orf26 (Chromosome 7, Open Reading Frame 26) is a gene in humans that encodes a protein known as c7orf26 (uncharacterized protein c7orf26). Based on properties of c7orf26 and its conservation over a long period of time, its suggested function is targeted for the cytoplasm and it is predicted to play a role in regulating transcription.
Chromosome 7 is one of the 23 pairs of chromosomes in the human body, and spans about 159 million base pairs and represents about 5-5.5% of the total DNA in cells.[1] Changes to the structure of chromosome 7 can result in a number of genetic abnormalities, including Williams Syndrome which causes structural and cosmetic changes to the human body, ultimately resulting in a shorter lifespan.[2] There are hundreds of known open reading frames (ORF) along the domain of chromosome 7, however there is not much known about the 26th reading frame, which is of considerable interest.
Currently, two isoforms of c7orf26 are known in Homo Sapiens and are referred to as isoforms 1 and 2, respectively.[3]
c7orf26 (accession: NM_024067 / NP_076972; alias: MGC-2178) is located on the long arm of chromosome 7 (7p22.1), starting at 6590021 and ending at 6608726. The c7orf26 gene spans 2178 base pairs and is orientated on the + strand. The coding region is made up of a protein sequence measuring 449 amino acids long. It is divided into 6 transcripts containing a total of 24 exons on the forward strand and has 5952 unique Single Nucleotide Polymorphisms (SNPs).[4]
Genes ZDHHC4, ZNF853 and ZNF316 neighbor c7orf26 on chromosome 7.[5] Gene ZDHHC4 is a zinc-finger protein involved with cytochrome-c oxidase activity and protein-cysteine S-palmitoyltransferase activity and has overlapping regions with c7orf26.[6] Gene GRID2IP lies upstream by >2000 bp of c7orf26, and is heavily involved with in synaptogenesis and synaptic plasticity.[7]
c7orf26 is highly expressed in lymphatic, reproductive, and nervous tissue. These include the brain (frontal and occipital cortex), thymus glands, salivary glands, endometrium, cervix, and prostate. It is intermediately expressed in the lungs.[8]
No paralogs of c7orf26 have been found in the human genome, however, six unique isoforms have been identified. They are c7orf26 isoform (X1, X2, X3, X4) and isoform 2 (two sub-isoforms identified).[9]
Below is a table of a variety of orthologs of the human c7orf26. The table include closely, intermediately and distantly related orthologs.[10] Orthologs of the human protein c7orf26 are listed above in descending order of the date of divergence. c7orf26 is highly conserved throughout all orthologs, this is demonstrated with a 65% identity in the least similar ortholog. c7orf26 has evolved slowly and evenly over time.
Lingula anatina | Lingulata | Invertebrata | 916 | 406 | 72% | 66% | |
Cryptotermes secundus | West Indian Drywood Termite | Invertebrata | 797 | 403 | 77% | 70% | |
Acorn Worm | Invertebrata | 794 | 407 | 86% | 81% | ||
Burmese Python | Reptilia | 286 | 329 | 72% | 66% | ||
Speckled Mousebird | Aves | 273 | 309 | 87% | 83% | ||
Callorhinchus milii | Australian Ghostshark | Chondrichthyes | 177 | 441 | 70% | 57% | |
Cynoglossus semilaevis | Tonguefish | Osteichthyes | 128 | 307 | 67% | 52% | |
Clownfish | Osteichthyes | 117 | 481 | 65% | 50% | ||
Cape Elephant Shrew | Mammalia | 105 | 445 | 91% | 88% | ||
Ugandan Red Colobus | Mammalia | 102 | 449 | 88% | 84% | ||
Bleeding Heart Monkey | Primate | 43.6 | 617 | 91% | 88% | ||
Black Capped Squirrel Monkey | Primate | 43.2 | 585 | 92% | 90% | ||
Homo Sapiens | Human | Primate | 0 | 449 | 100% | 100% |
Below is a phylogenetic tree showing the evolutionary history of c7orf26 and its nearest orthologs.
The molecular weight of c7orf26 is 50 kilodaltons. The isoelectric point is 7.61. The protein sequence is uniquely rich for leucine at 15.8% of its composition, this may indicate a leucine-zipper. Further analysis from PSORT indicates that a leucine-zipper region is found at amino acid 318 and lasts until position 340 (22 amino acids long). There are no extremes with regards to acidity and alkalinity. c7orf26 has a positive charge cluster from amino acid 245 – 275 and does not have any negative, or mixed charge clusters.[11]
An even distribution of amino acids compose c7orf26. The percent composition of each amino acid is fairly consistent throughout the orthologs of the protein. The most distant ortholog displays the most variance in amino acid composition. There is a higher percent composition of tyrosine, histidine and leucine and a lower composition of valine and alanine.
c7orf26 is highly phosphorylated post modified. There are 66 predicted phosphorylated sites according to the NetPhos predictor of phosphorylation sites.[12] There are 4 unique sumoylation sites according to SUMOplot/SUMOsp programs.[13] Sumoylation sites are involved in a number of cellular processes, including nuclear-cytosolic transport, transcriptional regulation and protein stability.
According DAS-TMFilter Server,[14] c7orf26 has zero predicted transmembrane sites or transmembrane protein coding regions, therefore, it can be inferred with certainty that c7orf26 is not a transmembrane protein.
Using the GOR (Garnier-Osguthorpe-Robson)[15] method, it can be inferred that c7orf26 has unique secondary structure composed of alpha helices, random coil regions and extended strands. Random coil regions are most found in c7orf26, as they constitute 53.23% of the protein, while alpha helices constitute 34.30% and extended strands 12.47%.
According to PSORT, c7orf26 is predicted to be localized in the cytoplasm with 70.6% confidence.[16]
c7orf26 interacts uniquely with 11 different proteins, according to the Mentha interactome browser.[17] In particular, c7orf26 interacts with the entire family of 'INTS' (Integrator Complex Subunit 1–7). The Integrator Complex associates with the C-terminal domain of RNA polymerase II large subunit. It is involved in the transcription and processing of their transcripts. INTS mediates recruitment of cytoplasmic dynein to the nuclear envelope.
Outside of the INTS gene family, c7orf26 interacts with AK5,[18] HDGF, and ASUN.[19]
According to Guirato et al. (2018), there may be some evidence that regions on chromosome 7 may be directly linked to a nuclear estrogen receptor (ESR2) that modulates cancer cell proliferation and tumor growth.[20] In another journal article by Fu et al. (2014), there is further indication that regions along chromosome 7, located between open reading frames 20-30, directly correlate to cellular functions of a hepatoma-derived growth factor (HDGF), another way of expressing normal function in tumorigenesis.[21]