CCDC130 explained

Coiled-coil domain containing 130 is a protein that in humans is encoded by the CCDC130 gene. It is part of the U4/U5/U6 tri-snRNP in the U5 portion. This tri-snRNP comes together with other proteins to form complex B of the mature spliceosome. The mature protein is approximately 45 kilodaltons (kDa) and is extremely hydrophilic due to the abnormally high number of charged and polar amino acids.^[1] CCDC130 is a highly conserved protein, it has orthologous genes in some yeasts and plants that were found using nucleotide and protein versions of the basic local alignment search tool (BLAST) from the National Center for Biotechnology Information.^[2] GEO profiles for CCDC130 have shown that this protein is ubiquitously expressed, but the highest levels of expression are found in T-lymphocytes.

Function

While the specific function of CCDC130 is still unknown, there have been several studies and research papers identifying it as a component of the U5 portion of the U4/U5/U6 tri-snRNP that helps form Complex B of the human spliceosome after coming together with Complex A. Complex B then undergoes more modifications and conformational changes before becoming a mature spliceosome. In one study, the conservation of spliceosomal components is discussed by comparing the human spliceosome with that of yeast. In this study, CCDC130 is categorized as a known splicing factor and its homolog in yeast is Yju2.^[3] This yeast protein is a splicing factor that helps form the complete, active spliceosome and promotes the first step of splicing, which involves cleavage at the 5' splice site of the first exon. Based on this information, it is likely that CCDC130 plays a similar role in the human spliceosome, but due to the higher complexity of the human spliceosome, this protein may perform other functions or a completely different function. Due to its high number of phosphorylation sites, it is likely that this protein is activated and recruited to the spliceosomal complex through phosphorylation or dephosphorylation (see Post-translational modifications). Since this gene is ubiquitously expressed and expressed 2.9 times higher than the average gene, it is clear that this protein plays an integral part in the proper function of the spliceosome.

Gene

Aliases

Coiled-coil domain containing 130 has several aliases, including CCDC130, SB115, LOC81576, and MGC10471.

Locus

CCDC130 is located on the short arm of chromosome 19 in humans. The exact locus is 19p13.2. The entire gene spans from 13858753-13874106 on the + strand of chromosome 19. CCDC130 is bordered upstream by CACNA1A on the - strand, glatobu, smagly, and socho on the + strand, and downstream by MGC3207, C19orf53, ZSWIM4 on the + strand and joypaw, smeygly, floytobu, smawgly, and wycho on the - strand. Glatobu, smagly, socho, joypaw, smeygly, floytobu, smawgly, and wycho have only been verified by cDNA sequences in GenBank and have no information available about their function. There are also several small genes found within the CCDC130 sequence, with snugly, glytobu, stygly, and glartobu occurring on the + strand and chacho, zoycho, spogly, glotobu, glutobu, and sneygly occurring on the - strand. All of these small genes have extremely low levels of expression (under 3% of the expression of the average gene), with stygly having the highest expression at 2.8% of the average.

Promoter

There were several predicted promoters found for CCDC130 using ElDorado from Genomatix, but the promoter that corresponds the closest to the protein sequence is 760 bases and spans from 13858094-13858853 on chromosome 19.^[4]

Homology and evolution

Paralogs

There is only one paralog identified for CCDC130, which is CCDC94, the only other known human member in the CWC16 family of proteins. The two have about 27% identity, most of which is located in the COG5134 domain and at the C-terminus. CCDC94 has three predicted serine phosphorylation sites at positions 213, 220, and 306 that line up with serines in CCDC130 in the multiple sequence alignment and a threonine phosphorylation site that lines up with a phosphorylated serine in CCDC130.^[5]

Orthologs and homologs

CCDC130 is a highly conserved protein, with true orthologs present in primates, other mammals, amphibians, reptiles, fish, and even invertebrates, such as insects and marine invertebrates. Bird orthologs have not been found in nucleotide or protein BLASTs There have been homologous genes documented in yeasts and other fungi, as well as plants. It is unclear when the most distant homolog of CCDC130 arose, but it was well before the divergence of autotrophs and heterotrophs

Sequence	Genus	Species	Common Name	Date of Divergence (mya)	Accession #	Sequence Length (aa)	% Identity
1	Homo	sapiens	Human	N/A	NP_110445	396	100
2	Saimiri	boliviensis	Squirrel monkey	42.6	XP_003941759	396	94
3	Ailuropoda	melanoleuca	Giant panda	94.2	XP_002921062	392	88
4	Canis	lupus familiaris	Dog	94.2	XP_542031	397	87
5	Bos	taurus	Cow	94.2	NP_001069812	400	86
6	Sus	scrofa	Wild boar	94.2	XP_003123393	398	86
7	Cricetulus	griseus	Chinese hamster	92.3	XP_003501975	383	79
8	Mus	musculus	Mouse	92.3	NP_080626	385	78
9	Sarcophilus	harrisii	Tasmanian devil	162.6	XP_003760711	367	77
10	Anolis	carolinensis	Anole lizard	296	XP_003216443	373	62
11	Xenopus	laevis	African clawed frog	371.2	NP_001086365	384	61
12	Danio	rerio	Zebrafish	400.1	NP_991158	390	56
13	Takifugu	rubripes	Pufferfish	400.1	XP_003972319	379	65
14	Amphimedon	queenslandica	Sponge	716.5	XP_003388671	299	46
15	Culex	quinquefasciatus	Mosquito	782.7	XP_001846118	329	53
16	Bombus	impatiens	Bumblebee	782.7	XP_003485202	314	55
17	Caenorhabditis	remanei	Nematode	937.5	XP_003094402	365	44
18	Schizosaccharomyces	pombe	Yeast	1215.8	NP_595734.2	294	27
19	Cucumis	sativus	Cucumber	1369	XP_004135117	313	47

Conserved regions

CCDC130 has two conserved domains and a coiled-coil region. The first is the COG5134 domain which is found to be conserved in cucumbers and likely plays a role in the function of the protein because it is always the most highly conserved region in any multiple sequence alignment. It spans approximately the first 170 amino acids of the protein. The other domain is the DUF572 domain, which is a eukaryotic domain of unknown function that is shared by all of the orthologs and a majority of the more distant homologs. This domain doesn't have a defined range, as different sources have reported different lengths, some saying that it is the entire protein. The coiled-coil region is from 182-214 in the human protein and is rich in charged amino acids. The modified residues are also very well conserved.

Protein

The most abundant variant of CCDC130 is encoded by the second longest open reading frame (ORF), corresponding to a 396 amino acid protein with a molecular weight of 44.8 kDa and an isoelectric point of 8.252. The CCDC130 protein is rich in charged amino acids and deficient in uncharged, non-polar amino acids. Mobyle @ Pasteur predicted CCDC130 to be extremely hydrophilic due to the large numbers of charged and polar amino acids, with no site scoring above zero on the hydrophobicity graph and some sites reaching as low as -6 (F180). There is a region in the coiled coil domain (182-214) in which 14 of 18 amino acids are charged. SAPS analysis predicted that this protein would be unstable. Due to its high hydrophilicity, this protein definitely does not contain transmembrane segments.

Variation

There are 17 different mRNAS produced from the CCDC130 gene. 13 of these mRNAs come from alternative splicing, and the other four are unspliced. There have been four alternative promoters, five alternative polyadenylation sites, and four alternative last exons described. Two instances of intron retention have been described. 14 different proteins have been identified from the CCDC130 gene, all of which contain the DUF572 domain but only five seem to show the coiled-coil stretch. The other three mRNAs were very low quality and were not translated. It was also noted that this gene has the potential to encode several non-overlapping proteins. 45 SNPs have been documented for CCDC130 on NCBI: 29 missense mutations and 16 synonymous mutations that don't change the amino acid.

Post-translational modifications

CCDC130 is a heavily phosphorylated protein, with 31 different phosphorylation sites predicted by NetPhos and 26 of those 31 being located in the C-terminal half of the protein.^[5] 17 of 22 serines, 4 of 6 threonines, and 2 of 3 tyrosines predicted had probability scores over .800, indicating a high likelihood that they are true phosphorylation sites.^[5] There were six sumoylation sites predicted, but only one of these sites (K177) had a probability score of higher than .500, at .640.^[6] The physiological function of sumoylation is still relatively mysterious, but this modification can add a substantial amount of molecular weight onto a protein (11 kDa). 13 glycation sites with probability scores over .500 were predicted, and 10 of the 13 glycated lysines occur in the N-terminal half of the protein.^[7] NetOGlyc predicted 11 possible O-glycosylationsites with probability scores over .500, with all 11 occurring in a 64 amino acid span running from T313 to T376.^[8] Several of these sites were predicted as both phosphorylation sites and O-glycosylation sites. CCDC130 was not predicted to be sulfated,^[9] acetylated,^[10] myristoylated,^[11] N-glycosylated,^[12] C-mannosylated,^[13] or undergo any GPI modification.^[14]

Secondary structure

There is a long alpha helix sequence predicted in CCDC130 that spans from R121-A211 that was predicted by YASPIN. Other programs for secondary structure analysis, such as PELE, CHOFAS, and SABLE, also predicted alpha helices of varying lengths in this region.^[15] There were no consistent predictions for beta sheets in CCDC130.

Interaction information

There are several proteins listed that interact with CCDC130, including EEF1A1, NINL, TRAF2, ZBTB16, ZNF165, and ZNF24. EEF1A1 is a eukaryotic elongation factor that is involved in the binding of aminoacyl-tRNA to the A-site of ribosomes during translation.^[16] NINL is a ninein-like protein that is involved in microtubule organization and has calcium ion binding activity. TRAF2, tumor necrosis factor (TNF) receptor associated factor 2, is part of some E3 ubiquitin ligase complexes and is involved in ubiquitinating proteins so they can get degraded by the proteasome. ZBTB16, zinc finger and BTB domain-containing protein 16, is also part of the E3 ubiquitin ligase complex and is most likely involved in substrate recognition. There is also an alternate form of CCDC130 where only 803 bases are transcribed instead of 1433 bases, but there is no additional information provided.^[17] ZNF165 and ZNF24 are both zinc finger proteins, which bind DNA and other proteins to regulate transcription. Below is a table of the interacting proteins for CCDC130 assembled by GeneCards. The interactions of CCDC130 with NINL, ZNF24, TRAF2, JUP, GATA5 have been verified by a two-hybrid screen according to STRING, so these interactions do occur. JUP is a plaque protein. GATA5 is a transcription factor that helps activate the promoter for lactase-phlorizin hydrolase. Interactions with CDA, DERA, CDC40, NAA25, DGCR14, NAA20, and PRPF19 have not been verified experimentally, but interactions between gene homologs have been documented in other species according to STRING so these interactions could potentially occur. ZBTB16, EEF1A1, and ZNF165 all have been verified by at least one two-hybrid screen according to MINT. NAT9 was described as a known interactant on I2D. In a study done at the University of the District of Columbia to characterize CCDC130, they have found that it is induced through insulin signaling, is targeted by three different kinases (GSK3, CK1, and CK2), and is a mitochondrial protein.5 The study also shows that CCDC130 can potentially be used as a biomarker for certain types of cancer due to its differential expression in cancer cells. The study specifically mentions that CCDC130 is downregulated in some types of colon cancer, which allowed more cancer cells to be untargeted by the apoptosis pathway.

Expression

CCDC130 is a ubiquitously expressed protein, showing some expression level in all tissue and cell samples analyzed. The AceView profile for CCDC130 shows expression levels 2.9 times higher than the average protein. The level of expression varies greatly between tissues, but there is at least some level of expression in every sample. According to NCBI GEO profiles and BioGPS data, the fetal thyroid, adrenal cortex, uterus, prostate, testes, seminiferous tubule, heart, PB-CD4+ T cells, PB-CD8+ T cells, lymph node, lung, thymus, thyroid, leukemia chronic myelogenous K562, and leukemia lymphoblastic molt4 samples all had at expression levels above the 75th percentile for gene expression in at least one of two samples. Gene expression was lower than the 25th percentile in at least one of two samples for cerebellum peduncles, occipital lobe, pons, trigeminal ganglion, subthalamic nucleus, superior cervical ganglion (drastically different expression levels), dorsal root ganglion, fetal liver, uterus corpus, atrioventricular node, appendix, skeletal muscle, cardiac myocytes, tongue, and salivary gland. PB-CD8+ T cells had the highest relative CCDC130 expression and the tongue had the lowest relative expression. For more information about CCDC130 expression, see mouse brain expression data or human brain microarray data from Allen Brain Atlas or differential expression in GEO profiles from NCBI.

Medical information

CCDC130 has shown to be differentially expressed in several cancers, including breast, colon, and pancreatic through microarray studies of cancer cells.^[18] It was shown to be down-regulated in colon cancers, suggesting that it could be a biomarker for cancers. There is still research being done on this topic to confirm its function as a cancer identifier. Many websites also say that it is involved in the cell's response to viral infection, but there is no specific information on this nor any elaboration.

Notes and References

Web site: CCDC130 Analysis. Biology Workbench. San Diego Supercomputing Center- University of California San Diego. 7 May 2013.
Web site: NCBI. National Library of Medicine. 2 April 2013.
Fabrizio P, Dannenberg J, Dube P, Kastner B, Stark H, Urlaub H, Lührmann R . The evolutionarily conserved core design of the catalytic activation step of the yeast spliceosome . Molecular Cell . 36 . 4 . 593–608 . November 2009 . 19941820 . 10.1016/j.molcel.2009.09.040 . free . 11858/00-001M-0000-0010-9378-C . free .
Web site: El Dorado. Genomatix Software GmbH. 11 April 2013. 2 December 2021. https://web.archive.org/web/20211202010908/https://www.genomatix.de/. dead.
Blom N, Gammeltoft S, Brunak S . Sequence and structure-based prediction of eukaryotic protein phosphorylation sites . Journal of Molecular Biology . 294 . 5 . 1351–62 . 1999 . 10600390 . 10.1006/jmbi.1999.3310 .
Web site: SUMOplot Analysis Program. Abgent- a WuXi AppTec Company. 14 May 2013.
Johansen MB, Kiemer L, Brunak S . Analysis and prediction of mammalian protein glycation . Glycobiology . 16 . 9 . 844–53 . 2006 . 16762979 . 10.1093/glycob/cwl009 . 10.1.1.128.831 .
Julenius K, Mølgaard A, Gupta R, Brunak S . Prediction, conservation analysis, and structural characterization of mammalian mucin-type O-glycosylation sites . Glycobiology . 15 . 2 . 153–64 . 2005 . 15385431 . 10.1093/glycob/cwh151 .
Monigatti F, Gasteiger E, Bairoch A, Jung E . The Sulfinator: predicting tyrosine sulfation sites in protein sequences . Bioinformatics . 18 . 5 . 769–70 . 2002 . 12050077 . 10.1093/bioinformatics/18.5.769. free .
Kiemer L, Bendtsen JD, Blom N . NetAcet: prediction of N-terminal acetylation sites . Bioinformatics . 21 . 7 . 1269–70 . 2005 . 15539450 . 10.1093/bioinformatics/bti130 . free .
Bologna G, Yvon C, Duvaud S, Veuthey AL . N-Terminal myristoylation predictions by ensembles of neural networks . Proteomics . 4 . 6 . 1626–32 . 2004 . 15174132 . 10.1002/pmic.200300783 . 20289352 .
Web site: Prediction of N-glycosylation sites in human proteins.. NetNGlyc1.0. 13 May 2013. R. Gupta . E. Jung . S. Brunak. Center for Biological Sequence Analysis- University of Denmark. 2004.
Julenius K . NetCGlyc 1.0: prediction of mammalian C-mannosylation sites . Glycobiology . 17 . 8 . 868–76 . 2007 . 17494086 . 10.1093/glycob/cwm050 . free .
Web site: big-PI Predictor. GPI Lipid Anchor Project- I.M.P. Bioinformatics. 14 May 2013. 21 July 2020. https://web.archive.org/web/20200721214740/http://mendel.imp.ac.at/sat/gpi/gpi_server.html. dead.
Web site: SABLE Secondary Structure Prediction. Cincinnati Children's Hospital Medical Center. 14 May 2013.
Web site: NextProt . CCDC130 interacting proteins . Swiss Institute of Bioinformatics. 14 May 2013.
Web site: GeneCards. Weizmann Institute of Science. 14 May 2013.
Wang Y, Sun G, Ji Z, Xing C, Liang Y . Weighted change-point method for detecting differential gene expression in breast cancer microarray data . PLOS ONE . 7 . 1 . e29860 . 20 January 2012 . 22276133 . 3262809 . 10.1371/journal.pone.0029860 . free . 2012PLoSO...729860W .