ZNF337, also known as zinc finger protein 337, is a protein that in humans is encoded by the ZNF337 gene. The ZNF337 gene is located on human chromosome 20 (20p11.21). Its protein contains 751 amino acids, has a 4,237 base pair mRNA and contains 6 exons total.[1] In addition, alternative splicing results in multiple transcript variants.[2] The ZNF337 gene encodes a zinc finger domain containing protein, however, this gene/protein is not yet well understood by the scientific community. The function of this gene has been proposed to participate in a processes such as the regulation of transcription (DNA-dependent), and proteins are expected to have molecular functions such as DNA binding, metal ion binding, zinc ion binding, which would be further localized in various subcellular locations.[3] [4] While there are no commonly associated or known aliases, an important paralog of this gene is ZNF875.[5]
There are no commonly associated or known aliases beyond Zinc Finger 337, however, some potential ones could include LOC26152.[6] Its locus is found on chromosome 20, positioned 11.21 (20p11.21). Base coordinates are on the negative (minus) strand. There are 6 exons in total. The span of the ZNF337 gene (the start of transcription to the polyA site in base-pairs) is 4,237 base pairs (mRNA).
The ZNF337 gene contains two transcript variants (both encode the same protein); variant 1 represents the longer transcript (751 aa) while variant 2 differs in the 5’ UTR. There are also three isoforms (X1, X2, and X3). These isoforms represent one of many splice variants of the gene (while the transcript is an expressed sequence).
ZNF337 has a predicted molecular weight of about 86.9 kdal and a predicted isoelectric point of 9.74 pI.[7] It is important to note that these are predictions as post translational modifications could affect these values. As suggested by the protein's name, there are several zinc fingers. There are no high scoring hydrophobic or transmembrane segments/regions and has no positive or negative charge clusters.[8]
Some amino acids found in ZNF337 are seen in unusual amounts as shown below. In amino acid distribution, glutamine (E), methionine (M), and alanine (A) are low while cysteine (C) and histidine (H) are high. It is rare for cysteine particularly to be highly expressed in amino acid sequences; the ZNF337 protein is an unusually basic protein. Because of its basic properties, it is DNA or RNA loving (i.e. able to bind to DNA or RNA fairly easily).
As found through the MyHits program (found on ExPasy), there are about 6 different motifs (or pfams) present in ZNF337.[9]
KRAB (KRAB box) | 12-52 | 6.6e-26 | |
PHD (PHD-finger) | 349-412 | 0.0032 | |
Rpr2 (RNAse P Rpr2/Rpr21/SNM1 domain) | 472-551 | 0.00088 | |
Zf-C2H2 (Zinc finger, C2H2 type) | 208-230 | 4.3e-06 | |
236-258 | 3.8e-09 | ||
264-286 | 6e-07 | ||
292-314 | 2.4e-08 | ||
320-342 | 4.6e-07 | ||
348-370 | 1.9e-09 | ||
376-398 | 2.8e-07 | ||
404-426 | 5.9e-09 | ||
432-454 | 1.2e-07 | ||
460-482 | 1.8e-08 | ||
488-510 | 3.1e-07 | ||
516-538 | 3.8-07 | ||
544-566 | 2.1e-06 | ||
572-594 | 2.2e-06 | ||
600-622 | 5e-07 | ||
628-650 | 1.3e-08 | ||
656-679 | 0.00014 | ||
685-707 | 2.4e-07 | ||
713-735 | 1.2e-07 | ||
Zf-C3HC4 (Zinc finger, C3HC4 type (RING finger)) | 210-269 | 0.00083 | |
Zf-FCS (MYM-type zinc finger with FCS sequence motif) | 342-385 | 0.02 |
The secondary structure of ZNF337 is predicted to have many helices, sheets, turns and coils (especially random coils) as shown below.[10] [11]
Alpha Helix | 169 | 22.50% | |
Extended Strand | 154 | 20.51% | |
Random Coil | 428 | 56.99% |
Both the H. Sapiens and P. troglodyte secondary structures are extremely similar; however, it is interesting to compare to S. dumerili where there is a stronger presence of sheets and coils between both 200-300 bp and 400-500 bp positions instead of sheets and helices. Additionally, comparing the beginning of the secondary structure (0-14 bp) of all species/orthologs shows that coils and turns make up the majority of the beginning, but not as much in some species such as S. dumerili (more helices and sheets instead).
Several tertiary structure modeling programs were unable to construct a model for ZNF337. When using the SWISS-model program, some models were constructed, however, to ZNF568. The ZNF568 protein sequence is 45.20% identical to that of ZNF337, has a sequence similarity of 0.44, and coverage of 0.37 with a range between the 345-623 bp amino acids in the ZNF337 protein sequence.[12] The predicted tertiary structure is shown in Figure 1. In this figure, there are several zinc ion ligands.
ZNF568 is a protein coding gene, associated with diseases such as transient neonatal diabetes mellitus. It has transcriptional repression activity, partially through the recruitment of the co-repressor TRIM28, but also has repression activity independently of this interaction. It is specifically important during embryonic development, where it acts as a direct repressor of a placental-specific transcript of IGF2 in early development and regulates convergent extension movements required for axis elongation and tissue morphogenesis in all germ layers. It is also crucial for normal morphogenesis of extraembryonic tissues including the yolk sac, extraembryonic mesoderm and placenta. Interestingly, it may enhance proliferation or maintenance of neural stem cells [13]
The promoter region was chosen using ElDorado at Genomatrix, which assessed the ZNF337 gene locus for possible promoter regions. Out of the six possible promoter regions and sets, promoter set 6 (GXP_8991829) was chosen as it is the one best supported by transcripts (has six transcript ID's). Its start position is 25696627, its end position is 25697904 and its length is 1278 base pairs. Within GXP_8991829 (-), coding transcript GXT_26235925 was chosen as it has 5 exons, 37,403 CAGE tags, and corresponds with accession number XM_006723558 in NCBI (see Figure 2).
The promoter sequence contains a CpG island with a CpG count of 138. There is also a DNAse cluster (score =1000) present within the promoter sequence.
Possible transcription factors for the ZNF337 promoter region were determined using ElDorado at Genomatrix. These are listed below in Table 2.
Transcription Factor | Detailed Matrix Information | Anchor Base/Position | Matrix Similarity | Sequence | |
---|---|---|---|---|---|
TF2B | Transcription factor II B (TFIIB) recognition element | 984 | 1.0 | ccgCGCC | |
VTBP | Avian C-type LTR TATA box | 21 | 0.814 | ctatagtTAAGaacaat | |
Avian C-type LTR TATA box | 743 | 0.825 | ttttattTAGGtagccc | ||
Lentivirus LTR TATA box | 314 | 0.83 | gtgTATAatatgctgat | ||
Cellular and viral TATA box elements | 177 | 0.961 | ccctaTAAAtatgtaca | ||
Cellular and viral TATA box elements | 275 | 0.911 | aaataTAAAgtctacgt | ||
CAAT | Cellular and viral CCAAT box | 553 | 0.909 | taaaCCATtgagaga | |
CAAT | Nuclear factor Y (Y-box binding factor) | 114 | 0.939 | taccCCAAtcaccct | |
CEBP | CCAAT/enhancer binding protein (C/EBP), epsilon | 289 | 0.974 | gtggtttgGCAAgcc |
There are 340 factors from 129 cell types of Transcription Factor ChIP-seq Clusters (from Encode3).[14] With that said, only the strong ones (indicated as black or dark grey) that also contain peaks within the promoter or enhancer regions are shown in Table 3.
Location | Transcription Factor – ChIP | Cell Type(s) | |
---|---|---|---|
Promoter | CTCF | GM12878 (human lymphoblastoid), H1-hESC (human embryonic stem cells), K562 (myelogenous leukemia cells) | |
Promoter | RFX5 | GM12878 (human lymphoblastoid) | |
Promoter | STAT1 | GM12878 (human lymphoblastoid) | |
Promoter | TAF1 | GM12878 (human lymphoblastoid) | |
Promoter | TRIM22 | GM12878 (human lymphoblastoid) | |
Promoter | REST | H1-hESC (human embryonic stem cells) | |
Promoter | GABPA | HeLa-S3 (cervical cancer cell line) | |
Promoter | MAFK | HeLa-S3 (cervical cancer cell line) | |
Promoter | TBP | HeLa-S3 (cervical cancer cell line) | |
Promoter | FOXA1 | HepG2 (human liver cancer cell line) | |
Promoter | SIN3A | HepG2 (human liver cancer cell line) | |
Promoter | SP1 | HepG2 (human liver cancer cell line) | |
Promoter | GATA2 | K562 (myelogenous leukemia cells) | |
Promoter | MYC | K562 (myelogenous leukemia cells) | |
Promoter | POLR2A | Body of Pancreas | |
Promoter | FOS | Endothelial Cell of Umbilical Vein |
According to ORegAnno (literature curated TFBSs), there is no TF-ChIP signal overlap within the promoter/enhancer regions. Most of the ORegAnno citations correlate with a “NANP” gene, while transcription factors CTCF and CEBPA are confirmed in the enhancer region for the ZNF337 gene.
Both RNA sequence data from the Gene database records at NCBI and the Human Protein Atlas [15] using immunohistochemical staining to determine protein in various tissues show that the ZNF337 protein is expressed in many tissues. While ZNF337 mRNA tissue specificity is expressed in low tissue specificity levels, the mRNA is notably expressed in the cerebellum (brain) but is also more highly expressed in all tissues (distribution in all) compared to protein expression, especially higher in female tissues.
An antibody was developed against a recombinant protein corresponding to amino acids: ESSQGQRENPTEIDKVLKGIENSRWGAFKCAERGQDFSRKMMVIIHKKAHSRQKLFTCRECHQGFRDESALLLHQN. The specificity of human ZNF337 antibody was verified on a Protein Array containing target protein plus 383 other non-specific proteins. This isotype is IgG, its clonality is polyclonal, its host is rabbit, and its purity is immunogen affinity purified. This staining of human cerebellum shows cytoplasmic positivity in Purkinje cells (which regulate and coordinate motor movements through inhibitory functions and neurotransmitters).[16]
While there is little-some expressivity in a wide range of tissues, together, these results indicate a trend that expressivity is highest and most present in the brain, particularly the cerebellum. A few experiments and results also indicate expressivity in female (and some male) reproductive tissues.
Multiple sequence alignments were created to observe conservation between different species. Specifically, a multiple sequence alignment (MSA) of the ZNF337 promoter region in primates and marsupial (opossum, chimpanzee, human, and rhesus monkey), or closely related species, shows little to no conservation in the beginning of sequences.
There are highly conserved regions in the beginning of both the 5’ UTR and 3’ UTR multiple sequence alignments. These could be functionally important based on stem-loop formations, miRNA binding capacity, or RNA binding protein binding capacity.
The prediction for localization of ZNF337 is highest in the nucleus (nuclear) at 95.7% followed by 4.3% in the mitochondria (mitochondrial).
ZNF337 contains many predicted post-translational domains such as phosphorylation (serine and tyrosine kinases),[17] PEST motifs,[18] O-GlcNAc sites,[19] SUMOylation,[20] and glycation[21] as seen below:
Phosphorylation | 46, 109, 127, 155, 287, 446, 474, 483, 672, 695, 708, 743, 745, 751 | |
PEST motif | 598-612 | |
O-GlcNAc sites | 109, 142, 231, 384, 750, 751 | |
SUMOylation | 633 | |
Glycation | 94, 123, 125, 199, 206, 234, 248, 309, 339, 374, 388, 407, 430, 449, 486, 547, 556, 617, 668, 730 |
No predicted transmembrane domains were identified from tests run through SOSUI.[22] A prediction for a new signal peptide is very low and negative at -3.83. The GvH is also very negative at -8.69 (with a possible cleavage site between amino acids 56 and 57), indicating a low possibility that it has a cleavable signal sequence. Thus, ZNF337 is predicted to have no N-terminal signal peptide. Also, Reinhardt's method for cytoplasmic/nuclear discrimination has a cytoplasmic prediction for ZNF337 with a reliability score of 94.1.[23]
The nuclear localization signal is somewhat low at 0.75. Orthologs (P. troglodytes, S. dumerili, and C. asiatica) were used to confirm the significance of these predictions. Likewise, there were no predictions of no N-terminal peptide signals and transmembrane domains. All these ZNF337 orthologous proteins confirmed the prediction of nuclear location at 95.7%.
An important paralog of the ZNF337 gene is ZNF875.
ZNF337 has many orthologs shown in a wide variety of species (vertebrates and invertebrates), such as primates, bony fishes, rodents, and even some plants as seen in Table 6 below. There are no orthologs found outside plants. Highly conserved amino acids and regions are shown in the middle-end of the ZNF337 protein sequence, suggesting that functions may differ due to less conservation in the beginning of ZNF337 sequences between species.
Phylogenetic trees highlight the evolution of species (specifically in relation to the evolution of the ZNF337 gene). Primates are clumped together closest to humans, while other species such as the megabat and mouse deviate from the cape golden mole or the zig zag eel and flier cichlid deviate from the greater amberjack. Species whose date of divergence from the human lineage (measured in units of millions of years ago) are greater show less sequence similarity and identity, which is also demonstrated through distance shown through phylogenetic trees.
Homo sapiens | Human | Primates | 0 | NP_056470 | 751 aa | 100% | 0 | |
Gorilla gorilla gorilla | Western gorilla | Primates | 8.6 | XP_004061979.1 | 751 aa | 99.5% | 0 | |
Pongo pygmaeus | Bornean orangutan | Primates | 15.2 | XP_009231663.1 | 753 aa | 98.3% | 0 | |
Colobus angolensis palliates | Angola colobus | Primates | 15.2 | XP_011807556.1 | 758 aa | 96.7% | 0 | |
Aotus nancymaae | Nancy Ma's night monkey | Primates | 42.9 | XP_012324051.1 | 751 aa | 94.4% | 0 | |
Pan troglodytes | Chimpanzee | Primates | 6.4 | XP_009435254.1 | 751 aa | 95.6% | 0 | |
Macaca mulatta | Rhesus macaque | Primates | 28.81 | XP_028683917.1 | 751 aa | 81.4% | 0 | |
Macaca fascicularis | Crab-eating macaque | Primates | 28.81 | XP_015313198.1 | 751 aa | 81.2% | 0 | |
Cebus capucinus imitator | Panamanian white-faced capuchin | Primates | 42.9 | XP_017376089.1 | 751 aa | 80.4% | 0 | |
Pan paniscus | Bonobo | Primates | 6.4 | XP_014198483.1 | 827 aa | 76.8% | 0 | |
Tupaia chinensis | Chinese tree shrew | Scandentia | 85 | XP_006163813.1 | 876 aa | 50.7% | 0 | |
Carlito syrichta | Philippine tarsier | Primates | 69 | XP_021573536.1 | 807 aa | 52.5% | 0 | |
Chrysochloris asiatica | Cape golden mole | Afrosoricida | 102 | XP_006877795.1 | 764 aa | 45.0% | 0 | |
Echinops telfairi | Lesser hedgehog tenrec | Afrosoricida | 102 | XP_030742187.1 | 1487 aa | 26.1% | 0 | |
Seriola dumerili | Greater amberjack | Carangidae ("Bony fishes") | 433 | XP_022604330.1 | 763 aa | 30.9% | 0 | |
Oreochromis niloticus | Nile tilapia | Cichildae ("Bony fishes") | 433 | XP_019222635.1 | 1033 aa | 10.2% | 0 | |
Archocentrus centrachus | Flier cichlid | Cichildae ("Bony fishes") | 433 | XP_030603298.1 | 794 aa | 28.6% | 0 | |
Mastacembelus armatus | Zig-zag eel | Synbrachiformes | 433 | XP_026164592.1 | 760 aa | 28.7% | 0 | |
Pteropodidae | Megabat | Chiroptera | 94 | 751 aa | 35.4% | 0 | ||
Mus musculus | Mouse | Rodentia | 89 | 751 aa | 26.5% | 3.00e-153 | ||
Ciona intestinalis | Sea squirt | Enterogona | 603 | 1278 aa | 15.6% | 3.00e-96 | ||
Petromyzontiformes | Sea lamprey | Lamprey | 599 | 751 aa | 7.1% | 4.00e-35 | ||
Drosophila sechellia | Fruit fly | Fly | 736 | 751 aa | 6.9% | 4.00e-35 | ||
Pristionchus pacificus | Roundworm | Rhabditida | 736 | 751 aa | 2.9% | 6.00e-15 | ||
Caenorhabditis briggsae | Nematode | Rhabditida | 736 | 751 aa | 1.9% | 2.00e-08 | ||
Camellia japonica | Japanese camellia | Plants | 1275 | 751 aa | 1.9% | 2.00e-08 |
ZNF337 is evolving at the molecular level very quickly. When compared to fibrinogen protein rate of evolution, the ZNF337 appears to be accumulating the same amount amino acid changes in the same amount of time. It is evolving faster than cytochrome C protein, which is known to evolve slowly, as well as hemoglobin.
The ZNF337 gene encodes a zinc finger domain containing protein, however, this gene/protein is not yet well understood by the scientific community.
The function of this gene has been proposed to participate in a processes such as the regulation of transcription (DNA-dependent), and proteins are expected to have molecular functions such as DNA binding, metal ion binding, zinc ion binding, which would be further localized in various subcellular locations.
Because ZNF337 has several post-translational modification sites, alternative protein states may be present that permit ZNF337 to have different forms.
ZNF337 also has a variety of interactions with other proteins as discussed above, suggesting it may have a broad range of action. The different transcription factors demonstrate roles in transcription regulation. The KRAB box in the beginning of the sequence may play an important role in cell differentiation and development as well as regulating viral replication and transcription.[24] PHD fingers are found in nuclear proteins involved in epigenetics and chromatin-mediated transcriptional regulation.[25] Zinc finger C2H2 transcription factors are sequence-specific DNA binding proteins that regulate transcription. They possess DNA-binding domains that are formed from repeated Cys2His2 zinc finger motifs.[26] Also, many proteins containing a RING finger play a key role in the ubiquitination pathway.
Only the CEBPA transcription factor within the strongest DNAse HS cluster was also detected by GenoMatix. GenoMatix determined that potential transcription factors could include the following: TF2B, VTBP, CAAT, and CEBP. This is confirmed to be associated with the ZNF337 gene by the TF-ChIP ENCODE data and ORegAnno. The cluster score for this overlapping transcription factor, CEBPA, is 1000. Transcription Factors that might bind to regulatory sequences, specifically the enhancer region, includes CEBPA (chr20:25670005-25670302) and CTCF (chr20:25670168-25670507).
Diseases associated with the ZNF337 gene include the development of adult astrocytic tumors,[27] which is the most common glial (brain cell) tumor occurring within the brain and spinal cord.[28] This observation and association could make sense as there is a high expression of the ZNF337 gene in various parts of the brain (specifically the cerebellum).
There are several notable SNPs in the coding sequence of ZNF337. These mutations include mostly missense and nonsense mutations.[29] [30]