Z curve explained

The Z curve (or Z-curve) method is a bioinformatics algorithm for genome analysis. The Z-curve is a three-dimensional curve that constitutes a unique representation of a DNA sequence, i.e., for the Z-curve and the given DNA sequence each can be uniquely reconstructed from the other.[1] The resulting curve has a zigzag shape, hence the name Z-curve.

Background

The Z Curve method was first created in 1994 as a way to visually map a DNA or RNA sequence. Different properties of the Z curve, such as its symmetry and periodicity can give unique information on the DNA sequence.[2] The Z curve is generated from a series of nodes, P0, P1,...PN, with the coordinates xn, yn, and zn (n=0,1,2...N, with N being the length of the DNA sequence). The Z curve is created by connecting each of the nodes sequentially.[3]

xn=(An+Gn)-(Cn+Tn)

yn=(An+Cn)-(Gn+Tn)

zn=(An+Tn)-(Cn+Gn)

n=0,1,2,...N

Applications

Information on the distribution of nucleotides in a DNA sequence can be determined from the Z curve. The four nucleotides are combined into six different categories. The nucleotides are placed into each category by some defining characteristic and each category is designated a letter.[4]

PurineR = A, GAminoM = A, CWeak Hydrogen BondsW = A, T
PyrimidineY = C, TKetoK = G, TStrong Hydrogen BondsS = G, C
The x, y, and z components of the Z curve display the distribution of each of these categories of bases for the DNA sequence being studied. The x-component represents the distribution of purines and pyrimidine bases (R/Y). The y-component shows the distribution of amino and keto bases (M/K) and the z-component shows the distribution of strong-H bond and weak-H bond bases (S/W) in the DNA sequence.[5]

The Z-curve method has been used in many different areas of genome research, such as replication origin identification,[6] [7] [8] [9] , ab initio gene prediction,[10] isochore identification,[11] genomic island identification[12] and comparative genomics.[13] Analysis of the Z curve has also been shown to be able to predict if a gene contains introns,[14]

Research

Experiments have shown that the Z curve can be used to identify the replication origin in various organisms. One study analyzed the Z curve for multiple species of Archaea and found that the oriC is located at a sharp peak on the curve followed by a broad base. This region was rich in AT bases and had multiple repeats, which is expected for replication origin sites.[15] This and other similar studies were used to generate a program that could predict the origins of replication using the Z curve.

The Z curve has also been experimentally used to determine phylogenetic relationships. In one study, a novel coronavirus in China was analyzed using sequence analysis and the Z curve method to determine its phylogenetic relationship to other coronaviruses. It was determined that similarities and differences in related species can quickly by determined by visually examining their Z curves. An algorithm was created to identify the geometric center and other trends in the Z curve of 24 species of coronaviruses. The data was used to create a phylogenetic tree. The results matched the tree that was generated using sequence analysis. The Z curve method proved superior because while sequence analysis creates a phylogenetic tree based solely on coding sequences in the genome, the Z curve method analyzed the entire genome.[16]

External links

Notes and References

  1. Zhang CT, Zhang R, Ou HY . 2003 . The Z curve database: a graphic representation of genome sequences . Bioinformatics . 19 . 5 . 593–99 . 12651717 . 10.1093/bioinformatics/btg041. free .
  2. Zhang . Ren . Zhang . Chun-Ting . Z Curves, An Tool for Visualizing and Analyzing the DNA Sequences . Journal of Biomolecular Structure and Dynamics . February 1994 . 11 . 4 . 767–782 . 10.1080/07391102.1994.10508031 . 8204213.
  3. DFA7, a New Method to Distinguish between Intron-Containing and Intronless Genes. PLOS ONE. 2014-07-18. 4103774. 25036549. e101363. 9. 7. 10.1371/journal.pone.0101363. Chenglong. Yu. Mo. Deng. Lu. Zheng. Rong Lucy. He. Jie. Yang. Stephen S.-T.. Yau. free. 2014PLoSO...9j1363Y .
  4. A Brief Review: The Z-curve Theory and its Application in Genome Analysis. Current Genomics. 2014-04-01. 1389-2029. 4009844. 24822026. 78–94. 15. 2. 10.2174/1389202915999140328162433. Ren. Zhang. Chun-Ting. Zhang.
  5. A symmetrical theory of DNA sequences and its applications. Journal of Theoretical Biology. 1997-08-07. 0022-5193. 9245572. 297–306. 187. 3. 10.1006/jtbi.1997.0401. C. T.. Zhang. 1997JThBi.187..297Z .
  6. Zhang R, Zhang CT . 2005 . Identification of replication origins in archaeal genomes based on the Z-curve method . Archaea . 1 . 5 . 335–46 . 15876567 . 10.1155/2005/509646 . 2685548 . free .
  7. Worning P, Jensen LJ, Hallin PF, Staerfeldt HH, Ussery DW . Origin of replication in circular prokaryotic chromosomes. Environ. Microbiol.. 8. 2. 353–61. February 2006. 16423021. 10.1111/j.1462-2920.2005.00917.x. 2006EnvMi...8..353W. 3135023.
  8. Zhang. Ren. Zhang. Chun-Ting. 2002-09-20. Single replication origin of the archaeon Methanosarcina mazei revealed by the Z curve method. Biochemical and Biophysical Research Communications. 297. 2. 396–400. 0006-291X. 12237132. 10.1016/s0006-291x(02)02214-3.
  9. Worning. Peder. Jensen. Lars J.. Hallin. Peter F.. Staerfeldt. Hans-Henrik. Ussery. David W.. 2006-02-01. Origin of replication in circular prokaryotic chromosomes. Environmental Microbiology. 8. 2. 353–361. 10.1111/j.1462-2920.2005.00917.x. 1462-2912. 16423021. 2006EnvMi...8..353W . 3135023.
  10. Guo FB, Ou HY, Zhang CT . 2003 . ZCURVE: a new system for recognizing protein-coding genes in bacterial and archaeal genomes . . 31 . 6 . 1780–89 . 12626720 . 10.1093/nar/gkg254 . 152858.
  11. Zhang CT, Zhang R . 2004 . Isochore structures in the mouse genome . Genomics . 83 . 3 . 384–94 . 14962664 . 10.1016/j.ygeno.2003.09.011.
  12. Zhang R, Zhang CT . 2004 . A systematic method to identify genomic islands and its applications in analyzing the genomes of Corynebacterium glutamicum and Vibrio vulnificus CMCP6 chromosome I . Bioinformatics . 20 . 5 . 612–22 . 15033867 . 10.1093/bioinformatics/btg453. free .
  13. Zhang R, Zhang CT . 2003 . Identification of genomic islands in the genome of Bacillus cereus by comparative analysis with Bacillus anthracis . Physiological Genomics . 16 . 1 . 19–23 . 14600214 . 10.1152/physiolgenomics.00170.2003.
  14. Zhang. C. T.. Lin. Z. S.. Yan. M.. Zhang. R.. 1998-06-21. A novel approach to distinguish between intron-containing and intronless genes based on the format of Z curves. Journal of Theoretical Biology. 192. 4. 467–473. 10.1006/jtbi.1998.0671. 0022-5193. 9680720. 1998JThBi.192..467Z .
  15. Zhang. Ren. Zhang. Chun-Ting. 2002-09-20. Single replication origin of the archaeon Methanosarcina mazei revealed by the Z curve method. Biochemical and Biophysical Research Communications. 297. 2. 396–400. 0006-291X. 12237132. 10.1016/s0006-291x(02)02214-3.
  16. Zheng. Wen-Xin. Chen. Ling-Ling. Ou. Hong-Yu. Gao. Feng. Zhang. Chun-Ting. 2005-08-01. Coronavirus phylogeny based on a geometric approach. Molecular Phylogenetics and Evolution. 36. 2. 224–232. 10.1016/j.ympev.2005.03.030. 1055-7903. 15890535. 7111192. free. 2005MolPE..36..224Z .