Metabolic gene cluster explained

Metabolic gene clusters or biosynthetic gene clusters are tightly linked sets of mostly non-homologous genes participating in a common, discrete metabolic pathway. The genes are in physical vicinity to each other on the genome, and their expression is often coregulated.[1] [2] [3] Metabolic gene clusters are common features of bacterial[4] and most fungal[5] genomes. They are less often found in other[6] organisms. They are most widely known for producing secondary metabolites, the source or basis of most pharmaceutical compounds, natural toxins, chemical communication, and chemical warfare between organisms. Metabolic gene clusters are also involved in nutrient acquisition, toxin degradation,[7] antimicrobial resistance, and vitamin biosynthesis. Given all these properties of metabolic gene clusters, they play a key role in shaping microbial ecosystems, including microbiome-host interactions. Thus several computational genomics tools have been developed to predict metabolic gene clusters.

Databases

MIBiG, BiG-FAM

Bioinformatic tools

Tools based on rules

Bioinformatic tools have been developed to predict, and determine the abundance and expression of, this kind of gene cluster in microbiome samples, from metagenomic data.[8] Since the size of metagenomic data is considerable, filtering and clusterization thereof are important parts of these tools. These processes can consist of dimensionality -reduction techniques, such as Minhash,[9] and clusterization algorithms such as k-medoids and affinity propagation. Also several metrics and similarities have been developed to compare them.

Genome mining for biosynthetic gene clusters (BGCs) has become an integral part of natural product discovery. The >200,000 microbial genomes now publicly available hold information on abundant novel chemistry. One way to navigate this vast genomic diversity is through comparative analysis of homologous BGCs, which allows identification of cross-species patterns that can be matched to the presence of metabolites or biological activities. However, current tools are hindered by a bottleneck caused by the expensive network-based approach used to group these BGCs into gene cluster families (GCFs).BiG-SLiCE (Biosynthetic Genes Super-Linear Clustering Engine), a tool designed to cluster massive numbers of BGCs. By representing them in Euclidean space, BiG-SLiCE can group BGCs into GCFs in a non-pairwise, near-linear fashion.

Satria et al., 2021[10] across BiG-SLiCE demonstrate the utility of such analyses by reconstructing a global map of secondary metabolic diversity across taxonomy to identify uncharted biosynthetic potential, opens up new possibilities to accelerate natural product discovery and offers a first step towards constructing a global and searchable interconnected network of BGCs. As more genomes are sequenced from understudied taxa, more information can be mined to highlight their potentially novel chemistry.[10]

tools based on machine learning

Evolution

The origin and evolution of metabolic gene clusters have been debated since the 1990s.[11] [12] It has since been demonstrated that metabolic gene clusters can arise in a genome by genome rearrangement, gene duplication, or horizontal gene transfer,[13] and some metabolic clusters have evolved convergently in multiple species.[14] Horizontal gene cluster transfer has been linked to ecological niches in which the encoded pathways are thought to provide a benefit.[15] It has been argued that clustering of genes for ecological functions results from reproductive trends among organisms, and goes on to contribute to accelerated adaptation by increasing refinement of complex functions in the pangenome of a population.[16]

Notes and References

  1. Schläpfer P, Zhang P, Wang C, Kim T, Banf M, Chae L, Dreher K, Chavali AK, Nilo-Poyanco R, Bernard T, Kahn D, Rhee SY . Genome-Wide Prediction of Metabolic Enzymes, Pathways, and Gene Clusters in Plants . Plant Physiology . 173 . 4 . 2041–2059 . April 2017 . 28228535 . 5373064 . 10.1104/pp.16.01942 .
  2. Miller BL, Miller KY, Roberti KA, Timberlake WE . Position-dependent and -independent mechanisms regulate cell-specific expression of the SpoC1 gene cluster of Aspergillus nidulans . Molecular and Cellular Biology . 7 . 1 . 427–34 . January 1987 . 3550422 . 365085 . 10.1128/MCB.7.1.427 .
  3. Banf M, Zhao K, Rhee SY . METACLUSTER-an R package for context-specific expression analysis of metabolic gene clusters . Bioinformatics . 35 . 17 . 3178–3180 . September 2019 . 30657869 . 6735823 . 10.1093/bioinformatics/btz021 .
  4. Cimermancic P, Medema MH, Claesen J, Kurita K, Wieland Brown LC, Mavrommatis K, Pati A, Godfrey PA, Koehrsen M, Clardy J, Birren BW, Takano E, Sali A, Linington RG, Fischbach MA . Insights into secondary metabolism from a global analysis of prokaryotic biosynthetic gene clusters . Cell . 158 . 2 . 412–421 . July 2014 . 25036635 . 4123684 . 10.1016/j.cell.2014.06.034 .
  5. Book: Slot JC . Fungal Phylogenetics and Phylogenomics . Fungal Gene Cluster Diversity and Evolution . Advances in Genetics . 100 . 141–178 . 2017 . 29153399 . 10.1016/bs.adgen.2017.09.005 . 978-0-12-813261-6 .
  6. Wisecaver JH, Borowsky AT, Tzin V, Jander G, Kliebenstein DJ, Rokas A . A Global Coexpression Network Approach for Connecting Genes to Specialized Metabolic Pathways in Plants . The Plant Cell . 29 . 5 . 944–959 . May 2017 . 28408660 . 5466033 . 10.1105/tpc.17.00009 .
  7. Gluck-Thaler E, Slot JC . Specialized plant biochemistry drives gene clustering in fungi . The ISME Journal . 12 . 7 . 1694–1705 . June 2018 . 29463891 . 6018750 . 10.1038/s41396-018-0075-3 . 2018ISMEJ..12.1694G .
  8. Pascal-Andreu V, Augustijn H, van den Berg K, van der Hooft J, Fischbach M, Medema M. 2020. BiG-MAP: an automated pipeline to profile metabolic gene cluster abundance and expression in microbiomes . bioRxiv. 6. 5. e00937-21. 10.1101/2020.12.14.422671 . 8547482. 34581602.
  9. Ondov B, Treangen T, Melsted P, Mallonee A, Bergman N, Koren S, Phillippy A. 2016. Mash: fast genome and metagenome distance estimation using MinHash . Genome Biology. 17 . 32. 14. 10.1186/s13059-016-0997-x . 4915045. 27323842. free.
  10. Kautsar . Satria A . van der Hooft . Justin J J . de Ridder . Dick . Medema . Marnix H . BiG-SLiCE: A highly scalable tool maps the diversity of 1.2 million biosynthetic gene clusters . GigaScience . 13 January 2021 . 10 . 1 . giaa154 . 10.1093/gigascience/giaa154. 33438731 . 7804863 . free .
  11. Lawrence. Jeffrey G.. Roth. John R.. 1996-08-01. Selfish Operons: Horizontal Transfer May Drive the Evolution of Gene Clusters. Genetics. en. 143. 4. 1843–1860. 10.1093/genetics/143.4.1843 . 0016-6731. 8844169. 1207444.
  12. Pál. Csaba. Hurst. Laurence D. 2004-06-01. Evidence against the selfish operon theory. Trends in Genetics. en. 20. 6. 232–234. 10.1016/j.tig.2004.04.001. 15145575 .
  13. Reynolds. Hannah T.. Vijayakumar. Vinod. Gluck-Thaler. Emile. Korotkin. Hailee Brynn. Matheny. Patrick Brandon. Slot. Jason C.. 2018. Horizontal gene cluster transfer increased hallucinogenic mushroom diversity. Evolution Letters. en. 2. 2. 88–101. 10.1002/evl3.42. 2056-3744. 6121855. 30283667.
  14. Slot. Jason C.. Rokas. Antonis. 2010-06-01. Multiple GAL pathway gene clusters evolved independently and by different mechanisms in fungi. Proceedings of the National Academy of Sciences. 107. 22. 10136–10141. 10.1073/pnas.0914418107 . 20479238 . 2890473. 2010PNAS..10710136S . free .
  15. Greene. George H.. McGary. Kriston L.. Rokas. Antonis. Slot. Jason C.. January 2014. Ecology drives the distribution of specialized tyrosine metabolism modules in fungi. Genome Biology and Evolution. 6. 1. 121–132. 10.1093/gbe/evt208. 1759-6653. 3914699. 24391152.
  16. 2019-10-01. Metabolic gene clusters, fungal diversity, and the generation of accessory functions. Current Opinion in Genetics & Development. en. 58-59. 17–24. 10.1016/j.gde.2019.07.006. 0959-437X. Slot . Jason C. . Gluck-Thaler . Emile . 31466036 . 201674539 . free.