Bioinformatics workflow management system explained

A bioinformatics workflow management system is a specialized form of workflow management system designed specifically to compose and execute a series of computational or data manipulation steps, or a workflow, that relate to bioinformatics.

There are currently many different workflow systems. Some have been developed more generally as scientific workflow systems for use by scientists from many different disciplines like astronomy and earth science. All such systems are based on an abstract representation of how a computation proceeds in the form of a directed graph, where each node represents a task to be executed and edges represent either data flow or execution dependencies between different tasks. Each system typically provides a visual front-end, allowing the user to build and modify complex applications with little or no programming expertise.[1] [2] [3]

Examples

In alphabetical order, some examples of bioinformatics workflow management systems include:

A functional workflow language for large-scale data analysis[7]

initially targeted at genomics[8]

Comparisons between workflow systems

With a large number of bioinformatics workflow systems to choose from,[13] it becomes difficult to understand and compare the features of the different workflow systems. There has been little work conducted in evaluating and comparing the systems from a bioinformatician's perspective, especially when it comes to comparing the data types they can deal with, the in-built functionalities that are provided to the user or even their performance or usability. Examples of existing comparisons include:

Notes and References

  1. Oinn . T. . Greenwood . M. . Addis . M. . Alpdemir . M. N. . Ferris . J. . Glover . K. . Goble . C. . Carole Goble. Goderis . A. . Hull . D. . 10.1002/cpe.993 . Marvin . D. . Li . P. . Lord . P. . Pocock . M. R. . Senger . M. . Stevens . R. . Wipat . A. . Wroe . C. . Taverna: Lessons in creating a workflow environment for the life sciences . Concurrency and Computation: Practice and Experience . 18 . 10 . 1067–1100 . 2006 . 10219281 .
  2. Yu . J. . Buyya . R. . 10.1145/1084805.1084814 . A taxonomy of scientific workflow systems for grid computing . ACM SIGMOD Record . 34 . 3 . 44 . 2005 . 10.1.1.63.3176 . 538714 .
  3. Book: Curcin . V. . Ghanem . M. . 2008 Cairo International Biomedical Engineering Conference . Scientific workflow systems - can one size fit all? . 10.1109/CIBEC.2008.4786077 . 1–9 . 2008 . 978-1-4244-2694-2 . 1885579 .
  4. Web site: Anduril workflow website.
  5. Ovaska. Kristian. Laakso. Marko. Haapa-Paananen. Saija. Louhimo. Riku. Chen. Ping. Aittomäki. Viljami. Valo. Erkka. Núñez-Fontarnau. Javier. Rantanen. Ville. 2010-09-07. Large-scale data integration framework provides a comprehensive view on glioblastoma multiforme. Genome Medicine. 2. 9. 65. 10.1186/gm186. 1756-994X. 3092116. 20822536 . free .
  6. Elhai . J. . Taton . A. . Massar . J. . Myers . J. K. . Travers . M. . Casey . J. . Slupesky . M. . Shrager . J. . 10.1093/nar/gkp354 . BioBIKE: A Web-based, programmable, integrated biological knowledge base . Nucleic Acids Research . 37 . Web Server issue . W28–W32 . 2009 . 19433511. 2703918 .
  7. Brandt . Jörgen . Bux . Marc N. . Leser . Ulf. Cuneiform: A functional language for large scale scientific data analysis. Proceedings of the Workshops of the EDBT/ICDT. 1330. 17–26. 2015.
  8. Goecks . J. . Nekrutenko . A. . Taylor . J. . Galaxy Team . T. . Galaxy: A comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences . 10.1186/gb-2010-11-8-r86 . Genome Biology . 11 . 8 . R86 . 2010 . 20738864 . 2945788 . free .
  9. 16642009. 2006. Reich. Michael. GenePattern 2.0. Nature Genetics. 38. 1. 500–5001. 10.1038/ng0506-500. 5503897. etal.
  10. 10.1016/j.compbiolchem.2007.08.009. 17931570. Workflow based framework for life science informatics. Computational Biology and Chemistry. 2007. 31. 5–6. 305–319. Tiwari. Abhishek. Sekhar. Arvind K.T..
  11. 22368248. 2012. Okonechnikov. K. Unipro UGENE: A unified bioinformatics toolkit. Bioinformatics. 28. 8. 1166–7. Golosova. O. Fursov. M. Ugene. Team. 10.1093/bioinformatics/bts091. free.
  12. Book: 10.1109/VISUAL.2005.1532788 . 2005 . 135–142. Bavoil. L.. Callahan. S.P.. Crossno. P.J.. Freire. J.. Scheidegger. C.E.. Silva. C.T.. Vo. H.T.. VIS 05. IEEE Visualization, 2005. VisTrails: Enabling Interactive Multiple-View Visualizations. 978-0-7803-9462-9.
  13. Web site: Existing Workflow systems. Common Workflow Language wiki. https://web.archive.org/web/20191017094453/https://github.com/common-workflow-language/common-workflow-language/wiki/Existing-Workflow-systems. 2019-10-17. live. 2019-10-17.
  14. Book: Abouelhoda . M. . Alaa . S. . Ghanem . M. . 10.1145/1833398.1833400 . Meta-workflows . Proceedings of the 1st International Workshop on Workflow Approaches to New Data-centric Science - Wands '10 . 1 . 2010 . 9781450301886 . 17343728 .
  15. 21999641. 3215701. 2011. Kallio. M. A.. Chipster: User-friendly analysis software for microarray and other high-throughput data. BMC Genomics. 12. 507. Tuimala. J. T.. Hupponen. T. Klemelä. P. Gentile. M. Scheinin. I. Koski. M. Käki. J. Korpelainen. E. I.. 10.1186/1471-2164-12-507. free.
  16. Leipzig . J . 2016 . A review of bioinformatic pipeline frameworks . Briefings in Bioinformatics . 18 . 3 . 530–536 . 10.1093/bib/bbw020 . 27013646 . 5429012 . vanc .