High-performance Integrated Virtual Environment explained

The High-performance Integrated Virtual Environment (HIVE) is a distributed computing environment used for healthcare-IT and biological research, including analysis of Next Generation Sequencing (NGS) data, preclinical, clinical and post market data, adverse events, metagenomic data, etc.[1] Currently it is supported and continuously developed by US Food and Drug Administration (government domain), George Washington University (academic domain), and by DNA-HIVE, WHISE-Global and Embleema (commercial domain). HIVE currently operates fully functionally within the US FDA supporting wide variety (+60) of regulatory research and regulatory review projects as well as for supporting MDEpiNet medical device postmarket registries. Academic deployments of HIVE are used for research activities and publications in NGS analytics, cancer research, microbiome research and in educational programs for students at GWU. Commercial enterprises use HIVE for oncology, microbiology, vaccine manufacturing, gene editing, healthcare-IT, harmonization of real-world data, in preclinical research and clinical studies.

Infrastructure

HIVE is a massively parallel distributed computing environment where the distributed storage library and the distributed computational powerhouse are linked seamlessly.[2] The system is both robust and flexible due to maintaining both storage and the metadata database on the same network.[3] The distributed storage layer of software is the key component for file and archive management and is the backbone for the deposition pipeline. The data deposition back-end allows automatic uploads and downloads of external datasets into HIVE data repositories. The metadata database can be used to maintain specific information about extremely large files ingested into the system (big data) as well as metadata related to computations run on the system. This metadata then allows details of a computational pipeline to be brought up easily in the future in order to validate or replicate experiments. Since the metadata is associated with the computation, it stores the parameters of any computation in the system eliminating manual record keeping.

Differentiating HIVE from other object oriented databases is that HIVE implements a set of unified APIs to search, view, and manipulate data of all types. The system also facilitates a highly secure hierarchical access control and permission system, allowing determination of data access privileges in a finely granular manner without creating a multiplicity of rules in the security subsystem. The security model, designed for sensitive data, provides comprehensive control and auditing functionality in compliance with HIVE's designation as a FISMA Moderate system.[4]

HIVE technological capabilities

HIVE open source

FDA launched HIVE Open Source as a platform to support end to end needs for NGS analytics. https://github.com/FDA/fda-hive

HIVE biocompute harmonization platform is at the core of High-throughput Sequencing Computational Standards for Regulatory Sciences (HTS-CSRS) project. Its mission is to provide the scientific community with a framework to harmonize biocomputing, promote interoperability, and verify bioinformatics protocols (https://hive.biochemistry.gwu.edu/htscsrs). For more information, see the project description on the FDA Extramural Research page (https://www.fda.gov/ScienceResearch/SpecialTopics/RegulatoryScience/ucm491893.htm

HIVE architecture

Sub-clusters of scalable high performance high density compute cores are there to serve as a powerhouse for extra-large distributed parallelized computations of NGS algorithmics. System is extremely scalable and has deployment instances ranging from a single HIVE in a box appliance to massive enterprise level systems of thousands of compute units.

Public Presentations

External links

Notes and References

  1. Simonyan . Vahan . Mazumder . Raja . High-Performance Integrated Virtual Environment (HIVE) Tools and Applications for Big Data Analysis . Genes . 5 . 4 . 957–81 . 2014 . 25271953 . 4276921 . 10.3390/genes5040957 . free .
  2. https://hive.biochemistry.gwu.edu/help/HIVEWhitePaper_12_16_2014.pdf{{full citation needed|date=February 2016}}
  3. https://hive.biochemistry.gwu.edu/help/HIVEInfrastructuresUK.pdf{{full citation needed|date=February 2016}}
  4. Wilson . C. A. . Simonyan . V. . FDA's Activities Supporting Regulatory Application of 'Next Gen' Sequencing Technologies . PDA Journal of Pharmaceutical Science and Technology . 68 . 6 . 626–30 . 2014 . 25475637 . 10.5731/pdajpst.2014.01024 . 37583755 .
  5. Web site: NIH Login User Name and Password or PIV Card Authentication . 2016-02-01 . 2016-01-01 . https://web.archive.org/web/20160101054032/https://datascience.nih.gov/community/datascience-at-nih/frontiers . dead .
  6. Web site: NIH VideoCast - High-Performance Integrated Virtual Environment (HIVE): A regulatory NGS data analysis platform. 29 January 2016 .
  7. Web site: NIH Login User Name and Password or PIV Card Authentication . 2016-02-01 . 2016-01-01 . https://web.archive.org/web/20160101054032/https://datascience.nih.gov/community/datascience-at-nih/frontiers#title4 . dead .
  8. Web site: 2014-BIT-Brochure. Staff. 2014. 2014 Bio-IT World Expo. Cambridge Healthtech Institute. 6 (col 2). 15 June 2016. (title) High-Performance Integrated Virtual Environment (HIVE) Infrastructure for Big-Data Analysis: Applications to Next-Gen Sequencing Informatics. registration .
  9. http://fedscoop.com/fdas-examines-nextgen-sequencing-too{{full citation needed|date=February 2016}}l
  10. Web site: Bio-IT World.