Renaissance Computing Institute Explained

Renaissance Computing Institute (RENCI)
Established:2004
Research Field:data science and cyberinfrastructure; environmental sciences; biomedical and health sciences
Director:Stanley C. Ahalt, PhD
City:Chapel Hill
State:NC
Affiliations:University of North Carolina at Chapel Hill
Website:renci.org

Renaissance Computing Institute (RENCI) was launched in 2004 as a collaboration involving the State of North Carolina, University of North Carolina at Chapel Hill (UNC-CH), Duke University, and North Carolina State University. RENCI is organizationally structured as a research institute within UNC-CH, and its main campus is located in Chapel Hill, NC, a few miles from the UNC-CH campus. RENCI has engagement centers at UNC-CH, Duke University (Durham), and North Carolina State University (Raleigh).

RENCI's founding director was Daniel A. Reed; Stanley C. Ahalt is the current director. RENCI employs over 80 staff members.

Mission statement

RENCI's current mission is: "to develop and deploy advanced technologies to enable research discoveries and practical innovations."[1] RENCI achieves its mission by partnering with academic researchers, governmental policy makers, and industry leaders to engage in research and development aimed at solving critical challenges in several focus areas: data science and cyberinfrastructure; environmental sciences; and biomedical and health sciences.

History

RENCI was founded in January 2004 by Daniel A. Reed, PhD, with funding from the State of North Carolina, UNC-CH, North Carolina State University, and Duke University.[2] [3] Dr. Reed formerly served as director of the National Center for Supercomputing Applications (NCSA), Chief Architect for the National Science Foundation (NSF) TeraGrid initiative, and Member of the President's Information Technology Advisory Committee. In May 2004, Alan Blatecky joined RENCI as deputy director. Mr. Blatecky formerly served as executive director of the San Diego Supercomputer Center and head of the NSF Middleware initiative.

RENCI's initial mission statement was:

to serve as a multidisciplinary institute bridging academe, commerce and society to enrich and empower human potential, create multi-institutional partnerships, and develop and deploy world-leading computational infrastructure.

In December 2005, RENCI received $5.9M in funding from the State of North Carolina for FY2005-2006 and $11.8M in recurring funds for "staff support, computer operations and equipment." This funding was critical for RENCI as it developed a statewide infrastructure to create a virtual organization and leverage that infrastructure and the expertise of RENCI staff in order to engage in federally funded projects of interest to the State. RENCI's initial focus was on applying cyber technologies and advanced analytics to coastal disaster planning, mitigation, and response. RENCI has since engaged in diverse partnerships throughout North Carolina and across the nation. Those partnerships have yielded numerous federal grant awards, thus providing the organization with an additional revenue stream.

RENCI underwent a change in leadership in 2007, with the departure of Dr. Reed and the appointment of Mr. Blatecky as interim director. RENCI implemented its first ever strategic planning process during this time. The process led to a revised mission statement:

The Renaissance Computing Institute, a multi-institutional organization, brings together multidisciplinary experts and advanced technological capabilities to address pressing research issues and to find solutions to complex problems that affect the quality of life in North Carolina, our nation and the world.

In 2009, Stanley C. Ahalt, PhD, was appointed to the position of director. Dr. Ahalt previously served as executive director of the Ohio Supercomputer Center (OSC) and was a professor in the Department of Electrical and Computer Engineering at Ohio State University (OSU). Upon arriving at RENCI, Dr. Ahalt received a joint appointment as professor in the department of computer science at UNC-CH.

Ashok Krishnamurthy, PhD, was appointed as deputy director in February 2013. Dr. Krishnamurthy was previously the director of research and scientific development at OSC and associate professor in the Department of Computer and Electrical Engineering at OSU.

Under the leadership of Drs. Ahalt and Krishnamurthy, RENCI expanded its staff numbers, external partners, and breadth of activities. Several key partnerships and initiatives have been launched. The first is a partnership with the School of Medicine at UNC-CH on a National Institutes of Health (NIH) Center for Translational and Clinical Science award, which led to the establishment of the North Carolina Translational and Clinical Sciences Institute in 2008 (NC TraCS). Drs. Ahalt and Krishnamurthy serve as director and co-director, respectively, of the Biomedical Informatics Service within NC TraCS. A second key activity was the founding of the Water Science Software Institute (WSSI), which was co-founded by RENCI and the National Socio-Environmental Synthesis Center (SESYNC) in September 2012. A third key activity was the creation of the National Consortium for Data Science (NCDS) in February 2013. The NCDS is headquartered at RENCI and includes members drawn from academics, industry, and government. Finally, a fourth key activity was the establishment of the iRODS Consortium in March 2013. The iRODS Consortium also is headquartered at RENCI and includes a diverse international membership.

Current leadership

Key research and development focus areas and technologies

Data science and cyberinfrastructure

RENCI has a number of active research programs that are aimed at developing and deploying advanced computing and networking capabilities. Many of the resultant technologies are open source. For example, the open source ExoGENI (Exo-Global Environment for Network Innovation) is being developed as part of the NSF-funded GENI initiative.[4] [5] ExoGENI functions as a federated, cloud-based Networked Infrastructure-as-a-Service (NIaaS) platform for dynamic provisioning of networking, storage, and compute resources. ADAMANT (Adaptive Data-Aware Multi-domain Application Network Topologies), also funded by the NSF, builds upon ExoGENI. ADAMANT integrates the Pegasus (workflow management) and HT Condor scientific workflow system with the ExoGENI NIaaS platform to orchestrate the execution of large-scale scientific workflows over distributed cloud or traditional high-performance computing resources. iRODS (integrated Rule-Oriented Data System) was developed by the Data Intensive Cyber Environments (DICE) Centers at UNC-CH and the University of California, San Diego and is currently maintained by RENCI. iRODS is an open source middleware technology designed to provide policy-based control over data access, movement, use, and archiving across geographical sites, disparate storage technologies, and multiple user groups, each with varying policies regarding data access and use.[6] [7] [8] [9] RADII (Resource Aware Data-centric collaborative Infrastructure; web citation) integrates GENI's ORCA (Open Resource Control Architecture) with iRODS to dynamically provision a distributed cloud-based infrastructure for multi-institutional, data-driven research collaborations. RADII accomplishes this through software designed to model research data and map data elements, computations, and storage onto the underlying physical infrastructure of iRODS. DataBridge aims to provide a multi-dimensional sociometric network system for sharing long-tail data collections.[10] [11] [12] DataBridge is an open source collaboration tool that allows scientists to explore available data sets and their relevant algorithms and define semantic bridges to link to and access diverse data sets within the sociometric network.

Environmental sciences

Many of RENCI's projects in the Environmental Sciences focus on hydrology, coastal storm surges, and advanced modeling to assist in disaster preparedness. ADCIRC is an open source software model that applies advanced analytics to multiple data sources and types (e.g., hydrology data sets, atmospheric data sets, tropical storm forecasting data, Geographic Information System data, etc.) to enable real-time, high-resolution prediction of the impact of coastal storm surges and flooding after hurricanes and related events.[13] [14] In collaboration with researchers at the UNC Coastal Resilience Center and the National Hurricane Center, ADCIRC is being developed as a coastal forecasting system to assist with state and federal disaster planning and decision support. EarthCube is an NSF-funded initiative that aims "to develop a framework over the next decade to assist researchers in understanding and predicting the Earth system from the Sun to the center of the Earth."[15] [16] EarthCube is being designed as an open dynamic cyberinfrastructure to enable community-governed data sharing across the geosciences, including ocean science, polar studies, atmospheric science, geospace, computer science, and other fields. HydroShare is supported by the NSF-funded CUAHSI (Consortium of Universities for the Advancement of Hydrologic Science Inc.) and is under development as an open collaboration cyberinfrastructure for hydrology.[17] [18] [19] HydroShare allows water scientists to identify and retrieve water-related data sets and associated algorithms and models and then analyze and compute on the data using a distributed computing environment that includes grid-based cloud and high-performance computing and storage capabilities

Biomedical and health sciences

A major focus of RENCI's work in the Biomedical and Health Sciences is clinical genomics. RENCI works with NC TraCS, the Lineberger Comprehensive Cancer Center at UNC-CH, and UNC's Information Technology Services Research Computing Division to develop and implement technologies to support next-generation genomic sequencing technologies, such as Whole Genome Sequencing (WGS) and Whole Exome Sequencing (WES). These technologies include the GMW (Genetic Medical Workflow) Engine, which was funded in part by the NIH and provides end-to-end capture, analysis, validation, and reporting of WGS and WES data. The GMW Engine is designed as open source architecture that coordinates workflows, sub-workflows, samples, data, and people to support all aspects of genomics research and clinical application, from the initial patient visit to the physician-guided reporting of genomic findings.[20] MapSeq (Masively Parallel Sequencing) is an open source plugin-based Service-Oriented Architecture (SOA) that provides secure management and execution of the complex downstream computational and analytical steps involved in high-throughput genomic sequencing and other data-intensive applications.[21] MaPSeq and its homegrown sister technology, GATE (Grid Access Triage Engine), are built on top of Apache Karaf and together provide extensible capabilities for downstream analysis of genomic data and other large data sets, including workflow pipeline execution and management, meta-scheduling of workflow jobs, opportunistic use of compute resources, secure data transfer, and web-based client access. CANVAS (CAroliNa Variant Annotation Store) and AnnoBot (Annotation Bot) work together to provide version-controlled annotation and metadata for genomic variant data in order to support up-to-date clinical interpretation of genomic variants and thereby guide clinical decision making.[22] CANVAS is designed as an open source, relational PostgreSQL relational database that stores genomic variant data with associated annotation and metadata. AnnoBot consists of Python modules and software driver code configured to provide automated monitoring and retrieval of external data sources for annotation updates. CHAT (Convergent Haplotype Association Tagging) is a software algorithm that allows for the identification of moderately penetrant genomic variants using cross-population genetic structures. CHAT invokes a graph theory–based algorithm to determine the haplotype phase of a population of unrelated individuals by: identifying subsets of individuals that share a region of the genome through descent; and then generating a consensus haplotype for the shared region.[23] The SMW (Secure Medical Workspace) provisions a secure environment for access to sensitive patient data for clinical care or Institutional Review Board–approved clinical research.[24] [25] The open source SMW architecture uses virtualization technology (i.e., VMWare) and Data Leakage Protection (DLP) technology (i.e., WebSense) to create a secure virtual workspace coupled with the ability to prevent (or allow with a challenge and auditing by Information Technology staff) the physical removal of data from a central, secure storage environment.

Institutes and consortiums

RENCI pioneered the establishment of a national institute, the WSSI, and two major consortiums, the iRODS Consortium and the NCDS.

WSSI

The NSF-funded WSSI was established in September 2012 as a collaboration between RENCI and SESYNC. The mission of the WSSI is to "enable and accelerate new transformative water science by concurrently transforming both the software culture and the research culture of the water science community."[26] [27] When it is fully operational, the WSSI aims to operate under the Open Community Engagement Model, which will integrate multiple NSF-funded initiatives (Synthesis Centers, Environmental Observatories, Software Sustainability Institutes, etc.) to distill data, ideas, theories, and methods and thereby provide synthetic information to address water science challenges that cannot be addressed using traditional disciplinary methods. The activities of the WSSI focus on the development of an open community and the promotion of open source and agile software development in order accelerate transformative water science research. In addition to RENCI and SESYNC, current members include the Institute for the Environment at UNC-CH, University of Illinois Urbana-Champaign, University of Michigan, University of Maryland, NCSA, RedHat, National Oceanic and Atmospheric Administration, and IBM.

NCDS

The NCDS was established by RENCI in February 2013 as a public/private partnership of leading universities, governmental and non-profit agencies, and businesses devoted to advancing data science, which the NCDS defines as "the systematic study of the organization and use of digital data in order to accelerate discovery, improve critical decision-making processes, and enable a data-driven economy."[28] The mission of the NCDS is "to provide the foundation needed to advance data science research, education, and economic opportunity." The NCDS works toward this mission by providing intellectual leadership and hosting numerous workshops, an academic-industry faculty fellowship, a Data Matters Summer Short Course series, student career events, invited talks, and summit meetings. In addition, the NCDS sponsors a Data Observatory, which provides a shared federated infrastructure for data sharing and computing. The NCDS also partners with numerous regional efforts in data science, including Datapalooza, Triangle Open Data Day, Pearl Hacks, Data4Decisions, Analytics Forward UnConference, and others. As of June 2015, the NCDS comprises 15 member organizations, with 8 based in North Carolina and 4 multinational companies with a strong presence in the Research Triangle Park, NC area.

iRODS Consortium

The iRODS Consortium was founded by RENCI in March 2013 and is headquartered at RENCI, as is the main iRODS development team. The mission of the consortium is "to ensure the sustainability of the integrated Rule-Oriented Data System (iRODS) and to further its adoption and continued evolution."[29] To achieve its mission, the consortium works to develop standards for the open source iRODS technology and its future development, promote advancements for the technology, and expand the user base. The consortium also supports the development of a mission-critical, production-level version of iRODS (currently v4.1). The iRODS Consortium includes a diverse membership of iRODS user organizations from around the world. Current consortium members include RENCI, the DICE Centers at UNC-CH and the University of California, San Diego, DataDirect Networks, Seagate Technology, Wellcome Trust Sanger Institute, EMC Corporation (EMC2), IBM, and NASA's Atmospheric Science Data Center.

External links

35.9396°N -79.0188°W

Notes and References

  1. RENCI Website, renci.org.
  2. The University of North Carolina at Chapel Hill. (2004-2010). Annual Financial Reports. Chapel Hill, North Carolina: University of North Carolina at Chapel Hill. Available at: http://research.unc.edu/offices/vice-chancellor/about/annual-reports.
  3. The University of North Carolina at Chapel Hill. (2011-2014). Comprehensive Annual Financial Reports. Chapel Hill, North Carolina: University of North Carolina at Chapel Hill. Available at: http://finance.unc.edu/reports-and-data/financial-statements-archive.
  4. Baldin, I., Ruth, P., Xin, Y., Mandal, A., Chase, J., Tilson, J., & Prasad. S. (2013). Visions of a Future Internet: The ExoGENI Example. RENCI/NCDS, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA. .
  5. Baldin, I., Xin, Y., Mandal, A., Ruth, P., Heerman, C., & Chase, J. (2012). ExoGENI: A Multi-Domain Infrastructure-as-a-Service Testbed. Proceedings of the 8th International ICST Conference on Testbeds and Research Infrastructures for the Development of Networks and Communities (TridentCom). Available at: https://www.cs.duke.edu/~chase/exogeni.pdf.
  6. Rajasekar . A. . Moore . R. . Hou . C. . Lee . C. A. . Marciano . R. . de Torcy . A. . Wan . M. . Schroeder . W. . Chen . S. . Gilbert . L. . Tooby . P. . Zhu . B. . 2010a . iRODS Primer: Integrated Rule-Oriented Data System . Synthesis Lectures on Information Concepts, Retrieval, and Services . 2 . 1. 1–143 . 10.2200/s00233ed1v01y200912icr012.
  7. Rajasekar, A., Moore, R., Wan, M., Schroeder, W., & Hasan, A. (2010b). Applying Rules as Policies for Large-Scale Data Sharing. Intelligent Systems, Modelling and Simulation (ISMS), 2010 International Conference on Intelligent Systems, Modelling and Simulation. Available at: https://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5416072.
  8. Schmitt, C., Wilhelmsen, K., Krishnamurthy, A., Ahalt, S. & Fecho, K. (2013b). Security and Privacy in the Era of Big Data: iRODS, a Technological Solution to the Challenge of Implementing Security and Privacy Policies and Procedures. RENCI, University of North Carolina at Chapel Hill. Available at: http://www.renci.org/wp-content/uploads/2014/02/0313WhitePaper-iRODS.pdf.
  9. Fortner, B., Ahalt, S., Coposky, J., Fecho, K., Krishnamurthy, A., Moore, R., Rajasekar, A., Schmitt, C., & Schroeder, W. (2014). Control Your Data: iRODS, the integrated Rule-Oriented Data System. RENCI, University of North Carolina at Chapel Hill. Available at: http://renci.org/wp-content/uploads/2014/07/0214WhitePaper-iRODS-2-FINAL-v6.pdf.
  10. Rajasekar, A., Kum, H., Crosas, M., Crabtree, J., Sankaran, S., Lander, H., Carsey, T., King, G., & Zhan, J. (2013a). The DataBridge. Science Journal, ASE. Available at: http://databridge.web.unc.edu/files/2013/01/DataBridge_Journal_Final.pdf.
  11. Rajasekar, A., Sankaran, S., Lander, H., Carsey, T., Crabtree, J., Kum, H., Crosas, M., King, G., & Zhan, J. (2013b). Sociometric Methods for Relevancy Analysis of Long Tail Science Data. ASE/IEEE International Conference on Big Data. Available at: http://databridge.web.unc.edu/files/2013/01/Databridge-ConferenceVersion-final.pdf.
  12. Crabtree, J. (2013). DataBridge: Building an e-Science Collaboration Environment Tool for Linking Diverse Datasets into a Sociometric Network. IASSIST 2013. Available at: http://databridge.web.unc.edu/files/2013/07/IASSIST2013DataBridgeFinal.ppt.
  13. Luettich, R. A., Westerink, J. J., & Scheffner, N. W. (1992). ADCIRC: an Advanced Three-dimensional Circulation Model for Shelves Coasts and Estuaries, Report 1: Theory and Methodology of ADCIRC-2DDI and ADCIRC-3DL. Dredging Research Program Technical Report DRP-92-6. U.S. Army Waterways Experiment Station, Vicksburg, MS. Available at: http://www.dtic.mil/dtic/tr/fulltext/u2/a261608.pdf.
  14. Westerink . J. . Luettich . R. . Feyen . J. . Atkinson . J. . Dawson . C. . Roberts . H. . Powell . M. . Dunion . J. . Kubatko . E. . Pourtaheri . H. . 2008 . A Basin- to Channel-scale Unstructured Grid Hurricane Storm Surge Model Applied to Southern Louisiana . Monthly Weather Review . 136 . 3. 833–864 . 10.1175/2007MWR1946.1. 2008MWRv..136..833W . free .
  15. Caron, B. (2011). EarthCube Governance Whitepaper: Realizing Expectable Returns on EarthCube Investments in Community Building and Democratic Governance. New Media Research Institute, Santa Barbara, CA. Available at: http://semanticommunity.info/@api/deki/files/13792/=004_Caron.pdf.
  16. [Yolanda Gil|Gil, Y.]
  17. Tarboton, D. G., Idaszak, R., Horsburgh, J. S., Heard, J., Ames, D., Goodall, J. L., Band, L. E., Merwade, V., Couch, A., Arrigo, J., Hooper, R., Valentine, D., & Maidment, D. (2014a). A Resource Centric Approach for Advancing Collaboration Through Hydrologic Data and Model Sharing. 11th International Conference on Hydroinformatics, HIC 2014, New York City, USA. Available at: http://www.hic2014.org/proceedings/bitstream/handle/123456789/1539/1566.pdf?sequence=1&isAllowed=y.
  18. Tarboton, D. G., Idaszak, R., Horsburgh, J. S., Heard, J., Ames, D., Goodall, J. L., Band, L. E., Merwade, V., Couch, A., Arrigo, J., Hooper, R., Valentine, D., & Maidment, D. (2014b). HydroShare: Advancing Collaboration through Hydrologic Data and Model Sharing. In D. P. Ames, N. W. T. Quinn, and A. E. Rizzoli (eds), Proceedings of the 7th International Congress on Environmental Modelling and Software. San Diego, California: International Environmental Modelling and Software Society (iEMSs). . http://www.iemss.org/sites/iemss2014/papers/iemss2014_submission_243.pdf.
  19. Heard, J., Tarboton, D., Idaszak, R., Horsburgh, J., Ames, D., Bedig, A., Castronova, A. M., Couch, A., Dash, P., Frisby, C., Gan, T., Goodall, J., Jackson, S., Livingston, S., Maidment, D., Martin, N., Miles, B., Mills, S., Sadler, J., Valentine, D., & Zhao, L. (2014). An Architectural Overview of Hydroshare, A Next-Generation Hydrologic Information System. 11th International Conference on Hydroinformatics, HIC 2014, New York City, USA. Available at: http://www.hic2014.org/proceedings/bitstream/handle/123456789/1536/1562.pdf?sequence=1&isAllowed=y.
  20. Owen, P., Ahalt, S., Berg, J., Coyle, J., Evans, J., Fecho, K., Gillis, D., Schmitt, C., Young, D. & Wilhelmsen, K. (2014): Technologies for Genomic Medicine: The GMW, A Genetic Medical Workflow Engine. RENCI, University of North Carolina at Chapel Hill. . Available at: http://renci.org/technical-reports/tr-14-02-the-gmw-a-genetic-medical-workflow-engine.
  21. Reilly, J., Ahalt, S., Fecho, K., Jones, C., McGee, J., Roach, J., Schmitt, C., & Wilhelmsen, K. (2014). Technologies for Genomic Medicine: MaPSeq, A Computational and Analytical Workflow Manager for Downstream Genomic Sequencing. RENCI, University of North Carolina at Chapel Hill. . Available at: http://renci.org/technical-reports/mapseq-computational-and-analytical-workflow-manager.
  22. Bizon, C., Ahalt, S., Fecho, K, Nassar, N., Schmitt, C., Scott, E., & Wilhelmsen, K. (2014). Technologies for Genomic Medicine: CANVAS and AnnoBot, Solutions for Genomic Variant Annotation. RENCI, University of North Carolina at Chapel Hill. . Available at: http://renci.org/technical-reports/tr-14-04-canvas-and-annobot-solutions-for-genomic-variant-annotation.
  23. Webb, A. E. (2011). Linkage, Association, And Haplotype Analysis: A Spectrum Of Approaches To Elucidate The Genetic Influences Of Complex Human Traits (Doctoral dissertation). Retrieved from UNC electronic theses and dissertations collection. (cdm 3992)
  24. Schmitt, C., Shoffner, M., Owen P., Wang, X., Lamm, B., Mostafa, J., Barker, M., Krishnamurthy, A., Wilhelmsen, K., Ahalt, S., & Fecho, K. (2013a). Security and Privacy in the Era of Big Data: The SMW, a Technological Solution to the Challenge of Data Leakage. RENCI, University of North Carolina at Chapel Hill. Available at: http://www.renci.org/wp-content/uploads/2014/02/0213WhitePaper-SMW.pdf.
  25. Shoffner . M. . Owens . P. . Mostafa . J. . Lamm . B. . Wang . X. . Schmitt . C. P. . Ahalt . S. P. . 2014 . The Secure Medical Research Workspace: An IT Infrastructure to Enable Secure Research on Clinical Data . Clinical and Translational Science . 6 . 3. 222–225 . 3682797 . 23751029 . 10.1111/cts.12060.
  26. Ahalt, S., Minsker, B., Tiemann, M., Band, L., Palmer, M., Idaszak, R., Lenhardt, C., & Whitton, M. (2013). Water Science Software Institute: An Open Source Engagement Process. Proceedings of the 5th International Workshop on Software Engineering for Computational Science and Engineering, 40-47. Available at: http://waters2i2.org/documents/2013/05/water-science-software-institute-an-open-source-engagement-approach.pdf.
  27. Lenhardt, W.C., Ahalt, S., Jones, M., Aukema, J., Hampton, S., Idaszak, R., Rebich-Hespanh, S., & Schildhauer, M. (2014). ISEES-WSSI Lessons for Sustainable Science Software from an Early Career Training Institute on Open Science Synthesis. figshare. Available at: http://files.figshare.com/1796332/ISEES_WSSI_TrainingInst_REV_20141115.pdf.
  28. Ahalt, C. S., Bizon, C., Evans, J., Erlich, Y., & Ginsburgh, G. S., Krishnamurthy, A., Lange, L., Maltbie, D., Masys, D., Schmitt, C., Wilhelmsen, K. (2014). Data to Discovery: Genomes to Health. A White Paper from the National Consortium for Data Science. RENCI, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA. Available at: doi: 10.7921.G03X84K4. Available at: http://data2discovery.org/dev/wp-content/uploads/2014/02/NCDS-Summit-2013.pdf.
  29. iRODS Consortium. By Laws. December 01, 2014. Available at: http://irods.org/wp-content/uploads/2014/12/iRODS_ByLaws_V1120114.pdf.