Data-centric computing explained

Data-centric computing is an emerging concept that has relevance in information architecture and data center design. It describes an information system where data is stored independently of the applications, which can be upgraded without costly and complicated data migration. This is a radical shift in information systems that will be needed to address organizational needs for storing, retrieving, moving and processing exponentially growing data sets.[1]

Background

Traditional information system architectures are based on an application-centric mindset. Traditionally, applications were installed, kept relatively static, updated infrequently, and utilized a fixed set of compute, storage, and networking elements to cope with a relatively small set of structured data.[2]

This approach functioned well for decades, but over the past decade, data growth, particularly unstructured data growth, put new pressures on organizations, information architectures and data center infrastructure. 90% of new data is unstructured and, according to a 2018 report, 59% of organizations manage over 10 billion files and objects[3] spread over large numbers of servers and storage nodes. Organizations are struggling to cope with exponential data growth while seeking better approaches to extracting insights from that data using services including Big Data analytics and machine learning. However, existing architectures aren't built to address service requirements at petabyte scale and beyond without significant performance limits.[4]

Traditional architectures fail to fully store, retrieve, move and utilize that data because due to limitations of hardware infrastructure as well as application-centric systems design, development, and management.[5]

Data-centric workloads

There are two problems data-centric computing aims to address.

  1. Organizations need to utilize all available data but traditional applications aren't sufficiently agile or flexible. New shifts toward constant service innovation, supported by emerging approaches to service delivery (including microservices and containers) open new possibilities that step away from traditional application-centric mindsets.
  2. Existing limits of data center hardware also restricts complete movement, management and utilization of unstructured data sets. Conventional CPUs are impeding performance because they do not include specialized capabilities needed for storage, networking, and analysis. Slow storage, including hard drives and SAS/SATA solid state drives over the network can reduce performance and limit data accessibility.[6] New hardware capabilities are needed.

Data-centric computing

Data-centric computing is an approach that merges innovative hardware and software to treat data, not applications, as the permanent source of value.[7] Data-centric computing aims to rethink both hardware and software to extract as much value as possible from existing and new data sources. It increases agility by prioritizing data transfer and data computation over static application performance and resilience.

Data-centric hardware and software

To meet the goals of data-centric computing, data center hardware infrastructure will evolve to address massive scale, rapid growth, the need for very high performance data movement, and extensive calculation requirements.

As far as software goes, data-centric computing accelerates the disappearance of traditional static applications.[11] Applications become short-lived, constantly added, updated, or removed as algorithms come and go. Software is redesigned to conduct analysis on all available data instead of subsets. Microservices visit data, conduct calculations and express the results of their process at speeds beyond conventional approaches.

Notes and References

  1. Web site: The Data-Centric Revolution. TDAN.com. September 2015 . en-US. 2019-12-07.
  2. Web site: The Emergence Of Data-Centric Computing. Bhageshpur. Kiran. 2016-10-06. The Next Platform. en-US. 2019-12-07.
  3. Web site: 2018 State of Unstructured Data Management. Bhagheshpur. Kiran. Igneous. dead. https://web.archive.org/web/20200718190301/https://www.igneous.io/hubfs/State%20of%20Unstructured%20Data%20Management%20Report.pdf?hsCtaTracking=b4b69840-c05a-47af-83a9-2ff709e58fee%7Ca2058246-b80b-4493-8c3a-7a86ed971b8c. July 18, 2020. December 7, 2019.
  4. Web site: Requirements for Unstructured Data at Petabyte Scale. 2019-10-14. StorageSwiss.com - The Home of Storage Switzerland. en. 2019-12-07.
  5. Web site: Data-Centric Computing with the Netezza Architecture. George S. Davidson, Kevin W. Boyack, Ron A. Zacharski, Stephen C. Helmreich, and Jim R. Cowie. April 2006. sandia.gov. December 7, 2019.
  6. Web site: The Network is the New Storage Bottleneck. States. Austin TX United. 2016-11-10. Datanami. 2019-12-07.
  7. Web site: Data-Centric Manifesto. datacentricmanifesto.org. 2019-12-07.
  8. Web site: The foundation of the computing industry's innovation is faltering. What can replace it?. Simonite. Tom. MIT Technology Review. en-US. 2019-12-07.
  9. Web site: DPU: Data Processing Unit Programmable Processor. Fungible. en-US. 2019-12-07. 2020-08-05. https://web.archive.org/web/20200805141038/https://www.fungible.com/tech/dpu-a-new-category-of-programmable-processor/. dead.
  10. Web site: When You're Implementing NVMe Over Fabrics, the Fabric Really Matters. Kieran. Mike. 2019-03-21. NetApp Blog. en-US. 2019-12-07.
  11. Web site: Microservices Momentum Accelerates. 2018-05-10. DevOps.com. en-US. 2019-12-07.