In-memory processing explained

The term is used for two different things:

  1. In computer science, in-memory processing (PIM) is a computer architecture in which data operations are available directly on the data memory, rather than having to be transferred to CPU registers first.[1] This may improve the power usage and performance of moving data between the processor and the main memory.
  2. In software engineering, in-memory processing is a software architecture where a database is kept entirely in random-access memory (RAM) or flash memory so that usual accesses, in particular read or query operations, do not require access to disk storage.[2] This may allow faster data operations such as "joins", and faster reporting and decision-making in business.[3]

Extremely large datasets may be divided between co-operating systems as in-memory data grids.

Hardware (PIM)

PIM could be implemented by:[4]

Application of in-memory technology in everyday life

In-memory processing techniques are frequently used by modern smartphones and tablets to improve application performance. This can result in speedier app loading times and more enjoyable user experiences.

Software

Disk-based data access

Data structures

With disk-based technology, data is loaded on to the computer's hard disk in the form of multiple tables and multi-dimensional structures against which queries are run. Disk-based technologies are often relational database management systems (RDBMS), often based on the structured query language (SQL), such as SQL Server, MySQL, Oracle and many others. RDBMS are designed for the requirements of transactional processing. Using a database that supports insertions and updates as well as performing aggregations, joins (typical in BI solutions) are typically very slow. Another drawback is that SQL is designed to efficiently fetch rows of data, while BI queries usually involve fetching of partial rows of data involving heavy calculations.

To improve query performance, multidimensional databases or OLAP cubes - also called multidimensional online analytical processing (MOLAP) - may be constructed. Designing a cube may be an elaborate and lengthy process, and changing the cube's structure to adapt to dynamically changing business needs may be cumbersome. Cubes are pre-populated with data to answer specific queries and although they increase performance, they are still not optimal for answering all ad-hoc queries.[9]

Information technology (IT) staff may spend substantial development time on optimizing databases, constructing indexes and aggregates, designing cubes and star schemas, data modeling, and query analysis.[10]

Processing speed

Reading data from the hard disk is much slower (possibly hundreds of times) when compared to reading the same data from RAM. Especially when analyzing large volumes of data, performance is severely degraded. Though SQL is a very powerful tool, arbitrary complex queries with a disk-based implementation take a relatively long time to execute and often result in bringing down the performance of transactional processing. In order to obtain results within an acceptable response time, many data warehouses have been designed to pre-calculate summaries and answer specific queries only. Optimized aggregation algorithms are needed to increase performance.

In-memory data access

With both in-memory database and data grid, all information is initially loaded into memory RAM or flash memory instead of hard disks. With a data grid processing occurs at three order of magnitude faster than relational databases which have advanced functionality such as ACID which degrade performance in compensation for the additional functionality. The arrival of column centric databases, which store similar information together, allow data to be stored more efficiently and with greater compression ratios. This allows huge amounts of data to be stored in the same physical space, reducing the amount of memory needed to perform a query and increasing processing speed. Many users and software vendors have integrated flash memory into their systems to allow systems to scale to larger data sets more economically. Users query the data loaded into the system's memory, thereby avoiding slower database access and performance bottlenecks. This differs from caching, a very widely used method to speed up query performance, in that caches are subsets of very specific pre-defined organized data. With in-memory tools, data available for analysis can be as large as a data mart or small data warehouse which is entirely in memory. This can be accessed quickly by multiple concurrent users or applications at a detailed level and offers the potential for enhanced analytics and for scaling and increasing the speed of an application. Theoretically, the improvement in data access speed is 10,000 to 1,000,000 times compared to the disk. It also minimizes the need for performance tuning by IT staff and provides faster service for end users.

Advantages of in-memory processing technology

Certain developments in computer technology and business needs have tended to increase the relative advantages of in-memory technology.[11]

Application in business

A range of in-memory products provide ability to connect to existing data sources and access to visually rich interactive dashboards. This allows business analysts and end users to create custom reports and queries without much training or expertise. Easy navigation and ability to modify queries on the fly is of benefit to many users. Since these dashboards can be populated with fresh data, users have access to real time data and can create reports within minutes. In-memory processing may be of particular benefit in call centers and warehouse management.

With in-memory processing, the source database is queried only once instead of accessing the database every time a query is run, thereby eliminating repetitive processing and reducing the burden on database servers. By scheduling to populate the in-memory database overnight, the database servers can be used for operational purposes during peak hours.

Adoption of in-memory technology

With a large number of users, a large amount of RAM is needed for an in-memory configuration, which in turn affects the hardware costs. The investment is more likely to be suitable in situations where speed of query response is a high priority, and where there is significant growth in data volume and increase in demand for reporting facilities; it may still not be cost-effective where information is not subject to rapid change. Security is another consideration, as in-memory tools expose huge amounts of data to end users. Makers advise ensuring that only authorized users are given access to the data.

See also

Notes and References

  1. Ghose . S. . November 2019 . Processing-in-memory: A workload-driven perspective . IBM Journal of Research and Development . 63 . 6 . 3:1–19. 10.1147/JRD.2019.2934048 . 202025511 .
  2. Hao. Zhang. Gang Chen. Beng Chin Ooi. Kian-Lee Tan. Meihui Zhang. In-Memory Big Data Management and Processing: A Survey. IEEE Transactions on Knowledge and Data Engineering. July 2015. 27. 7. 1920–1948. 10.1109/TKDE.2015.2427795. free.
  3. Book: Plattner. Hasso. Zeier. Alexander. In-Memory Data Management: Technology and Applications. 2012. Springer Science & Business Media. 9783642295744. en.
  4. Web site: Processing-in-Memory Course: Lecture 1: Exploring the PIM Paradigm for Future Systems - Spring 2022 . . 10 March 2022 . en.
  5. Web site: Park . Kate . 2023-07-27 . Samsung extends cut in memory chip production, will focus on high-end AI chips instead . 2023-12-05 . TechCrunch . en-US.
  6. Tan . Kian-Lee . Cai . Qingchao . Ooi . Beng Chin . Wong . Weng-Fai . Yao . Chang . Zhang . Hao . 2015-08-12 . In-memory Databases: Challenges and Opportunities From Software and Hardware Perspectives . ACM SIGMOD Record . 44 . 2 . 35–40 . 10.1145/2814710.2814717 . 14238437 . 0163-5808.
  7. Book: https://ieeexplore.ieee.org/document/9937475 . 2023-12-05 . 10.1109/ISCAS48785.2022.9937475 . 253462291 . Approximate In-Memory Computing using Memristive IMPLY Logic and its Application to Image Processing . 2022 IEEE International Symposium on Circuits and Systems (ISCAS) . 2022 . Fatemieh . Seyed Erfan . Reshadinezhad . Mohammad Reza . Taherinejad . Nima . 3115–3119 . 978-1-6654-8485-5 .
  8. Web site: What is processing in memory (PIM) and how does it work? . 2023-12-05 . Business Analytics . en.
  9. Gill. John. Shifting the BI Paradigm with In-Memory Database Technologies. Business Intelligence Journal. 2007. 12. 2. 58–62. https://web.archive.org/web/20150924203158/http://www.highbeam.com/doc/1P3-1636785121.html. dead. 2015-09-24.
  10. Book: Earls, A. Tips on evaluating, deploying and managing in-memory analytics tools. 2011. Tableau. https://web.archive.org/web/20120425232535/http://www.analyticsearches.com/site/files/776/66977/259607/579091/In-Memory_Analytics_11.10.11.pdf . 2012-04-25.
  11. Web site: In_memory Analytics. yellowfin. 6.
  12. Web site: Kote . Sparjan . In-memory computing in Business Intelligence . dead . https://web.archive.org/web/20110424013629/http://www.infosysblogs.com/oracle/2011/03/in-memory_computing_in_busines.html . April 24, 2011 .
  13. Web site: Survey Analysis: Why BI and Analytics Adoption Remains Low and How to Expand Its Reach . 2023-12-05 . Gartner . en.
  14. Book: Upchurch. E.. Sterling. T.. Brockman. J.. Proceedings of the ACM/IEEE SC2004 Conference . Analysis and Modeling of Advanced PIM Architecture Design Tradeoffs . 2004. https://ieeexplore.ieee.org/document/1392942. Pittsburgh, PA, USA. IEEE. 12. 10.1109/SC.2004.11. 978-0-7695-2153-4. 9089044 .