Computational RAM explained

Computational RAM (C-RAM) is random-access memory with processing elements integrated on the same chip. This enables C-RAM to be used as a SIMD computer. It also can be used to more efficiently use memory bandwidth within a memory chip. The general technique of doing computations in memory is called Processing-In-Memory (PIM).

Overview

The most influential implementations of computational RAM came from The Berkeley IRAM Project. Vector IRAM (V-IRAM) combines DRAM with a vector processor integrated on the same chip.[1]

Reconfigurable Architecture DRAM (RADram) is DRAM with reconfigurable computing FPGA logic elements integrated on the same chip.[2] SimpleScalar simulations show that RADram (in a system with a conventional processor) can give orders of magnitude better performance on some problems than traditional DRAM (in a system with the same processor).

Some embarrassingly parallel computational problems are already limited by the von Neumann bottleneck between the CPU and the DRAM.Some researchers expect that, for the same total cost, a machine built from computational RAM will run orders of magnitude faster than a traditional general-purpose computer on these kinds of problems.[3]

As of 2011, the "DRAM process" (few layers; optimized for high capacitance) and the "CPU process" (optimized for high frequency; typically twice as many BEOL layers as DRAM; since each additional layer reduces yield and increases manufacturing cost, such chips are relatively expensive per square millimeter compared to DRAM) is distinct enough that there are three approaches to computational RAM:

Some CPUs designed to be built on a DRAM process technology (rather than a "CPU" or "logic" process technology specifically optimized for CPUs) includeThe Berkeley IRAM Project, TOMI Technology[4] [5] and the AT&T DSP1.

Because a memory bus to off-chip memory has many times the capacitance of an on-chip memory bus, a system with separate DRAM and CPU chips can have several times the energy consumption of an IRAM system with the same computer performance.

Because computational DRAM is expected to run hotter than traditional DRAM,and increased chip temperatures result in faster charge leakage from the DRAM storage cells,computational DRAM is expected to require more frequent DRAM refresh.

Processor-in-/near-memory

A processor-in-/near-memory (PINM) refers to a computer processor (CPU) tightly coupled to memory, generally on the same silicon chip.

The chief goal of merging the processing and memory components in this way is to reduce memory latency and increase bandwidth. Alternatively reducing the distance that data needs to be moved reduces the power requirements of a system.[6] Much of the complexity (and hence power consumption) in current processors stems from strategies to deal with avoiding memory stalls.

Examples

In the 1980s, a tiny CPU that executed FORTH was fabricated into a DRAM chip to improve PUSH and POP. FORTH is a stack-oriented programming language and this improved its efficiency.

The transputer also had large on chip memory given that it was made in the early 1980s making it essentially a processor-in-memory.

Notable PIM projects include the Berkeley IRAM project (IRAM) at the University of California, Berkeley[7] project and the University of Notre Dame PIM[8] effort.

DRAM-based PIM Taxonomy

DRAM-based near-memory and in-memory designs can be categorized into four groups:

See also

Bibliography

Notes and References

  1. Christoforos E. Kozyrakis,Stylianos Perissakis,David Patterson,Thomas Anderson, et al."Scalable Processors in the Billion-Transistor Era: IRAM".IEEE Computer (magazine).1997.says"Vector IRAM ...can operate as a parallel built-in self-test engine forthe memory array, significantly reducing the DRAMtesting time and the associated cost."
  2. Mark Oskin, Frederic T. Chong, and Timothy Sherwood."Active Pages: A Computation Model for Intelligent Memory" .1998.
  3. [Daniel J. Bernstein]
  4. https://www.venraytechnology.com/home.htm "TOMI the milliwatt microprocessor"
  5. Yong-Bin Kim and Tom W. Chen."Assessing Merged DRAM/Logic Technology".1998.Web site: Archived copy . dead . https://web.archive.org/web/20110725012704/http://www.ece.neu.edu/groups/hpvlsi/publication/ASSESSING_MERDRAM_ELSEVIER.pdf . 2011-07-25 . 2011-11-27. https://ieeexplore.ieee.org/document/541917/;jsessionid=92FEF4332BEA3A9443059A1E2D895BBA?arnumber=541917
  6. Web site: GYRFALCON STARTS SHIPPING AI CHIP . electronics-lab . 5 December 2018. 2018-10-10 .
  7. http://iram.cs.berkeley.edu/ IRAM
  8. Web site: PIM . 2015-05-26 . https://web.archive.org/web/20151109094746/http://www.cse.nd.edu/%7Epim/ . 2015-11-09 . dead .
  9. Hadi Asghari-Moghaddam, et al., "Chameleon: Versatile and practical near-DRAM acceleration architecture for large memory systems".
  10. Liu Ke, et al., "RecNMP: Accelerating Personalized Recommendation with Near-Memory Processing".
  11. Dongping, Zhang, et al., "TOP-PIM: Throughput-oriented programmable processing in memory".
  12. Sukhan Lee, et al., "Hardware Architecture and Software Stack for PIM Based on Commercial DRAM Technology : Industrial Product".
  13. Shuangchen Li, et al.,"DRISA: A dram-based reconfigurable in-situ accelerator".
  14. Marzieh Lenjani, et al., "Fulcrum: a Simplified Control and Access Mechanismtoward Flexible and Practical In-situ Accelerators".