Five-minute rule explained

In computer science, the five-minute rule is a rule of thumb for deciding whether a data item should be kept in memory, or stored on disk and read back into memory when required. It was first formulated by Jim Gray and Gianfranco Putzolu in 1985, and then subsequently revised in 1997 and 2007^[1] to reflect changes in the relative cost and performance of memory and persistent storage.

The rule is as follows:

The 5-minute random rule: cache randomly accessed disk pages that are re-used every 5 minutes or less.

Gray also issued a counterpart one-minute rule for sequential access:^[2]

The 1-minute rule: cache sequentially accessed disk pages that are re-used every 1 minute or less.

Although the 5-minute rule was invented in the realm of databases, it has also been applied elsewhere, for example, in Network File System cache capacity planning.^[3]

The original 5-minute rule was derived from the following cost-benefit computation:^[1]

BreakEvenIntervalinSeconds = (PagesPerMBofRAM / AccessesPerSecondPerDisk) × (PricePerDiskDrive / PricePerMBofRAM)

Applying it to 2007 data yields approximately a 90-minutes interval for magnetic-disk-to-DRAM caching, 15 minutes for SSD-to-DRAM caching and 2 hours for disk-to-SSD caching. The disk-to-DRAM interval was thus a bit short of what Gray and Putzolu anticipated in 1987 as the "five-hour rule" was going to be in 2007 for RAM and disks.^[1]

According to calculations by NetApp engineer David Dale as reported in The Register, the figures for disc-to-DRAM caching in 2008 were as follows: "The 50KB page break-even was five minutes, the 4KB one was one hour and the 1KB one was five hours. There needed to be a 50-fold increase in page size to cache for break-even at five minutes." Regarding disk-to-SSD caching in 2010, the same source reported that "A 250KB page break even with SLC was five minutes, but five hours with a 4KB page size. It was five minutes with a 625KB page size with MLC flash and 13 hours with a 4KB MLC page size."^[4]

In 2000, Gray and Shenoy applied a similar calculation for web page caching and concluded that a browser should "cache web pages if there is any chance they will be re-referenced within their lifetime."^[5]

Notes and References

Free version in ACM Queue, September 2008.
Book: René J. Chevance. Server Architectures: Multiprocessors, Clusters, Parallel Systems, Web Servers, Storage Solutions. 2004. Digital Press. 978-0-08-049229-2. 542.
Book: Gian-Paolo D. Musumeci. Mike Loukides. System Performance Tuning. 2002. O'Reilly Media, Inc.. 978-0-596-55204-6. 263.
Web site: Flash and the five-minute rule • The Register. The Register.
Jim Gray, Prashant Shenoy, "Rules of Thumb in Data Engineering", MS-TR-99-100