CAP theorem explained

In database theory, the CAP theorem, also named Brewer's theorem after computer scientist Eric Brewer, states that any distributed data store can provide only two of the following three guarantees:^[1] ^[2] ^[3]

Consistency: Every read receives the most recent write or an error. Note that consistency as defined in the CAP theorem is quite different from the consistency guaranteed in ACID database transactions.^[4]

When a network partition failure happens, it must be decided whether to do one of the following:

cancel the operation and thus decrease the availability but ensure consistency
proceed with the operation and thus provide availability but risk inconsistency. Note this doesn't necessarily mean that system is highly available to its users. ^[6]

Thus, if there is a network partition, one has to choose between consistency or availability.

Explanation

No distributed system is safe from network failures, thus network partitioning generally has to be tolerated.^[7] ^[8] In the presence of a partition, one is then left with two options: consistency or availability. When choosing consistency over availability, the system will return an error or a time out if particular information cannot be guaranteed to be up to date due to network partitioning. When choosing availability over consistency, the system will always process the query and try to return the most recent available version of the information, even if it cannot guarantee it is up to date due to network partitioning.

In the absence of a partition, both availability and consistency can be satisfied.^[9]

Database systems designed with traditional ACID guarantees in mind such as RDBMS choose consistency over availability, whereas systems designed around the BASE philosophy, common in the NoSQL movement for example, choose availability over consistency.

Some cloud services choose strong consistency but use worldwide private fiber networks and GPS clock synchronization to minimize the frequency of network partitions. Finally, consistent shared-nothing architectures may use techniques such as geographic sharding to maintain availability of data owned by the queried node, but without being available for arbitrary requests during a network partition.

History

According to computer scientist Eric Brewer of the University of California, Berkeley, the theorem first appeared in autumn 1998.^[10] It was published as the CAP principle in 1999^[11] and presented as a conjecture by Brewer at the 2000 Symposium on Principles of Distributed Computing (PODC).^[12] In 2002, Seth Gilbert and Nancy Lynch of MIT published a formal proof of Brewer's conjecture, rendering it a theorem.^[1]

In 2012, Brewer clarified some of his positions, including why the often-used "two out of three" concept can be somewhat misleading because system designers only need to sacrifice consistency or availability in the presence of partitions; partition management and recovery techniques exist. Brewer also noted the different definition of consistency used in the CAP theorem relative to the definition used in ACID.^[13]

A similar theorem stating the trade-off between consistency and availability in distributed systems was published by Birman and Friedman in 1996.^[14] Birman and Friedman's result restricted this lower bound to non-commuting operations.

The PACELC theorem, introduced in 2010, builds on CAP by stating that even in the absence of partitioning, there is another trade-off between latency and consistency. PACELC means, if partition (P) happens, the trade-off is between availability (A) and consistency (C); Else (E), the trade-off is between latency (L) and consistency (C). Some experts like Marc Brooker argue that the CAP theorem is particularly relevant in intermittently connected environments, such as those related to the Internet of Things (IoT) and mobile applications. In these contexts, devices may become partitioned due to challenging physical conditions, such as power outages or when entering confined spaces like elevators. For distributed systems, such as cloud applications, it is more appropriate to use the PACELC theorem, which is more comprehensive and considers trade-offs such as latency and consistency even in the absence of network partitions. ^[15]

Notes and References

Gilbert . Seth . Lynch . Nancy . Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services . ACM SIGACT News . Association for Computing Machinery (ACM) . 33 . 2 . 2002 . 0163-5700 . 10.1145/564585.564601 . 51–59. 15892169 .
Web site: Brewer's CAP Theorem . julianbrowne.com . 2009-01-11 .
Web site: Brewers CAP Theorem on distributed systems . royans.net . 2010-02-14 .
Web site: Liochon . Nicolas . The confusing CAP and ACID wording . 1 February 2019 . This long run.
Book: Fowler, Adam . NoSQL For Dummies . For Dummies . 2015 . 978-8126554904.
Book: Fowler, Adam . NoSQL For Dummies . For Dummies . 2015 . 978-8126554904.
Kleppmann . Martin . A Critique of the CAP Theorem . 2015-09-18 . Apollo - University of Cambridge Repository . 10.17863/CAM.13083 . 24 November 2019. 2015arXiv150905393K . 1509.05393 . 1991487 .
Web site: Martin . Kleppmann . Please stop calling databases CP or AP . Martin Kleppmann's Blog . 24 November 2019.
Web site: DBMS Musings: Problems with CAP, and Yahoo's little known NoSQL system. Abadi. Daniel. 2010-04-23. DBMS Musings. 2018-01-23.
Brewer . Eric . CAP twelve years later: How the "rules" have changed . Computer . Institute of Electrical and Electronics Engineers (IEEE) . 45 . 2 . 2012 . 0018-9162 . 10.1109/mc.2012.37 . 23–29 . 890105 .
Armando Fox . Eric Brewer . Harvest, Yield and Scalable Tolerant Systems . Proc. 7th Workshop Hot Topics in Operating Systems (HotOS 99) . IEEE CS . 1999 . 174–178 . 10.1109/HOTOS.1999.798396.
Web site: Eric Brewer . Towards Robust Distributed Systems.
Book: Carpenter . Jeff . Cassandra: The Definitive Guide . Hewitt . Eben . July 2016 . O'Reilly Media . 9781491933657 . 2nd . In February 2012, Eric Brewer provided an updated perspective on his CAP theorem... Brewer now describes the “2 out of 3” axiom as somewhat misleading. He notes that designers only need sacrifice consistency or availability in the presence of partitions, and that advances in partition recovery techniques have made it possible for designers to achieve high levels of both consistency and availability..
Web site: Ken Birman . Roy Friedman . Trading Consistency for Availability in Distributed Systems . April 1996 . 1813/7235.
Book: Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems . O'Reilly Media . 978-1449373320.

CAP theorem explained

Explanation

History

See also

Notes and References