Concurrent hash table explained

A concurrent hash table or concurrent hash map is an implementation of hash tables allowing concurrent access by multiple threads using a hash function.[1] [2]

Concurrent hash tables represent a key concurrent data structure for use in concurrent computing which allow multiple threads to more efficiently cooperate for a computation among shared data.

Due to the natural problems associated with concurrent access - namely contention - the way and scope in which the table can be concurrently accessed differs depending on the implementation. Furthermore, the resulting speed up might not be linear with the amount of threads used as contention needs to be resolved, producing processing overhead. There exist multiple solutions to mitigate the effects of contention, that each preserve the correctness of operations on the table.[3] [4]

As with their sequential counterpart, concurrent hash tables can be generalized and extended to fit broader applications, such as allowing more complex data types to be used for keys and values. These generalizations can however negatively impact performance and should thus be chosen in accordance to the requirements of the application.

Concurrent hashing

When creating concurrent hash tables, the functions accessing the table with the chosen hashing algorithm need to be adapted for concurrency by adding a conflict resolution strategy. Such a strategy requires managing accesses in a way such that conflicts caused by them do not result in corrupt data, while ideally increasing their efficiency when used in parallel. Herlihy and Shavit[5] describe how the accesses to a hash table without such a strategy - in its example based on a basic implementation of the Cuckoo hashing algorithm - can be adapted for concurrent use. Fan et al.[6] further describe a table access scheme based on cuckoo hashing that is not only concurrent, but also keeps the space efficiency of its hashing function while also improving cache locality as well as the throughput of insertions.

When hash tables are not bound in size and are thus allowed to grow/shrink when necessary, the hashing algorithm needs to be adapted to allow this operation. This entails modifying the used hash function to reflect the new key-space of the resized table. A concurrent growing algorithm is described by Maier et al.

Mega-KV[7] is a high performance key-value store system, where the cuckoo hashing is used and the KV indexing is massively parallelized in batch mode by GPU. With further optimizations of GPU acceleration by Nvidia and Oak Ridge National Lab, Mega-KV was pushed to another high record of the throughput in 2018 (up to 888 millions of key-value operations per second).[8]

Contention handling

As with any concurrent data structure, concurrent hash tables suffer from a variety of problems known in the field of concurrent computing as a result of contention. Examples for such are the ABA problem, race conditions, and deadlocks.The extent in which these problems manifest or even occur at all depends on the implementation of the concurrent hash table; specifically which operations the table allows to be run concurrently, as well as its strategies for mitigating problems associated with contention. When handling contention, the main goal is the same as with any other concurrent data structure, namely ensuring correctness for every operation on the table. At the same time, it should naturally be done in such a way as to be more efficient than a sequential solution when used concurrently. This is also known as concurrency control.

Atomic instructions

Using atomic instructions such as compare-and-swap or fetch-and-add, problems caused by contention can be reduced by ensuring that an access is completed before another access has the chance to interfere. Operations such as compare-and-swap often present limitations as to what size of data they can handle, meaning that the types of keys and values of a table have to be chosen or converted accordingly.

Using so called Hardware Transactional Memory (HTM), table operations can be thought of much like database transactions, ensuring atomicity. An example of HTM in practice are the Transactional Synchronization Extensions.

Locking

With the help of locks, operations trying to concurrently access the table or values within it can be handled in a way that ensures correct behavior. This can however lead to negative performance impacts, in particular when the locks used are too restrictive, thus blocking accesses that would otherwise not contend and could execute without causing any problems. Further considerations have to be made to avoid even more critical problems that threaten correctness, as with livelocks, deadlocks or starvation.

Phase concurrency

A phase concurrent hash table groups accesses by creating phases in which only one type of operation is allowed (i.e. a pure write-phase), followed by a synchronization of the table state across all threads. A formally proven algorithm for this is given by Shun and Blelloch.

Read-copy-update

Widely used within the Linux kernel, read-copy-update (RCU) is especially useful in cases where the number of reads far exceeds the number of writes.

Applications

Naturally, concurrent hash tables find application wherever sequential hash tables are useful. The advantage that concurrency delivers herein lies within the potential speedup of these use-cases, as well as the increased scalability. Considering hardware such as multi-core processors that become increasingly more capable of concurrent computation, the importance of concurrent data structures within these applications grow steadily.

Performance analysis

Maier et al. perform a thorough analysis on a variety of concurrent hash table implementations, giving insight into the effectiveness of each in different situations that are likely to occur in real use-cases. The most important findings can be summed up as the following:

OperationContentionNotes
LowHigh
find Very high speedups both when successful and unsuccessful unique finds, even with very high contention
insert High speedups reached, high contention becomes problematic when keys can hold more than one value (otherwise inserts are simply discarded if key already exists)
update Both overwrites and modifications of existing values reach high speedups when contention is kept low, otherwise performs worse than sequential
delete Phase concurrency reached highest scalability; Fully concurrent implementations where delete uses update with dummy-elements were closely behind

As expected low contention leads to positive behavior across every operation, whereas high contention becomes problematic when it comes to writing. The latter however is a problem of high contention in general, wherein the benefit of concurrent computation is negated due to the natural requirement for concurrency control restricting contending accesses. The resulting overhead causes worse performance than that of the ideal sequential version.In spite of this, concurrent hash tables still prove invaluable even in such high contention scenarios when observing that a well-designed implementation can still achieve very high speedups by leveraging the benefits of concurrency to read data concurrently.

However, real use-cases of concurrent hash tables are often not simply sequences of the same operation, but rather a mixture of multiple types.As such, when a mixture of insert and find operations is used the speedup and resulting usefulness of concurrent hash tables become more obvious, especially when observing find heavy workloads.

Ultimately the resulting performance of a concurrent hash table depends on a variety of factors based upon its desired application. When choosing the implementation, it is important to determine the necessary amount of generality, contention handling strategies and some thoughts on whether the size of the desired table can be determined in advance or a growing approach must be used instead.

Implementations

See also

Further reading

Notes and References

  1. Maier . Tobias. Sanders. Peter. Dementiev. Roman. March 2019. Concurrent Hash Tables: Fast and General(?)! . ACM Transactions on Parallel Computing . 5 . 4 . Article 16 . 10.1145/3309206 . ACM . New York, NY, USA . 67870641 . 2329-4949.
  2. Shun . Julian . Blelloch . Guy E. . 2014 . Phase-concurrent Hash Tables for Determinism . SPAA '14: Proceedings of the 26th ACM symposium on Parallelism in algorithms and architectures . 978-1-4503-2821-0 . 96–107 . 10.1145/2612669.2612687 . ACM . New York.
  3. Li . Xiaozhou . Andersen . David G. . Kaminsky . Michael . Freedman . Michael J. . 2014 . Algorithmic Improvements for Fast Concurrent Cuckoo Hashing . Proceedings of the Ninth European Conference on Computer Systems . EuroSys '14 . 978-1-4503-2704-6 . Article No. 27 . 10.1145/2592798.2592820 . ACM . New York.
  4. Triplett . Josh . McKenney . Paul E. . Walpole . Jonathan . 2011 . Resizable, Scalable, Concurrent Hash Tables via Relativistic Programming . USENIXATC'11: Proceedings of the 2011 USENIX conference on USENIX annual technical conference . 11 . USENIX Association . Berkeley, CA.
  5. Book: Herlihy . Maurice . Shavit . Nir . The Art of Multiprocessor Programming . 2008 . Morgan Kaufmann Publishers Inc. . San Francisco, CA, USA . 978-0-12-370591-4 . 316–325 . Chapter 13: Concurrent Hashing and Natural Parallelism.
  6. Fan . Bin . Andersen . David G. . Kaminsky . Michael . 2013 . MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing . nsdi'13: Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation . 371–384 . USENIX Association . Berkeley, CA.
  7. Zhang, Kai; Wang, Kaibo; Yuan, Yuan; Guo, Lei; Lee, Rubao; and Zhang, Xiaodong (2015). "Mega-KV: a case for GPUs to maximize the throughput of in-memory key-value stores". Proceedings of the VLDB Endowment, Vol. 8, No. 11, 2015.
  8. Chu, Ching-Hsing; Potluri, Sreeram; Goswami, Anshuman; Venkata, Manjunath Gorentla; Imam, Neenaand; and Newburn, Chris J. (2018) "Designing High-performance in-memory key-value operations with persistent GPU kernels and OPENSHMEM"..
  9. https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ConcurrentHashMap.html Java ConcurrentHashMap documentation
  10. https://github.com/efficient/libcuckoo GitHub repository for libcuckoo
  11. https://software.intel.com/en-us/node/506171 Threading Building Blocks concurrent_unordered_map and concurrent_unordered_multimap documentation
  12. https://software.intel.com/en-us/node/506191 Threading Building Blocks concurrent_hash_map documentation
  13. https://github.com/TooBiased/growt GitHub repository for growt
  14. https://github.com/facebook/folly/blob/master/folly/concurrency/ConcurrentHashMap.h GitHub page for implementation of concurrent hash maps in folly
  15. https://github.com/facebook/folly GitHub repository for folly
  16. https://github.com/preshing/junction GitHub repository for Junction