Concurrent data structure explained

In computer science, a concurrent data structure is aparticular way of storing and organizing data for access bymultiple computing threads (or processes) on a computer.

Historically, such data structures were used on uniprocessormachines with operating systems that supported multiplecomputing threads (or processes). The term concurrency captured themultiplexing/interleaving of the threads' operations on thedata by the operating system, even though the processors neverissued two operations that accessed the data simultaneously.

Today, as multiprocessor computer architectures that provideparallelism become the dominant computing platform (through theproliferation of multi-core processors), the term has come tostand mainly for data structures that can be accessed by multiplethreads which may actually access the data simultaneously becausethey run on different processors that communicate with one another.The concurrent data structure (sometimes also called a shared data structure) is usually considered to reside in an abstract storageenvironment called shared memory, though this memory may bephysically implemented as either a "tightly coupled" or adistributed collection of storage modules.

Basic principles

Concurrent data structures, intended for use inparallel or distributed computing environments, differ from"sequential" data structures, intended for use on a uni-processormachine, in several ways.[1] Most notably, in a sequential environmentone specifies the data structure's properties and checks that theyare implemented correctly, by providing safety properties. Ina concurrent environment, the specification must also describeliveness properties which an implementation must provide.Safety properties usually state that something bad never happens,while liveness properties state that something good keeps happening.These properties can be expressed, for example, using Linear Temporal Logic.

The type of liveness requirements tend to define the data structure.The method calls can be blocking or non-blocking. Data structures are notrestricted to one type or the other, and can allow combinationswhere some method calls are blocking and others are non-blocking(examples can be found in the Java concurrency softwarelibrary).

The safety properties of concurrent data structures must capture their behavior given the many possible interleavings of methodscalled by different threads. It is quiteintuitive to specify how abstract data structuresbehave in a sequential setting in which there are no interleavings.Therefore, many mainstream approaches for arguing the safety properties of aconcurrent data structure (such as serializability, linearizability, sequential consistency, andquiescent consistency) specify the structures propertiessequentially, and map its concurrent executions toa collection of sequential ones.

To guarantee the safety and liveness properties, concurrentdata structures must typically (though not always) allow threads toreach consensus as to the resultsof their simultaneous data access and modification requests. Tosupport such agreement, concurrent data structures are implementedusing special primitive synchronization operations (see synchronization primitives)available on modern multiprocessor machinesthat allow multiple threads to reach consensus. This consensus can be achieved in a blocking manner by using locks, or without locks, in which case it is non-blocking. There is a wide bodyof theory on the design of concurrent data structures (seebibliographical references).

Design and implementation

Concurrent data structures are significantly more difficult to designand to verify as being correct than their sequential counterparts.

The primary source of this additional difficulty is concurrency, exacerbated by the fact that threads must be thought of as being completely asynchronous:they are subject to operating system preemption, page faults,interrupts, and so on.

On today's machines, the layout of processors andmemory, the layout of data in memory, the communication load on thevarious elements of the multiprocessor architecture all influence performance.Furthermore, there is a tension between correctness and performance: algorithmic enhancements that seek to improve performance often make it more difficult to design and verify a correctdata structure implementation.[2]

A key measure for performance is scalability, captured by the speedup of the implementation. Speedup is a measure of howeffectively the application is using the machine it is runningon. On a machine with P processors, the speedup is the ratio of the structures execution time on a single processor to its execution time on P processors. Ideally, we want linear speedup: we would like to achieve aspeedup of P when using P processors. Data structures whosespeedup grows with P are called scalable. The extent to which one can scale the performance of a concurrent data structure is captured by a formula known as Amdahl's law and more refined versions of it such as Gustafson's law.

A key issue with the performance of concurrent data structures is the level of memory contention: the overhead in traffic to and from memory as aresult of multiple threads concurrently attempting to access the samelocations in memory. This issue is most acute with blocking implementations in which locks control access to memory. In order toacquire a lock, a thread must repeatedly attempt to modify thatlocation. On a cache-coherentmultiprocessor (one in which processors havelocal caches that are updated by hardware to keep themconsistent with the latest values stored) this results in longwaiting times for each attempt to modify the location, and is exacerbated by the additional memory traffic associated withunsuccessful attempts to acquire the lock.

See also

Further reading

External links

Notes and References

  1. Book: Mark Moir . . Handbook of Data Structures and Applications . Concurrent Data Structures. http://www.cs.tau.ac.il/~shanir/concurrent-data-structures.pdf. https://web.archive.org/web/20110401070433/http://www.cs.tau.ac.il/~shanir/concurrent-data-structures.pdf. 2011-04-01 . Dinesh Metha . . Chapman and Hall/CRC Press . 2007 . 47-14–47-30 .
  2. More than you ever wanted to know about synchronization: Synchrobench, measuring the impact of the synchronization on concurrent algorithms . Gramoli, V. . Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming . 1–10 . 2015 . ACM . https://web.archive.org/web/20150410030004/http://sydney.edu.au/engineering/it/~gramoli/doc/pubs/gramoli-synchrobench.pdf . 10 April 2015 .