Red–black tree | |
Type: | Tree |
Invented By: | Leonidas J. Guibas and Robert Sedgewick |
Invented Year: | 1978 |
Space Avg: | O(n) |
Search Avg: | O(logn) |
Search Worst: | O(logn) |
Insert Avg: | O(logn) |
Insert Worst: | O(logn) |
Delete Avg: | O(logn) |
Delete Worst: | O(logn) |
In computer science, a red–black tree is a self-balancing binary search tree data structure noted for fast storage and retrieval of ordered information. The nodes in a red-black tree hold an extra "color" bit, often drawn as red and black, which help ensure that the tree is always approximately balanced.[1]
When the tree is modified, the new tree is rearranged and "repainted" to restore the coloring properties that constrain how unbalanced the tree can become in the worst case. The properties are designed such that this rearranging and recoloring can be performed efficiently.
The (re-)balancing is not perfect, but guarantees searching in
O(logn)
n
O(logn)
Tracking the color of each node requires only one bit of information per node because there are only two colors (due to memory alignment present in some programming languages, the real memory consumption may differ). The tree does not contain any other data specific to it being a red–black tree, so its memory footprint is almost identical to that of a classic (uncolored) binary search tree. In some cases, the added bit of information can be stored at no added memory cost.
In 1972, Rudolf Bayer[4] invented a data structure that was a special order-4 case of a B-tree. These trees maintained all paths from root to leaf with the same number of nodes, creating perfectly balanced trees. However, they were not binary search trees. Bayer called them a "symmetric binary B-tree" in his paper and later they became popular as 2–3–4 trees or even 2–3 trees.[5]
In a 1978 paper, "A Dichromatic Framework for Balanced Trees",[6] Leonidas J. Guibas and Robert Sedgewick derived the red–black tree from the symmetric binary B-tree.[7] The color "red" was chosen because it was the best-looking color produced by the color laser printer available to the authors while working at Xerox PARC.[8] Another response from Guibas states that it was because of the red and black pens available to them to draw the trees.[9]
In 1993, Arne Andersson introduced the idea of a right leaning tree to simplify insert and delete operations.[10]
In 1999, Chris Okasaki showed how to make the insert operation purely functional. Its balance function needed to take care of only 4 unbalanced cases and one default balanced case.[11]
The original algorithm used 8 unbalanced cases, but reduced that to 6 unbalanced cases.[1] Sedgewick showed that the insert operation can be implemented in just 46 lines of Java code.[12] [13] In 2008, Sedgewick proposed the left-leaning red–black tree, leveraging Andersson’s idea that simplified the insert and delete operations. Sedgewick originally allowed nodes whose two children are red, making his trees more like 2–3–4 trees, but later this restriction was added, making new trees more like 2–3 trees. Sedgewick implemented the insert algorithm in just 33 lines, significantly shortening his original 46 lines of code.[14] [15]
A red–black tree is a special type of binary search tree, used in computer science to organize pieces of comparable data, such as text fragments or numbers (as e.g. the numbers in figures 1 and 2). The nodes carrying keys and/or data are frequently called "internal nodes", but to make this very specific they are also called non-NIL nodes in this article.
The leaf nodes of red–black trees (
 NIL  in figure 1) do not contain keys or data. These "leaves" need not be explicit individuals in computer memory: a NULL pointer can—as in all binary tree data structures— encode the fact that there is no child node at this position in the (parent) node. Nevertheless, by their position in the tree, these objects are in relation to other nodes that is relevant to the RB-structure, it may have parent, sibling (i.e., the other child of the parent), uncle, even nephew node; and may be child—but never parent—of another node.It is not really necessary to attribute a "color" to these end-of-path objects, because the condition "isNIL
or BLACK
" is implied by the condition "is NIL
" (see also this remark).Figure 2 shows the conceptually same red–black tree without these NIL leaves. To arrive at the same notion of a path, one must notice that e.g., 3 paths run through the node 1, namely a path through 1left plus 2 added paths through 1right, namely the paths through 6left and 6right. This way, these ends of the paths are also docking points for new nodes to be inserted, fully equivalent to the NIL leaves of figure 1.
Instead, to save a marginal amount of execution time, these (possibly many) NIL leaves may be implemented as pointers to one unique (and black) sentinel node (instead of pointers of value NULL).
As a conclusion, the fact that a child does not exist (is not a true node, does not contain data) can in all occurrences be specified by the very same NULL pointer or as the very same pointer to a sentinel node. Throughout this article, either choice is called NIL node and has the constant value NIL
.
The black depth of a node is defined as the number of black nodes from the root to that node (i.e. the number of black ancestors). The black height of a red–black tree is the number of black nodes in any path from the root to the leaves, which, by requirement 4, is constant (alternatively, it could be defined as the black depth of any leaf node).[16] The black height of a node is the black height of the subtree rooted by it. In this article, the black height of a NIL node shall be set to 0, because its subtree is empty as suggested by figure 2, and its tree height is also 0.
In addition to the requirements imposed on a binary search tree the following must be satisfied by a
Some authors, e.g. Cormen & al., claim "the root is black" as fifth requirement; but not Mehlhorn & Sanders[16] or Sedgewick & Wayne.[15] Since the root can always be changed from red to black, this rule has little effect on analysis.This article also omits it, because it slightly disturbs the recursive algorithms and proofs.
As an example, every perfect binary tree that consists only of black nodes is a red–black tree.
The read-only operations, such as search or tree traversal, do not affect any of the requirements. In contrast, the modifying operations insert and delete easily maintain requirements 1 and 2, but with respect to the other requirements some extra effort must be made, to avoid introducing a violation of requirement 3, called a red-violation, or of requirement 4, called a black-violation.
The requirements enforce a critical property of red–black trees: the path from the root to the farthest leaf is no more than twice as long as the path from the root to the nearest leaf. The result is that the tree is height-balanced. Since operations such as inserting, deleting, and finding values require worst-case time proportional to the height
h
n
Red–black trees, like all binary search trees, allow quite efficient sequential access (e.g. in-order traversal, that is: in the order Left–Root–Right) of their elements. But they support also asymptotically optimal direct access via a traversal from root to leaf, resulting in
O(logn)
Red–black trees are similar in structure to 2–3–4 trees, which are B-trees of order 4.[17] In 2–3–4 trees, each node can contain between 1 and 3 values and have between 2 and 4 children. These 2–3–4 nodes correspond to black node – red children groups in red-black trees, as shown in figure 3. It is not a 1-to-1 correspondence, because 3-nodes have two equivalent representations: the red child may lie either to the left or right. The left-leaning red-black tree variant makes this relationship exactly 1-to-1, by only allowing the left child representation. Since every 2–3–4 node has a corresponding black node, invariant 4 of red-black trees is equivalent to saying that the leaves of a 2–3–4 tree all lie at the same level.
Despite structural similarities, operations on red–black trees are more economical than B-trees. B-trees require management of vectors of variable length, whereas red-black trees are simply binary trees.[18]
Red–black trees offer worst-case guarantees for insertion time, deletion time, and search time. Not only does this make them valuable in time-sensitive applications such as real-time applications, but it makes them valuable building blocks in other data structures that provide worst-case guarantees. For example, many data structures used in computational geometry are based on red–black trees, and the Completely Fair Scheduler and epoll system call of the Linux kernel use red–black trees.[19] [20] The AVL tree is another structure supporting
O(logn)
Red–black trees are also particularly valuable in functional programming, where they are one of the most common persistent data structures, used to construct associative arrays and sets that can retain previous versions after mutations. The persistent version of red–black trees requires
O(logn)
For every 2–3–4 tree, there are corresponding red–black trees with data elements in the same order. The insertion and deletion operations on 2–3–4 trees are also equivalent to color-flipping and rotations in red–black trees. This makes 2–3–4 trees an important tool for understanding the logic behind red–black trees, and this is why many introductory algorithm texts introduce 2–3–4 trees just before red–black trees, even though 2–3–4 trees are not often used in practice.
In 2008, Sedgewick introduced a simpler version of the red–black tree called the left-leaning red–black tree[22] by eliminating a previously unspecified degree of freedom in the implementation. The LLRB maintains an additional invariant that all red links must lean left except during inserts and deletes. Red–black trees can be made isometric to either 2–3 trees,[23] or 2–3–4 trees,[22] for any sequence of operations. The 2–3–4 tree isometry was described in 1978 by Sedgewick.[6] With 2–3–4 trees, the isometry is resolved by a "color flip," corresponding to a split, in which the red color of two children nodes leaves the children and moves to the parent node.
The original description of the tango tree, a type of tree optimised for fast searches, specifically uses red–black trees as part of its data structure.[24]
As of Java 8, the HashMap has been modified such that instead of using a LinkedList to store different elements with colliding hashcodes, a red–black tree is used. This results in the improvement of time complexity of searching such an element from
O(m)
O(logm)
m
The read-only operations, such as search or tree traversal, on a red–black tree require no modification from those used for binary search trees, because every red–black tree is a special case of a simple binary search tree. However, the immediate result of an insertion or removal may violate the properties of a red–black tree, the restoration of which is called rebalancing so that red–black trees become self-balancing.It requires in the worst case a small number,
O(logn)
n
O(1)
If the example implementation below is not suitable, other implementations with explanations may be found in Ben Pfaff’s[28] annotated C library GNU libavl (v2.0.3 as of June 2019).
The details of the insert and removal operations will be demonstrated with example C++ code, which uses the type definitions, macros below, and the helper function for rotations:
enum color_t ;
struct RBnode ;
struct RBtree ;
// Get the child direction (∈)// of the non-root non-NIL RBnode* N:
The proposal breaks down both, insertion and removal (not mentioning some very simple cases), into six constellations of nodes, edges and colors, which are called cases. The proposal contains for both, insertion and removal, exactly one case that advances one black level closer to the root and loops, the other five cases rebalance the tree of their own. The more complicated cases are pictured in a diagram.
U == NIL || U->color == BLACK ''// considered black''
U != NIL && U->color == RED ''// not considered black''
U == NIL
. Then in both cases U->color
is not touched (see Short-circuit evaluation).considered black
is in accordance with requirement 2.)
if
-statements have to occur far less frequently if the proposal[29] is realised.Insertion begins by placing the new (non-NIL) node, say N, at the position in the binary search tree of a NIL node whose in-order predecessor’s key compares less than the new node’s key, which in turn compares less than the key of its in-order successor.(Frequently, this positioning is the result of a search within the tree immediately preceding the insert operation and consists of a node P
together with a direction dir
with The newly inserted node is temporarily colored red so that all paths contain the same number of black nodes as before.But if its parent, say P, is also red then this action introduces a red-violation.
Because the algorithm transforms the input without using an auxiliary data structure and using only a small amount of extra storage space for auxiliary variables it is in-place.
- When the deleted node has 2 children (non-NIL), then we can swap its value with its in-order successor (the leftmost child of the right subtree), and then delete the successor instead. Since the successor is leftmost, it can only have a right child (non-NIL) or no child at all.
- When the deleted node has only 1 child (non-NIL). In this case, just replace the node with its child, and color it black.
The single child (non-NIL) must be red according to conclusion 5, and the deleted node must be black according to requirement 3.
- When the deleted node has no children (both NIL) and is the root, replace it with NIL. The tree is empty.
- When the deleted node has no children (both NIL), and is red, simply remove the leaf node.
- When the deleted node has no children (both NIL), and is black, deleting it will create an imbalance, and requires a fixup, as covered in the next section.
The complex case is when N is not the root, colored black and has no proper child (⇔ only NIL children).In the first iteration, N is replaced by NIL.
Because the algorithm transforms the input without using an auxiliary data structure and using only a small amount of extra storage space for auxiliary variables it is in-place.
For
h\in\N
h
mh | =2\lfloor(h+1)/2\rfloor+2\lfloor-2 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
=l\{ 2 ⋅ 2\tfrac{h2}-2=2\tfrac{h2+1}-2 h 3 ⋅ 2\tfrac{h-12}-2 h \lfloor\rfloor Its black height is   \lceilh/2\rceil h (h-1)/2~.
The RB tree of height h=1 m1=2\lfloor\ | +\!2^ \ | \!-\ | \!2 = 2^1\ | +\!2^0\ | \!-\ | \!2 = 1~. A minimal RB tree (RBh in figure 4) of height h>1 mh-1 \lfloor(h\ | \!-\ | \!1)/2\rfloor =: s . The other subtree is a perfect binary tree of (black) height s 2s\ | \!-\ | \!1=2^\ | \!-\ | \!1 black nodes—and no red node. Then the number of nodes is by induction
The graph of the function mh (h=2k | \;m_=2 \cdot 2^k\!-\ | 2) where k\in\N. mh= h\geq1
h 9>8=23 3>23/2 h mh=3 ⋅ 2(h-1)/2-2=l(3 ⋅ 2-3/2r) ⋅ 2(h+2)/2-2>2 ⋅ 2h/2-2 h
n
n h\inO(logn). Set operations and bulk operationsIn addition to the single-element insert, delete and lookup operations, several set operations have been defined on union, intersection and set difference. Then fast bulk operations on insertions or deletions can be implemented based on these set functions. These set operations rely on two helper operations, Split and Join. With the new operations, the implementation of red–black trees can be more efficient and highly-parallelizable.[32] In order to achieve its time complexities this implementation requires that the root is allowed to be either red or black, and that every node stores its own black height.
If the two trees have the same black height, Join simply creates a new node with left subtree, root and right subtree . If both and have black root, set to be red. Otherwise is set black. If the black heights are unequal, suppose that has larger black height than (the other case is symmetric). Join follows the right spine of until a black node, which is balanced with . At this point a new node with left child, root (set to be red) and right child is created to replace c. The new node may invalidate the red–black invariant because at most three red nodes can appear in a row. This can be fixed with a double rotation. If double red issue propagates to the root, the root is then set to be black, restoring the properties. The cost of this function is the difference of the black heights between the two input trees.
For some applications, Split also returns a boolean value denoting if appears in the tree. The cost of Split is O(logn), The join algorithm is as follows: function joinRightRB(TL, k, TR): if (TL.color=black) and (TL.blackHeight=TR.blackHeight): return Node(TL,⟨k,red⟩,TR) T'=Node(TL.left,⟨TL.key,TL.color⟩,joinRightRB(TL.right,k,TR)) if (TL.color=black) and (T'.right.color=T'.right.right.color=red): T'.right.right.color=black; return rotateLeft(T') return T' /* T[recte T'] */ function joinLeftRB(TL, k, TR): /* symmetric to joinRightRB */ function join(TL, k, TR): if TL.blackHeight>TR.blackHeight: T'=joinRightRB(TL,k,TR) if (T'.color=red) and (T'.right.color=red): T'.color=black return T' if TR.blackHeight>TL.blackHeight: /* symmetric */ if (TL.color=black) and (TR.color=black): return Node(TL,⟨k,red⟩,TR) return Node(TL,⟨k,black⟩,TR) The split algorithm is as follows: function split(T, k): if (T = nil) return (nil, false, nil) if (k = T.key) return (T.left, true, T.right) if (k < T.key): (L',b,R') = split(T.left, k) return (L',b,join(R',T.key,T.right)) (L',b,R') = split(T.right, k) return (join(T.left,T.key,L'),b,T.right) The union of two red–black trees and representing sets and, is a red–black tree that represents . The following recursive function computes this union: function union(t1, t2): if t1 = nil return t2 if t2 = nil return t1 (L1,b,R1)=split(t1,t2.key) proc1=start: TL=union(L1,t2.left) proc2=start: TR=union(R1,t2.right) wait all proc1,proc2 return join(TL, t2.key, TR) Here, split is presumed to return two trees: one holding the keys less its input key, one holding the greater keys. (The algorithm is non-destructive, but an in-place destructive version exists also.) The algorithm for intersection or difference is similar, but requires the Join2 helper routine that is the same as Join but without the middle key. Based on the new functions for union, intersection or difference, either one key or multiple keys can be inserted to or deleted from the red–black tree. Since Split calls Join but does not deal with the balancing criteria of red–black trees directly, such an implementation is usually called the "join-based" implementation. The complexity of each of union, intersection and difference is O\left(mlog\left({n\overm}+1\right)\right) m n(\gem) O(logmlogn) m=1 Parallel algorithmsParallel algorithms for constructing red–black trees from sorted lists of items can run in constant time or O(loglogn) n n\toinfty The join-based algorithms for red–black trees are parallel for bulk operations, including union, intersection, construction, filter, map-reduce, and so on. Parallel bulk operationsBasic operations like insertion, removal or update can be parallelised by defining operations that process bulks of multiple elements. It is also possible to process bulks with several basic operations, for example bulks may contain elements to insert and also elements to remove from the tree. The algorithms for bulk operations aren't just applicable to the red–black tree, but can be adapted to other sorted sequence data structures also, like the 2–3 tree, 2–3–4 tree and (a,b)-tree. In the following different algorithms for bulk insert will be explained, but the same algorithms can also be applied to removal and update. Bulk insert is an operation that inserts each element of a sequence I T Join-basedThis approach can be applied to every sorted sequence data structure that supports efficient join- and split-operations.[34] The general idea is to split I T
I
I k\inN+ \langleI1, … ,Ik\rangle
T k \langleT1, … ,Tk\rangle j\inN+ | \, 1 \leq j < k following constraints hold: last(Ij)<first(Tj) last(Tj)<first(Ij)
Ij Tj j k
Note that in Step 3 the constraints for splitting I | initial treeBulkInsert JoinBased SplitTree.svg | split I and TBulkInsert JoinBased SplitTreeInserted.svg | insert into the split TBulkInsert JoinBased JoinedTree.svg | join T The pseudo code shows a simple divide-and-conquer implementation of the join-based algorithm for bulk-insert.Both recursive calls can be executed in parallel.The join operation used here differs from the version explained in this article, instead join2 is used, which misses the second parameter k. bulkInsert(T, I, k): I.sort bulklInsertRec(T, I, k) bulkInsertRec(T, I, k): if k = 1: forall e in I: T.insert(e) else m := ⌊size(I) / 2⌋ (T1, _, T2) := split(T, I[m]) bulkInsertRec(T1, I[0 .. m], ⌈k / 2⌉) | bulkInsertRec(T2, I[m + 1 .. size(I) - 1], ⌊k / 2⌋) T ← join2(T1, T2)Execution timeSorting I
\inO\left(log | T | + \frac | T | \right).[35] Work
PipeliningAnother method of parallelizing bulk operations is to use a pipelining approach.[36] This can be done by breaking the task of processing a basic operation up into a sequence of subtasks.For multiple basic operations the subtasks can be processed in parallel by assigning each subtask to a separate processor.
I
I T \inI T I S sn, I n
mn, sn, T n' mn, mn, sn, mn, S sn', sn',
T \inS If two nodes have different nearest black ancestors, they can be repaired in parallel. Since at most four nodes can have the same nearest black ancestor, the nodes at the lowest level can be repaired in a constant number of parallel steps. This step will be applied successively to the black levels above until T
S \inI S \inO( | I | ) and in every stage the subsequences are being cut in half, the number of stages is \inO(log | I | ). Since all stages move up the black levels of the tree, they can be parallelised in a pipeline. Once a stage has finished processing one black level, the next stage is able to move up and continue at that level. | Initial treeBulkInsert Pipelining InsertPositions.svg | Find insert positionsBulkInsert Pipelining Stage1Insert.svg | Stage 1 inserts elementsBulkInsert Pipelining Stage1Repair.svg | Stage 1 begins to repair nodesBulkInsert Pipelining Stage2Insert.svg | Stage 2 inserts elementsBulkInsert Pipelining Stage2Repair.svg | Stage 2 begins to repair nodesBulkInsert Pipelining Stage3Insert.svg | Stage 3 inserts elementsBulkInsert Pipelining Stage3Repair1.svg | Stage 3 begins to repair nodesBulkInsert Pipelining Stage3Repair2.svg | Stage 3 continues to repair nodesExecution timeSorting I | I | is assumed to be smaller than | T | , otherwise it would be more efficient to construct the resulting tree from scratch.
Work
See also
Further reading
External links
|
---|
2 ⋅ k
n=2 ⋅ 2k-2
h<2log2(n+1),
n
O(1)
n
O(loglogn)
n/loglogn