In computer science, weight-balanced binary trees (WBTs) are a type of self-balancing binary search trees that can be used to implement dynamic sets, dictionaries (maps) and sequences.[1] These trees were introduced by Nievergelt and Reingold in the 1970s as trees of bounded balance, or BB[α] trees.[2] Their more common name is due to Knuth.[3]
A well known example is a Huffman coding of a corpus.
Like other self-balancing trees, WBTs store bookkeeping information pertaining to balance in their nodes and perform rotations to restore balance when it is disturbed by insertion or deletion operations. Specifically, each node stores the size of the subtree rooted at the node, and the sizes of left and right subtrees are kept within some factor of each other. Unlike the balance information in AVL trees (using information about the height of subtrees) and red–black trees (which store a fictional "color" bit), the bookkeeping information in a WBT is an actually useful property for applications: the number of elements in a tree is equal to the size of its root, and the size information is exactly the information needed to implement the operations of an order statistic tree, viz., getting the 'th largest element in a set or determining an element's index in sorted order.[4]
Weight-balanced trees are popular in the functional programming community and are used to implement sets and maps in MIT Scheme, SLIB, SML-NJ, and implementations of Haskell.[5] [3]
A weight-balanced tree is a binary search tree that stores the sizes of subtrees in the nodes. That is, a node has fields
By definition, the size of a leaf (typically represented by a pointer) is zero. The size of an internal node is the sum of sizes of its two children, plus one: . Based on the size, one defines the weight to be . Weight has the advantage that the weight of a node is simply the sum of the weights of its left and right children.
Operations that modify the tree must make sure that the weight of the left and right subtrees of every node remain within some factor of each other, using the same rebalancing operations used in AVL trees: rotations and double rotations. Formally, node balance is defined as follows:
A node is -weight-balanced if and .[6]
Here, is a numerical parameter to be determined when implementing weight balanced trees. Larger values of produce "more balanced" trees, but not all values of are appropriate; Nievergelt and Reingold proved that
\alpha<1-
\sqrt{2 | |
is a necessary condition for the balancing algorithm to work. Later work showed a lower bound of for, although it can be made arbitrarily small if a custom (and more complicated) rebalancing algorithm is used.
Applying balancing correctly guarantees a tree of elements will have height
h\le
log | ||||
|
n=
log2n | |||||||||
|
=O(logn)
If is given its maximum allowed value, the worst-case height of a weight-balanced tree is the same as that of a red–black tree at
2log2n
The number of balancing operations required in a sequence of insertions and deletions is linear in, i.e., balancing takes a constant amount of overhead in an amortized sense.[7]
While maintaining a tree with the minimum search cost requires four kinds of double rotations (LL, LR, RL, RR as in an AVL tree) in insert/delete operations, if we desire only logarithmic performance, LR and RL are the only rotations required in a single top-down pass.[8]
Several set operations have been defined on weight-balanced trees: union, intersection and set difference. Then fast bulk operations on insertions or deletions can be implemented based on these set functions. These set operations rely on two helper operations, Split and Join. With the new operations, the implementation of weight-balanced trees can be more efficient and highly-parallelizable.[9] [10]
\alpha<1-
1 | |
\sqrt{2 |
O(logn)
The join algorithm is as follows:
function joinRightWB(TL, k, TR) (l, k', c) = expose(TL) if balance(|TL|, |TR|) return Node(TL, k, TR) else T' = joinRightWB(c, k, TR) (l', k', r') = expose(T') if (balance(|l|,|T'|)) return Node(l, k', T') else if (balance(|l|,|l'|) and balance(|l|+|l'|,|r'|)) return rotateLeft(Node(l, k', T')) else return rotateLeft(Node(l, k', rotateRight(T')) function joinLeftWB(TL, k, TR) /* symmetric to joinRightWB */ function join(TL, k, TR) if (heavy(TL, TR)) return joinRightWB(TL, k, TR) if (heavy(TR, TL)) return joinLeftWB(TL, k, TR) Node(TL, k, TR)
Here balance
(x,y)
x
y
v
l
k
r
l
k
r
The split algorithm is as follows:
function split(T, k) if (T = nil) return (nil, false, nil) (L, (m, c), R) = expose(T) if (k = m) return (L, true, R) if (k < m) (L', b, R') = split(L, k) return (L', b, join(R', m, R)) if (k > m) (L', b, R') = split(R, k) return (join(L, m, L'), b, R))
The union of two weight-balanced trees and representing sets and, is a weight-balanced tree that represents . The following recursive function computes this union:
function union(t1, t2): if t1 = nil: return t2 if t2 = nil: return t1 t<, t> ← split t2 on t1.root return join(union(left(t1), t<), t1.root, union(right(t1), t>))
Here, Split is presumed to return two trees: one holding the keys less than its input key, the other holding the greater keys. (The algorithm is non-destructive, but an in-place destructive version exists as well.)
The algorithm for intersection or difference is similar, but requires the Join2 helper routine that is the same as Join but without the middle key. Based on the new functions for union, intersection or difference, either one key or multiple keys can be inserted to or deleted from the weight-balanced tree. Since Split and Union call Join but do not deal with the balancing criteria of weight-balanced trees directly, such an implementation is usually called the join-based algorithms.
The complexity of each of union, intersection and difference is
O\left(mlog\left({n\overm}+1\right)\right)
m
n(\gem)
O(logmlogn)
m=1