Order-maintenance problem explained

In computer science, the order-maintenance problem involves maintaining a totally ordered set supporting the following operations:

Paul Dietz first introduced a data structure to solve this problem in1982.[1] This datastructure supports insert(X, Y) in

O(logn)

(in Big O notation)amortized time and order(X, Y) in constant time but doesnot support deletion. Athanasios Tsakalidis used BB[α] trees with the same performance bounds that supportsdeletion in

O(logn)

and improved insertion and deletion performance to

O(1)

amortized time with indirection.[2] Dietz and Daniel Sleator published an improvement to worst-case constant time in 1987.[3] Michael Bender, Richard Cole and Jack Zito published significantly simplified alternatives in 2002. Bender, Fineman, Gilbert, Kopelowitz and Montes also published a deamortized solution in 2017.

Efficient data structures for order-maintenance have applications inmany areas, including data structure persistence,[4] graph algorithms[5] [6] and fault-tolerant data structures.[7]

List labeling

See main article: List-labeling problem. A problem related to the order-maintenance problem is thelist-labeling problem in which instead of the order(X, Y) operation the solution must maintain an assignment of labelsfrom a universe of integers

\{1,2,\ldots,m\}

to theelements of the set such that X precedes Y in the total order if andonly if X is assigned a lesser label than Y. It must also support anoperation label(X) returning the label of any node X.Note that order(X, Y) can be implemented simply bycomparing label(X) and label(Y) so that anysolution to the list-labeling problem immediately gives one to theorder-maintenance problem. In fact, most solutions to theorder-maintenance problem are solutions to the list-labeling problemaugmented with a level of data structure indirection to improveperformance. We will see an example of this below.

For a list-labeling problem on sets of size up to

n

, the cost of list labeling depends on how large

m

is a function of

n

. The relevant parameter range for order maintenance are for

m=n1+\Theta(1)

, for which an

O(logn)

amortized cost solution is known, and

2\Omega(n)

for which a constant time amortized solution is known[8]

O(1) amortized insertion via indirection

Indirection is a technique used in data structures in which a problemis split into multiple levels of a data structure in order to improveefficiency. Typically, a problem of size

n

is split into

n/logn

problems of size

logn

. Forexample, this technique is used in y-fast tries. Thisstrategy also works to improve the insertion and deletion performanceof the data structure described above to constant amortized time. Infact, this strategy works for any solution of the list-labelingproblem with

O(logn)

amortized insertion and deletiontime.

The new data structure is completely rebuilt whenever it grows toolarge or too small. Let

N

be the number of elements ofthe total order when it was last rebuilt. The data structure isrebuilt whenever the invariant

\tfrac{N}{3}\len\le2N

isviolated by an insertion or deletion. Since rebuilding can be done inlinear time this does not affect the amortized performance ofinsertions and deletions.

During the rebuilding operation, the

N

elements of thetotal order are split into

O(N/logN)

contiguoussublists, each of size

\Omega(logN)

. The list labeling problem is solved on the set set of nodes representing each ofthe sublists in their original list order. The labels for this subproblem are taken to be polynomial --- say

m=N2

, so that they can be compared in constant time and updated in amortized

O(logN)

time.

For each sublist adoubly-linked list of its elements is built storing with each element apointer to its representative in the tree as well as a local integerlabel. The local integer labels are also taken from a range

m=N2

, so that the can be compared in constant time, but because each local problem involves only

\Theta(logN)

items, the labels range

m

is exponential in the number of items being labeled. Thus, they can be updated in

O(1)

amortized time.

See the list-labeling problem for details of both solutions.

Order

Given the sublist nodes X and Y, order(X, Y) can beanswered by first checking if the two nodes are in the samesublist. If so, their order can be determined by comparing their locallabels. Otherwise the labels of their representatives in the first list-labeling problem are compared. These comparisons take constant time.

Insert

Given a new sublist node for X and a pointer to the sublist node Y,insert(X, Y) inserts X immediately after Y in the sublistof Y, if there is room for X in the list, that is if the length of the list is no greater than

2logN

after the insertion. It's local label is given by the local list labeling algorithm for exponential labels. This case takes

O(1)

amortized time.

If the local list overflows, it is split evenly into two lists of size

logN

, and the items in each list are given new labels from their (independent) ranges. This creates a new sublist, which is inserted into the list of sublists, and the new sublist node is given a label in the list of sublists by the list-labeling algorithm. Finally X is inserted into the appropriate list.

This sequence of operations take

O(logN)

time, but there have been

\Omega(logN)

insertions since the list was created or last split. Thus the amortized time per insertion is

O(1)

.

Delete

Given a sublist node X to be deleted, delete(X) simplyremoves X from its sublist in constant time. If this leaves thesublist empty, then we need to remove the representative of thelist of sublists. Since at least

\Omega(logN)

elements were deleted from the sublist since it was first built we can afford to spend the

O(logN)

time, the amortized cost of a deletion is

O(1)

.

External links

Notes and References

  1. .
  2. .
  3. . Full version,Tech. Rep. CMU-CS-88-113, Carnegie MellonUniversity, 1988.
  4. .
  5. .
  6. .
  7. .
  8. .