Splaysort Explained

In computer science, splaysort is an adaptive comparison sorting algorithm based on the splay tree data structure.

Algorithm

The steps of the algorithm are:

  1. Initialize an empty splay tree
  2. For each data item in the input order, insert it into the splay tree
  3. Traverse the splay tree in inorder to find the sorted order of the data

Thus, the algorithm may be seen as a form of insertion sort or tree sort, using a splay tree to speed up each insertion.

Analysis

Based on the amortized analysis of splay trees, the worst case running time of splaysort, on an input with n data items, is O(n log n), matching the time bounds for efficient non-adaptive algorithms such as quicksort, heap sort, and merge sort.

For an input sequence in which most items are placed close to their predecessor in the sorted order, or are out of order with only a small number of other items, splaysort can be faster than O(n log n), showing that it is an adaptive sort. To quantify this, let dx be the number of positions in the input that separate x from its predecessor, and let ix be the number of items that appear on one side of x in the input and on the other side of x in the output (the number of inversions that involve x). Then it follows from the dynamic finger theorem for splay trees that the total time for splaysort is bounded by

\sumxlogdx

and by

\sumxlogix

.[1]

Splaysort can also be shown to be adaptive to the entropy of the input sequence.[2]

Experimental results

In experiments by, splaysort was slower than quicksort on tables of random numbers by a factor of 1.5 to 2, and slower than mergesort by smaller factors. For data consisting of larger records, again in a random order, the additional amount of data movement performed by quicksort significantly slowed it down compared to pointer-based algorithms, and the times for splaysort and mergesort were very close to each other. However, for nearly presorted input sequences (measured in terms of the number of contiguous monotone subsequences in the data, the number of inversions, the number of items that must be removed to make a sorted subsequence, or the number of non-contiguous monotone subsequences into which the input can be partitioned) splaysort became significantly more efficient than the other algorithms.

compared splaysort to several other algorithms that are adaptive to the total number of inversions in the input, as well as to quicksort. They found that, on the inputs that had few enough inversions to make an adaptive algorithm faster than quicksort, splaysort was the fastest algorithm.[3]

Variations

modify splaysort to be more strongly adaptive to the number of contiguous monotone subsequences in the input, and report on experiments showing that the resulting algorithm is faster on inputs that are nearly presorted according to this measure.[4]

Notes and References

  1. .
  2. .
  3. .
  4. .