Funnelsort Explained

Funnelsort is a comparison-based sorting algorithm. It is similar to mergesort, but it is a cache-oblivious algorithm, designed for a setting where the number of elements to sort is too large to fit in a cache where operations are done. It was introduced by Matteo Frigo, Charles Leiserson, Harald Prokop, and Sridhar Ramachandran in 1999 in the context of the cache oblivious model.^[1] ^[2]

Mathematical properties

In the external memory model, the number of memory transfers it needs to perform a sort of

items on a machine with cache of size

and cache lines of length

O\left(\tfrac{N}{L}log_ZN\right)

, under the tall cache assumption that

Z=\Omega(L²⁾

. This number of memory transfers has been shown to be asymptotically optimal for comparison sorts. Funnelsort also achieves the asymptotically optimal runtime complexity of

\Theta(NlogN)

Algorithm

Basic overview

Funnelsort operates on a contiguous array of

elements. To sort the elements, it performs the following:

Split the input into

N^1/3

arrays of size

N^2/3

, and sort the arrays recursively.

Merge the

N^1/3

sorted sequences using a

N^1/3

-merger. (This process will be described in more detail.)

Funnelsort is similar to merge sort in that some number of subarrays are recursively sorted, after which a merging step combines the subarrays into one sorted array. Merging is performed by a device called a k-merger, which is described in the section below.

k-mergers

A k-merger takes

sorted sequences. Upon one invocation of a k-merger, it outputs the first

k³

elements of the sorted sequence obtained by merging the k input sequences.

At the top level, funnelsort uses a

N^1/3

-merger on

N^1/3

sequences of length

N^2/3

, and invokes this merger once.

The k-merger is built recursively out of

\sqrt{k}

-mergers. It consists of

\sqrt{k}

input

\sqrt{k}

-mergers

I_1,I_2,\ldots,I_\sqrt{k}

, and a single output

\sqrt{k}

-merger

.The k inputs are separated into

\sqrt{k}

sets of

\sqrt{k}

inputs each. Each of these sets is an input to one of the input mergers. The output of each input merger is connected to a buffer, a FIFO queue that can hold

2k^3/2

elements. The buffers are implemented as circular queues.The outputs of the

\sqrt{k}

buffers are connected to the inputs of the output merger

. Finally, the output of

is the output of the entire k-merger.

In this construction, any input merger only outputs

k^3/2

items at once, but the buffer it outputs to has double the space. This is done so that an input merger can be called only when its buffer does not have enough items, but that when it is called, it outputs a lot of items at once (namely,

k^3/2

of them).

A k-merger works recursively in the following way. To output

k³

elements, it recursively invokes its output merger

k^3/2

times. However, before it makes a call to

, it checks all of its buffers, filling each of them that are less than half full. To fill the i-th buffer, it recursively invokes the corresponding input merger

I_i

once. If this cannot be done (due to the merger running out of inputs), this step is skipped. Since this call outputs

k^3/2

elements, the buffer contains at least

k^3/2

elements. At the end of all these operations, the k-merger has output the first

k³

of its input elements, in sorted order.

Analysis

Most of the analysis of this algorithm revolves around analyzing the space and cache miss complexity of the k-merger.

The first important bound is that a k-merger can be fit in

O(k²⁾

space. To see this, we let

S(k)

denote the space needed for a k-merger. To fit the

k^1/2

buffers of size

2k^3/2

takes

O(k²⁾

space. To fit the

\sqrt{k}+1

smaller buffers takes

(\sqrt{k}+1)S(\sqrt{k})

space. Thus, the space satisfies the recurrence

S(k)=(\sqrt{k}+1)S(\sqrt{k})+O(k²⁾

. This recurrence has solution

S(k)=O(k²⁾

It follows that there is a positive constant

\alpha

such that a problem of size at most

\alpha\sqrt{Z}

fits entirely in cache, meaning that it incurs no additional cache misses.

Letting

Q_M(k)

denote the number of cache misses incurred by a call to a k-merger, one can show that

Q_M(k)=O((k³log_Zk)/L).

This is done by an induction argument. It has

k\le\alpha\sqrt{Z}

as a base case. For larger k, we can bound the number of times a

\sqrt{k}

-merger is called. The output merger is called exactly

k^3/2

times. The total number of calls on input mergers is at most

k^3/2+2\sqrt{k}

. This gives a total bound of

2k^3/2+2\sqrt{k}

recursive calls. In addition, the algorithm checks every buffer to see if needs to be filled. This is done on

\sqrt{k}

buffers every step for

k^3/2

steps, leading to a max of

k²

cache misses for all the checks.

This leads to the recurrence

Q_M(k)\le(2k^3/2+2\sqrt{k})Q_M(\sqrt{k})+k²

, which can be shown to have the solution given above.

Finally, the total cache misses

Q(N)

for the entire sort can be analyzed. It satisfies the recurrence

Q(N)=N^1/3Q(N^2/3)+

	1/3
Q
	M(N

This can be shown to have solution

Q(N)=O((N/L)log_ZN).

Lazy funnelsort

Lazy funnelsort is a modification of the funnelsort, introduced by Gerth Stølting Brodal and Rolf Fagerberg in 2002.^[3] The modification is that when a merger is invoked, it does not have to fill each of its buffers. Instead, it lazily fills a buffer only when it is empty. This modification has the same asymptotic runtime and memory transfers as the original funnelsort, but has applications in cache-oblivious algorithms for problems in computational geometry in a method known as distribution sweeping.

Notes and References

M. Frigo, C.E. Leiserson, H. Prokop, and S. Ramachandran. Cache-oblivious algorithms. In Proceedings of the 40th IEEE Symposium on Foundations of Computer Science (FOCS 99), pp. 285-297. 1999. Extended abstract at IEEE, at Citeseer.
Harald Prokop. Cache-Oblivious Algorithms. Masters thesis, MIT. 1999.
Book: Gerth Stølting. Automata, Languages and Programming,Proceedings of the International Conference on Automata, Languages and Programming -->. Brodal. Gerth Stølting Brodal. Rolf. Fagerberg. Cache Oblivious Distribution Sweeping. Lecture Notes in Computer Science. Springer. 2380. 426–438. 10.1007/3-540-45465-9_37. 25 June 2002. 978-3-540-43864-9. 10.1.1.117.6837. . See also the longer technical report.

Funnelsort Explained

Mathematical properties

Algorithm

Basic overview

k-mergers

Analysis

Lazy funnelsort

See also

Notes and References