Cuckoo filter explained

A cuckoo filter is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set, like a Bloom filter does. False positive matches are possible, but false negatives are not – in other words, a query returns either "possibly in set" or "definitely not in set". A cuckoo filter can also delete existing items, which isnot supported by Bloom filters.In addition, for applications that store many items andtarget moderately low false positive rates, cuckoo filters can achievelower space overhead than space-optimized Bloom filters.[1]

Cuckoo filters were first described in 2014.[2]

Algorithm description

A cuckoo filter uses a hash table based on cuckoo hashing to store the fingerprints of items. The data structure is broken into buckets of some size

b

. To insert the fingerprint of an item

x

, one first computes two potential buckets

h1(x)

and

h2(x)

where

x

could go. These buckets are calculated using the formula

h1(x)=hash(x)

h2(x)=h1(x)hash(fingerprint(x))

Note that, due to the symmetry of the XOR operation, one can compute

h2(x)

from

h1(x)

, and

h1(x)

from

h2(x)

. As defined above,

h2(x)=h1(x)hash(fingerprint(x))

; it follows that

h1(x)=h2(x)hash(fingerprint(x))

. These properties are what make it possible to store the fingerprints with cuckoo hashing.

The fingerprint of

x

is placed into one of buckets

h1(x)

and

h2(x)

. If the buckets are full, then one of the fingerprints in the bucket is evicted using cuckoo hashing, and placed into the other bucket where it can go. If that bucket, in turn, is also full, then that may trigger another eviction, etc.

The hash table can achieve both high utilization (thanks to cuckoo hashing), and compactness because only fingerprints are stored. Lookup and delete operations of a cuckoo filter are straightforward.

There are a maximum of two buckets to check by

h1(x)

and

h2(x)

. If found, the appropriate lookup or delete operation can be performed in

O(b)

time. Often, in practice,

b

is a constant.

In order for the hash table to offer theoretical guarantees, the fingerprint size

f

must be at least

\Omega((logn)/b)

bits.[3] [4] Subject to this constraint, cuckoo filters guarantee a false-positive rate of at most

\epsilon\leb/2f

.

Comparison to Bloom filters

A cuckoo filter is similar to a Bloom filter in that they both are fast and compact, and they may both return false positives as answers to set-membership queries:

1.44log2(1/\epsilon)

bits of space per inserted key, where

\epsilon

is the false positive rate. A cuckoo filter requires

(log2(1/\epsilon)+1+log2b)/\alpha

space per key where

\alpha

is the hash table load factor, which can be

95.5\%

based on the cuckoo filter's setting. Note that the information theoretical lower bound requires

log2(1/\epsilon)

bits for each item. Both bloom filters and cuckoo filters with low load can be compressed when not in use.

log2(1/\epsilon)

memory accesses into the bit array, whereas a cuckoo filter requires at most

2b

memory accesses, which can be a constant in practice.

Limitations

O(1)

.[5]

f

of at least

\Omega((logn)/b)

bits. This means that the space per key must be at least

\Omega((logn)/b)

bits, even if

\epsilon

is large. In practice,

b

is chosen to be large enough that this is not a major issue.

External links

Notes and References

  1. Web site: Bloom Filters, Cuckoo Hashing, Cuckoo Filters, Adaptive Cuckoo Filters, and Learned Bloom Filters. Michael D. Mitzenmacher . Michael Mitzenmacher.
  2. Fan . Bin . Andersen . Dave G. . Kaminsky . Michael . Mitzenmacher . Michael D. . Michael Mitzenmacher . Cuckoo filter: Practically better than Bloom . 10.1145/2674005.2674994 . 75–88 . Proc. 10th ACM International on Conference on Emerging Networking Experiments and Technologies (CoNEXT '14) . Sydney, Australia . 2014. 9781450332798 . free .
  3. David . Eppstein . David Eppstein. Cuckoo filter: Simplification and analysis. Proc. 15th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2016). 22 June 2016. Reykjavik, Iceland. 8:1–8:12. Leibniz International Proceedings in Informatics (LIPIcs). 53. 10.4230/LIPIcs.SWAT.2016.8. free . 1604.06067.
  4. Noah . Fleming. Cuckoo Hashing and Cuckoo Filters. 17 May 2018. University of Toronto.
  5. Pagh . Rasmus . Rasmus Pagh . Rodler . Flemming Friche. Cuckoo hashing. 10.1007/3-540-44676-1_10 . Proc. 9th Annual European Symposium on Algorithms (ESA 2001) . Lecture Notes in Computer Science . 2161 . 121–133. 2001 . 978-3-540-42493-2 . Århus, Denmark.