Count sketch explained
Count sketch is a type of dimensionality reduction that is particularly efficient in statistics, machine learning and algorithms.[1] [2] It was invented by Moses Charikar, Kevin Chen and Martin Farach-Colton in an effort to speed up the AMS Sketch by Alon, Matias and Szegedy for approximating the frequency moments of streams[3] (these calculations require counting of the number of occurrences for the distinct elements of the stream).
The sketch is nearly identical to the Feature hashing algorithm by John Moody,[4] but differs in its use of hash functions with low dependence, which makes it more practical.In order to still have a high probability of success, the median trick is used to aggregate multiple count sketches, rather than the mean.
These properties allow use for explicit kernel methods, bilinear pooling in neural networks and is a cornerstone in many numerical linear algebra algorithms.[5]
Intuitive explanation
The inventors of this data structure offer the following iterative explanation of its operation:
- at the simplest level, the output of a single hash function mapping stream elements into is feeding a single up/down counter . After a single pass over the data, the frequency
of a stream element can be approximated, although extremely poorly, by the
expected value
;
- a straightforward way to improve the variance of the previous estimate is to use an array of different hash functions
, each connected to its own counter
. For each element, the
still holds, so averaging across the range will tighten the approximation;
- the previous construct still has a major deficiency: if a lower-frequency-but-still-important output element exhibits a hash collision with a high-frequency element,
estimate can be significantly affected. Avoiding this requires reducing the frequency of collision counter updates between any two distinct elements. This is achieved by replacing each
in the previous construct with an array of counters (making the counter set into a two-dimensional matrix
), with index of a particular counter to be incremented/decremented selected via another set of hash functions
that map element into the range . Since
, averaging across all values of will work.
Mathematical definition
1. For constants
and
(to be defined later) independently choose
random hash functions
and
such that
and
.It is necessary that the hash families from which
and
are chosen be pairwise independent.
2. For each item
in the stream, add
to the
th bucket of the
th hash.
At the end of this process, one has
sums
where
To estimate the count of
s one computes the following value:
The values
are unbiased estimates of how many times
has appeared in the stream.
The estimate
has variance
, where
is the length of the stream and
is
.
[6] Furthermore,
is guaranteed to never be more than
off from the true value, with probability
.
Vector formulation
Alternatively Count-Sketch can be seen as a linear mapping with a non-linear reconstruction function.Let
, be a collection of
matrices, defined by
for
and 0 everywhere else.
Then a vector
is sketched by
.To reconstruct
we take
.This gives the same guarantees as stated above, if we take
and
.
Relation to Tensor sketch
The count sketch projection of the outer product of two vectors is equivalent to the convolution of two component count sketches.
, where
and
are independent count sketch matrices.
Pham and Pagh[7] show that this equals
– a count sketch
of the
outer product of vectors, where
denotes
Kronecker product.
The fast Fourier transform can be used to do fast convolution of count sketches.By using the face-splitting product[8] [9] [10] such structures can be computed much faster than normal matrices.
See also
Further reading
Notes and References
- Faisal M. Algashaam; Kien Nguyen; Mohamed Alkanhal; Vinod Chandran; Wageeh Boles. "Multispectral Periocular Classification WithMultimodal Compact Multi-Linear Pooling" [1]. IEEE Access, Vol. 5. 2017.
- Web site: Ahle . Thomas . Knudsen . Jakob . 2019-09-03 . Almost Optimal Tensor Sketch . 2020-07-11 . ResearchGate.
- Alon, Noga, Yossi Matias, and Mario Szegedy. "The space complexity of approximating the frequency moments." Journal of Computer and system sciences 58.1 (1999): 137-147.
- Moody, John. "Fast learning in multi-resolution hierarchies." Advances in neural information processing systems. 1989.
- Woodruff, David P. "Sketching as a Tool for Numerical Linear Algebra." Theoretical Computer Science 10.1-2 (2014): 1–157.
- Larsen, Kasper Green, Rasmus Pagh, and Jakub Tětek. "CountSketches, Feature Hashing and the Median of Three." International Conference on Machine Learning. PMLR, 2021.
- Fast and scalable polynomial kernels via explicit feature maps. Ninh. Pham. Rasmus. Pagh . Rasmus Pagh. 2013. Association for Computing Machinery. SIGKDD international conference on Knowledge discovery and data mining. 10.1145/2487575.2487591.
- Slyusar. V. I. . End products in matrices in radar applications . Radioelectronics and Communications Systems . 1998 . 41 . 3. 50–53.
- Slyusar. V. I.. 1997-05-20. Analytical model of the digital antenna array on a basis of face-splitting matrix products. . Proc. ICATT-97, Kyiv. 108–109.
- Slyusar. V. I.. March 13, 1998. A Family of Face Products of Matrices and its Properties. Cybernetics and Systems Analysis C/C of Kibernetika I Sistemnyi Analiz.- 1999.. 35. 3. 379–384. 10.1007/BF02733426. 119661450 .