Concentration inequality explained

In probability theory, concentration inequalities provide mathematical bounds on the probability of a random variable deviating from some value (typically, its expected value).

The law of large numbers of classical probability theory states that sums of independent random variables, under mild conditions, concentrate around their expectation with a high probability. Such sums are the most basic examples of random variables concentrated around their mean.

Concentration inequalities can be sorted according to how much information about the random variable is needed in order to use them.

Markov's inequality

See main article: Markov's inequality. Let

be a random variable that is non-negative (almost surely). Then, for every constant

a>0

\Pr(X\geqa)\leq

	\operatorname{E
	(X)}{a}.

Note the following extension to Markov's inequality: if

\Phi

is a strictly increasing and non-negative function, then

\Pr(X\geqa)=\Pr(\Phi(X)\geq\Phi(a))\leq

	\operatorname{E
	(\Phi(X))}{\Phi

(a)}.

Chebyshev's inequality

See main article: Chebyshev's inequality. Chebyshev's inequality requires the following information on a random variable

The expected value

\operatorname{E}[X]

is finite.

\operatorname{Var}[X]=\operatorname{E}[(X-\operatorname{E}[X])^2]

is finite.

Then, for every constant

a>0

\Pr(|X-\operatorname{E}[X]|\geqa)\leq

	\operatorname{Var
	[X]}{a

^2},

or equivalently,

\Pr(|X-\operatorname{E}[X]|\geqa ⋅ \operatorname{Std}[X])\leq

	1
	a²

where

\operatorname{Std}[X]

is the standard deviation of

Chebyshev's inequality can be seen as a special case of the generalized Markov's inequality applied to the random variable

|X-\operatorname{E}[X]|

with

\Phi(x)=x²

Vysochanskij–Petunin inequality

See main article: Vysochanskij–Petunin inequality. Let X be a random variable with unimodal distribution, mean μ and finite, non-zero variance σ². Then, for any $\lambda > \sqrt = 1.63299\ldots,$

Pr(\left|X-\mu\right|\geqλ\sigma)\leq

	4
	9λ²

(For a relatively elementary proof see e.g.^[1]).

One-sided Vysochanskij–Petunin inequality

For a unimodal random variable

and

r\geq0

, the one-sided Vysochanskij-Petunin inequality^[2] holds as follows:

Pr(X-E[X]\geqr)\leq \begin{cases} \dfrac{4}{9}\dfrac{\operatorname{Var}(X)}{r²+\operatorname{Var}(X)}&forr²\geq\dfrac{5}{3}\operatorname{Var}(X),\\[5pt] \dfrac{4}{3}\dfrac{\operatorname{Var}(X)}{r²+\operatorname{Var}(X)}-\dfrac{1}{3}&otherwise. \end{cases}

Paley–Zygmund inequality

See main article: Paley–Zygmund inequality. In contrast to most commonly used concentration inequalities, the Paley-Zygmund inequality provides a lower bound on the deviation probability.

Cantelli's inequality

See main article: Cantelli's inequality.

Gauss's inequality

See main article: Gauss's inequality.

Chernoff bounds

See main article: Chernoff bound. The generic Chernoff bound^[3] requires the moment generating function of

, defined as

	tX
M
	X(t):=\operatorname{E}\left[e

\right].

It always exists, but may be infinite. From Markov's inequality, for every

t>0

\Pr(X\geqa)\leq

	\operatorname{E
	[e

^tX]}{e^ta

and for every

t<0

\Pr(X\leqa)\leq

	\operatorname{E
	[e

^tX]}{e^ta

There are various Chernoff bounds for different distributions and different values of the parameter

. See ^[4] for a compilation of more concentration inequalities.

Mill's inequality

See main article: article and Mill's Inequality. Let

Z\simN(0,1)

. Then

\operatorname (|Z| > t) \le \sqrt \frac

Bounds on sums of independent bounded variables

See main article: Hoeffding's inequality, Azuma's inequality, McDiarmid's inequality, Bennett's inequality and Bernstein inequalities (probability theory). Let

X_1,X_2,...,X_n

be independent random variables such that, for all i:

a_i\leqX_i\leqb_i

almost surely.

c_i:=b_i-a_i

\foralli:c_i\leqC

Let

S_n

be their sum,

E_n

its expected value and

V_n

its variance:

S_n:=

	n
\sum
	i=1

X_i

E_n:=\operatorname{E}[S_n]=

	n
\sum
	i=1

\operatorname{E}[X_i]

V_n:=\operatorname{Var}[S_n]=

	n
\sum
	i=1

\operatorname{Var}[X_i]

It is often interesting to bound the difference between the sum and its expected value. Several inequalities can be used.

1. Hoeffding's inequality says that:

\Pr\left[|S_n-E_n|>t\right]\le2\exp\left(-

2t²

\sum

	2
c
	i

i=1

\right)\le2\exp\left(-

	2t²
	nC²

\right)

2. The random variable

S_n-E_n

is a special case of a martingale, and

S_0-E₀₌₀

. Hence, the general form of Azuma's inequality can also be used and it yields a similar bound:

\Pr\left[|S_n-E_n|>t\right]<2\exp\left(-

2t²

\sum

	2
c
	i

i=1

\right)<2\exp\left(-

	2t²
	nC²

\right)

This is a generalization of Hoeffding's since it can handle other types of martingales, as well as supermartingales and submartingales. See Fan et al. (2015).^[5] Note that if the simpler form of Azuma's inequality is used, the exponent in the bound is worse by a factor of 4.

3. The sum function,

S_n=f(X_1,...,X_n)

, is a special case of a function of n variables. This function changes in a bounded way: if variable i is changed, the value of f changes by at most

b_i-a_i<C

. Hence, McDiarmid's inequality can also be used and it yields a similar bound:

\Pr\left[|S_n-E_n|>t\right]<2\exp\left(-

2t²

\sum

	2
c
	i

i=1

\right)<2\exp\left(-

	2t²
	nC²

\right)

This is a different generalization of Hoeffding's since it can handle other functions besides the sum function, as long as they change in a bounded way.

4. Bennett's inequality offers some improvement over Hoeffding's when the variances of the summands are small compared to their almost-sure bounds C. It says that:

\Pr\left[|S_n-E_n|>t\right]\leq 2\exp\left[-

	V_n	h\left(
	C²

	Ct
	V_n

\right)\right],

where

h(u)=(1+u)log(1+u)-u

5. The first of Bernstein's inequalities says that:

\Pr\left[|S_n-E_n|>t\right]<2\exp\left(-

	t^2/2
	V_n+C ⋅ t/3

\right)

This is a generalization of Hoeffding's since it can handle random variables with not only almost-sure bound but both almost-sure bound and variance bound.

6. Chernoff bounds have a particularly simple form in the case of sum of independent variables, since

	t ⋅ S_n
\operatorname{E}[e

	n
\prod
	i=1

	t ⋅ X_i
{\operatorname{E}[e

]}

For example,^[6] suppose the variables

X_i

satisfy

X_i\geqE(X_i)-a_i-M

, for

1\leqi\leqn

. Then we have lower tail inequality:

\Pr[S_n-E_n<-λ]\leq\exp\left(-

λ²

2(V

	n

	i=1

	2+Mλ/3)
a
	i

n+\sum

\right)

X_i

satisfies

X_i\leqE(X_i)+a_i+M

, we have upper tail inequality:

\Pr[S_n-E_n>λ]\leq\exp\left(-

λ²

2(V

	n
\sum
	i=1

	2+Mλ/3)
a
	i

\right)

X_i

are i.i.d.,

|X_i|\leq1

and

\sigma²

is the variance of

X_i

, a typical version of Chernoff inequality is:

\Pr[|S_n|\geqk\sigma]\leq

	-k^2/4n
2e

for0\leqk\leq2\sigma.

7. Similar bounds can be found in: Rademacher distribution#Bounds on sums

Efron–Stein inequality

The Efron–Stein inequality (or influence inequality, or MG bound on variance) bounds the variance of a general function.

Suppose that

X₁...X_n

X_1'...X_n'

are independent with

X_i'

and

X_i

having the same distribution for all

Let

X=(X_1,...,X_n),X⁽ⁱ⁾=(X_1,...,X_i-1,X_i',X_i+1,...,X_n).

Then

Var(f(X))\leq

	1
	2

	n
\sum
	i=1

E[(f(X)-f(X⁽ⁱ⁾))^2].

A proof may be found in e.g.,.^[7]

Bretagnolle–Huber–Carol inequality

Bretagnolle–Huber–Carol Inequality bounds the difference between a vector of multinomially distributed random variables and a vector of expected values.^[8] ^[9] A simple proof appears in ^[10] (Appendix Section).

If a random vector

(Z_1,Z_2,Z_3,\ldots,Z_n)

is multinomially distributed with parameters

(p_1,p_2,\ldots,p_n)

and satisfies

Z₁+Z₂+...+Z_n=M,

then

\Pr\left(

	n
\sum
	i=1

|Z_i-Mp_i|\geq2M\varepsilon\right) \leq2ⁿ

	-2M\varepsilon²
e

This inequality is used to bound the total variation distance.

Mason and van Zwet inequality

The Mason and van Zwet inequality^[11] for multinomial random vectors concerns a slight modification of the classical chi-square statistic.

Let the random vector

(N_1,\ldots,N_k)

be multinomially distributed with parameters

and

(p_1,\ldots,p_k)

such that

p_i>0

for

i<k.

Then for every

C>0

and

\delta>0

there exist constants

a,b,c>0,

such that for all

n\geq1

and

λ,p_1,\ldots,p_k-1

satisfying

λ>Cnmin\{p_i|1\leqi\leqk-1\}

and

	k-1
\sum
	i=1

p_i\leq1-\delta,

we have

\Pr\left(

	k-1
\sum
	i=1

	2

	i)

i-np

np_i

>λ\right) \leqae^bk-cλ.

Dvoretzky–Kiefer–Wolfowitz inequality

See main article: Dvoretzky–Kiefer–Wolfowitz inequality.

The Dvoretzky–Kiefer–Wolfowitz inequality bounds the difference between the real and the empirical cumulative distribution function.

Given a natural number

, let

X_1,X_2,...,X_n

be real-valued independent and identically distributed random variables with cumulative distribution function F(·). Let

F_n

denote the associated empirical distribution function defined by

F_n(x)=

	1n
	\sum

	n

	i=1

1
	\{X_i\leqx\

},\qquad x\in\mathbb.So

F(x)

is the probability that a single random variable

is smaller than

, and

F_n(x)

is the average number of random variables that are smaller than

Then

\Pr\left(\sup_x\inRl(F_n(x)-F(x)r)>\varepsilon\right)\le

	-2n\varepsilon²
e

forevery\varepsilon\geq\sqrt{\tfrac1{2n}ln2}.

Anti-concentration inequalities

Anti-concentration inequalities, on the other hand, provide an upper bound on how much a random variable can concentrate, either on a specific value or range of values. A concrete example is that if you flip a fair coin

times, the probability that any given number of heads appears will be less than

	1
	\sqrt{n

}. This idea can be greatly generalized. For example, a result of Rao and Yehudayoff^[12] implies that for any

\beta,\delta>0

there exists some

C>0

such that, for any

, the following is true for at least

2ⁿ⁽¹

values of

x\in\{\pm1\}ⁿ

\Pr\left(\langlex,Y\rangle=k\right)\le

	C
	\sqrt{n

},where

is drawn uniformly from

\{\pm1\}ⁿ

Such inequalities are of importance in several fields, including communication complexity (e.g., in proofs of the gap Hamming problem^[13]) and graph theory.^[14]

An interesting anti-concentration inequality for weighted sums of independent Rademacher random variables can be obtained using the Paley–Zygmund and the Khintchine inequalities.^[15]

External links

Karthik Sridharan, "A Gentle Introduction to Concentration Inequalities" —Cornell University

Notes and References

https://www.jstor.org/stable/pdf/2684253.pdf Pukelsheim, F., 1994. The Three Sigma Rule. The American Statistician, 48(2), pp. 88–91
Mercadier . Mathieu . Strobel . Frank . 2021-11-16 . A one-sided Vysochanskii-Petunin inequality with financial applications . European Journal of Operational Research . en . 295 . 1 . 374–377 . 10.1016/j.ejor.2021.02.041 . 0377-2217.
Book: Probability and Computing: Randomized Algorithms and Probabilistic Analysis . Cambridge University Press . Mitzenmacher, Michael . Upfal, Eli . 2005 . 0-521-83540-2.
News: One Hundred Statistics and Probability Inequalities . Slagle, N.P. . 2012. 2102.07234 .
Exponential inequalities for martingales with applications. X. . Fan. I. . Grama . Q. . Liu . Electronic Journal of Probability . Electron. J. Probab. 20. 2015. 20 . 1–22. 10.1214/EJP.v20-3496. 1311.6273.
Web site: Chung. Fan. Fan Chung . Linyuan . Lu . Old and new concentration inequalities. Complex Graphs and Networks . . 2010 . August 14, 2018.
Boucheron . Stphane . Lugosi . Gbor . Bousquet . Olivier . Concentration inequalities . Advanced Lectures on Machine Learning: ML Summer Schools 2003, Canberra, Australia, February 2–14, 2003, Tbingen, Germany, August 4–16, 2003, Revised Lectures . 2004 . 208–240 . Springer.
Book: Bretagnolle, Jean . Lois empiriques et distance de Prokhorov . Huber-Carol, Catherine . 1978 . 978-3-540-08761-8 . Lecture Notes in Mathematics . 649 . 332–341 . 10.1007/BFb0064609.
Book: van der Vaart, A.W.. Wellner, J.A.. Weak convergence and empirical processes: With applications to statistics. Springer Science & Business Media. 1996.
Yuto Ushioda. Masato Tanaka. Tomomi Matsui. Monte Carlo Methods for the Shapley–Shubik Power Index. Games . 13 . 3 . 44 . 2022. 10.3390/g13030044 . free . 2101.02841 .
Mason, David M.. Willem R. Van Zwet . A Refinement of the KMT Inequality for the Uniform Empirical Process . The Annals of Probability . 15 . 3 . 1987 . 871–884 . 10.1214/aop/1176992070 . free .
News: Anti-concentration in most directions . Electronic Colloquium on Computational Complexity . Rao, Anup . Yehudayoff, Amir . 2018.
News: The Communication Complexity of Gap Hamming Distance . . Sherstov, Alexander A. . 2012.
Anticoncentration for subgraph statistics . Journal of the London Mathematical Society . 99 . 3 . 757–777 . Matthew Kwan . Benny Sudakov . Tuan Tran . 2018. 10.1112/jlms.12192 . 2018arXiv180705202K . 1807.05202 . 54065186 .
0909.2586v1. Veraar. Mark. On Khintchine inequalities with a weight. math.PR. 2009.