McDiarmid's inequality explained

In probability theory and theoretical computer science, McDiarmid's inequality (named after Colin McDiarmid ^[1]) is a concentration inequality which bounds the deviation between the sampled value and the expected value of certain functions when they are evaluated on independent random variables. McDiarmid's inequality applies to functions that satisfy a bounded differences property, meaning that replacing a single argument to the function while leaving all other arguments unchanged cannot cause too large of a change in the value of the function.

Statement

A function

f:l{X}₁ x l{X}₂ x … x l{X}_n → R

satisfies the bounded differences property if substituting the value of the

th coordinate

x_i

changes the value of

by at most

c_i

. More formally, if there are constants

c_1,c_2,...,c_n

such that for all

i\in[n]

, and all

x_1\inl{X}_1,x_2\inl{X}_2,\ldots,x_n\inl{X}_n

\sup
	x_i'\inl{X

_i}\left|f(x_1,...,x_i-1,x_i,x_i+1,\ldots,x_n)-f(x_1,...,x_i-1,x_i',x_i+1,\ldots,x_n)\right|\leqc_i.

Extensions

Unbalanced distributions

A stronger bound may be given when the arguments to the function are sampled from unbalanced distributions, such that resampling a single argument rarely causes a large change to the function value.

This may be used to characterize, for example, the value of a function on graphs when evaluated on sparse random graphs and hypergraphs, since in a sparse random graph, it is much more likely for any particular edge to be missing than to be present.

Differences bounded with high probability

McDiarmid's inequality may be extended to the case where the function being analyzed does not strictly satisfy the bounded differences property, but large differences remain very rare.

There exist stronger refinements to this analysis in some distribution-dependent scenarios,^[2] such as those that arise in learning theory.

Sub-Gaussian and sub-exponential norms

Let the

th centered conditional version of a function
f

be

f_k(X)(x):=f(x_1,\ldots,x_k-1,X_k,x_k+1,\ldots,x_n)-

E
	X'_k

f(x_1,\ldots,x_k-1,X'_k,x_k+1,\ldots,x_n),

so that

f_k(X)

is a random variable depending on random values of

x_1,\ldots,x_k-1,x_k+1,\ldots,x_n

Bennett and Bernstein forms

Refinements to McDiarmid's inequality in the style of Bennett's inequality and Bernstein inequalities are made possible by defining a variance term for each function argument. Let

\begin{align} B&:=max_k

\sup
	x_1,...,x_k-1,x_k+1,...,x_n

\left|f(x_1,...,x_k-1,X_k,x_k+1,...,x_n)-

E
	X_k

f(x_1,...,x_k-1,X_k,x_k+1,...,x_n)\right|,\\ V_k&:=

\sup
	x_1,...,x_k-1,x_k+1,...,x_n

E
	X_k

\left(f(x_1,...,x_k-1,X_k,x_k+1,...,x_n)-

E
	X_k

f(x_1,...,x_k-1,X_k,x_k+1,...,

	2,
x
	n)\right)

\\ \tilde\sigma²&:=

	n
\sum
	k=1

V_{k.
\end{align}}

Proof

The following proof of McDiarmid's inequality constructs the Doob martingale tracking the conditional expected value of the function as more and more of its arguments are sampled and conditioned on, and then applies a martingale concentration inequality (Azuma's inequality).An alternate argument avoiding the use of martingales also exists, taking advantage of the independence of the function arguments to provide a Chernoff-bound-like argument.

For better readability, we will introduce a notational shorthand:

z_i

will denote

z_i,...,z_j

for any

z\inl{X}ⁿ

and integers

1\lei\lej\len

, so that, for example,

f(X₁,y,x_(i+1)):=f(X_1,\ldots,X_i-1,y,x_i+1,\ldots,x_n).

Pick any

x_1',x_2',\ldots,x_n'

. Then, for any

x_1,x_2,\ldots,x_n

, by triangle inequality,

\begin{align} &|f(x₁)-f(x'₁)|\\[6pt] \leq{}&|f(x₁)-f(x'₁,x_n)|+c_n\\
\leq{}&|f(x₁)-f(x'₁,x_(n-1))|+c_n-1+c_n\\
\leq{}&\ldots\\ \leq{}&

	n
\sum
	i=1

c_i, \end{align}

and thus

is bounded.

Since

is bounded, define the Doob martingale

\{Z_i\}

(each

Z_i

being a random variable depending on the random values of

X_1,\ldots,X_i

) as

Z_i:=E[f(X₁)\midX₁]

for all

i\geq1

and

Z_0:=E[f(X₁)]

, so that

Z_n=f(X₁)

Now define the random variables for each

\begin{align} U_i&:=\sup_x_i}E[f(X₁,x,X_(i+1))\midX₁,X_i=x]-[f(X₁,X_{i\rightharpoondown})\midX₁],\\ L_i&:=inf_x_i}E[f(X₁,x,X_(i+1))\midX₁,X_i=x]-[f(X₁,X_{i\rightharpoondown})\midX₁].\\ \end{align}

Since

X_i,\ldots,X_n

are independent of each other, conditioning on

X_i=x

does not affect the probabilities of the other variables, so these are equal to the expressions

\begin{align} U_i&=\sup_x_i}E[f(X₁,x,X_(i+1))-f(X₁,X_{i\rightharpoondown})\midX₁],\\ L_i&=inf_x_i}E[f(X₁,x,X_(i+1))-f(X₁,X_{i\rightharpoondown})\midX₁].\\ \end{align}

Note that

L_i\leqZ_i-Z_i-1\leqU_i

. In addition,

\begin{align} U_i-L_i&=\sup_u\in_i,\ell\inl{X}_i}
E[f(X₁,u,X_(i+1))\midX₁] -E[f(X₁,\ell,X_(i+1))\midX₁]\\[6pt] &=\sup_u\in_i,\ell\inl{X}_i}
E[f(X₁,u,X_(i+1))-f(X₁,l,X_(i+1))\midX₁]\\ &\leq

\sup
	x_u\inl{X

_i,x_l\inl{X}_i}
E[c_i\midX₁]\\[6pt] &\leqc_{i
\end{align}}

Then, applying the general form of Azuma's inequality to

\left\{Z_i\right\}

, we have

P(f(X_1,\ldots,X_n)-E[f(X_1,\ldots,X_n)]\geq\varepsilon) =\operatorname{P}(Z_n-Z₀\geq\varepsilon) \leq\exp\left(-

2\varepsilon²

\sum

	2
c
	i

i=1

\right).

The one-sided bound in the other direction is obtained by applying Azuma's inequality to

\left\{-Z_i\right\}

and the two-sided bound follows from a union bound.

\square

Notes and References

McDiarmid . Colin . On the method of bounded differences . Surveys in Combinatorics, 1989: Invited Papers at the Twelfth British Combinatorial Conference . 1989 . 148–188. 10.1017/CBO9781107359949.008 . 978-0-521-37823-9 .
Wu . Xinxing . Zhang . Junping . Distribution-dependent concentration inequalities for tighter generalization bounds . Science China Information Sciences . April 2018 . 61 . 4 . 048105:1–048105:3 . 10.1007/s11432-017-9225-2 . 1607.05506 . 255199895 . 10 July 2022.