Azuma's inequality explained

In probability theory, the Azuma–Hoeffding inequality (named after Kazuoki Azuma and Wassily Hoeffding) gives a concentration result for the values of martingales that have bounded differences.

Suppose

\{X_k:k=0,1,2,3,...\}

is a martingale (or super-martingale) and

|X_k-X_k-1|\leqc_k,

almost surely. Then for all positive integers N and all positive reals

\epsilon

P(X_N-X₀\geq\epsilon)\leq\exp\left({-\epsilon²\over

	N
2\sum
	k=1

	2}
c
	k

\right).

And symmetrically (when X_k is a sub-martingale):

P(X_N-X₀\leq-\epsilon)\leq\exp\left({-\epsilon²\over

	N
2\sum
	k=1

	2}
c
	k

\right).

If X is a martingale, using both inequalities above and applying the union bound allows one to obtain a two-sided bound:

P(|X_N-X_0|\geq\epsilon)\leq2\exp\left({-\epsilon²\over

	N
2\sum
	k=1

	2}
c
	k

\right).

Proof

The proof shares similar idea of the proof for the general form of Azuma's inequality listed below. Actually, this can be viewed as a direct corollary of the general form of Azuma's inequality.

A general form of Azuma's inequality

Limitation of the vanilla Azuma's inequality

Note that the vanilla Azuma's inequality requires symmetric bounds on martingale increments, i.e.

-c_t\leqX_t-X_t-1\leqc_t

. So, if known bound is asymmetric, e.g.

a_t\leqX_t-X_t-1\leqb_t

, to use Azuma's inequality, one need to choose

c_t=max(|a_t|,|b_t|)

which might be a waste of information on the boundedness of

X_t-X_t-1

. However, this issue can be resolved and one can obtain a tighter probability bound with the following general form of Azuma's inequality.

Statement

Let

\left\{X_0,X_1, … \right\}

be a martingale (or supermartingale) with respect to filtration

\left\{l{F}_0,l{F}_1, … \right\}

. Assume there are predictable processes

\left\{A_0,A_1, … \right\}

and

\left\{B_0,B_1,...\right\}

with respect to

\left\{l{F}_0,l{F}_1, … \right\}

, i.e. for all

A_t,B_t

are

l{F}_t-1

-measurable, and constants

0<c_1,c_2, … <infty

such that

A_t\leqX_t-X_t-1\leqB_t and B_t-A_t\leqc_t

almost surely. Then for all

\epsilon>0

P(X_n-X₀\geq\epsilon)\leq\exp\left(-

2\epsilon²

	n
\sum
	t=1

	2
c
	t

\right).

Since a submartingale is a supermartingale with signs reversed, we have if instead

\left\{X_0,X_1,...\right\}

is a martingale (or submartingale),

P(X_n-X₀\leq-\epsilon)\leq\exp\left(-

2\epsilon²

	n
\sum
	t=1

	2
c
	t

\right).

\left\{X_0,X_1,...\right\}

is a martingale, since it is both a supermartingale and submartingale, by applying union bound to the two inequalities above, we could obtain the two-sided bound:

P(|X_n-X_0|\geq\epsilon)\leq2\exp\left(-

2\epsilon²

	n
\sum
	t=1

	2
c
	t

\right).

Proof

We will prove the supermartingale case only as the rest are self-evident. By Doob decomposition, we could decompose supermartingale

\left\{X_t\right\}

X_t=Y_t+Z_t

where

\left\{Y_t,l{F}_t\right\}

is a martingale and

\left\{Z_t,l{F}_t\right\}

is a nonincreasing predictable sequence (Note that if

\left\{X_t\right\}

itself is a martingale, then

Z_t=0

). From

A_t\leqX_t-X_t-1\leqB_t

, we have

-(Z_t-Z_t-1)+A_t\leqY_t-Y_t-1\leq -(Z_t-Z_t-1)+B_t

Applying Chernoff bound to

Y_n-Y₀

, we have for

\epsilon>0

\begin{align} P(Y_n-Y₀\geq\epsilon)&\leq\underset{s>0}{min} e^-s\epsilonE

	s(Y_n-Y₀₎
[e

]\\ &=\underset{s>0}{min} e^-s\epsilonE\left[\exp\left(s

	n
\sum
	t=1

(Y_t-Y_t-1)\right)\right]\\ &=\underset{s>0}{min} e^-s\epsilonE\left[\exp\left(s

	n-1
\sum
	t=1

(Y_t-Y_t-1)\right)E\left[\exp\left(s(Y_n-Y_n-1)\right)\midl{F}_n-1\right]\right] \end{align}

For the inner expectation term, since

(i)

E[Y_t-Y_t-1\midl{F}_t-1]=0

\left\{Y_t\right\}

is a martingale;

(ii)

-(Z_t-Z_t-1)+A_t\leqY_t-Y_t-1\leq-(Z_t-Z_t-1)+B_t

;

(iii)

-(Z_t-Z_t-1)+A_t

and

-(Z_t-Z_t-1)+B_t

are both

l{F}_t-1

-measurable as

\left\{Z_t\right\}

is a predictable process;

(iv)

B_t-A_t\leqc_t

;

by applying Hoeffding's lemma, we have

E\left[\exp\left(s(Y_t-Y_t-1)\right)\midl{F}_t-1\right] \leq \exp\left(

(B_t-

	2
A
	t)

\right) \leq \exp\left(

	2
c
	t

\right).

Repeating this step, one could get

P(Y_n-Y₀\geq\epsilon) \leq \underset{s>0}{min} e^-s\epsilon\exp\left(

	n
\sum
	t=1

	2
c
	t

\right).

Note that the minimum is achieved at

4\epsilon

\sum

	2
c
	t

t=1

, so we have

P(Y_n-Y₀\geq\epsilon) \leq \exp\left(-

2\epsilon²

\sum

	2
c
	t

t=1

\right).

Finally, since

X_n-X₀=(Y_n-Y₀₎+(Z_n-Z₀₎

and

Z_n-Z₀\leq0

\left\{Z_n\right\}

is nonincreasing, so event

\left\{X_n-X₀\geq\epsilon\right\}

implies

\left\{Y_n-Y₀\geq\epsilon\right\}

, and therefore

P(X_n-X₀\geq\epsilon) \leq P(Y_n-Y₀\geq\epsilon) \leq \exp\left(-

2\epsilon²

\sum

	2
c
	t

t=1

\right).\square

Remark

Note that by setting

A_t=-c_t,B_t=c_t

, we could obtain the vanilla Azuma's inequality.

Note that for either submartingale or supermartingale, only one side of Azuma's inequality holds. We can't say much about how fast a submartingale with bounded increments rises (or a supermartingale falls).

This general form of Azuma's inequality applied to the Doob martingale gives McDiarmid's inequality which is common in the analysis of randomized algorithms.

Simple example of Azuma's inequality for coin flips

Let F_i be a sequence of independent and identically distributed random coin flips (i.e., let F_i be equally likely to be −1 or 1 independent of the other values of F_i). Defining

X_i=

	i
\sum
	j=1

F_j

yields a martingale with |X_k - X_k-1| ≤ 1, allowing us to apply Azuma's inequality. Specifically, we get

\operatorname{P}(X_n>t)\leq\exp\left(

	-t²
	2n

\right).

For example, if we set t proportional to n, then this tells us that although the maximum possible value of X_n scales linearly with n, the probability that the sum scales linearly with n decreases exponentially fast with n.

If we set

t=\sqrt{2nlnn}

we get:

\operatorname{P}\left(X_n>\sqrt{2nlnn}\right)\leq

	1n,

which means that the probability of deviating more than

\sqrt{2nlnn}

approaches 0 as n goes to infinity.

Remark

A similar inequality was proved under weaker assumptions by Sergei Bernstein in 1937.

Hoeffding proved this result for independent variables rather than martingale differences, and also observed that slight modifications of his argument establish the result for martingale differences (see page 9 of his 1963 paper).

References

Book: N. . Alon . J. . Spencer. The Probabilistic Method. Wiley. New York. 1992.
10.2748/tmj/1178243286. K. . Azuma. Weighted Sums of Certain Dependent Random Variables. Tôhoku Mathematical Journal. 19. 357–367 . 1967. 3. PDF. 0221571. free.
Bernstein . Sergei N. . Sergei Natanovich Bernstein . 1937 . On certain modifications of Chebyshev's inequality . Doklady Akademii Nauk SSSR . 17 . 6 . 275–277 . ru:О некоторых модификациях неравенства Чебышёва. Russian . (vol. 4, item 22 in the collected works)
Book: McDiarmid, C. . On the method of bounded differences. Surveys in Combinatorics. London Math. Soc. Lectures Notes 141. Cambridge Univ. Press. Cambridge . 1989. 148–188. 1036755.
10.2307/2282952. W. . Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association. 58. 301. 13–30. 1963. 0144363 . 2282952 .
Book: A. P. . Godbole . P. . Hitczenko . Microsurveys in Discrete Probability . Beyond the method of bounded differences . DIMACS Series in Discrete Mathematics and Theoretical Computer Science . 41. 43–58. 1998 . 1630408 . 10.1090/dimacs/041/03 . 9780821808276 .

Azuma's inequality explained

Proof

A general form of Azuma's inequality

Limitation of the vanilla Azuma's inequality

Statement

Proof

Remark

Simple example of Azuma's inequality for coin flips

Remark

See also

References