In probability theory, the Azuma–Hoeffding inequality (named after Kazuoki Azuma and Wassily Hoeffding) gives a concentration result for the values of martingales that have bounded differences.
Suppose
\{Xk:k=0,1,2,3,...\}
|Xk-Xk-1|\leqck,
almost surely. Then for all positive integers N and all positive reals
\epsilon
P(XN-X0\geq\epsilon)\leq\exp\left({-\epsilon2\over
N | |
2\sum | |
k=1 |
2} | |
c | |
k |
\right).
And symmetrically (when Xk is a sub-martingale):
P(XN-X0\leq-\epsilon)\leq\exp\left({-\epsilon2\over
N | |
2\sum | |
k=1 |
2} | |
c | |
k |
\right).
If X is a martingale, using both inequalities above and applying the union bound allows one to obtain a two-sided bound:
P(|XN-X0|\geq\epsilon)\leq2\exp\left({-\epsilon2\over
N | |
2\sum | |
k=1 |
2} | |
c | |
k |
\right).
The proof shares similar idea of the proof for the general form of Azuma's inequality listed below. Actually, this can be viewed as a direct corollary of the general form of Azuma's inequality.
Note that the vanilla Azuma's inequality requires symmetric bounds on martingale increments, i.e.
-ct\leqXt-Xt-1\leqct
at\leqXt-Xt-1\leqbt
ct=max(|at|,|bt|)
Xt-Xt-1
Let
\left\{X0,X1, … \right\}
\left\{l{F}0,l{F}1, … \right\}
\left\{A0,A1, … \right\}
\left\{B0,B1,...\right\}
\left\{l{F}0,l{F}1, … \right\}
t
At,Bt
l{F}t-1
0<c1,c2, … <infty
At\leqXt-Xt-1\leqBt and Bt-At\leqct
\epsilon>0
P(Xn-X0\geq\epsilon)\leq\exp\left(-
2\epsilon2 | ||||||||||||||
|
\right).
\left\{X0,X1,...\right\}
P(Xn-X0\leq-\epsilon)\leq\exp\left(-
2\epsilon2 | ||||||||||||||
|
\right).
\left\{X0,X1,...\right\}
P(|Xn-X0|\geq\epsilon)\leq2\exp\left(-
2\epsilon2 | ||||||||||||||
|
\right).
We will prove the supermartingale case only as the rest are self-evident. By Doob decomposition, we could decompose supermartingale
\left\{Xt\right\}
Xt=Yt+Zt
\left\{Yt,l{F}t\right\}
\left\{Zt,l{F}t\right\}
\left\{Xt\right\}
Zt=0
At\leqXt-Xt-1\leqBt
-(Zt-Zt-1)+At\leqYt-Yt-1\leq -(Zt-Zt-1)+Bt
Yn-Y0
\epsilon>0
\begin{align} P(Yn-Y0\geq\epsilon)&\leq\underset{s>0}{min} e-s\epsilonE
s(Yn-Y0) | |
[e |
]\\ &=\underset{s>0}{min} e-s\epsilonE\left[\exp\left(s
n | |
\sum | |
t=1 |
(Yt-Yt-1)\right)\right]\\ &=\underset{s>0}{min} e-s\epsilonE\left[\exp\left(s
n-1 | |
\sum | |
t=1 |
(Yt-Yt-1)\right)E\left[\exp\left(s(Yn-Yn-1)\right)\midl{F}n-1\right]\right] \end{align}
For the inner expectation term, since
(i)
E[Yt-Yt-1\midl{F}t-1]=0
\left\{Yt\right\}
(ii)
-(Zt-Zt-1)+At\leqYt-Yt-1\leq-(Zt-Zt-1)+Bt
(iii)
-(Zt-Zt-1)+At
-(Zt-Zt-1)+Bt
l{F}t-1
\left\{Zt\right\}
(iv)
Bt-At\leqct
by applying Hoeffding's lemma, we have
E\left[\exp\left(s(Yt-Yt-1)\right)\midl{F}t-1\right] \leq \exp\left(
| |||||||||||||
8 |
\right) \leq \exp\left(
| |||||||||||||
8 |
\right).
P(Yn-Y0\geq\epsilon) \leq \underset{s>0}{min} e-s\epsilon\exp\left(
| |||||||||||||||||||
8 |
\right).
s=
4\epsilon | |||||||||||||||
|
P(Yn-Y0\geq\epsilon) \leq \exp\left(-
2\epsilon2 | |||||||||||||||
|
\right).
Xn-X0=(Yn-Y0)+(Zn-Z0)
Zn-Z0\leq0
\left\{Zn\right\}
\left\{Xn-X0\geq\epsilon\right\}
\left\{Yn-Y0\geq\epsilon\right\}
P(Xn-X0\geq\epsilon) \leq P(Yn-Y0\geq\epsilon) \leq \exp\left(-
2\epsilon2 | |||||||||||||||
|
\right).\square
Note that by setting
At=-ct,Bt=ct
Note that for either submartingale or supermartingale, only one side of Azuma's inequality holds. We can't say much about how fast a submartingale with bounded increments rises (or a supermartingale falls).
This general form of Azuma's inequality applied to the Doob martingale gives McDiarmid's inequality which is common in the analysis of randomized algorithms.
Let Fi be a sequence of independent and identically distributed random coin flips (i.e., let Fi be equally likely to be −1 or 1 independent of the other values of Fi). Defining
Xi=
i | |
\sum | |
j=1 |
Fj
\operatorname{P}(Xn>t)\leq\exp\left(
-t2 | |
2n |
\right).
For example, if we set t proportional to n, then this tells us that although the maximum possible value of Xn scales linearly with n, the probability that the sum scales linearly with n decreases exponentially fast with n.
If we set
t=\sqrt{2nlnn}
\operatorname{P}\left(Xn>\sqrt{2nlnn}\right)\leq
1n, | |
which means that the probability of deviating more than
\sqrt{2nlnn}
A similar inequality was proved under weaker assumptions by Sergei Bernstein in 1937.
Hoeffding proved this result for independent variables rather than martingale differences, and also observed that slight modifications of his argument establish the result for martingale differences (see page 9 of his 1963 paper).