In information theory, Pinsker's inequality, named after its inventor Mark Semenovich Pinsker, is an inequality that bounds the total variation distance (or statistical distance) in terms of the Kullback–Leibler divergence.The inequality is tight up to constant factors.[1]
Pinsker's inequality states that, if
P
Q
(X,\Sigma)
\delta(P,Q)\le\sqrt{
1 | |
2 |
DKL(P\parallelQ)},
where
\delta(P,Q)=\supl\{|P(A)-Q(A)|\mid A\in\Sigmaisameasurableeventr\}
is the total variation distance (or statistical distance) between
P
Q
DKL(P\parallelQ)=\operatorname{E}P\left(log
dP | |
dQ |
\right)=\intX\left(log
dP | |
dQ |
\right)dP
is the Kullback–Leibler divergence in nats. When the sample space
X
DKL(P\parallelQ)=\sumi\left(log
P(i) | |
Q(i) |
\right)P(i)
\|P-Q\|
P-Q
\|P-Q\|\le\sqrt{2DKL(P\parallelQ)}.
A proof of Pinsker's inequality uses the partition inequality for f-divergences.
Note that the expression of Pinsker inequality depends on what basis of logarithm is used in the definition of KL-divergence.
DKL
ln
e
D
log2
D(P\parallelQ)=
DKL(P\parallelQ) | |
ln2 |
.
Given the above comments, there is an alternative statement of Pinsker's inequality in some literature that relates information divergence to variation distance:
D(P\parallelQ)=
DKL(P\parallelQ) | |
ln2 |
\ge
1 | |
2ln2 |
V2(p,q),
\sqrt{ | DKL(P\parallelQ) |
2 |
} \ge
V(p,q) | |
2 |
,
V(p,q)=\sumx
p
q
l{X}
This form of Pinsker's inequality shows that "convergence in divergence" is a stronger notion than "convergence in variation distance".
A simple proof by John Pollard is shown by letting
r(x)=P(x)/Q(x)-1\ge-1
\begin{align} DKL(P\parallelQ) &=EQ[(1+r(x))log(1+r(x))-r(x)] \\&\ge
1 | |
2 |
E | ||||
|
\right] \\&\ge
1 | |
2 |
| |||||||
EQ[1+r(x)/3] |
&(fromTitu'slemma) \\&=
1 | |
2 |
2 &(As | |
E | |
Q[|r(x)|] |
EQ[1+r(x)/3]=1) \\&=
1 | |
2 |
V(p,q)2. \end{align}
Note that the lower bound from Pinsker's inequality is vacuous for any distributions where
DKL(P\parallelQ)>2
1
\delta(P,Q)\le
-DKL(P\parallelQ) | |
\sqrt{1-e |
Pinsker first proved the inequality with a greater constant. The inequality in the above form was proved independently by Kullback, Csiszár, and Kemperman.[5]
A precise inverse of the inequality cannot hold: for every
\varepsilon>0
P\varepsilon,Q
\delta(P\varepsilon,Q)\le\varepsilon
DKL(P\varepsilon\parallelQ)=infty
\{0,1\}
Q(0)=0,Q(1)=1
P\varepsilon(0)=\varepsilon,P\varepsilon(1)=1-\varepsilon
However, an inverse inequality holds on finite spaces
X
Q
\alphaQ:=minxQ(x)
P
Q
1 | |
2 |
DKL(P\parallelQ)\le
1 | |
\alphaQ |
\delta(P,Q)2.
As a consequence, if
Q
Q(x)>0
x\inX
\delta(P,Q)2\le
1 | |
2 |
D(P\parallelQ)\le
1 | |
\alphaQ |
\delta(P,Q)2.