Kullback's inequality explained
In information theory and statistics, Kullback's inequality is a lower bound on the Kullback–Leibler divergence expressed in terms of the large deviations rate function.[1] If P and Q are probability distributions on the real line, such that P is absolutely continuous with respect to Q, i.e. P << Q, and whose first moments exist, thenwhere
is the rate function, i.e. the
convex conjugate of the
cumulant-generating function, of
, and
is the first
moment of
The Cramér–Rao bound is a corollary of this result.
Proof
Let P and Q be probability distributions (measures) on the real line, whose first moments exist, and such that P << Q. Consider the natural exponential family of Q given byfor every measurable set A, where
is the
moment-generating function of
Q. (Note that
Q0 =
Q.) Then
By
Gibbs' inequality we have
DKL(P\parallelQ\theta)\ge0
so that
Simplifying the right side, we have, for every real
θ where
where
is the first moment, or mean, of
P, and
is called the
cumulant-generating function. Taking the supremum completes the process of
convex conjugation and yields the
rate function:
Corollary: the Cramér–Rao bound
See main article: Cramér–Rao bound.
Start with Kullback's inequality
Let Xθ be a family of probability distributions on the real line indexed by the real parameter θ, and satisfying certain regularity conditions. Then
where
is the
convex conjugate of the
cumulant-generating function of
and
is the first moment of
Left side
The left side of this inequality can be simplified as follows:which is half the Fisher information of the parameter θ.
Right side
The right side of the inequality can be developed as follows:This supremum is attained at a value of t=τ where the first derivative of the cumulant-generating function is
\Psi'\theta(\tau)=\mu\theta+h,
but we have
\Psi'\theta(0)=\mu\theta,
so that
Moreover,
Putting both sides back together
We have:which can be rearranged as:
See also
Notes and references
- Aimé . Fuchs . Giorgio . Letta . L'inégalité de Kullback. Application à la théorie de l'estimation . Séminaire de Probabilités de Strasbourg . Séminaire de probabilités . Strasbourg . 4 . 108–131 . 1970 .