Information projection explained

In information theory, the information projection or I-projection of a probability distribution q onto a set of distributions P is

p*=\underset{p\inP}{\argmin}\operatorname{D}KL(p||q)

.

where

DKL

is the Kullback–Leibler divergence from q to p. Viewing the Kullback–Leibler divergence as a measure of distance, the I-projection

p*

is the "closest" distribution to q of all the distributions in P.

The I-projection is useful in setting up information geometry, notably because of the following inequality, valid when P is convex:[1]

\operatorname{D}KL(p||q)\geq\operatorname{D}KL(p||p*)+\operatorname{D}KL(p*||q)

.

This inequality can be interpreted as an information-geometric version of Pythagoras' triangle-inequality theorem, where KL divergence is viewed as squared distance in a Euclidean space.

It is worthwhile to note that since

\operatorname{D}KL(p||q)\geq0

and continuous in p, if P is closed and non-empty, then there exists at least one minimizer to the optimization problem framed above. Furthermore, if P is convex, then the optimum distribution is unique.

The reverse I-projection also known as moment projection or M-projection is

p*=\underset{p\inP}{\argmin}\operatorname{D}KL(q||p)

.

Since the KL divergence is not symmetric in its arguments, the I-projection and the M-projection will exhibit different behavior. For I-projection,

p(x)

will typicallyunder-estimate the support of

q(x)

and will lock onto one of its modes. This is due to

p(x)=0

, whenever

q(x)=0

to make sure KL divergence stays finite. For M-projection,

p(x)

will typically over-estimate the support of

q(x)

. This is due to

p(x)>0

whenever

q(x)>0

to make sure KL divergence stays finite.

The reverse I-projection plays a fundamental role in the construction of optimal e-variables.

The concept of information projection can be extended to arbitrary f-divergences and other divergences.[2]

See also

References

Notes and References

  1. Book: Cover. Thomas M.. Thomas. Joy A.. Elements of Information Theory. Wiley Interscience. 2. 2006. Hoboken, New Jersey. 367 (Theorem 11.6.1).
  2. Nielsen. Frank . What is... an information projection?. Notices of the American Mathematical Society . 65 . 3. 2018. 321–324. 10.1090/noti1647 .