In information theory, the information projection or I-projection of a probability distribution q onto a set of distributions P is
p*=\underset{p\inP}{\argmin}\operatorname{D}KL(p||q)
where
DKL
p*
The I-projection is useful in setting up information geometry, notably because of the following inequality, valid when P is convex:[1]
\operatorname{D}KL(p||q)\geq\operatorname{D}KL(p||p*)+\operatorname{D}KL(p*||q)
This inequality can be interpreted as an information-geometric version of Pythagoras' triangle-inequality theorem, where KL divergence is viewed as squared distance in a Euclidean space.
It is worthwhile to note that since
\operatorname{D}KL(p||q)\geq0
The reverse I-projection also known as moment projection or M-projection is
p*=\underset{p\inP}{\argmin}\operatorname{D}KL(q||p)
Since the KL divergence is not symmetric in its arguments, the I-projection and the M-projection will exhibit different behavior. For I-projection,
p(x)
q(x)
p(x)=0
q(x)=0
p(x)
q(x)
p(x)>0
q(x)>0
The reverse I-projection plays a fundamental role in the construction of optimal e-variables.
The concept of information projection can be extended to arbitrary f-divergences and other divergences.[2]