Y
X
Y
X
Η(Y|X)
The conditional entropy of
Y
X
where
lX
lY
X
Y
Note: Here, the convention is that the expression
0log0
\lim | |
\theta\to0+ |
\thetalog\theta=0
Intuitively, notice that by definition of expected value and of conditional probability,
\displaystyleH(Y|X)
H(Y|X)=E[f(X,Y)]
f
\displaystylef(x,y):=-log\left(
p(x,y) | |
p(x) |
\right)=-log(p(y|x))
\displaystylef
\displaystyle(x,y)
\displaystyle(Y=y)
\displaystyle(X=x)
\displaystyle(Y=y)
(X=x)
\displaystylef
(x,y)\inl{X} x l{Y}
\displaystyleH(Y|X)
X
Y
Let
Η(Y|X=x)
Y
X
x
X
Y
lX
lY
Y
pY{(y)}
Y
Η(Y):=E[\operatorname{I}(Y)]
Η(Y)=\sumy\inlY{Pr(Y=y)I(y)}=-\sumy\inlY{pY(y)log2{pY(y)}},
where
\operatorname{I}(yi)
Y
yi
Y
X
x
Η(Y|X=x) =-\sumy\inlY{\Pr(Y=y|X=x)log2{\Pr(Y=y|X=x)}}.
Η(Y|X)
Η(Y|X=x)
x
X
y1,...,yn
EX[Η(y1,...,yn\midX=x)]
X
lX
Y
lY
Y
X
Η(Y|X=x)
x
p(x)
\begin{align} Η(Y|X) &\equiv\sumx\inlXp(x)Η(Y|X=x)\\ &=-\sumx\inlXp(x)\sumy\inlYp(y|x)log2p(y|x)\\ &=-\sumx\inlX,p(x)p(y|x)log2p(y|x)\\ &=-\sumx\inlX,p(x,y)log2
p(x,y) | |
p(x) |
.\end{align}
Η(Y|X)=0
Y
X
Conversely,
Η(Y|X)=Η(Y)
Y
X
Assume that the combined system determined by two random variables
X
Y
Η(X,Y)
Η(X,Y)
X
Η(X)
X
Η(X,Y)-Η(X)
Η(Y|X)
Η(Y|X)=Η(X,Y)-Η(X).
The chain rule follows from the above definition of conditional entropy:
\begin{align}Η(Y|X)&=\sumx\inlX,p(x,y)log\left(
p(x) | |
p(x,y) |
\right)\\[4pt] &=\sumx\inlX,p(x,y)(log(p(x))-log(p(x,y)))\\[4pt] &=-\sumx\inlX,p(x,y)log(p(x,y))+\sumx\inlX,{p(x,y)log(p(x))}\\[4pt] &=Η(X,Y)+\sumxp(x)log(p(x))\\[4pt] &=Η(X,Y)-Η(X).\end{align}
In general, a chain rule for multiple random variables holds:
Η(X1,X2,\ldots,Xn)=
n | |
\sum | |
i=1 |
Η(Xi|X1,\ldots,Xi-1)
It has a similar form to chain rule in probability theory, except that addition instead of multiplication is used.
Bayes' rule for conditional entropy states
Η(Y|X)=Η(X|Y)-Η(X)+Η(Y).
Proof.
Η(Y|X)=Η(X,Y)-Η(X)
Η(X|Y)=Η(Y,X)-Η(Y)
Η(X,Y)=Η(Y,X)
If
Y
Z
X
Η(Y|X,Z)=Η(Y|X).
For any
X
Y
\begin{align} Η(Y|X)&\leΗ(Y)\\ Η(X,Y)&=Η(X|Y)+Η(Y|X)+\operatorname{I}(X;Y), \\ Η(X,Y)&=Η(X)+Η(Y)-\operatorname{I}(X;Y),\\ \operatorname{I}(X;Y)&\leΗ(X), \end{align}
where
\operatorname{I}(X;Y)
X
Y
For independent
X
Y
Η(Y|X)=Η(Y)
Η(X|Y)=Η(X)
Although the specific-conditional entropy
Η(X|Y=y)
Η(X)
y
Y
Η(X|Y)
Η(X)
The above definition is for discrete random variables. The continuous version of discrete conditional entropy is called conditional differential (or continuous) entropy. Let
X
Y
f(x,y)
h(X|Y)
In contrast to the conditional entropy for discrete random variables, the conditional differential entropy may be negative.
As in the discrete case there is a chain rule for differential entropy:
h(Y|X)=h(X,Y)-h(X)
Joint differential entropy is also used in the definition of the mutual information between continuous random variables:
\operatorname{I}(X,Y)=h(X)-h(X|Y)=h(Y)-h(Y|X)
h(X|Y)\leh(X)
X
Y
The conditional differential entropy yields a lower bound on the expected squared error of an estimator. For any random variable
X
Y
\widehat{X}
This is related to the uncertainty principle from quantum mechanics.
In quantum information theory, the conditional entropy is generalized to the conditional quantum entropy. The latter can take negative values, unlike its classical counterpart.