In information theory, the binary entropy function, denoted
\operatornameH(p)
\operatornameHb(p)
p
\operatornameH(X)=-plogp-(1-p)log(1-p).
The base of the logarithm corresponds to the choice of units of information; base corresponds to nats and is mathematically convenient, while base 2 (binary logarithm) corresponds to shannons and is conventional (as shown in the graph); explicitly:
\operatornameH(X)=-plog2p-(1-p)log2(1-p).
Note that the values at 0 and 1 are given by the limit
style0log0:=
\lim | |
x\to0+ |
xlogx=0
When
p=1/2
p=0
p=1
Binary entropy
\operatornameH(X)
Η(X)
\operatornameH(p)
Η(X)
\operatorname{Ber}(p)
\operatornameH(p)=Η(\operatorname{Ber}(p))
Writing the probability of each of the two values being and, so
p+q=1
q=1-p
\operatornameH(X)=-plogp-(1-p)log(1-p)=-plogp-qlogq=-\sumx\operatorname{Pr}(X=x) ⋅ log\operatorname{Pr}(X=x)=Η(\operatorname{Ber}(p)).
Sometimes the binary entropy function is also written as
\operatornameH2(p)
Η2(X)
In terms of information theory, entropy is considered to be a measure of the uncertainty in a message. To put it intuitively, suppose
p=0
p=1
p=1/2
p=1/4
The derivative of the binary entropy function may be expressed as the negative of the logit function:
{d\overdp}\operatornameHb(p)=-\operatorname{logit}2(p)=-log2\left(
p | |
1-p |
\right)
{d2\overdp2}\operatornameHb(p)=-
1 | |
p(1-p)ln2 |
The convex conjugate (specifically, the Legendre transform) of the binary entropy (with base) is the negative softplus function. This is because (following the definition of the Legendre transform: the derivatives are inverse functions) the derivative of negative binary entropy is the logit, whose inverse function is the logistic function, which is the derivative of softplus.
Softplus can be interpreted as logistic loss, so by duality, minimizing logistic loss corresponds to maximizing entropy. This justifies the principle of maximum entropy as loss minimization.
The Taylor series of the binary entropy function at 1/2 is
\operatornameHb(p)=1-
1 | |
2ln2 |
infin | |
\sum | |
n=1 |
(1-2p)2n | |
n(2n-1) |
0\lep\le1
The following bounds hold for
0<p<1
ln(2) ⋅ log2(p) ⋅ log2(1-p)\leqHb(p)\leqlog2(p) ⋅ log2(1-p)
4p(1-p)\leqHb(p)\leq(4p(1-p))(1/ln
ln