Limiting density of discrete points explained
In information theory, the limiting density of discrete points is an adjustment to the formula of Claude Shannon for differential entropy.
It was formulated by Edwin Thompson Jaynes to address defects in the initial definition of differential entropy.
Definition
Shannon originally wrote down the following formula for the entropy of a continuous distribution, known as differential entropy:
Unlike Shannon's formula for the discrete entropy, however, this is not the result of any derivation (Shannon simply replaced the summation symbol in the discrete version with an integral), and it lacks many of the properties that make the discrete entropy a useful measure of uncertainty. In particular, it is not invariant under a
change of variables and can become negative. In addition, it is not even dimensionally correct. Since
would be dimensionless,
must have units of
, which means that the argument to the logarithm is not dimensionless as required.
Jaynes argued that the formula for the continuous entropy should be derived by taking the limit of increasingly dense discrete distributions.[1] [2] Suppose that we have a set of
discrete points
, such that in the limit
their density approaches a function
called the "invariant measure":
\limN
| b |
(numberofpointsina<x<b)=\int | |
| a |
m(x)dx.
Jaynes derived from this the following formula for the continuous entropy, which he argued should be taken as the correct formula:
\limNHN(X)=log(N)-\intp(x)log
dx.
Typically, when this is written, the term
is omitted, as that would typically not be finite. So the actual common definition is
Where it is unclear whether or not the
term should be omitted, one could write
Notice that in Jaynes' formula,
is a probability density. For any finite
that
is a uniform density over the quantization of the continuous space that is used in the Riemann sum. In the limit,
is the continuous limiting density of points in the quantization used to represent the continuous variable
.
Suppose one had a number format that took on
possible values, distributed as per
. Then
(if
is large enough that the continuous approximation is valid) is the discrete entropy of the variable
in this encoding. This is equal to the average number of bits required to transmit this information, and is no more than
. Therefore,
may be thought of as the amount of information gained by knowing that the variable
follows the distribution
, and is not uniformly distributed over the possible quantized values, as would be the case if it followed
.
is actually the (negative)
Kullback–Leibler divergence from
to
, which is thought of as the information gained by learning that a variable previously thought to be distributed as
is actually distributed as
.
Jaynes' continuous entropy formula has the property of being invariant under a change of variables, provided that
and
are transformed in the same way. (This motivates the name "invariant measure" for
m.) This solves many of the difficulties that come from applying Shannon's continuous entropy formula. Jaynes himself dropped the
term as it was not relevant to his work (maximum entropy distributions), and it is somewhat awkward to have an infinite term in the calculation. Unfortunately, this cannot be helped if the quantization is made arbitrarily fine, as would be the case in the continuous limit. Note that
as defined here (without the
term) would always be non-positive, because a KL divergence would always be non-negative.
If it is the case that
is constant over some interval of size
, and
is essentially zero outside that interval, then the limiting density of discrete points (LDDP) is closely related to the differential entropy
:
HN(X) ≈ log(N)-log(r)+h(X).
Further reading
- Book: Jaynes, E. T.. 2003. Probability Theory: The Logic of Science. Cambridge University Press. 978-0521592710.
Notes and References
- Book: Edwin Thompson Jaynes
. Edwin Thompson Jaynes. Jaynes, E. T.. Statistical Physics. 1963. Information Theory and Statistical Mechanics. K. Ford. Benjamin, New York. 181.
- Jaynes, E. T.. 1968. Prior Probabilities. IEEE Transactions on Systems Science and Cybernetics. SSC-4. 3 . 227–241. 10.1109/TSSC.1968.300117 .