Entropy rate explained

In the mathematical theory of probability, the entropy rate or source information rate is a function assigning an entropy to a stochastic process.

For a strongly stationary process, the conditional entropy for latest random variable eventually tend towards this rate value.

Definition

A process

with a countable index gives rise to the sequence of its joint entropies

H_n(X_1,X_2,...X_n)

. If the limit exists, the entropy rate is defined as

H(X):=\lim_n\tfrac{1}{n}H_n.

Note that given any sequence

(a_n)_n

with

a₀₌₀

and letting

\Deltaa_k:=a_k-a_k-1

, by telescoping one has

a_n={style\sum

	n}\Delta

	k=1

a_k

. The entropy rate thus computes the mean of the first

such entropy changes, with

going to infinity.The behaviour of joint entropies from one index to the next is also explicitly subject in some characterizations of entropy.

Discussion

While

may be understood as a sequence of random variables, the entropy rate

H(X)

represents the average entropy change per one random variable, in the long term.

It can be thought of as a general property of stochastic sources - this is the subject of the asymptotic equipartition property.

For strongly stationary processes

A stochastic process also gives rise to a sequence of conditional entropies, comprising more and more random variables.For strongly stationary stochastic processes, the entropy rate equals the limit of that sequence

H(X)=\lim_nH(X_n|X_n-1,X_n-2,...X₁₎

The quantity given by the limit on the right is also denoted

H'(X)

, which is motivated to the extent that here this is then again a rate associated with the process, in the above sense.

For Markov chains

Since a stochastic process defined by a Markov chain that is irreducible, aperiodicand positive recurrent has a stationary distribution, the entropy rate is independent of the initial distribution.

P_ij

and an entropy

h_i:=-\sum_jP_ijlogP_ij

associated with each state, one finds

\displaystyleH(X)=\sum_i\mu_ih_i,

where

\mu_i

is the asymptotic distribution of the chain.

In particular, it follows that the entropy rate of an i.i.d. stochastic process is the same as the entropy of any individual member in the process.

For hidden Markov models

The entropy rate of hidden Markov models (HMM) has no known closed-form solution. However, it has known upper and lower bounds. Let the underlying Markov chain

X_1:infty

be stationary, and let

Y_1:infty

be the observable states, then we have

H(Y_n|X_1, Y_) \leq H(Y) \leq H(Y_n|Y_)

and at the limit of

n\toinfty

, both sides converge to the middle.^[1]

Applications

The entropy rate may be used to estimate the complexity of stochastic processes. It is used in diverse applications ranging from characterizing the complexity of languages, blind source separation, through to optimizing quantizers and data compression algorithms. For example, a maximum entropy rate criterion may be used for feature selection in machine learning.^[2]

References

Cover, T. and Thomas, J. (1991) Elements of Information Theory, John Wiley and Sons, Inc., https://archive.today/20121216133431/http://www3.interscience.wiley.com/cgi-bin/bookhome/110438582?CRETRY=1&SRETRY=0

Notes and References

Book: Cover . Thomas M. . Elements of information theory . Thomas . Joy A. . 2006 . Wiley-Interscience . 978-0-471-24195-9 . 2nd . Hoboken, N.J . 4.5. Functions of Markov chains.
Einicke . G. A. . Maximum-Entropy Rate Selection of Features for Classifying Changes in Knee and Ankle Dynamics During Running . IEEE Journal of Biomedical and Health Informatics . 28 . 4 . 1097–1103 . 2018 . 10.1109/JBHI.2017.2711487 . 29969403 . 49555941 . 10810/68978 . free .