Shannon–Fano–Elias coding explained

In information theory, Shannon–Fano–Elias coding is a precursor to arithmetic coding, in which probabilities are used to determine codewords.^[1] It is named for Claude Shannon, Robert Fano, and Peter Elias.

Algorithm description

Given a discrete random variable X of ordered values to be encoded, let $p(x)$ be the probability for any x in X. Define a function

\barF(x)=

\sum
	x_i<x

p(x_i)+

	12
	p(x)

Algorithm:

For each x in X,

Let Z be the binary expansion of

\barF(x)

Choose the length of the encoding of x,

L(x)

, to be the integer

\left\lceillog₂

	1
	p(x)

\right\rceil+1

Choose the encoding of x,

code(x)

, be the first

L(x)

most significant bits after the decimal point of Z.

Example

Let X =, with probabilities p = .

For A

\barF(A)=

	12
	p(A)

	12
	⋅

	13
	=

0.1666\ldots

In binary, Z(A) = 0.0010101010...

L(A)=\left\lceillog₂

	1
	3

\right\rceil+1=3

code(A) is 001

For B

\barF(B)=p(A)+

	12
	p(B)

	13
	+

	12
	⋅

	14
	=

0.4583333\ldots

In binary, Z(B) = 0.01110101010101...

L(B)=\left\lceillog₂

	1
	4

\right\rceil+1=3

code(B) is 011

For C

\barF(C)=p(A)+p(B)+

	12
	p(C)

	13
	+

	14
	+

	12
	⋅

	16
	=

0.66666\ldots

In binary, Z(C) = 0.101010101010...

L(C)=\left\lceillog₂

	1
	6

\right\rceil+1=4

code(C) is 1010

For D

\barF(D)=p(A)+p(B)+p(C)+

	12
	p(D)

	13
	+

	14
	+

	16
	+

	12
	⋅

	14
	=

0.875

In binary, Z(D) = 0.111

L(D)=\left\lceillog₂

	1
	4

\right\rceil+1=3

code(D) is 111

Algorithm analysis

Prefix code

Shannon–Fano–Elias coding produces a binary prefix code, allowing for direct decoding.

Let bcode(x) be the rational number formed by adding a decimal point before a binary code. For example, if code(C) = 1010 then bcode(C) = 0.1010. For all x, if no y exists such that

\operatorname{bcode}(x)\le\operatorname{bcode}(y)<\operatorname{bcode}(x)+2^-L(x)

then all the codes form a prefix code.

By comparing F to the CDF of X, this property may be demonstrated graphically for Shannon–Fano–Elias coding.

By definition of L it follows that

2^-L(x)\le

	12
	p(x)

And because the bits after L(y) are truncated from F(y) to form code(y), it follows that

\barF(y)-\operatorname{bcode}(y)\le2^-L(y)

thus bcode(y) must be no less than CDF(x).

So the above graph demonstrates that the

\operatorname{bcode}(y)-\operatorname{bcode}(x)>p(x)\ge2^-L(x)

, therefore the prefix property holds.

Code length

The average code length is

LC(X)=\sum_xp(x)L(x)=\sum_xp(x)\left(\left\lceillog₂

	1
	p(x)

\right\rceil+1\right)

.
Thus for H(X), the entropy of the random variable X,

H(X)+1\leLC(X)<H(X)+2

Shannon Fano Elias codes from 1 to 2 extra bits per symbol from X than entropy, so the code is not used in practice.

Notes and References

Book: Elements of information theory . T. M. Cover and Joy A. Thomas . 2nd . John Wiley and Sons . 2006 . 127–128 . 978-0-471-24195-9 .

Shannon–Fano–Elias coding explained

Algorithm description

Example

Algorithm analysis

Prefix code

Code length

See also

Notes and References