Shannon–Fano–Elias coding explained

In information theory, Shannon–Fano–Elias coding is a precursor to arithmetic coding, in which probabilities are used to determine codewords.[1] It is named for Claude Shannon, Robert Fano, and Peter Elias.

Algorithm description

Given a discrete random variable X of ordered values to be encoded, let p(x) be the probability for any x in X. Define a function

\barF(x)=

\sum
xi<x

p(xi)+

12
p(x)

Algorithm:

For each x in X,

Let Z be the binary expansion of

\barF(x)

.

Choose the length of the encoding of x,

L(x)

, to be the integer

\left\lceillog2

1
p(x)

\right\rceil+1

Choose the encoding of x,

code(x)

, be the first

L(x)

most significant bits after the decimal point of Z.

Example

Let X =, with probabilities p = .

For A

\barF(A)=

12
p(A)

=

12
13
=

0.1666\ldots

In binary, Z(A) = 0.0010101010...

L(A)=\left\lceillog2

1
1
3

\right\rceil+1=3

code(A) is 001

For B

\barF(B)=p(A)+

12
p(B)

=

13
+
12
14
=

0.4583333\ldots

In binary, Z(B) = 0.01110101010101...

L(B)=\left\lceillog2

1
1
4

\right\rceil+1=3

code(B) is 011

For C

\barF(C)=p(A)+p(B)+

12
p(C)

=

13
+
14
+
12
16
=

0.66666\ldots

In binary, Z(C) = 0.101010101010...

L(C)=\left\lceillog2

1
1
6

\right\rceil+1=4

code(C) is 1010

For D

\barF(D)=p(A)+p(B)+p(C)+

12
p(D)

=

13
+
14
+
16
+
12
14
=

0.875

In binary, Z(D) = 0.111

L(D)=\left\lceillog2

1
1
4

\right\rceil+1=3

code(D) is 111

Algorithm analysis

Prefix code

Shannon–Fano–Elias coding produces a binary prefix code, allowing for direct decoding.

Let bcode(x) be the rational number formed by adding a decimal point before a binary code. For example, if code(C) = 1010 then bcode(C) = 0.1010. For all x, if no y exists such that

\operatorname{bcode}(x)\le\operatorname{bcode}(y)<\operatorname{bcode}(x)+2-L(x)

then all the codes form a prefix code.

By comparing F to the CDF of X, this property may be demonstrated graphically for Shannon–Fano–Elias coding.

By definition of L it follows that

2-L(x)\le

12
p(x)
And because the bits after L(y) are truncated from F(y) to form code(y), it follows that

\barF(y)-\operatorname{bcode}(y)\le2-L(y)

thus bcode(y) must be no less than CDF(x).

So the above graph demonstrates that the

\operatorname{bcode}(y)-\operatorname{bcode}(x)>p(x)\ge2-L(x)

, therefore the prefix property holds.

Code length

The average code length is

LC(X)=\sumxp(x)L(x)=\sumxp(x)\left(\left\lceillog2

1
p(x)

\right\rceil+1\right)

.
Thus for H(X), the entropy of the random variable X,

H(X)+1\leLC(X)<H(X)+2

Shannon Fano Elias codes from 1 to 2 extra bits per symbol from X than entropy, so the code is not used in practice.

See also

Notes and References

  1. Book: Elements of information theory . T. M. Cover and Joy A. Thomas . 2nd . John Wiley and Sons . 2006 . 127–128 . 978-0-471-24195-9 .