Truncated binary encoding explained

Truncated binary encoding is an entropy encoding typically used for uniform probability distributions with a finite alphabet. It is parameterized by an alphabet with total size of number n. It is a slightly more general form of binary encoding when n is not a power of two.

If n is a power of two, then the coded value for 0 ≤ x < n is the simple binary code for x of length log₂(n).Otherwise let k = floor(log₂(n)), such that 2^k < n < 2^k+1and let u = 2^k+1 − n.

Truncated binary encoding assigns the first u symbols codewords of length k and then assigns the remaining n − u symbols the last n − u codewords of length k + 1. Because all the codewords of length k + 1 consist of an unassigned codeword of length k with a "0" or "1" appended, the resulting code is a prefix code.

History

Used since at least 1984, phase-in codes, also known as economy codes,^[1] are also known as truncated binary encoding.

Example with n = 5

For example, for the alphabet, n = 5 and 2² ≤ n < 2³, hence k = 2 and u = 2³ − 5 = 3. Truncated binary encoding assigns the first u symbols the codewords 00, 01, and 10, all of length 2, then assigns the last n − u symbols the codewords 110 and 111, the last two codewords of length 3.

For example, if n is 5, plain binary encoding and truncated binary encoding allocates the following codewords. Digits shown ~~struck~~ are not transmitted in truncated binary.

Truncated binary	Encoding			Standard binary
0	bgcolor=silver	0	0	0	0
1	bgcolor=silver	0	0	1	1
2	bgcolor=silver	0	1	0	2
UNUSED	bgcolor=silver	0	bgcolor=silver	1	bgcolor=silver	1	3
UNUSED	bgcolor=silver	1	bgcolor=silver	0	bgcolor=silver	0	4
UNUSED	bgcolor=silver	1	bgcolor=silver	0	bgcolor=silver	1	5/UNUSED
3	1	1	0	6/UNUSED
4	1	1	1	7/UNUSED

It takes 3 bits to encode n using straightforward binary encoding, hence 2³ − n = 8 − 5 = 3 are unused.

In numerical terms, to send a value x, where 0 ≤ x < n, and where there are 2^k ≤ n < 2^k+1 symbols, there are u = 2^k+1 − n unused entries when the alphabet size is rounded up to the nearest power of two. The process to encode the number x in truncated binary is: if x is less than u, encode it in k binary bits; if x is greater than or equal to u, encode the value x + u in k + 1 binary bits.

Example with n = 10

Another example, encoding an alphabet of size 10 (between 0 and 9) requires 4 bits, but there are 2⁴ − 10 = 6 unused codes, so input values less than 6 have the first bit discarded, while input values greater than or equal to 6 are offset by 6 to the end of the binary space. (Unused patterns are not shown in this table.)

Input value	Offset	Offset value	Standard binary	Truncated binary
0	0	0	0000	000
1	0	1	0001	001
2	0	2	0010	010
3	0	3	0011	011
4	0	4	0100	100
5	0	5	0101	101

6	6	12	0110	1100
7	6	13	0111	1101
8	6	14	1000	1110
9	6	15	1001	1111

To decode, read the first k bits. If they encode a value less than u, decoding is complete. Otherwise, read an additional bit and subtract u from the result.

Example with n = 7

Here is a more extreme case: with n = 7 the next power of 2 is 8, so k = 2 and u = 2³ − 7 = 1:

Input value	Offset	Offset value	Standard binary	Truncated binary
0	0	0	000	00

1	1	2	001	010
2	1	3	010	011
3	1	4	011	100
4	1	5	100	101
5	1	6	101	110
6	1	7	110	111

This last example demonstrates that a leading zero bit does not always indicate a short code; if u < 2^k, some long codes will begin with a zero bit.

Simple algorithm

Generate the truncated binary encoding for a value x, 0 ≤ x < n, where n > 0 is the size of the alphabet containing x. n need not be a power of two. string TruncatedBinary (int x, int n)The routine Binary is expository; usually just the rightmost len bits of the variable x are desired.Here we simply output the binary code for x using len bits, padding with high-order 0s if necessary.string Binary (int x, int len)

On efficiency

If n is not a power of two, and k-bit symbols are observed with probability p, then (k + 1)-bit symbols are observed with probability 1 − p. We can calculate the expected number of bits per symbol

b_e

b_e=pk+(1-p)(k+1).

Raw encoding of the symbol has

b_u=k+1

bits. Then relative space saving s (see Data compression ratio) of the encoding can be defined as

s=1-

	b_e
	b_u

=1-

	pk+(1-p)(k+1)
	k+1

When simplified, this expression leads to

	p
	k+1

	p
	b_u

This indicates that relative efficiency of truncated binary encoding increases as probability p of k-bit symbols increases, and the raw-encoding symbol bit-length

b_u

decreases.

Notes and References

Job van der Zwan."Phase-in Codes".