Truncated binary encoding is an entropy encoding typically used for uniform probability distributions with a finite alphabet. It is parameterized by an alphabet with total size of number n. It is a slightly more general form of binary encoding when n is not a power of two.
If n is a power of two, then the coded value for 0 ≤ x < n is the simple binary code for x of length log2(n).Otherwise let k = floor(log2(n)), such that 2k < n < 2k+1and let u = 2k+1 − n.
Truncated binary encoding assigns the first u symbols codewords of length k and then assigns the remaining n − u symbols the last n − u codewords of length k + 1. Because all the codewords of length k + 1 consist of an unassigned codeword of length k with a "0" or "1" appended, the resulting code is a prefix code.
Used since at least 1984, phase-in codes, also known as economy codes,[1] are also known as truncated binary encoding.
For example, for the alphabet, n = 5 and 22 ≤ n < 23, hence k = 2 and u = 23 − 5 = 3. Truncated binary encoding assigns the first u symbols the codewords 00, 01, and 10, all of length 2, then assigns the last n − u symbols the codewords 110 and 111, the last two codewords of length 3.
For example, if n is 5, plain binary encoding and truncated binary encoding allocates the following codewords. Digits shown struck are not transmitted in truncated binary.
Truncated binary | Encoding | Standard binary | |||||
---|---|---|---|---|---|---|---|
0 | bgcolor=silver | 0 | 0 | 0 | |||
1 | bgcolor=silver | 0 | 1 | 1 | |||
2 | bgcolor=silver | 1 | 0 | 2 | |||
UNUSED | bgcolor=silver | bgcolor=silver | bgcolor=silver | 3 | |||
UNUSED | bgcolor=silver | bgcolor=silver | bgcolor=silver | 4 | |||
UNUSED | bgcolor=silver | bgcolor=silver | bgcolor=silver | 5/UNUSED | |||
3 | 1 | 1 | 0 | 6/UNUSED | |||
4 | 1 | 1 | 1 | 7/UNUSED |
It takes 3 bits to encode n using straightforward binary encoding, hence 23 − n = 8 − 5 = 3 are unused.
In numerical terms, to send a value x, where 0 ≤ x < n, and where there are 2k ≤ n < 2k+1 symbols, there are u = 2k+1 − n unused entries when the alphabet size is rounded up to the nearest power of two. The process to encode the number x in truncated binary is: if x is less than u, encode it in k binary bits; if x is greater than or equal to u, encode the value x + u in k + 1 binary bits.
Another example, encoding an alphabet of size 10 (between 0 and 9) requires 4 bits, but there are 24 − 10 = 6 unused codes, so input values less than 6 have the first bit discarded, while input values greater than or equal to 6 are offset by 6 to the end of the binary space. (Unused patterns are not shown in this table.)
Input value | Offset | Offset value | Standard binary | Truncated binary | |
---|---|---|---|---|---|
0 | 0 | 0 | 000 | ||
1 | 0 | 1 | 001 | ||
2 | 0 | 2 | 010 | ||
3 | 0 | 3 | 011 | ||
4 | 0 | 4 | 100 | ||
5 | 0 | 5 | 101 | ||
6 | 6 | 12 | 0110 | 1100 | |
7 | 6 | 13 | 0111 | 1101 | |
8 | 6 | 14 | 1000 | 1110 | |
9 | 6 | 15 | 1001 | 1111 |
To decode, read the first k bits. If they encode a value less than u, decoding is complete. Otherwise, read an additional bit and subtract u from the result.
Here is a more extreme case: with n = 7 the next power of 2 is 8, so k = 2 and u = 23 − 7 = 1:
Input value | Offset | Offset value | Standard binary | Truncated binary | |
---|---|---|---|---|---|
0 | 0 | 0 | 00 | ||
1 | 1 | 2 | 001 | 010 | |
2 | 1 | 3 | 010 | 011 | |
3 | 1 | 4 | 011 | 100 | |
4 | 1 | 5 | 100 | 101 | |
5 | 1 | 6 | 101 | 110 | |
6 | 1 | 7 | 110 | 111 |
This last example demonstrates that a leading zero bit does not always indicate a short code; if u < 2k, some long codes will begin with a zero bit.
Generate the truncated binary encoding for a value x, 0 ≤ x < n, where n > 0 is the size of the alphabet containing x. n need not be a power of two. Binary
is expository; usually just the rightmost len
bits of the variable x are desired.Here we simply output the binary code for x using len
bits, padding with high-order 0s if necessary.
If n is not a power of two, and k-bit symbols are observed with probability p, then (k + 1)-bit symbols are observed with probability 1 − p. We can calculate the expected number of bits per symbol
be
be=pk+(1-p)(k+1).
Raw encoding of the symbol has
bu=k+1
s=1-
be | |
bu |
=1-
pk+(1-p)(k+1) | |
k+1 |
.
When simplified, this expression leads to
s=
p | |
k+1 |
=
p | |
bu |
.
This indicates that relative efficiency of truncated binary encoding increases as probability p of k-bit symbols increases, and the raw-encoding symbol bit-length
bu