Comma code explained

A comma code is a type of prefix-free code in which a comma, a particular symbol or sequence of symbols, occurs at the end of a code word and never occurs otherwise.[1] This is an intuitive way to express arrays.

For example, Fibonacci coding is a comma code in which the comma is 11. 11 and 1011 are valid Fibonacci code words, but 101, 0111, and 11011 are not.

Examples

SymbolCodeComma Code
Comma- (N/A)0
000100
101101
210110
311111

The definition of word being a number of symbols ending in a comma, the equivalent of a space character.

All scrambled data or suitably curated same-length data exhibits so called implied probability.

Such data that can be termed 'generic data' can be analysed using any interleaving unary code as headers where additional bijective bits (equal to the length of the unary code just read) are read as data while the unary code serves as an introduction or header for the data. This header serves as a comma. The data can be read in an interleaving fashion between each bit of the header or in post read fashion when the data is only read after the entire unary header code is read like Chen-Ho encoding.

It can be seen by random walk techniques and by statistical summation that all generic data has a header or comma of an average of 2 bits and data of an additional 2 bits (minimum 1).

This also allows for an inexpensive base increase algorithm before transmission in non binary communication channels, like base-3 or base-5 communication channels.

nRL codeNext codeBijective data (non-NULL)Commas
11''?''0''?''? (1=1,2=2),
21''?''1''?''0''?''0''?''?? (3,4,5,6=11,12,21,22),,
31''?''1''?''1''?''0''?''0''?''0''?''???,,,
41''?''1''?''1''?''1''?''0''?''0''?''0''?''0''?''????,,,,
51''?''1''?''1''?''1''?''1''?''0''?''0''?''0''?''0''?''0''?''?????,,,,,
61''?''1''?''1''?''1''?''1''?''1''?''0''?''0''?''0''?''0''?''0''?''0''?''??????,,,,,,
71''?''1''?''1''?''1''?''1''?''1''?''1''?''0''?''0''?''0''?''0''?''0''?''0''?''0''?''???????,,,,,,,
81''?''1''?''1''?''1''?''1''?''1''?''1''?''1''?''0''?''0''?''0''?''0''?''0''?''0''?''0''?''0''?''????????,,,,,,,,
91''?''1''?''1''?''1''?''1''?''1''?''1''?''1''?''1''?''0''?''0''?''0''?''0''?''0''?''0''?''0''?''0''?''0''?''?????????,,,,,,,,,
101''?''1''?''1''?''1''?''1''?''1''?''1''?''1''?''1''?''1''?''0''?''0''?''0''?''0''?''0''?''0''?''0''?''0''?''0''?''0''?''??????????,,,,,,,,,,
...

Where '?' is '1' or '2' for the value of the bijective digit that requires no further processing.

Of course we use a single comma to separate each field of data, therefore showing that all the data consists of 50% of commas. This is quite visible from an implied probability of 50% for the 0 code in Huffman base 3 codes: 0,10,11 (net 2/3 or 66.66% commas) or the base-5 comma code shown above. The cost-per-character quotient of higher base communication has to maintain near logarithmic values \fracfor the data and less than 2-bits for the comma character to maintain cost effectiveness.

This method has an assurance of a '1' or '2' after every '0' (comma) and this property can be useful when designing around timing concerns in transmission. It can be somewhat expensive to convert a known binary value to ternary unless ternary bit costs are reduced to similar to binary bit costs, so this bit can be multiplexed in a separate binary channel if costs agree (this may require a read of an additional 'tail'/trailing portion of 2-bits pure data for the binary channel (from after the first bit of the first change as this is not an instantly-decodable code, simply read if using an instantly decodable unary code) to be similar to the 2 average ternary bits remaining on the primary channel equivalent to 2*\frac=3.17 bits before cost comparisons are factored in).

Not considering multiplexing, this method has a read efficiency of 3 ternary digits for a read of 4 binary bits or 1.33 bits. Or \frac\frac=84.12%

nRL codeNext codeBijective data (has NULL)Commas
110NULL (or 0),
21''?''10''?''0? (1=1,2=2),,
31''?''1''?''10''?''0''?''0?? (3,4,5,6=11,12,21,22),,,
41''?''1''?''1''?''10''?''0''?''0''?''0???,,,,
51''?''1''?''1''?''1''?''10''?''0''?''0''?''0''?''0????,,,,,
61''?''1''?''1''?''1''?''1''?''10''?''0''?''0''?''0''?''0''?''0?????,,,,,,
71''?''1''?''1''?''1''?''1''?''1''?''10''?''0''?''0''?''0''?''0''?''0''?''0??????,,,,,,,
81''?''1''?''1''?''1''?''1''?''1''?''1''?''10''?''0''?''0''?''0''?''0''?''0''?''0''?''0???????,,,,,,,,
91''?''1''?''1''?''1''?''1''?''1''?''1''?''1''?''10''?''0''?''0''?''0''?''0''?''0''?''0''?''0''?''0????????,,,,,,,,,
101''?''1''?''1''?''1''?''1''?''1''?''1''?''1''?''1''?''10''?''0''?''0''?''0''?''0''?''0''?''0''?''0''?''0''?''0?????????,,,,,,,,,,
...

Where '?' is '1' or '2' for the value of the bijective digit that requires no further processing. This method results in statistical similarity to a simple 'implied read' of Huffman base 3 codes: 0,10,11 (net 2/3 or 66.66% commas).

It can be seen by random walk techniques and by statistical summation that all generic data has a header or comma of an average of 2 bits and data of an additional 1 bit (minimum 0).

This has no assurance of a '1' or '2' after every '0' (comma) a property that can be useful when designing around timing concerns in transmission.

This method has a read efficiency of 2 ternary digits for a read of 3 binary bits or 1.5 binary bits/ternary digit. Or \frac\frac=94.64%

The main advantage to this technique apart from higher efficiency is that there is no base conversion required which would require the entire stream to be read first and then converted. The disadvantage is that the average number length becomes higher and similar to random number generation and timing concerns that govern ternary transmission come to the fore. With m=2 and n=2, we get, not forgetting that a value of '(2)' is essentially 0-bits:

Binary encodingTernary digits
READS - Code space (128 states)b3b2b1b0Values encodedDescriptionWRITES - Occurrences (100 states)
50% (64 states)0ab(0–1) (0–1)Two lower digits44.44% (45 states)
25% (32 states)10a(2) (0–1)One lower digit,one higher digit22.22% (22 states)
12.5% (16 states)110b(0–1) (2)22.22% (22 states)
12.5% (16 states)111(2) (2)Two higher digits11.11% (11 states)

This method therefore has a read efficiency of 2 ternary digits for a read of 50*3+25*3+12.5*4+12.5*3=3.125 binary bits or 1.5625 binary bits/ternary digit. Or \frac\frac=98.58%.

A write efficiency of 2 ternary digits for a write of \frac9*3+\frac9*3+\frac9*4+\frac9*3=3.22 bits or 1.61 binary bits/ternary digit, or \frac=98.38%

See also

References

  1. Book: Wade, Graham . Signal Coding and Processing . 8 September 1994 . Cambridge University Press . 978-0-521-42336-6 . 56.