Online codes explained

In computer science, online codes are an example of rateless erasure codes. These codes can encode a message into a number of symbols such that knowledge of any fraction of them allows one to recover the original message (with high probability). Rateless codes produce an arbitrarily large number of symbols which can be broadcast until the receivers have enough symbols.

The online encoding algorithm consists of several phases. First the message is split into n fixed size message blocks. Then the outer encoding is an erasure code which produces auxiliary blocks that are appended to the message blocks to form a composite message.

From this the inner encoding generates check blocks. Upon receiving a certain number of check blocks some fraction of the composite message can be recovered. Once enough has been recovered the outer decoding can be used to recover the original message.

Detailed discussion

Online codes are parameterised by the block size and two scalars, q and ε. The authors suggest q=3 and ε=0.01. These parameters set the balance between the complexity and performance of the encoding. A message of n blocks can be recovered, with high probability, from (1+3ε)n check blocks. The probability of failure is (ε/2)^q+1.

Outer encoding

Any erasure code may be used as the outer encoding, but the author of online codes suggest the following.

For each message block, pseudo-randomly choose q auxiliary blocks (from a total of 0.55qεn auxiliary blocks) to attach it to. Each auxiliary block is then the XOR of all the message blocks which have been attached to it.

Inner encoding

The inner encoding takes the composite message and generates a stream of check blocks. A check block is the XOR of all the blocks from the composite message that it is attached to.

The degree of a check block is the number of blocks that it is attached to. The degree is determined by sampling a random distribution, p, which is defined as:

F=\left\lceil	ln(\epsilon^2/4)
	ln(1-\epsilon/2)

\right\rceil

1=1-	1+1/F
	1+\epsilon

i=	(1-p_1)F
	(F-1)i(i-1)

for

2\lei\leF

Once the degree of the check block is known, the blocks from the composite message which it is attached to are chosen uniformly.

Decoding

Obviously the decoder of the inner stage must hold check blocks which it cannot currently decode. A check block can only be decoded when all but one of the blocks which it is attached to are known. The graph to the left shows the progress of an inner decoder. The x-axis plots the number of check blocks received and the dashed line shows the number of check blocks which cannot currently be used. This climbs almost linearly at first as many check blocks with degree > 1 are received but unusable. At a certain point, some of the check blocks are suddenly usable, resolving more blocks which then causes more check blocks to be usable. Very quickly the whole file can be decoded.

As the graph also shows the inner decoder falls just shy of decoding everything for a little while after having received n check blocks. The outer encoding ensures that a few elusive blocks from the inner decoder are not an issue, as the file can be recovered without them.

External links

Original paper
Rateless Codes and Big Downloads (A more accessible paper by the same author)
Papers by Petar Maymounkov
A Ruby project hosted at RubyForge containing a Ruby library for Online Coding