In information theory, the noisy-channel coding theorem (sometimes Shannon's theorem or Shannon's limit), establishes that for any given degree of noise contamination of a communication channel, it is possible (in theory) to communicate discrete data (digital information) nearly error-free up to a computable maximum rate through the channel. This result was presented by Claude Shannon in 1948 and was based in part on earlier work and ideas of Harry Nyquist and Ralph Hartley.
The Shannon limit or Shannon capacity of a communication channel refers to the maximum rate of error-free data that can theoretically be transferred over the channel if the link is subject to random data transmission errors, for a particular noise level. It was first described by Shannon (1948), and shortly after published in a book by Shannon and Warren Weaver entitled The Mathematical Theory of Communication (1949). This founded the modern discipline of information theory.
Stated by Claude Shannon in 1948, the theorem describes the maximum possible efficiency of error-correcting methods versus levels of noise interference and data corruption. Shannon's theorem has wide-ranging applications in both communications and data storage. This theorem is of foundational importance to the modern field of information theory. Shannon only gave an outline of the proof. The first rigorous proof for the discrete case is given in .
The Shannon theorem states that given a noisy channel with channel capacity C and information transmitted at a rate R, then if
R<C
The converse is also important. If
R>C
The channel capacity
C
Simple schemes such as "send the message 3 times and use a best 2 out of 3 voting scheme if the copies differ" are inefficient error-correction methods, unable to asymptotically guarantee that a block of data can be communicated free of error. Advanced techniques such as Reed–Solomon codes and, more recently, low-density parity-check (LDPC) codes and turbo codes, come much closer to reaching the theoretical Shannon limit, but at a cost of high computational complexity. Using these highly efficient codes and with the computing power in today's digital signal processors, it is now possible to reach very close to the Shannon limit. In fact, it was shown that LDPC codes can reach within 0.0045 dB of the Shannon limit (for binary additive white Gaussian noise (AWGN) channels, with very long block lengths).[1]
The basic mathematical model for a communication system is the following:
\begin\hline \text \\ f_n \\ \hline\end \xrightarrow[\mathrm{Encoded \atop sequence}] \begin\hline \text \\ p(y|x) \\ \hline\end \xrightarrow[\mathrm{Received \atop sequence}] \begin\hline \text \\ g_n \\ \hline\end \xrightarrow[\mathrm{Estimated \atop message}]A message W is transmitted through a noisy channel by using encoding and decoding functions. An encoder maps W into a pre-defined sequence of channel symbols of length n. In its most basic model, the channel distorts each of these symbols independently of the others. The output of the channel –the received sequence– is fed into a decoder which maps the sequence into an estimate of the message. In this setting, the probability of error is defined as:
Pe=Pr\left\{\hat{W} ≠ W\right\}.
Theorem (Shannon, 1948):
I(X;Y)
C=
\sup | |
pX |
I(X;Y)
has the following property. For any
\epsilon>0
R<C
N
N
\geqR
\leq\epsilon
2. If a probability of bit error
pb
R(pb)
R(pb)=
C | |
1-H2(pb) |
.
and
H2(pb)
H2(pb)=-\left[pblog2{pb}+(1-pb)log2({1-pb})\right]
3. For any
pb
R(pb)
(MacKay (2003), p. 162; cf Gallager (1968), ch.5; Cover and Thomas (1991), p. 198; Shannon (1948) thm. 11)
As with the several other major results in information theory, the proof of the noisy channel coding theorem includes an achievability result and a matching converse result. These two components serve to bound, in this case, the set of possible rates at which one can communicate over a noisy channel, and matching serves to show that these bounds are tight bounds.
The following outlines are only one set of many different styles available for study in information theory texts.
This particular proof of achievability follows the style of proofs that make use of the asymptotic equipartition property (AEP). Another style can be found in information theory texts using error exponents.
Both types of proofs make use of a random coding argument where the codebook used across a channel is randomly constructed - this serves to make the analysis simpler while still proving the existence of a code satisfying a desired low probability of error at any data rate below the channel capacity.
By an AEP-related argument, given a channel, length
n
n | |
X | |
1 |
n
n | |
Y | |
1 |
(n) | |
A | |
\varepsilon |
=\{(xn,yn)\inlXn x lYn
2-n(H(X)+\varepsilon)\le
n) | |
p(X | |
1 |
\le2-n(H(X)
2-n(H(Y)\le
n) | |
p(Y | |
1 |
\le2-n(H(Y)-\varepsilon)
{2-n(H(X,Y)
We say that two sequences
n} | |
{X | |
1 |
n | |
Y | |
1 |
Steps
2nR
p(y|x)
Pr(W=w)=2-nR,w=1,2,...,2nR
P(yn|xn(w))=
np(y | |
\prod | |
i|x |
i(w))
n | |
Y | |
1 |
The probability of error of this scheme is divided into two parts:
\varepsilon
n | |
X | |
1 |
(i)
n | |
Y | |
1 |
\le2-n(I(X;Y)
Define:
Ei=
n(i), | |
\{(X | |
1 |
n) | |
Y | |
1 |
\in
(n) | |
A | |
\varepsilon |
\},i=1,2,...,2nR
as the event that message i is jointly typical with the sequence received when message 1 is sent.
\begin{align} P(error)&{}=P(error|W=1)\le
c) | |
P(E | |
1 |
+
2nR | |
\sum | |
i=2 |
P(Ei)\\ &{}\le
c) | |
P(E | |
1 |
+(2nR-1)2-n(I(X;Y)-3\varepsilon)\\ &{}\le\varepsilon+2-n(I(X;Y)-R-3\varepsilon). \end{align}
We can observe that as
n
R<I(X;Y)
Finally, given that the average codebook is shown to be "good," we know that there exists a codebook whose performance is better than the average, and so satisfies our need for arbitrarily low error probability communicating across the noisy channel.
Suppose a code of
2nR
Xn
Yn
nR=H(W)=H(W|Yn)+I(W;Yn)
\leH(W|Yn)+I(Xn(W);Yn)
\le1+
(n) | |
P | |
e |
nR+I(Xn(W);Yn)
\le1+
(n) | |
P | |
e |
nR+nC
The result of these steps is that
(n) | |
P | |
e |
\ge1-
1 | |
nR |
-
C | |
R |
n
(n) | |
P | |
e |
A strong converse theorem, proven by Wolfowitz in 1957,[3] states that,
Pe\geq1-
4A | |
n(R-C)2 |
-
| ||||
e |
for some finite positive constant
A
n
C
We assume that the channel is memoryless, but its transition probabilities change with time, in a fashion known at the transmitter as well as the receiver.
Then the channel capacity is given by
C=\liminf
max | |||||||||||
|
1 | |
n |
nI(X | |
\sum | |
i;Y |
i).
The maximum is attained at the capacity achieving distributions for each respective channel. That is,
C=\liminf
1 | |
n |
n | |
\sum | |
i=1 |
Ci
Ci
The proof runs through in almost the same way as that of channel coding theorem. Achievability follows from random coding with each symbol chosen randomly from the capacity achieving distribution for that particular channel. Typicality arguments use the definition of typical sets for non-stationary sources defined in the asymptotic equipartition property article.
The technicality of lim inf comes into play when
1 | |
n |
n | |
\sum | |
i=1 |
Ci