The Nyquist–Shannon sampling theorem is an essential principle for digital signal processing linking the frequency range of a signal and the sample rate required to avoid a type of distortion called aliasing. The theorem states that the sample rate must be at least twice the bandwidth of the signal to avoid aliasing. In practice, it is used to select band-limiting filters to keep aliasing below an acceptable amount when an analog signal is sampled or when sample rates are changed within a digital signal processing function.
The Nyquist–Shannon sampling theorem is a theorem in the field of signal processing which serves as a fundamental bridge between continuous-time signals and discrete-time signals. It establishes a sufficient condition for a sample rate that permits a discrete sequence of samples to capture all the information from a continuous-time signal of finite bandwidth.
Strictly speaking, the theorem only applies to a class of mathematical functions having a Fourier transform that is zero outside of a finite region of frequencies. Intuitively we expect that when one reduces a continuous function to a discrete sequence and interpolates back to a continuous function, the fidelity of the result depends on the density (or sample rate) of the original samples. The sampling theorem introduces the concept of a sample rate that is sufficient for perfect fidelity for the class of functions that are band-limited to a given bandwidth, such that no actual information is lost in the sampling process. It expresses the sufficient sample rate in terms of the bandwidth for the class of functions. The theorem also leads to a formula for perfectly reconstructing the original continuous-time function from the samples.
Perfect reconstruction may still be possible when the sample-rate criterion is not satisfied, provided other constraints on the signal are known (see below and compressed sensing). In some cases (when the sample-rate criterion is not satisfied), utilizing additional constraints allows for approximate reconstructions. The fidelity of these reconstructions can be verified and quantified utilizing Bochner's theorem.[1]
The name Nyquist–Shannon sampling theorem honours Harry Nyquist and Claude Shannon, but the theorem was also previously discovered by E. T. Whittaker (published in 1915), and Shannon cited Whittaker's paper in his work. The theorem is thus also known by the names Whittaker–Shannon sampling theorem, Whittaker–Shannon, and Whittaker–Nyquist–Shannon, and may also be referred to as the cardinal theorem of interpolation.
Sampling is a process of converting a signal (for example, a function of continuous time or space) into a sequence of values (a function of discrete time or space). Shannon's version of the theorem states:[2]
A sufficient sample-rate is therefore anything larger than
2B
fs
B<fs/2
When the bandlimit is too high (or there is no bandlimit), the reconstruction exhibits imperfections known as aliasing. Modern statements of the theorem are sometimes careful to explicitly state that
x(t)
B,
B
2B
x(t)
x(t).
fs/2
x(t)
t,
fs,
B.
The symbol
T\triangleq1/fs
x(t)
x[n]\triangleqT ⋅ x(nT)
xn
n.
T
T
A mathematically ideal way to interpolate the sequence involves the use of sinc functions. Each sample in the sequence is replaced by a sinc function, centered on the time axis at the original location of the sample
nT,
x(nT).
Practical digital-to-analog converters produce neither scaled and delayed sinc functions, nor ideal Dirac pulses. Instead they produce a piecewise-constant sequence of scaled and delayed rectangular pulses (the zero-order hold), usually followed by a lowpass filter (called an "anti-imaging filter") to remove spurious high-frequency replicas (images) of the original baseband signal.
See main article: Aliasing.
When
x(t)
X(f)
infty | |
X(f) \triangleq \int | |
-infty |
x(t) e- {\rmd}t,
Then the samples,
x[n],
x(t)
X(f).
which is a periodic function and its equivalent representation as a Fourier series, whose coefficients are
x[n]
As depicted, copies of
X(f)
fs=1/T
(X(f)=0,forall|f|\geB)
fs,
X(f).
fs/2
x(t)
When there is no overlap of the copies (also known as "images") of
X(f)
k=0
where:
The sampling theorem is proved since
X(f)
x(t)
All that remains is to derive the formula for reconstruction.
H(f)
[B, fs-B]
X1/T(f)
B=fs/2,
where
rect
X(f)=rect\left(
f | |
fs |
\right) ⋅ X1/T(f)
=rect(Tf) ⋅
infty | |
\sum | |
n=-infty |
T ⋅ x(nT) e-i
=
infty | |
\sum | |
n=-infty |
x(nT) ⋅ \underbrace{T ⋅ rect(Tf) ⋅ e-i
The inverse transform of both sides produces the Whittaker–Shannon interpolation formula:
x(t)=
infty | |
\sum | |
n=-infty |
x(nT) ⋅ sinc\left(
t-nT | |
T |
\right),
which shows how the samples,
x(nT)
x(t)
fs
T
H(f)
sinc(t/T)
infty | |
style\sum | |
n=-infty |
x(nT) ⋅ \delta(t-nT),
Poisson shows that the Fourier series in produces the periodic summation of
X(f)
fs
B
fs=2B
Let
X(\omega)
x(t).
x(t)={1\over2\pi}
infty | |
\int | |
-infty |
X(\omega)ei\omega {\rmd}\omega={1\over2\pi}
2\piB | |
\int | |
-2\piB |
X(\omega)ei\omega {\rmd}\omega,
because
X(\omega)
\left|\tfrac{\omega}{2\pi}\right|<B.
t=\tfrac{n}{2B},
n
On the left are values of
x(t)
nth
X(\omega),
-B
B
x(n/2B)
X(\omega).
X(\omega),
X(\omega)
B,
X(\omega)
X(\omega)
x(t)
x(t)
Shannon's proof of the theorem is complete at that point, but he goes on to discuss reconstruction via sinc functions, what we now call the Whittaker–Shannon interpolation formula as discussed above. He does not derive or prove the properties of the sinc function, as the Fourier pair relationship between the rect (the rectangular function) and sinc functions was well known by that time.[4]
As in the other proof, the existence of the Fourier transform of the original signal is assumed, so the proof does not say whether the sampling theorem extends to bandlimited stationary random processes.
See main article: Multidimensional sampling.
The sampling theorem is usually formulated for functions of a single variable. Consequently, the theorem is directly applicable to time-dependent signals and is normally formulated in that context. However, the sampling theorem can be extended in a straightforward way to functions of arbitrarily many variables. Grayscale images, for example, are often represented as two-dimensional arrays (or matrices) of real numbers representing the relative intensities of pixels (picture elements) located at the intersections of row and column sample locations. As a result, images require two independent variables, or indices, to specify each pixel uniquely—one for the row, and one for the column.
Color images typically consist of a composite of three separate grayscale images, one to represent each of the three primary colors—red, green, and blue, or RGB for short. Other colorspaces using 3-vectors for colors include HSV, CIELAB, XYZ, etc. Some colorspaces such as cyan, magenta, yellow, and black (CMYK) may represent color by four dimensions. All of these are treated as vector-valued functions over a two-dimensional sampled domain.
Similar to one-dimensional discrete-time signals, images can also suffer from aliasing if the sampling resolution, or pixel density, is inadequate. For example, a digital photograph of a striped shirt with high frequencies (in other words, the distance between the stripes is small), can cause aliasing of the shirt when it is sampled by the camera's image sensor. The aliasing appears as a moiré pattern. The "solution" to higher sampling in the spatial domain for this case would be to move closer to the shirt, use a higher resolution sensor, or to optically blur the image before acquiring it with the sensor using an optical low-pass filter.
Another example is shown here in the brick patterns. The top image shows the effects when the sampling theorem's condition is not satisfied. When software rescales an image (the same process that creates the thumbnail shown in the lower image) it, in effect, runs the image through a low-pass filter first and then downsamples the image to result in a smaller image that does not exhibit the moiré pattern. The top image is what happens when the image is downsampled without low-pass filtering: aliasing results.
The sampling theorem applies to camera systems, where the scene and lens constitute an analog spatial signal source, and the image sensor is a spatial sampling device. Each of these components is characterized by a modulation transfer function (MTF), representing the precise resolution (spatial bandwidth) available in that component. Effects of aliasing or blurring can occur when the lens MTF and sensor MTF are mismatched. When the optical image which is sampled by the sensor device contains higher spatial frequencies than the sensor, the under sampling acts as a low-pass filter to reduce or eliminate aliasing. When the area of the sampling spot (the size of the pixel sensor) is not large enough to provide sufficient spatial anti-aliasing, a separate anti-aliasing filter (optical low-pass filter) may be included in a camera system to reduce the MTF of the optical image. Instead of requiring an optical filter, the graphics processing unit of smartphone cameras performs digital signal processing to remove aliasing with a digital filter. Digital filters also apply sharpening to amplify the contrast from the lens at high spatial frequencies, which otherwise falls off rapidly at diffraction limits.
The sampling theorem also applies to post-processing digital images, such as to up or down sampling. Effects of aliasing, blurring, and sharpening may be adjusted with digital filtering implemented in software, which necessarily follows the theoretical principles.
To illustrate the necessity of
fs>2B,
\theta
x(t)=
\cos(2\piBt+\theta) | |
\cos(\theta) |
= \cos(2\piBt)-\sin(2\piBt)\tan(\theta), -\pi/2<\theta<\pi/2.
With
fs=2B
T=1/2B,
x(nT)=\cos(\pin)-\underbrace{\sin(\pin)}0\tan(\theta)=(-1)n
That sort of ambiguity is the reason for the strict inequality of the sampling theorem's condition.
As discussed by Shannon:[2]
That is, a sufficient no-loss condition for sampling signals that do not have baseband components exists that involves the width of the non-zero frequency interval as opposed to its highest frequency component. See sampling for more details and examples.
For example, in order to sample FM radio signals in the frequency range of 100–102 MHz, it is not necessary to sample at 204 MHz (twice the upper frequency), but rather it is sufficient to sample at 4 MHz (twice the width of the frequency interval).
A bandpass condition is that
X(f)=0,
f
\left( | N |
2 |
fs,
N+1 | |
2 |
fs\right),
N
N=0.
The corresponding interpolation function is the impulse response of an ideal brick-wall bandpass filter (as opposed to the ideal brick-wall lowpass filter used above) with cutoffs at the upper and lower edges of the specified band, which is the difference between a pair of lowpass impulse responses:
Other generalizations, for example to signals occupying multiple non-contiguous bands, are possible as well. Even the most generalized form of the sampling theorem does not have a provably true converse. That is, one cannot conclude that information is necessarily lost just because the conditions of the sampling theorem are not satisfied; from an engineering perspective, however, it is generally safe to assume that if the sampling theorem is not satisfied then information will most likely be lost.
The sampling theory of Shannon can be generalized for the case of nonuniform sampling, that is, samples not taken equally spaced in time. The Shannon sampling theory for non-uniform sampling states that a band-limited signal can be perfectly reconstructed from its samples if the average sampling rate satisfies the Nyquist condition.[5] Therefore, although uniformly spaced samples may result in easier reconstruction algorithms, it is not a necessary condition for perfect reconstruction.
The general theory for non-baseband and nonuniform samples was developed in 1967 by Henry Landau.[6] He proved that the average sampling rate (uniform or otherwise) must be twice the occupied bandwidth of the signal, assuming it is a priori known what portion of the spectrum was occupied.
In the late 1990s, this work was partially extended to cover signals for which the amount of occupied bandwidth is known but the actual occupied portion of the spectrum is unknown.[7] In the 2000s, a complete theory was developed(see the section Sampling below the Nyquist rate under additional restrictions below) using compressed sensing. In particular, the theory, using signal processing language, is described in a 2009 paper by Mishali and Eldar.[8] They show, among other things, that if the frequency locations are unknown, then it is necessary to sample at least at twice the Nyquist criteria; in other words, you must pay at least a factor of 2 for not knowing the location of the spectrum. Note that minimum sampling requirements do not necessarily guarantee stability.
See main article: Undersampling. The Nyquist–Shannon sampling theorem provides a sufficient condition for the sampling and reconstruction of a band-limited signal. When reconstruction is done via the Whittaker–Shannon interpolation formula, the Nyquist criterion is also a necessary condition to avoid aliasing, in the sense that if samples are taken at a slower rate than twice the band limit, then there are some signals that will not be correctly reconstructed. However, if further restrictions are imposed on the signal, then the Nyquist criterion may no longer be a necessary condition.
A non-trivial example of exploiting extra assumptions about the signal is given by the recent field of compressed sensing, which allows for full reconstruction with a sub-Nyquist sampling rate. Specifically, this applies to signals that are sparse (or compressible) in some domain. As an example, compressed sensing deals with signals that may have a low overall bandwidth (say, the effective bandwidth
EB
2B.
2EB.
Another example where sub-Nyquist sampling is optimal arises under the additional constraint that the samples are quantized in an optimal manner, as in a combined system of sampling and optimal lossy compression.[9] This setting is relevant in cases where the joint effect of sampling and quantization is to be considered, and can provide a lower bound for the minimal reconstruction error that can be attained in sampling and quantizing a random signal. For stationary Gaussian random signals, this lower bound is usually attained at a sub-Nyquist sampling rate, indicating that sub-Nyquist sampling is optimal for this signal model under optimal quantization.[10]
The sampling theorem was implied by the work of Harry Nyquist in 1928,[11] in which he showed that up to
2B
B
The sampling theorem, essentially a dual of Nyquist's result, was proved by Claude E. Shannon.[2] The mathematician E. T. Whittaker published similar results in 1915,[13] J. M. Whittaker in 1935,[14] and Gabor in 1946 ("Theory of communication").
In 1948 and 1949, Claude E. Shannon published the two revolutionary articles in which he founded the information theory.[15] [16] [2] In Shannon 1948 the sampling theorem is formulated as "Theorem 13": Let
f(t)
where
Xn=f\left(
n | |
2W |
\right).
It was not until these articles were published that the theorem known as "Shannon's sampling theorem" became common property among communication engineers, although Shannon himself writes that this is a fact which is common knowledge in the communication art. A few lines further on, however, he adds: "but in spite of its evident importance, [it] seems not to have appeared explicitly in the literature of communication theory". Despite his sampling theorem being published at the end of the 1940s, Shannon had derived his sampling theorem as early as 1940.[17]
Others who have independently discovered or played roles in the development of the sampling theorem have been discussed in several historical articles, for example, by Jerri[18] and by Lüke.[19] For example, Lüke points out that H. Raabe, an assistant to Küpfmüller, proved the theorem in his 1939 Ph.D. dissertation; the term Raabe condition came to be associated with the criterion for unambiguous representation (sampling rate greater than twice the bandwidth). Meijering[20] mentions several other discoverers and names in a paragraph and pair of footnotes:
In Russian literature it is known as the Kotelnikov's theorem, named after Vladimir Kotelnikov, who discovered it in 1933.[21]
Exactly how, when, or why Harry Nyquist had his name attached to the sampling theorem remains obscure. The term Nyquist Sampling Theorem (capitalized thus) appeared as early as 1959 in a book from his former employer, Bell Labs,[22] and appeared again in 1963,[23] and not capitalized in 1965.[24] It had been called the Shannon Sampling Theorem as early as 1954,[25] but also just the sampling theorem by several other books in the early 1950s.
In 1958, Blackman and Tukey cited Nyquist's 1928 article as a reference for the sampling theorem of information theory,[26] even though that article does not treat sampling and reconstruction of continuous signals as others did. Their glossary of terms includes these entries:
Exactly what "Nyquist's result" they are referring to remains mysterious.
When Shannon stated and proved the sampling theorem in his 1949 article, according to Meijering, "he referred to the critical sampling interval
T=
1 | |
2W |
W,
Similarly, Nyquist's name was attached to Nyquist rate in 1953 by Harold S. Black:
According to the Oxford English Dictionary, this may be the origin of the term Nyquist rate. In Black's usage, it is not a sampling rate, but a signaling rate.