In the theory of stochastic processes, the Karhunen–Loève theorem (named after Kari Karhunen and Michel Loève), also known as the Kosambi–Karhunen–Loève theorem[1] (after Damodar Dharmananda Kosambi) states that a stochastic process can be represented as an infinite linear combination of orthogonal functions, analogous to a Fourier series representation of a function on a bounded interval. The transformation is also known as Hotelling transform and eigenvector transform, and is closely related to principal component analysis (PCA) technique widely used in image processing and in data analysis in many fields.[2]
There exist many such expansions of a stochastic process: if the process is indexed over, any orthonormal basis of yields an expansion thereof in that form. The importance of the Karhunen–Loève theorem is that it yields the best such basis in the sense that it minimizes the total mean squared error.
In contrast to a Fourier series where the coefficients are fixed numbers and the expansion basis consists of sinusoidal functions (that is, sine and cosine functions), the coefficients in the Karhunen–Loève theorem are random variables and the expansion basis depends on the process. In fact, the orthogonal basis functions used in this representation are determined by the covariance function of the process. One can think that the Karhunen–Loève transform adapts to the process in order to produce the best possible basis for its expansion.
In the case of a centered stochastic process (centered means for all) satisfying a technical continuity condition, admits a decomposition
Xt=
infty | |
\sum | |
k=1 |
Zkek(t)
where are pairwise uncorrelated random variables and the functions are continuous real-valued functions on that are pairwise orthogonal in . It is therefore sometimes said that the expansion is bi-orthogonal since the random coefficients are orthogonal in the probability space while the deterministic functions are orthogonal in the time domain. The general case of a process that is not centered can be brought back to the case of a centered process by considering which is a centered process.
Moreover, if the process is Gaussian, then the random variables are Gaussian and stochastically independent. This result generalizes the Karhunen–Loève transform. An important example of a centered real stochastic process on is the Wiener process; the Karhunen–Loève theorem can be used to provide a canonical orthogonal representation for it. In this case the expansion consists of sinusoidal functions.
The above expansion into uncorrelated random variables is also known as the Karhunen–Loève expansion or Karhunen–Loève decomposition. The empirical version (i.e., with the coefficients computed from a sample) is known as the Karhunen–Loève transform (KLT), principal component analysis, proper orthogonal decomposition (POD), empirical orthogonal functions (a term used in meteorology and geophysics), or the Hotelling transform.
\forallt\in[a,b] Xt\inL2(\Omega,F,P), i.e.
2] | |
E[X | |
t |
<infty,
\forallt\in[a,b] E[Xt]=0,
\forallt,s\in[a,b] KX(s,t)=E[XsXt].
The square-integrable condition
2] | |
E[X | |
t |
<infty
KX(s,t)
s,t\in[a,b]
\begin{aligned} &T | |
KX |
&:L2([a,b])&\toL2([a,b])\\ &&:f\mapsto
T | |
KX |
f&=
b | |
\int | |
a |
KX(s, ⋅ )f(s)ds \end{aligned}
Since is a linear operator, it makes sense to talk about its eigenvalues λk and eigenfunctions, which are found solving the homogeneous Fredholm integral equation of the second kind
b | |
\int | |
a |
KX(s,t)ek(s)ds=λkek(t)
Theorem. Let be a zero-mean square-integrable stochastic process defined over a probability space and indexed over a closed and bounded interval [''a'', ''b''], with continuous covariance function .
Then is a Mercer kernel and letting be an orthonormal basis on formed by the eigenfunctions of with respective eigenvalues admits the following representation
Xt=\sum
infty | |
k=1 |
Zkek(t)
where the convergence is in, uniform in t and
Zk=\int
b | |
a |
Xtek(t)dt
Furthermore, the random variables have zero-mean, are uncorrelated and have variance λk
E[Zk]=0,~\forallk\inN and E[ZiZj]=\deltaijλj,~\foralli,j\inN
Note that by generalizations of Mercer's theorem we can replace the interval [''a'', ''b''] with other compact spaces C and the Lebesgue measure on [''a'', ''b''] with a Borel measure whose support is C.
KX(s,t)=\sum
infty | |
k=1 |
λkek(s)ek(t)
Xt=\sum
infty | |
k=1 |
Zkek(t)
where the coefficients (random variables) are given by the projection of on the respective eigenfunctions
Zk=\int
b | |
a |
Xtek(t)dt
\begin{align} E[Zk]
b | |
&=E\left[\int | |
a |
Xtek(t)
b | |
dt\right]=\int | |
a |
E[Xt]ek(t)dt=0\ [8pt] E[ZiZj]&=E\left[
b | |
\int | |
a |
b | |
\int | |
a |
XtXsej(t)ei(s)dt
b | |
ds\right]\\ &=\int | |
a |
b | |
\int | |
a |
E\left[XtXs\right]ej(t)ei(s)dt
b | |
ds\\ &=\int | |
a |
b | |
\int | |
a |
KX(s,t)ej(t)ei(s)dt
b | |
ds\\ &=\int | |
a |
ei(s)\left(\int
b | |
a |
KX(s,t)ej(t)dt\right)ds\\ &=λj
b | |
\int | |
a |
ei(s)ej(s)ds\\ &=\deltaijλj \end{align}
where we have used the fact that the are eigenfunctions of and are orthonormal.
SN=\sum
N | |
k=1 |
Zkek(t).
Then:
\begin{align} E\left[\left|Xt-SN\right|2\right]&=E\left
2 | |
[X | |
t |
\right]+E\left
2 | |
[S | |
N |
\right]-2E\left[XtSN\right]\\ &=KX(t,t)+E\left[\sum
N | |
k=1 |
N | |
\sum | |
l=1 |
ZkZ\ellek(t)e\ell(t)\right]-2E\left[Xt\sum
N | |
k=1 |
Zkek(t)\right]\\ &=KX(t,t)+\sum
N | |
k=1 |
λk
2 | |
e | |
k(t) |
N | |
-2E\left[\sum | |
k=1 |
b | |
\int | |
a |
XtXsek(s)ek(t)ds\right]\\ &=KX(t,t)-\sum
N | |
k=1 |
λk
2 \end{align} | |
e | |
k(t) |
which goes to 0 by Mercer's theorem.
Since the limit in the mean of jointly Gaussian random variables is jointly Gaussian, and jointly Gaussian random (centered) variables are independent if and only if they are orthogonal, we can also conclude:
Theorem. The variables have a joint Gaussian distribution and are stochastically independent if the original process is Gaussian.
In the Gaussian case, since the variables are independent, we can say more:
\limN
N | |
\sum | |
i=1 |
ei(t)Zi(\omega)=Xt(\omega)
This is a consequence of the independence of the .
In the introduction, we mentioned that the truncated Karhunen–Loeve expansion was the best approximation of the original process in the sense that it reduces the total mean-square error resulting of its truncation. Because of this property, it is often said that the KL transform optimally compacts the energy.
More specifically, given any orthonormal basis of, we may decompose the process as:
Xt(\omega)=\sum
infty | |
k=1 |
Ak(\omega)fk(t)
where
Ak(\omega)=\int
b | |
a |
Xt(\omega)fk(t)dt
and we may approximate by the finite sum
\hat{X}t(\omega)=\sum
N | |
k=1 |
Ak(\omega)fk(t)
for some integer N.
Claim. Of all such approximations, the KL approximation is the one that minimizes the total mean square error (provided we have arranged the eigenvalues in decreasing order).
An important observation is that since the random coefficients Zk of the KL expansion are uncorrelated, the Bienaymé formula asserts that the variance of Xt is simply the sum of the variances of the individual components of the sum:
\operatorname{var}[Xt]=\sum
infty | |
k=0 |
2 | |
e | |
k(t) |
\operatorname{var}[Zk]=\sum
infty | |
k=1 |
λk
2 | |
e | |
k(t) |
Integrating over [''a'', ''b''] and using the orthonormality of the ek, we obtain that the total variance of the process is:
b | |
\int | |
a |
\operatorname{var}[Xt]
infty | |
dt=\sum | |
k=1 |
λk
In particular, the total variance of the N-truncated approximation is
N | |
\sum | |
k=1 |
λk.
As a result, the N-truncated expansion explains
| ||||||||||
|
of the variance; and if we are content with an approximation that explains, say, 95% of the variance, then we just have to determine an
N\inN
| ||||||||||
|
\geq0.95.
Given a representation of
Xt=\sum
infty | |
k=1 |
Wk\varphik(t)
\varphik(t)
Wk
pk=E[|W
2]/E[|X | |
t| |
2] | |
L2 |
infty | |
\sum | |
k=1 |
pk=1
H(\{\varphik\})=-\sumipklog(pk)
H(\{\varphik\})\geH(\{ek\})
\varphik
Proof:
Denote the coefficients obtained for the basis
ek(t)
pk
\varphik(t)
qk
Choose
N\ge1
ek
E
N | |
\left|\sum | |
k=1 |
Zkek(t)-Xt\right|
2\le | |
L2 |
N | |
E\left|\sum | |
k=1 |
Wk\varphik(t)-Xt\right|
2 | |
L2 |
Expanding the right hand size, we get:
N | |
E\left|\sum | |
k=1 |
Wk\varphik(t)-Xt\right|
2 | |
L2 |
2| | |
=E|X | |
L2 |
+
N | |
\sum | |
k=1 |
N | |
\sum | |
\ell=1 |
E[W\ell\varphi\ell(t)W
*(t)] | |
L2 |
N | |
-\sum | |
k=1 |
E[Wk\varphik
*] | |
X | |
L2 |
-
N | |
\sum | |
k=1 |
E[XtW
*(t)] | |
L2 |
Using the orthonormality of
\varphik(t)
Xt
\varphik(t)
2 | |
E[X | |
L2 |
2] | |
-\sum | |
k| |
We may perform identical analysis for the
ek(t)
{\displaystyleE[Xt]
NE | |
k=1 |
2]}\le | |
[|Z | |
k| |
{\displaystyleE[Xt]
NE | |
k=1 |
2 | |
[|W | |
k| |
]}
Subtracting the common first term, and dividing by
2 | |
E[|X | |
L2 |
]
N | |
\sum | |
k=1 |
pk\ge
N | |
\sum | |
k=1 |
qk
This implies that:
infty | |
-\sum | |
k=1 |
pklog(pk)\le
infty | |
-\sum | |
k=1 |
qklog(qk)
Consider a whole class of signals we want to approximate over the first vectors of a basis. These signals are modeled as realizations of a random vector of size . To optimize the approximation we design a basis that minimizes the average approximation error. This section proves that optimal bases are Karhunen–Loeve bases that diagonalize the covariance matrix of . The random vector can be decomposed in an orthogonal basis
\left\{gm\right\}0\le
as follows:
N-1 | |
Y=\sum | |
m=0 |
\left\langleY,gm\right\ranglegm,
where each
\left\langleY,gm\right\rangle
N-1 | |
=\sum | |
n=0 |
{Y[n]}
* | |
g | |
m |
[n]
is a random variable. The approximation from the first vectors of the basis is
YM=\sum
M-1 | |
m=0 |
\left\langleY,gm\right\ranglegm
The energy conservation in an orthogonal basis implies
\varepsilon[M]=E\left\{\left\|Y-YM\right\|2\right\}
N-1 | |
=\sum | |
m=M |
E\left\{\left|\left\langleY,gm\right\rangle\right|2\right\}
This error is related to the covariance of defined by
R[n,m]=E\left\{Y[n]Y*[m]\right\}
For any vector we denote by the covariance operator represented by this matrix,
E\left\{\left|\langleY,x\rangle\right|2\right\}=\langleKx,x\rangle
N-1 | |
=\sum | |
n=0 |
N-1 | |
\sum | |
m=0 |
R[n,m]x[n]x*[m]
The error is therefore a sum of the last coefficients of the covariance operator
\varepsilon
N-1 | |
[M]=\sum | |
m=M |
{\left\langleKgm,gm\right\rangle}
The covariance operator is Hermitian and Positive and is thus diagonalized in an orthogonal basis called a Karhunen–Loève basis. The following theorem states that a Karhunen–Loève basis is optimal for linear approximations.
Theorem (Optimality of Karhunen–Loève basis). Let be a covariance operator. For all, the approximation error
\varepsilon
N-1 | |
[M]=\sum | |
m=M |
\left\langleKgm,gm\right\rangle
is minimum if and only if
\left\{gm\right\}0\le
is a Karhunen–Loeve basis ordered by decreasing eigenvalues.
\left\langleKgm,gm\right\rangle\ge\left\langleKgm+1,gm+1\right\rangle, 0\lem<N-1.
Linear approximations project the signal on M vectors a priori. The approximation can be made more precise by choosing the M orthogonal vectors depending on the signal properties. This section analyzes the general performance of these non-linear approximations. A signal
f\inΗ
Η
\Beta=\left\{gm\right\}m\in
Let
fM
fM=\sum
m\inIM |
\left\langlef,gm\right\ranglegm
The approximation error is the sum of the remaining coefficients
\varepsilon[M]=\left\{\left\|f-fM\right\|2
N-1 | |
\right\}=\sum | |
m\notinIM |
\left\{\left|\left\langlef,gm\right\rangle\right|2\right\}
To minimize this error, the indices in must correspond to the M vectors having the largest inner product amplitude
\left|\left\langlef,gm\right\rangle\right|.
These are the vectors that best correlate f. They can thus be interpreted as the main features of f. The resulting error is necessarily smaller than the error of a linear approximation which selects the M approximation vectors independently of f. Let us sort
\left\{\left|\left\langlef,gm\right\rangle\right|\right\}m\in
in decreasing order
\left|\left\langlef,
g | |
mk |
\right\rangle\right|\ge\left|\left\langlef,
g | |
mk+1 |
\right\rangle\right|.
The best non-linear approximation is
fM=\sum
M | |
k=1 |
\left\langlef,
g | |
mk |
\right\rangle
g | |
mk |
It can also be written as inner product thresholding:
fM=\sum
infty | |
m=0 |
\thetaT\left(\left\langlef,gm\right\rangle\right)gm
with
T=\left|\left\langlef,
g | |
mM |
\right\rangle\right|, \thetaT(x)=\begin{cases}x&|x|\geT\ 0&|x|<T\end{cases}
The non-linear error is
\varepsilon[M]=\left\{\left\|f-fM\right\|2
infty | |
\right\}=\sum | |
k=M+1 |
\left\{\left|\left\langlef,
g | |
mk |
\right\rangle\right|2\right\}
this error goes quickly to zero as M increases, if the sorted values of
\left|\left\langlef,
g | |
mk |
\right\rangle\right|
\Iota\Rho
\|f\|\Beta,=\left(
infty | |
\sum | |
m=0 |
\left|\left\langlef,gm\right\rangle\right|p
| ||||
\right) |
The following theorem relates the decay of to
\|f\|\Beta,
Theorem (decay of error). If
\|f\|\Beta<infty
\varepsilon[M]\le
| |||||||
|
| ||||
M |
\varepsilon[M]=o\left(
| ||||
M |
\right).
\varepsilon[M]=o\left(
| ||||
M |
\right)
\|f\|\Beta<infty
To further illustrate the differences between linear and non-linear approximations, we study the decomposition of a simple non-Gaussian random vector in a Karhunen–Loève basis. Processes whose realizations have a random translation are stationary. The Karhunen–Loève basis is then a Fourier basis and we study its performance. To simplify the analysis, consider a random vector Y[''n''] of size N that is random shift modulo N of a deterministic signal f[''n''] of zero mean
N-1 | |
\sum | |
n=0 |
f[n]=0
Y[n]=f[(n-p)\bmodN]
The random shift P is uniformly distributed on [0, ''N'' − 1]:
\Pr(P=p)=
1 | |
N |
, 0\lep<N
Clearly
E\{Y[n]\}= | 1 |
N |
N-1 | |
\sum | |
p=0 |
f[(n-p)\bmodN]=0
and
R[n,k]=E\{Y[n]Y[k]\}=
1 | |
N |
N-1 | |
\sum | |
p=0 |
f[(n-p)\bmodN]f[(k-p)\bmodN]=
1 | |
N |
f\Theta\bar{f}[n-k], \bar{f}[n]=f[-n]
Hence
R[n,k]=RY[n-k],
R | ||||
|
f\Theta\bar{f}[k]
Since RY is N periodic, Y is a circular stationary random vector. The covariance operator is a circular convolution with RY and is therefore diagonalized in the discrete Fourier Karhunen–Loève basis
\left\{
1 | |
\sqrt{N |
The power spectrum is Fourier transform of :
PY[m]=\hat{R}
|
\left|\hat{f}[m]\right|2
Example: Consider an extreme case where
f[n]=\delta[n]-\delta[n-1]
\left\{gm[n]=\delta[n-m]\right\}0\le
E\left\{\left|\left\langleY[n],
1 | |
\sqrt{N |
Selecting higher frequency Fourier coefficients yields a better mean-square approximation than choosing a priori a few Dirac vectors to perform the approximation. The situation is totally different for non-linear approximations. If
f[n]=\delta[n]-\delta[n-1]
See main article: article and Principal component analysis.
We have established the Karhunen–Loève theorem and derived a few properties thereof. We also noted that one hurdle in its application was the numerical cost of determining the eigenvalues and eigenfunctions of its covariance operator through the Fredholm integral equation of the second kind
b | |
\int | |
a |
KX(s,t)ek(s)ds=λkek(t).
However, when applied to a discrete and finite process
\left(Xn\right)n\in\{1,\ldots,N\
Note that a continuous process can also be sampled at N points in time in order to reduce the problem to a finite version.
We henceforth consider a random N-dimensional vector
X=\left(X1~X2~\ldots~X
T | |
N\right) |
As in the continuous version, we assume that X is centered, otherwise we can let
X:=X-\muX
\muX
Let us adapt the procedure to the discrete case.
Recall that the main implication and difficulty of the KL transformation is computing the eigenvectors of the linear operator associated to the covariance function, which are given by the solutions to the integral equation written above.
Define Σ, the covariance matrix of X, as an N × N matrix whose elements are given by:
\Sigmaij=E[XiXj], \foralli,j\in\{1,\ldots,N\}
Rewriting the above integral equation to suit the discrete case, we observe that it turns into:
N | |
\sum | |
j=1 |
\Sigmaijej=λei \Leftrightarrow \Sigmae=λe
where
e=(e1~e2~\ldots~e
T | |
N) |
The integral equation thus reduces to a simple matrix eigenvalue problem, which explains why the PCA has such a broad domain of applications.
Since Σ is a positive definite symmetric matrix, it possesses a set of orthonormal eigenvectors forming a basis of
\RN
\{λi,\varphii\}i\in\{1,\ldots,N\
\begin{align} \Phi&:=\left(\varphi1~\varphi2~\ldots~\varphi
T\\ \Phi | |
N\right) |
T\Phi&=I \end{align}
It remains to perform the actual KL transformation, called the principal component transform in this case. Recall that the transform was found by expanding the process with respect to the basis spanned by the eigenvectors of the covariance function. In this case, we hence have:
X
N | |
=\sum | |
i=1 |
\langle\varphii,X\rangle\varphii
N | |
=\sum | |
i=1 |
T | |
\varphi | |
i |
X\varphii
In a more compact form, the principal component transform of X is defined by:
\begin{cases}Y=\PhiTX\ X=\PhiY\end{cases}
The i-th component of Y is
Yi=\varphi
T | |
i |
X
\varphii
\varphii
N | |
X=\sum | |
i=1 |
Yi\varphii=\sum
N | |
i=1 |
\langle\varphii,X\rangle\varphii
As in the continuous case, we may reduce the dimensionality of the problem by truncating the sum at some
K\in\{1,\ldots,N\}
| ||||||||||
|
\geq\alpha
where α is the explained variance threshold we wish to set.
We can also reduce the dimensionality through the use of multilevel dominant eigenvector estimation (MDEE).[5]
There are numerous equivalent characterizations of the Wiener process which is a mathematical formalization of Brownian motion. Here we regard it as the centered standard Gaussian process Wt with covariance function
KW(t,s)=\operatorname{cov}(Wt,Ws)=min(s,t).
We restrict the time domain to [''a'', ''b'']=[0,1] without loss of generality.
The eigenvectors of the covariance kernel are easily determined. These are
ek(t)=\sqrt{2}\sin\left(\left(k-\tfrac{1}{2}\right)\pit\right)
λk=
1 | |||||
|
.
Theorem. There is a sequence i of independent Gaussian random variables with mean zero and variance 1 such that
Wt=\sqrt{2}
infty | |
\sum | |
k=1 |
Zk
| ||||||
|
.
t\in[0,1].
Bt=Wt-tW1
KB(t,s)=min(t,s)-ts
Bt=
infty | |
\sum | |
k=1 |
Zk
\sqrt{2 | |
\sin(k |
\pit)}{k\pi}
Adaptive optics systems sometimes use K–L functions to reconstruct wave-front phase information (Dai 1996, JOSA A).Karhunen–Loève expansion is closely related to the Singular Value Decomposition. The latter has myriad applications in image processing, radar, seismology, and the like. If one has independent vector observations from a vector valued stochastic process then the left singular vectors are maximum likelihood estimates of the ensemble KL expansion.
In communication, we usually have to decide whether a signal from a noisy channel contains valuable information. The following hypothesis testing is used for detecting continuous signal s(t) from channel output X(t), N(t) is the channel noise, which is usually assumed zero mean Gaussian process with correlation function
RN(t,s)=E[N(t)N(s)]
H:X(t)=N(t),
K:X(t)=N(t)+s(t), t\in(0,T)
When the channel noise is white, its correlation function is
RN(t)=\tfrac{1}{2}N0\delta(t),
and it has constant power spectrum density. In physically practical channel, the noise power is finite, so:
SN(f)=\begin{cases}
N0 | |
2 |
&|f|<w\ 0&|f|>w\end{cases}
Then the noise correlation function is sinc function with zeros at
n | |
2\omega |
,n\inZ.
\Deltat=
n | |
2\omega |
within(0,''T'').
Let
Xi=X(i\Deltat)
n=
T | |
\Deltat |
=T(2\omega)=2\omegaT
\{X1,X2,\ldots,Xn\}
Si=S(i\Deltat)
H:Xi=Ni,
K:Xi=Ni+Si,i=1,2,\ldots,n.
The log-likelihood ratio
l{L}(\underline{x})=log
| ||||||||||||||||
2\sigma2 |
\Leftrightarrow\Deltat
n | |
\sum | |
i=1 |
Sixi=
n | |
\sum | |
i=1 |
S(i\Deltat)x(i\Deltat)\Deltat\gtrlessλ ⋅ 2
As, let:
G=
T | |
\int | |
0 |
S(t)x(t)dt.
Then G is the test statistics and the Neyman–Pearson optimum detector is
G(\underline{x})>G0 ⇒ K<G0 ⇒ H.
As G is Gaussian, we can characterize it by finding its mean and variances. Then we get
H:G\simN\left(0,\tfrac{1}{2}N0E\right)
K:G\simN\left(E,\tfrac{1}{2}N0E\right)
where
E=
T | |
\int | |
0 |
S2(t)dt
is the signal energy.
The false alarm error
\alpha=
infty | |
\int | |
G0 |
N\left(0,\tfrac{1}{2}N0E\right)dG ⇒ G0=\sqrt{\tfrac{1}{2}N0E}\Phi-1(1-\alpha)
And the probability of detection:
\beta=
infty | |
\int | |
G0 |
N\left(E,\tfrac{1}{2}N0E\right)dG=1-\Phi\left(
G0-E | |
\sqrt{\tfrac{1 |
{2}N0E}}\right)=\Phi\left(\sqrt{
2E | |
N0 |
where Φ is the cdf of standard normal, or Gaussian, variable.
When N(t) is colored (correlated in time) Gaussian noise with zero mean and covariance function
RN(t,s)=E[N(t)N(s)],
N(t)=
infty | |
\sum | |
i=1 |
Ni\Phii(t), 0<t<T,
where
Ni=\intN(t)\Phii(t)dt
\{\Phii{t}\}
RN(t,s)
\int
T | |
0 |
RN(t,s)\Phii(s)ds=λi\Phii(t), \operatorname{var}[Ni]=λi.
Do the expansion:
S(t)=
infty | |
\sum | |
i=1 |
Si\Phii(t),
where
Si=
T | |
\int | |
0 |
S(t)\Phii(t)dt
Xi=
T | |
\int | |
0 |
X(t)\Phii(t)dt=Ni
under H and
Ni+Si
\overline{X}=\{X1,X2,...\}
Ni
λi
under H:
\{Xi\}
fH[x(t)|0<t<T]=fH(\underline{x})=
infty | |
\prod | |
i=1 |
1 | |
\sqrt{2\piλi |
under K:
\{Xi-Si\}
fK[x(t)\mid0<t<T]=fK(\underline{x})=
infty | |
\prod | |
i=1 |
1 | |
\sqrt{2\piλi |
Hence, the log-LR is given by
l{L}(\underline{x})=
infty | |
\sum | |
i=1 |
| |||||||||||||
2λi |
and the optimum detector is
G=
infty | |
\sum | |
i=1 |
Sixiλi>G0 ⇒ K,<G0 ⇒ H.
Define
k(t)=
infty | |
\sum | |
i=1 |
λiSi\Phii(t),0<t<T,
then
G=
T | |
\int | |
0 |
k(t)x(t)dt.
Since
T | |
\int | |
0 |
RN(t,s)k(s)ds=
infty | |
\sum | |
i=1 |
λiSi
T | |
\int | |
0 |
RN(t,s)\Phii(s)ds=
infty | |
\sum | |
i=1 |
Si\Phii(t)=S(t),
k(t) is the solution to
T | |
\int | |
0 |
RN(t,s)k(s)ds=S(t).
If N(t)is wide-sense stationary,
T | |
\int | |
0 |
RN(t-s)k(s)ds=S(t),
which is known as the Wiener–Hopf equation. The equation can be solved by taking fourier transform, but not practically realizable since infinite spectrum needs spatial factorization. A special case which is easy to calculate k(t) is white Gaussian noise.
T | |
\int | |
0 |
N0 | |
2 |
\delta(t-s)k(s)ds=S(t) ⇒ k(t)=CS(t), 0<t<T.
The corresponding impulse response is h(t) = k(T − t) = CS(T − t). Let C = 1, this is just the result we arrived at in previous section for detecting of signal in white noise.
Since X(t) is a Gaussian process,
G=
T | |
\int | |
0 |
k(t)x(t)dt,
is a Gaussian random variable that can be characterized by its mean and variance.
\begin{align} E[G\midH]&=
T | |
\int | |
0 |
k(t)E[x(t)\midH]dt=0\\ E[G\midK]&=
T | |
\int | |
0 |
k(t)E[x(t)\midK]dt=
T | |
\int | |
0 |
k(t)S(t)dt\equiv\rho\\ E[G2\midH]&=
T | |
\int | |
0 |
T | |
\int | |
0 |
k(t)k(s)RN(t,s)dtds=
T | |
\int | |
0 |
k(t)\left
T | |
(\int | |
0 |
k(s)RN(t,s)ds\right)=
T | |
\int | |
0 |
k(t)S(t)dt=\rho\\ \operatorname{var}[G\midH]&=E[G2\midH]-(E[G\midH])2=\rho\\ E[G2\midK]
T | |
&=\int | |
0k(t)k(s) |
E[x(t)x(s)]dtds=
T | |
\int | |
0k(t)k(s)(R |
N(t,s)+S(t)S(s))dtds=\rho+\rho2\\ \operatorname{var}[G\midK]&=E[G2|K]-(E[G|K])2=\rho+\rho2-\rho2=\rho \end{align}
Hence, we obtain the distributions of H and K:
H:G\simN(0,\rho)
K:G\simN(\rho,\rho)
The false alarm error is
\alpha=
infty | |
\int | |
G0 |
N(0,\rho)dG=1-\Phi\left(
G0 | |
\sqrt{\rho |
So the test threshold for the Neyman–Pearson optimum detector is
G0=\sqrt{\rho}\Phi-1(1-\alpha).
Its power of detection is
\beta=
infty | |
\int | |
G0 |
N(\rho,\rho)dG=\Phi\left(\sqrt{\rho}-\Phi-1(1-\alpha)\right)
When the noise is white Gaussian process, the signal power is
\rho=
T | |
\int | |
0 |
k(t)S(t)dt=
T | |
\int | |
0 |
S(t)2dt=E.
For some type of colored noise, a typical practise is to add a prewhitening filter before the matched filter to transform the colored noise into white noise. For example, N(t) is a wide-sense stationary colored noise with correlation function
RN(\tau)=
BN0 | |
4 |
e-B|\tau|
SN(f)=
N0 | ||||
|
The transfer function of prewhitening filter is
H(f)=1+j
w | |
B |
.
When the signal we want to detect from the noisy channel is also random, for example, a white Gaussian process X(t), we can still implement K–L expansion to get independent sequence of observation. In this case, the detection problem is described as follows:
H0:Y(t)=N(t)
H1:Y(t)=N(t)+X(t), 0<t<T.
X(t) is a random process with correlation function
RX(t,s)=E\{X(t)X(s)\}
The K–L expansion of X(t) is
X(t)=
infty | |
\sum | |
i=1 |
Xi\Phii(t),
where
Xi=
T | |
\int | |
0 |
X(t)\Phii(t)dt
and
\Phii(t)
T | |
\int | |
0 |
RX(t,s)\Phii(s)ds=λi\Phii(t).
So
Xi
λi
\Phii(t)
Yi=
T | |
\int | |
0 |
Y(t)\Phii(t)dt=
T | |
\int | |
0 |
[N(t)+X(t)]\Phii(t)=Ni+Xi,
where
Ni=
T | |
\int | |
0 |
N(t)\Phii(t)dt.
As N(t) is Gaussian white noise,
Ni
\tfrac{1}{2}N0
H0:Yi=Ni
H1:Yi=Ni+Xi
The Neyman–Pearson optimal test:
Λ=
fY\midH1 | |
fY\midH0 |
=
| ||||||||||||||||||||||
Ce |
N0(\tfrac{1}{2}N0+λi)}},
so the log-likelihood ratio is
l{L}=ln(Λ)=K
infty | |
-\sum | |
i=1 |
2 | |
\tfrac{1}{2}y | |
i |
λi | |||||||
|
.
Since
\widehat{X}i=
λi | |||||||
|
is just the minimum-mean-square estimate of
Xi
Yi
l{L}=K+
1 | |
N0 |
infty | |
\sum | |
i=1 |
Yi\widehat{X}i.
K–L expansion has the following property: If
f(t)=\sumfi\Phii(t),g(t)=\sumgi\Phii(t),
where
fi=
T | |
\int | |
0 |
f(t)\Phii(t)dt, gi=
T | |
\int | |
0 |
g(t)\Phii(t)dt.
then
infty | |
\sum | |
i=1 |
figi=
T | |
\int | |
0 |
g(t)f(t)dt.
So let
\widehat{X}(t\midT)=
infty | |
\sum | |
i=1 |
\widehat{X}i\Phii(t), l{L}=K+
1 | |
N0 |
T | |
\int | |
0 |
Y(t)\widehat{X}(t\midT)dt.
Noncausal filter Q(t,s) can be used to get the estimate through
\widehat{X}(t\midT)=
T | |
\int | |
0 |
Q(t,s)Y(s)ds.
By orthogonality principle, Q(t,s) satisfies
T | |
\int | |
0 |
Q(t,s)RX(s,t)ds+\tfrac{N0}{2}Q(t,λ)=RX(t,λ),0<λ<T,0<t<T.
However, for practical reasons, it's necessary to further derive the causal filter h(t,s), where h(t,s) = 0 for s > t, to get estimate
\widehat{X}(t\midt)
Q(t,s)=h(t,s)+h(s,t)-
T | |
\int | |
0 |
h(λ,t)h(s,λ)dλ