In statistics and signal processing, the orthogonality principle is a necessary and sufficient condition for the optimality of a Bayesian estimator. Loosely stated, the orthogonality principle says that the error vector of the optimal estimator (in a mean square error sense) is orthogonal to any possible estimator. The orthogonality principle is most commonly stated for linear estimators, but more general formulations are possible. Since the principle is a necessary and sufficient condition for optimality, it can be used to find the minimum mean square error estimator.
The orthogonality principle is most commonly used in the setting of linear estimation.[1] In this context, let x be an unknown random vector which is to be estimated based on the observation vector y. One wishes to construct a linear estimator
\hat{x}=Hy+c
\hat{x}
\operatorname{E}\{(\hat{x}-x)yT\}=0,
\operatorname{E}\{\hat{x}-x\}=0.
Suppose x is a Gaussian random variable with mean m and variance
2. | |
\sigma | |
x |
y=x+w,
2. | |
\sigma | |
w |
\hat{x}=hy+c
\hat{x}=hy+c
0=\operatorname{E}\{(\hat{x}-x)y\}
0=\operatorname{E}\{(hx+hw+c-x)(x+w)\}
0=h
2) | |
(\sigma | |
w |
+hm2+cm-
2 | |
\sigma | |
x |
-m2
0=\operatorname{E}\{\hat{x}-x\}
0=\operatorname{E}\{hx+hw+c-x\}
0=(h-1)m+c.
h=
| |||||||||||||||
|
, c=
| |||||||
|
m,
\hat{x}=
| |||||||||||||||
|
y+
| |||||||||||||||
|
m.
This estimator can be interpreted as a weighted average between the noisy measurements y and the prior expected value m. If the noise variance
2 | |
\sigma | |
w |
2 | |
\sigma | |
x |
Finally, note that because the variables x and y are jointly Gaussian, the minimum MSE estimator is linear.[2] Therefore, in this case, the estimator above minimizes the MSE among all estimators, not only linear estimators.
Let
V
\langlex,y\rangle=\operatorname{E}\{xHy\}
W
V
\hat{x}\inW
x\inV
\operatorname{E}\|x-\hat{x}\|2
\hat{x}
x
In the special case of linear estimators described above, the space
V
x
y
W
y
Geometrically, we can see this problem by the following simple case where
W
We want to find the closest approximation to the vector
x
\hat{x}
W
e
W
More accurately, the general orthogonality principle states the following: Given a closed subspace
W
V
x
V
\hat{x}\inW
W
\operatorname{E}\{(x-\hat{x})yT\}=0
y\inW.
Stated in such a manner, this principle is simply a statement of the Hilbert projection theorem. Nevertheless, the extensive use of this result in signal processing has resulted in the name "orthogonality principle."
The following is one way to find the minimum mean square error estimator by using the orthogonality principle.
We want to be able to approximate a vector
x
x=\hat{x}+e
where
\hat{x}=\sumicipi
is the approximation of
x
W
p1,p2,\ldots.
ci
By the orthogonality theorem, the square norm of the error vector,
\left\Verte\right\Vert2
\left\langlex-\sumicipi,pj\right\rangle=0.
Developing this equation, we obtain
\left\langlex,pj\right\rangle=\left\langle\sumicipi,pj\right\rangle=\sumici\left\langlepi,pj\right\rangle.
If there is a finite number
n
pi
\begin{bmatrix} \left\langlex,p1\right\rangle\\ \left\langlex,p2\right\rangle\\ \vdots\\ \left\langlex,pn\right\rangle\end{bmatrix} = \begin{bmatrix} \left\langlep1,p1\right\rangle&\left\langlep2,p1\right\rangle& … &\left\langlepn,p1\right\rangle\\ \left\langlep1,p2\right\rangle&\left\langlep2,p2\right\rangle& … &\left\langlepn,p2\right\rangle\\ \vdots&\vdots&\ddots&\vdots\\ \left\langlep1,pn\right\rangle&\left\langlep2,pn\right\rangle& … &\left\langlepn,pn\right\rangle\end{bmatrix} \begin{bmatrix} c1\\ c2\\ \vdots\\ cn\end{bmatrix}.
Assuming the
pi
\begin{bmatrix} c1\\ c2\\ \vdots\\ cn\end{bmatrix} = \begin{bmatrix} \left\langlep1,p1\right\rangle&\left\langlep2,p1\right\rangle& … &\left\langlepn,p1\right\rangle\\ \left\langlep1,p2\right\rangle&\left\langlep2,p2\right\rangle& … &\left\langlepn,p2\right\rangle\\ \vdots&\vdots&\ddots&\vdots\\ \left\langlep1,pn\right\rangle&\left\langlep2,pn\right\rangle& … &\left\langlepn,pn\right\rangle\end{bmatrix}-1\begin{bmatrix} \left\langlex,p1\right\rangle\\ \left\langlex,p2\right\rangle\\ \vdots\\ \left\langlex,pn\right\rangle\end{bmatrix},
thus providing an expression for the coefficients
ci