In statistics, the theory of minimum norm quadratic unbiased estimation (MINQUE)[1] [2] [3] was developed by C. R. Rao. MINQUE is a theory alongside other estimation methods in estimation theory, such as the method of moments or maximum likelihood estimation. Similar to the theory of best linear unbiased estimation, MINQUE is specifically concerned with linear regression models. The method was originally conceived to estimate heteroscedastic error variance in multiple linear regression. MINQUE estimators also provide an alternative to maximum likelihood estimators or restricted maximum likelihood estimators for variance components in mixed effects models. MINQUE estimators are quadratic forms of the response variable and are used to estimate a linear function of the variances.
We are concerned with a mixed effects model for the random vector
Y\inRn
Y=X\boldsymbol\beta+U1\boldsymbol\xi1+ … +Uk\boldsymbol\xik
Here,
X\inRn x
\boldsymbol\beta\inRm
Ui\in
n x ci | |
R |
i
ci | |
\boldsymbol\xi | |
i\inR |
i
E[\boldsymbol\xii]=0
2 | |
V[\boldsymbol\xi | |
iI |
ci |
V[\boldsymbol\xii, \boldsymbol\xij]=0 \foralli ≠ j
2 | |
\sigma | |
k |
This is a general model that captures commonly used linear regression models.
U1=In
Y=X\boldsymbol\beta+\boldsymbol\epsilon
E[\boldsymbol\epsilon]=0
2 | |
V[\boldsymbol\epsilon]=\sigma | |
1 |
In
Y
Ui
A compact representation for the model is the following, where
U=\left[\begin{array}{c|c|c}U1& … &Uk\end{array}\right]
\boldsymbol\xi\top=
\top\end{array}\right] | |
\left[\begin{array}{c|c|c} \boldsymbol\xi | |
k |
Y=X\boldsymbol\beta+U\boldsymbol\xi
Note that this model makes no distributional assumptions about
Y
E[Y]=X\boldsymbol\beta
2 | |
V[Y]=\sigma | |
1U |
1U
\top | |
1 |
+ …
2 | |
+ \sigma | |
k |
Uk
\top \equiv | |
U | |
k |
2 | |
\sigma | |
1V |
1+ … +
2 | |
\sigma | |
k |
Vk
The goal in MINQUE is to estimate
\theta=
k | |
\sum | |
i=1 |
pi
2 | |
\sigma | |
i |
\hat{\theta}=Y\topAY
A
Consider a new fixed-effect parameter
\boldsymbol\gamma=\boldsymbol\beta-\boldsymbol\beta0
Y-X\boldsymbol\beta0= X\boldsymbol\gamma+U\boldsymbol\xi
Under this equivalent model, the MINQUE estimator is now
(Y-
\top | |
X\boldsymbol\beta | |
0) |
A (Y-X\boldsymbol\beta0)
Y\topAY
A
AX=0
Y\topAY
Suppose that we constrain
AX=0
\begin{align} \hat{\theta}&=Y\topAY\\ &=(X\boldsymbol\beta+U\boldsymbol\xi)\topA(X\boldsymbol\beta+U\boldsymbol\xi)\\ &=\boldsymbol\xi\topU\topAU\boldsymbol\xi \end{align}
To ensure that this estimator is unbiased, the expectation of the estimator
E[\hat{\theta}]
\theta
\boldsymbol\xii
\begin{align} E[\hat{\theta}]&=E[\boldsymbol\xi\topU\topAU\boldsymbol\xi]\\ &=
k | |
\sum | |
i=1 |
\topAU | |
E[\boldsymbol\xi | |
i\boldsymbol\xi |
i]\\ &=
k | |
\sum | |
i=1 |
2 | |
\sigma | |
i |
\top | |
Tr[U | |
i |
AUi] \end{align}
To ensure that this estimator is unbiased, Rao suggested setting
k | |
\sum | |
i=1 |
2 | |
\sigma | |
i |
\top | |
Tr[U | |
i |
AUi]=
k | |
\sum | |
i=1 |
pi
2 | |
\sigma | |
i |
A
\top | |
Tr[U | |
i |
AUi]=Tr[AVi]=pi
Rao argues that if
\boldsymbol\xi
\theta
\top\boldsymbol\xi | |
E[\boldsymbol\xi | |
i]=c |
i
2 | |
\sigma | |
i |
\boldsymbol\Delta
p1 | |
c1 |
\top\boldsymbol\xi | |
\boldsymbol\xi | |
1 |
+ … +
pk | |
ck |
\top\boldsymbol\xi | |
\boldsymbol\xi | |
k |
=
| |||||
\boldsymbol\xi | , … , |
pk | |
ck |
\right)\right]\boldsymbol\xi \equiv\boldsymbol\xi\top\boldsymbol\Delta\boldsymbol\xi
The difference between the proposed estimator and the natural estimator is
\boldsymbol\xi\top(U\topAU-\boldsymbol\Delta)\boldsymbol\xi
\lVertU\topAU-\boldsymbol\Delta\rVert
Given the constraints and optimization strategy derived from the optimal properties above, the MINQUE estimator
\hat{\theta}
k | |
\theta=\sum | |
i=1 |
pi\sigma
2 | |
i |
A
\lVertU\topAU-\boldsymbol\Delta\rVert
AX=0
Tr[AVi]=pi
In the Gauss-Markov model, the error variance
\sigma2
s2=
1 | |
n-m |
(Y-X\hat{\boldsymbol\beta})\top(Y-X\hat{\boldsymbol\beta})
This estimator is unbiased and can be shown to minimize the Euclidean norm of the form
\lVertU\topAU-\boldsymbol\Delta\rVert
For random variables
Y1, … ,Yn
2 | |
\sigma | |
n |
2 | |
\sigma | |
i |
n | |
n-2 |
(Yi-\overline{Y})2-
s2 | |
n-2 |
\overline{Y}=
1 | |
n |
n | |
\sum | |
i=1 |
Yi
s2=
1 | |
n-1 |
n | |
\sum | |
i=1 |
(Yi-\overline{Y})2
Rao proposed a MINQUE estimator for the variance components model based on minimizing the Euclidean norm. The Euclidean norm
\lVert ⋅ \rVert2
V=V1+ … +Vk=UU\top
Tr[U\topAU\boldsymbol\Delta]=Tr[AU\boldsymbol\DeltaU\top]
k | |
= Tr\left[\sum | |
i=1 |
pi | |
ci |
AVi\right]= Tr[\boldsymbol\Delta\boldsymbol\Delta]
\begin{align} \lVertU\topAU-\boldsymbol\Delta
2 | |
\rVert | |
2 |
&=(U\topAU-\boldsymbol\Delta)\top(U\topAU-\boldsymbol\Delta)\\ &=Tr[U\topAUUAU\top]-Tr[2U\topAU\boldsymbol\Delta]+Tr[\boldsymbol\Delta\boldsymbol\Delta]\\ &=Tr[AVAV]-Tr[\boldsymbol\Delta\boldsymbol\Delta] \end{align}
Note that since
Tr[\boldsymbol\Delta\boldsymbol\Delta]
A
A
Tr[AVAV]
Rao showed that the matrix
A
A\star=\sum
k | |
i=1 |
λiRViR
where
R=V-1(I-P)
P=X(X\topV-1X)-X\topV-1
X
( ⋅ )-
Therefore, the MINQUE estimator is the following, where the vectors
\boldsymbolλ
Q
\begin{align} \hat{\theta}&=Y\topA\starY\\ &=
k | |
\sum | |
i=1 |
λi
\topRV | |
Y | |
iRY\\ &\equiv\sum |
k | |
i=1 |
λiQi\\ &\equiv\boldsymbolλ\topQ \end{align}
The vector
\boldsymbolλ
Tr[A\starVi]=pi
\forallj\in\{1, … ,k\}
\begin{align} Tr[A\starVj]&=pj\\ Tr\left[
k | |
\sum | |
i=1 |
λiRViRVj\right]&=pj\\ \sum
k | |
i=1 |
λiTr[RViRVj]&=pj \end{align}
This can be written as a matrix product
S\boldsymbolλ=p
p=[p1 … p
\top | |
k] |
S
S=\begin{bmatrix} Tr[RV1RV1]& … &Tr[RVkRV1]\\ \vdots&\ddots&\vdots\\ Tr[RV1RVk]& … &Tr[RVkRVk] \end{bmatrix}
Then,
\boldsymbolλ=S-p
\hat{\theta}=\boldsymbolλ\topQ=p\top(S-)\topQ=p\topS-Q
k | |
\theta=\sum | |
i=1 |
pi
2 | |
\sigma | |
i |
=p\top\boldsymbol\sigma
\boldsymbol\sigma=
\top | |
[\sigma | |
k] |
\hat{\boldsymbol\sigma}=S-Q
MINQUE estimators can be obtained without the invariance criteria, in which case the estimator is only unbiased and minimizes the norm. Such estimators have slightly different constraints on the minimization problem.
The model can be extended to estimate covariance components. In such a model, the random effects of a component are assumed to have a common covariance structure
V[\boldsymbol\xii]=\boldsymbol\Sigma
V[\boldsymbol\xii]=\boldsymbol\Sigma
i\in \{1, … ,s\}
V[\boldsymbol\xii]= \sigma
2I | |
ci |
i\in\{s+1, … ,k\}