In statistics, particularly regression analysis, the Working–Hotelling procedure, named after Holbrook Working and Harold Hotelling, is a method of simultaneous estimation in linear regression models. One of the first developments in simultaneous inference, it was devised by Working and Hotelling for the simple linear regression model in 1929.[1] It provides a confidence region for multiple mean responses, that is, it gives the upper and lower bounds of more than one value of a dependent variable at several levels of the independent variables at a certain confidence level. The resulting confidence bands are known as the Working–Hotelling–Scheffé confidence bands.
Like the closely related Scheffé's method in the analysis of variance, which considers all possible contrasts, the Working–Hotelling procedure considers all possible values of the independent variables; that is, in a particular regression model, the probability that all the Working–Hotelling confidence intervals cover the true value of the mean response is the confidence coefficient. As such, when only a small subset of the possible values of the independent variable is considered, it is more conservative and yields wider intervals than competitors like the Bonferroni correction at the same level of confidence. It outperforms the Bonferroni correction as more values are considered.
Consider a simple linear regression model
Y=\beta0+\beta1X+\varepsilon
Y
X
b0
b1
\beta0
\beta1
E(Yi)
X=xi
\hat{Yi}=b0+b1xi
1-\alpha
X
\hat{y}i\in\left[b0+b1xi\pmt\alpha/2,df=n\sqrt{\left(
1 | |
n-2 |
n | |
\sum | |
j=1 |
2 | |
e | |
j |
\right) ⋅ \left(
1 | |
n |
+
(xi-\bar{x | |
) |
n | |
j=1 |
(xj-\bar{x})2}\right)}\right],
where
\left( | 1 |
n-2 |
n | |
\sum | |
j=1 |
2 | |
e | |
j |
\right)
t\alpha/2,df=n
\alpha | |
2 |
th
n-2
However, as multiple mean responses are estimated, the confidence level declines rapidly. To fix the confidence coefficient at
1-\alpha
\hat{y}i\in\left[b0+b1xi\pmW\sqrt{\left(
1 | |
n-2 |
n | |
\sum | |
j=1 |
2 | |
e | |
j |
\right) ⋅ \left(
1 | |
n |
+
(xi-\bar{x | |
) |
n(x | |
j |
-\bar{x})2}\right)}\right],
where
W2=2F\alpha,df=(2,n-2)
F
\alphath
(2,n-2)
1-\alpha
X
xi\inR
The Working–Hotelling confidence bands can be easily generalised to multiple linear regression. Consider a general linear model as defined in the linear regressions article, that is,
Y=X\boldsymbol\beta+\boldsymbol\varepsilon,
Y=\begin{pmatrix}Y1\ Y2\ \vdots\ Yn\end{pmatrix}, X=\begin{pmatrix}
\rmT | |
x | |
1 |
\rmT | |
\ x | |
2 |
\ \vdots
\rmT | |
\ x | |
n |
\end{pmatrix}=\begin{pmatrix}x11& … &x1p\\ x21& … &x2p\\ \vdots&\ddots&\vdots\\ xn1& … &xnp\end{pmatrix},\boldsymbol\beta=\begin{pmatrix}\beta1\ \beta2\ \vdots\ \betap\end{pmatrix}, \boldsymbol\varepsilon=\begin{pmatrix}\varepsilon1\ \varepsilon2\ \vdots\ \varepsilonn\end{pmatrix}.
Again, it can be shown that the least-squares estimate of the mean response
E(Yi)=
\rmT | |
x | |
i |
\boldsymbol\beta
\hat{Y}i=
\rmT | |
x | |
i |
b
b
\boldsymbol\beta
b=(X\rmX)-1X\rmY
1-\alpha
\hat{y}i\in\left[
\rmT | |
x | |
i |
b\pmt\alpha/2,df=n
\rmT | |
\sqrt{\operatorname{MSE}(x | |
i |
(X\rmX)-1xi})\right],
where is the observed value of the mean squared error
(Y\rmY-b\rmX\rmY)
The Working–Hotelling approach to multiple estimations is similar to that of simple linear regression, with only a change in the degrees of freedom:
\hat{y}i\in\left[
\rmT | |
x | |
i |
b\pmW
\rmT | |
\sqrt{\operatorname{MSE}(x | |
i |
(X\rmX)-1xi})\right],
where
W2=2F\alpha,df=(p,n-p)
In the simple linear regression case, Working–Hotelling–Scheffé confidence bands, drawn by connecting the upper and lower limits of the mean response at every level, take the shape of hyperbolas. In drawing, they are sometimes approximated by the Graybill–Bowden confidence bands, which are linear and hence easier to graph:
\beta0+\beta1(xi-\bar{x})\in\left[b0+b1(xi-\bar{x})\pmm\alpha, ⋅ \left(
1 | |
\sqrtn |
+
|xi-\barx| | |||||||||
|
where
m\alpha,
\alphath
n-2
The same data in ordinary least squares are utilised in this example:
Height (m) | 1.47 | 1.50 | 1.52 | 1.55 | 1.57 | 1.60 | 1.63 | 1.65 | 1.68 | 1.70 | 1.73 | 1.75 | 1.78 | 1.80 | 1.83 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Weight (kg) | 52.21 | 53.12 | 54.48 | 55.84 | 57.20 | 58.57 | 59.93 | 61.29 | 63.11 | 64.47 | 66.28 | 68.10 | 69.92 | 72.19 | 74.46 |
A simple linear regression model is fit to this data. The values of
b0
b1
W2
F0.95,=2.758828
\bar{x}=1.651
n | |
\sum | |
j=1 |
2 | |
e | |
j |
=7.490558
\operatorname{MSE}=0.5761968
n | |
\sum | |
j=1 |
(xj-\bar{x})2=693.3726
\hat{y}i\in\left[-39.06+61.27xi\pm\sqrt{2.758828 ⋅ 0.5761968 ⋅ \left(
1 | |
15 |
+
(xi-1.651)2 | |
693.3726 |
\right)}\right],
which results in the graph on the left.
The Working–Hotelling approach may give tighter or looser confidence limits compared to the Bonferroni correction. In general, for small families of statements, the Bonferroni bounds may be tighter, but when the number of estimated values increases, the Working–Hotelling procedure will yield narrower limits. This is because the confidence level of Working–Hotelling–Scheffé bounds is exactly
1-\alpha
xi\inR
\pm\sqrt{W}
\pmt1-\alpha/g,
g
Another alternative to the Working–Hotelling–Scheffé band is the Gavarian band, which is used when a confidence band is needed that maintains equal widths at all levels.[5]
The Working–Hotelling procedure is based on the same principles as Scheffé's method, which gives family confidence intervals for all possible contrasts.[6] Their proofs are almost identical. This is because both methods estimate linear combinations of mean response at all factor levels. However, the Working–Hotelling procedure does not deal with contrasts but with different levels of the independent variable, so there is no requirement that the coefficients of the parameters sum up to zero. Therefore, it has one more degree of freedom.