Projection matrix explained
In statistics, the projection matrix
,
[1] sometimes also called the
influence matrix[2] or
hat matrix
, maps the vector of response values (dependent variable values) to the vector of
fitted values (or predicted values). It describes the influence each response value has on each fitted value.
[3] [4] The diagonal elements of the projection matrix are the
leverages, which describe the influence each response value has on the fitted value for that same observation.
Definition
If the vector of response values is denoted by
and the vector of fitted values by
},
} = \mathbf \mathbf.As
} is usually pronounced "y-hat", the projection matrix
is also named
hat matrix as it "puts a
hat on
".
Application for residuals
The formula for the vector of residuals
can also be expressed compactly using the projection matrix:
} = \mathbf - \mathbf \mathbf = \left(\mathbf - \mathbf \right) \mathbf.where
is the
identity matrix. The matrix
is sometimes referred to as the
residual maker matrix or the
annihilator matrix.
The covariance matrix of the residuals
, by
error propagation, equals
\Sigmar=\left(I-P\right)sf{T}\Sigma\left(I-P\right)
, where
\Sigma</matH>isthe[[covariancematrix]]oftheerrorvector(andbyextension,theresponsevectoraswell).Forthecaseoflinearmodelswith[[independentandidenticallydistributed]]errorsinwhich<math>\Sigma=\sigma2I
, this reduces to:
\Sigmar=\left(I-P\right)\sigma2
.
Intuition
From the figure, it is clear that the closest point from the vector
onto the column space of
, is
, and is one where we can draw a line orthogonal to the column space of
. A vector that is orthogonal to the column space of a matrix is in the nullspace of the matrix transpose, so
.
From there, one rearranges, so
\begin{align}
&&Asf{T}b&-Asf{T}Ax=0\\
⇒ &&Asf{T}b&=Asf{T}Ax\\
⇒ &&x&=\left(Asf{T}A\right)-1Asf{T}b
\end{align}
.
Therefore, since
is on the column space of
, the projection matrix, which maps
onto
is just
, or
A\left(Asf{T}A\right)-1Asf{T}
.
Linear model
Suppose that we wish to estimate a linear model using linear least squares. The model can be written as
y=X\boldsymbol\beta+\boldsymbol\varepsilon,
where
is a matrix of explanatory variables (the
design matrix),
β is a vector of unknown parameters to be estimated, and
ε is the error vector.
Many types of models and techniques are subject to this formulation. A few examples are linear least squares, smoothing splines, regression splines, local regression, kernel regression, and linear filtering.
Ordinary least squares
When the weights for each observation are identical and the errors are uncorrelated, the estimated parameters are
\hat{\boldsymbol\beta}=\left(Xsf{T}X\right)-1Xsf{T}y,
so the fitted values are
} = \mathbf \hat = \mathbf \left(\mathbf^\textsf \mathbf \right)^ \mathbf^\textsf \mathbf.
Therefore, the projection matrix (and hat matrix) is given by
P:=X\left(Xsf{T}X\right)-1Xsf{T}.
Weighted and generalized least squares
The above may be generalized to the cases where the weights are not identical and/or the errors are correlated. Suppose that the covariance matrix of the errors is Σ. Then since
\hat{\beta}GLS=\left(Xsf{T}\Sigma-1X\right)-1Xsf{T}\Sigma-1y
.
the hat matrix is thus
H=X\left(Xsf{T}\Sigma-1X\right)-1Xsf{T}\Sigma-1
and again it may be seen that
, though now it is no longer symmetric.
Properties
The projection matrix has a number of useful algebraic properties.[5] [6] In the language of linear algebra, the projection matrix is the orthogonal projection onto the column space of the design matrix
. (Note that
\left(Xsf{T}X\right)-1Xsf{T}
is the pseudoinverse of X.) Some facts of the projection matrix in this setting are summarized as follows:
and
is symmetric, and so is
.
is idempotent:
, and so is
.
is an matrix with
, then
consist of
r ones and zeros, while the eigenvalues of
consist of ones and
r zeros.
[7]
is invariant under
:
hence
.
\left(I-P\right)P=P\left(I-P\right)=0.
is unique for certain subspaces.The projection matrix corresponding to a
linear model is
symmetric and
idempotent, that is,
. However, this is not always the case; in
locally weighted scatterplot smoothing (LOESS), for example, the hat matrix is in general neither symmetric nor idempotent.
For linear models, the trace of the projection matrix is equal to the rank of
, which is the number of independent parameters of the linear model.
[8] For other models such as LOESS that are still linear in the observations
, the projection matrix can be used to define the effective degrees of freedom of the model.
Practical applications of the projection matrix in regression analysis include leverage and Cook's distance, which are concerned with identifying influential observations, i.e. observations which have a large effect on the results of a regression.
Blockwise formula
Suppose the design matrix
can be decomposed by columns as
X=\begin{bmatrix}A&B\end{bmatrix}
.Define the hat or projection operator as
P[X]:=X\left(Xsf{T}X\right)-1Xsf{T}
. Similarly, define the residual operator as
.Then the projection matrix can be decomposed as follows:
[9]
where, e.g.,
P[A]=A\left(Asf{T}A\right)-1Asf{T}
and
.There are a number of applications of such a decomposition. In the classical application
is a column of all ones, which allows one to analyze the effects of adding an intercept term to a regression. Another use is in the
fixed effects model, where
is a large
sparse matrix of the dummy variables for the fixed effect terms. One can use this partition to compute the hat matrix of
without explicitly forming the matrix
, which might be too large to fit into computer memory.
History
The hat matrix was introduced by John Wilder in 1972. An article by Hoaglin, D.C. and Welsch, R.E. (1978) gives the properties of the matrix and also many examples of its application.
See also
Notes and References
- Book: Basilevsky, Alexander . Applied Matrix Algebra in the Statistical Sciences . Dover . 2005 . 0-486-44538-0 . 160–176 .
- Web site: Data Assimilation: Observation influence diagnostic of a data assimilation system . https://web.archive.org/web/20140903115021/http://old.ecmwf.int/newsevents/training/lecture_notes/pdf_files/ASSIM/ObservationInfluence.pdf . dead . 2014-09-03 .
- The Hat Matrix in Regression and ANOVA. David C. . Hoaglin . Roy E. . Welsch . . 32 . February 1978. 17–22 . 10.2307/2683469 . 1. 2683469 . 1721.1/1920 . free .
- Book: David A. Freedman . David A. Freedman . 2009. Statistical Models: Theory and Practice . Cambridge University Press.
- Book: Gans, P. . 1992 . Data Fitting in the Chemical Sciences . registration . Wiley . 0-471-93412-7 .
- Book: Draper, N. R. . Smith . H. . 1998 . Applied Regression Analysis . Wiley . 0-471-17082-8 .
- Book: Amemiya, Takeshi . Advanced Econometrics . Cambridge . Harvard University Press . 1985 . 0-674-00560-0 . 460–461 . registration .
- Web site: Proof that trace of 'hat' matrix in linear regression is rank of X . Stack Exchange . April 13, 2017 .
- Book: Rao. C. Radhakrishna. Toutenburg. Helge. Shalabh. Christian. Heumann. Linear Models and Generalizations. limited. 2008. Springer. Berlin. 978-3-540-74226-5. 323. 3rd.