In statistics, the residual sum of squares (RSS), also known as the sum of squared residuals (SSR) or the sum of squared estimate of errors (SSE), is the sum of the squares of residuals (deviations predicted from actual empirical values of data). It is a measure of the discrepancy between the data and an estimation model, such as a linear regression. A small RSS indicates a tight fit of the model to the data. It is used as an optimality criterion in parameter selection and model selection.
In general, total sum of squares = explained sum of squares + residual sum of squares. For a proof of this in the multivariate ordinary least squares (OLS) case, see partitioning in the general OLS model.
In a model with a single explanatory variable, RSS is given by:[1]
\operatorname{RSS}=
n | |
\sum | |
i=1 |
(yi-
2 | |
f(x | |
i)) |
where yi is the ith value of the variable to be predicted, xi is the ith value of the explanatory variable, and
f(xi)
\hat{yi}
yi=\alpha+\betaxi+\varepsiloni
\alpha
\beta
\widehat{\varepsilon}i
\operatorname{RSS}=
n | |
\sum | |
i=1 |
2 | |
(\widehat{\varepsilon} | |
i) |
=
n | |
\sum | |
i=1 |
(yi-(\widehat{\alpha}+\widehat{\beta}
2 | |
x | |
i)) |
where
\widehat{\alpha}
\alpha
\widehat{\beta}
\beta
The general regression model with observations and explanators, the first of which is a constant unit vector whose coefficient is the regression intercept, is
y=X\beta+e
where is an n × 1 vector of dependent variable observations, each column of the n × k matrix is a vector of observations on one of the k explanators,
\beta
\beta
X\hat\beta=y\iff
X\operatorname{T}X\hat\beta=X\operatorname{T}y\iff
\hat\beta=(X\operatorname{T}X)-1X\operatorname{T}y.
The residual vector
\hate=y-X\hat\beta=y-X(X\operatorname{T}X)-1X\operatorname{T}y
\operatorname{RSS}=\hate\operatorname{T}\hate=\|\hate\|2
(equivalent to the square of the norm of residuals). In full:
\operatorname{RSS}=y\operatorname{T}y-y\operatorname{T}X(X\operatorname{T}X)-1X\operatorname{T}y=y\operatorname{T}[I-X(X\operatorname{T}X)-1X\operatorname{T}]y=y\operatorname{T}[I-H]y
where is the hat matrix, or the projection matrix in linear regression.
The least-squares regression line is given by
y=ax+b
where
b=\bar{y}-a\bar{x}
a= | Sxy |
Sxx |
Sxy
n(\bar{x}-x | |
=\sum | |
i)(\bar{y}-y |
i)
Sxx
2. | |
=\sum | |
i) |
Therefore,
\begin{align} \operatorname{RSS}&=
n | |
\sum | |
i=1 |
(yi-
2= | |
f(x | |
i)) |
n | |
\sum | |
i=1 |
(yi-
2= | |
(ax | |
i+b)) |
n | |
\sum | |
i=1 |
(yi-axi-\bar{y}+a\bar{x})2\\[5pt] &=
n | |
\sum | |
i=1 |
(a(\bar{x}-xi)-(\bar{y}-y
2=a | |
i)) |
2S | |
xx |
-2aSxy+Syy=Syy-aSxy=Syy\left(1-
| |||||||
SxxSyy |
\right) \end{align}
where
Syy
n | |
=\sum | |
i=1 |
2 | |
(\bar{y}-y | |
i) |
.
The Pearson product-moment correlation is given by
r= | Sxy |
\sqrt{SxxSyy |
\operatorname{RSS}=Syy(1-r2).