Weighted least squares (WLS), also known as weighted linear regression,[1] [2] is a generalization of ordinary least squares and linear regression in which knowledge of the unequal variance of observations (heteroscedasticity) is incorporated into the regression.WLS is also a specialization of generalized least squares, when all the off-diagonal entries of the covariance matrix of the errors, are null.
The fit of a model to a data point is measured by its residual,
ri
yi
f(xi,\boldsymbol\beta)
If the errors are uncorrelated and have equal variance, then the functionis minimised at
\boldsymbol\hat\beta
\partialS | |
\partial\betaj |
(\hat\boldsymbol\beta)=0
The Gauss–Markov theorem shows that, when this is so,
\hat{\boldsymbol{\beta}}
\hat{\boldsymbol{\beta}}
The gradient equations for this sum of squares are
which, in a linear least squares system give the modified normal equations,The matrix
X
When the observational errors are uncorrelated and the weight matrix, W=Ω−1, is diagonal, these may be written as
If the errors are correlated, the resulting estimator is the BLUE if the weight matrix is equal to the inverse of the variance-covariance matrix of the observations.
When the errors are uncorrelated, it is convenient to simplify the calculations to factor the weight matrix as
wii=\sqrt{Wii
where we define the following scaled matrix and vector:
This is a type of whitening transformation; the last expression involves an entrywise division.
For non-linear least squares systems a similar argument shows that the normal equations should be modified as follows.
Note that for empirical tests, the appropriate W is not known for sure and must be estimated. For this feasible generalized least squares (FGLS) techniques may be used; in this case it is specialized for a diagonal covariance matrix, thus yielding a feasible weighted least squares solution.
If the uncertainty of the observations is not known from external sources, then the weights could be estimated from the given observations. This can be useful, for example, to identify outliers. After the outliers have been removed from the data set, the weights should be reset to one.[3]
In some cases the observations may be weighted—for example, they may not be equally reliable. In this case, one can minimize the weighted sum of squares:where wi > 0 is the weight of the ith observation, and W is the diagonal matrix of such weights.
The weights should, ideally, be equal to the reciprocal of the variance of the measurement. (This implies that the observations are uncorrelated. If the observations are correlated, the expression applies. In this case the weight matrix should ideally be equal to the inverse of the variance-covariance matrix of the observations).[3] The normal equations are then:
This method is used in iteratively reweighted least squares.
The estimated parameter values are linear combinations of the observed values
Therefore, an expression for the estimated variance-covariance matrix of the parameter estimates can be obtained by error propagation from the errors in the observations. Let the variance-covariance matrix for the observations be denoted by M and that of the estimated parameters by Mβ. Then
When, this simplifies to
2 | |
\chi | |
\nu |
where S is the minimum value of the weighted objective function:
The denominator,
\nu=n-m
In all cases, the variance of the parameter estimate
\hat\betai
\beta | |
M | |
ii |
\hat\betai
\hat\betaj
\beta | |
M | |
ij |
\sigmai=
\beta | |
\sqrt{M | |
ii |
\rhoij=
\beta | |
M | |
ij |
/(\sigmai\sigmaj)
See main article: article and Confidence interval. It is often assumed, for want of any concrete evidence but often appealing to the central limit theorem—see Normal distribution#Occurrence and applications—that the error on each observation belongs to a normal distribution with a mean of zero and standard deviation
\sigma
se\beta
\hat\beta\pmse\beta
\hat\beta\pm2se\beta
\hat\beta\pm2.5se\beta
The assumption is not unreasonable when n >> m. If the experimental errors are normally distributed the parameters will belong to a Student's t-distribution with n − m degrees of freedom. When n ≫ m Student's t-distribution approximates a normal distribution. Note, however, that these confidence limits cannot take systematic error into account. Also, parameter errors should be quoted to one significant figure only, as they are subject to sampling error.[4]
When the number of observations is relatively small, Chebychev's inequality can be used for an upper bound on probabilities, regardless of any assumptions about the distribution of experimental errors: the maximum probabilities that a parameter will be more than 1, 2, or 3 standard deviations away from its expectation value are 100%, 25% and 11% respectively.
The residuals are related to the observations by
where H is the idempotent matrix known as the hat matrix:
and I is the identity matrix. The variance-covariance matrix of the residuals, M r is given by
Thus the residuals are correlated, even if the observations are not.
When
W=M-1
The sum of weighted residual values is equal to zero whenever the model function contains a constant term. Left-multiply the expression for the residuals by X W:
Say, for example, that the first term of the model is a constant, so that
Xi1=1
Thus, in the motivational example, above, the fact that the sum of residual values is equal to zero is not accidental, but is a consequence of the presence of the constant term, α, in the model.
If experimental error follows a normal distribution, then, because of the linear relationship between residuals and observations, so should residuals,[5] but since the observations are only a sample of the population of all possible observations, the residuals should belong to a Student's t-distribution. Studentized residuals are useful in making a statistical test for an outlier when a particular residual appears to be excessively large.