In applied statistics and geostatistics, regression-kriging (RK) is a spatial prediction technique that combines a regression of the dependent variable on auxiliary variables (such as parameters derived from digital elevation modelling, remote sensing/imagery, and thematic maps) with interpolation (kriging) of the regression residuals. It is mathematically equivalent to the interpolation method variously called universal kriging and kriging with external drift, where auxiliary predictors are used directly to solve the kriging weights.[1]
Regression-kriging is an implementation of the best linear unbiased predictor (BLUP) for spatial data, i.e. the best linear interpolator assuming the universal model of spatial variation. Matheron (1969) proposed that a value of a target variable at some location can be modeled as a sum of the deterministic and stochastic components:[2]
Z(s)=m(s)+\varepsilon'(s)+\varepsilon''
which he termed universal model of spatial variation. Both deterministic and stochastic components of spatial variation can be modeled separately. By combining the two approaches, we obtain:
\hatz(s0)=\hatm(s0)+\hate(s0)=
p | |
\sum\limits | |
k=0 |
{\hat\betak ⋅ qk(s0)}+
n | |
\sum\limits | |
i=1 |
λi ⋅ e(si)
where
\hatm(s0)
\hate(s0)
\hat\betak
\hat\beta0
λi
e(si)
{s
\hat\betak
\hat\betaGLS=\left(qT ⋅ C ⋅ q\right) ⋅ qT ⋅ C ⋅ z
where
\hat\betaGLS
C
{q
z
Once the deterministic part of variation has been estimated (regression-part), the residual can be interpolated with kriging and added to the estimated trend. The estimation of the residuals is an iterative process: first the deterministic part of variation is estimated using OLS, then the covariance function of the residuals is used to obtain the GLS coefficients. Next, these are used to re-compute the residuals, from which an updated covariance function is computed, and so on. Although this is by many geostatisticians recommended as the proper procedure, Kitanidis (1994) showed that use of the covariance function derived from the OLS residuals (i.e. a single iteration) is often satisfactory, because it is not different enough from the function derived after several iterations; i.e. it does not affect much the final predictions. Minasny and McBratney (2007) report similar results—it seems that using more higher quality data is more important than to use more sophisticated statistical methods.[4]
In matrix notation, regression-kriging is commonly written as:[5]
\hatzRK(s0)=
T | |
q | |
0 |
⋅ \hat\betaGLS+
T | |
λ | |
0 |
⋅ (z -q ⋅ \hat\betaGLS)
where
\hatz({s
{s
{q
p+1
λ0
n
\hat
2 | |
\sigma | |
RK |
(s0) =(C0+C1)-
T | |
c | |
0 |
⋅ C ⋅ c0+\left(q0 -qT ⋅ C ⋅ c0\right)T ⋅ \left(qT ⋅ C ⋅ q\right)-1 ⋅ \left(q0-qT ⋅ C ⋅ c0\right)
where
C0+C1
{c
Many (geo)statisticians believe that there is only one Best Linear Unbiased Prediction model for spatial data (e.g. regression-kriging), all other techniques such as ordinary kriging, environmental correlation, averaging of values per polygons or inverse distance interpolation can be seen as its special cases. If the residuals show no spatial auto-correlation (pure nugget effect), the regression-kriging converges to pure multiple linear regression, because the covariance matrix (
C
The geostatistical literature uses many different terms for what are essentially the same or at least very similar techniques. This confuses the users and distracts them from using the right technique for their mapping projects. In fact, both universal kriging, kriging with external drift, and regression-kriging are basically the same technique.
Matheron (1969) originally termed the technique Le krigeage universel, however, the technique was intended as a generalized case of kriging where the trend is modelled as a function of coordinates. Thus, many authors reserve the term universal kriging (UK) for the case when only the coordinates are used as predictors. If the deterministic part of variation (drift) is defined externally as a linear function of some auxiliary variables, rather than the coordinates, the term kriging with external drift (KED) is preferred (according to Hengl 2007, "About regression-kriging: From equations to case studies"). In the case of UK or KED, the predictions are made as with kriging, with the difference that the covariance matrix of residuals is extended with the auxiliary predictors. However, the drift and residuals can also be estimated separately and then summed. This procedure was suggested by Ahmed et al. (1987) and Odeh et al. (1995) later named it regression-kriging, while Goovaerts (1997) uses the term kriging with a trend model to refer to a family of interpolators, and refers to RK as simple kriging with varying local means. Minasny and McBratney (2007) simply call this technique Empirical Best Linear Unbiased Predictor i.e. E-BLUP.[7] [8] [9]
In the case of KED, predictions at new locations are made by:
\hat{z}KED(s0)=
KED | |
\sum\limits | |
i |
(s0) ⋅ z(si)
for
n | |
\sum\limits | |
i=1 |
KED | |
w | |
i |
(s0) ⋅ qk(si)=qk(s0)
for
k=1,\ldots,p
\hatzKED(s0)=
T | |
\delta | |
0 |
⋅ z
where
z
qk
({s
{\delta
KED | |
w | |
i |
p
z
n
KED | |
λ | |
0 |
=\left\{
KED | |
w | |
1 |
(s0),\ldots
KED | |
,w | |
n |
(s0),\varphi0(s0),\ldots,\varphip(s0)\right\}T=CKED ⋅
KED | |
c | |
0 |
where
{λ
\varphip
{C
{c
In the case of KED, the extended covariance matrix of residuals looks like this (Webster and Oliver, 2007; p. 183):[10]
CKED=\left[ \begin{array}{ccccccc} C(s1,s1)& … &C(s1,sn)&1&q1(s1)& … &qp(s1)\\ \vdots&&\vdots&\vdots&\vdots&&\vdots\\ C(sn,s1)& … &C(sn,sn)&1&q1(sn)& … &qp(sn)\\ 1& … &1&0&0& … &0\\ q1(s1)& … &q1(sn)&0&0& … &0\\ \vdots&&\vdots&\vdots&\vdots&&\vdots\\ qp(s1)& … &qp(sn)&0&0& … &0\end{array} \right]
and
KED | |
c | |
0 |
KED | |
c | |
0 |
=\left\{C(s0,s1 ),\ldots,C(s0,sn),q0(s0),q1(s0),\ldots,qp(s0) \right\}T;q0(s0)=1
Hence, KED looks exactly as ordinary kriging, except the covariance matrix/vector are extended with values of auxiliary predictors.
Although the KED seems, at first glance, to be computationally more straightforward than RK, the parameters of the variogram for KED must also be estimated from regression residuals, thus requiring a separate regression modelling step. This regression should be GLS because of the likely spatial correlation between residuals. Note that many analyst use instead the OLS residuals, which may not be too different from the GLS residuals. However, they are not optimal if there is any spatial correlation, and indeed they may be quite different for clustered sample points or if the number of samples is relatively small (
\ll200
A limitation of KED is the instability of the extended matrix in the case that the covariate does not vary smoothly in space. RK has the advantage that it explicitly separates trend estimation from spatial prediction of residuals, allowing the use of arbitrarily-complex forms of regression, rather than the simple linear techniques that can be used with KED. In addition, it allows the separate interpretation of the two interpolated components. The emphasis on regression is important also because fitting of the deterministic part of variation (regression) is often more beneficial for the quality of final maps than fitting of the stochastic part (residuals).
Regression-kriging can be automated e.g. in R statistical computing environment, by using gstat and/or geoR package. Typical inputs/outputs include:
INPUTS:
z(si)
i=1,\ldots,n
\Deltaz
q(s)
z*(sj)
j=1,\ldots,l
OUTPUTS:
C0
C1
R
Regression-kriging is used in various applied fields, from meteorology, climatology, soil mapping, geological mapping, species distribution modeling and similar. The only requirement for using regression-kriging versus e.g. ordinary kriging is that one or more covariate layers exist, and which are significantly correlated with the feature of interest. Some general applications of regression-kriging are:
Regression-kriging-based algorithms play more and more important role in geostatistics because the number of possible covariates is increasing every day. For example, DEMs are now available from a number of sources. Detailed and accurate images of topography can now be ordered from remote sensing systems such as SPOT and ASTER; SPOT5 offers the High Resolution Stereoscopic (HRS) scanner, which can be used to produce DEMs at resolutions of up to 5 m.[12] Finer differences in elevation can also be obtained with airborne laser-scanners. The cost of data is either free or dropping in price as technology advances. NASA recorded most of the world's topography in the Shuttle Radar Topographic Mission in 2000.[13] From summer of 2004, these data has been available (e.g. via USGS ftp) for almost whole globe at resolution of about 90 m (for the North American continent at resolution of about 30 m). Likewise, MODIS multispectral images are freely available for download at resolutions of 250 m. A large free repository of Landsat images is also available for download via the Global Land Cover Facility (GLCF).