An error correction model (ECM) belongs to a category of multiple time series models most commonly used for data where the underlying variables have a long-run common stochastic trend, also known as cointegration. ECMs are a theoretically-driven approach useful for estimating both short-term and long-term effects of one time series on another. The term error-correction relates to the fact that last-period's deviation from a long-run equilibrium, the error, influences its short-run dynamics. Thus ECMs directly estimate the speed at which a dependent variable returns to equilibrium after a change in other variables.
Yule (1926) and Granger and Newbold (1974) were the first to draw attention to the problem of spurious correlation and find solutions on how to address it in time series analysis.[1] [2] Given two completely unrelated but integrated (non-stationary) time series, the regression analysis of one on the other will tend to produce an apparently statistically significant relationship and thus a researcher might falsely believe to have found evidence of a true relationship between these variables. Ordinary least squares will no longer be consistent and commonly used test-statistics will be non-valid. In particular, Monte Carlo simulations show that one will get a very high R squared, very high individual t-statistic and a low Durbin–Watson statistic. Technically speaking, Phillips (1986) proved that parameter estimates will not converge in probability, the intercept will diverge and the slope will have a non-degenerate distribution as the sample size increases.[3] However, there might be a common stochastic trend to both series that a researcher is genuinely interested in because it reflects a long-run relationship between these variables.
Because of the stochastic nature of the trend it is not possible to break up integrated series into a deterministic (predictable) trend and a stationary series containing deviations from trend. Even in deterministically detrended random walks spurious correlations will eventually emerge. Thus detrending does not solve the estimation problem.
In order to still use the Box–Jenkins approach, one could difference the series and then estimate models such as ARIMA, given that many commonly used time series (e.g. in economics) appear to be stationary in first differences. Forecasts from such a model will still reflect cycles and seasonality that are present in the data. However, any information about long-run adjustments that the data in levels may contain is omitted and longer term forecasts will be unreliable.
This led Sargan (1964) to develop the ECM methodology, which retains the level information.[4] [5]
Several methods are known in the literature for estimating a refined dynamic model as described above. Among these are the Engle and Granger 2-step approach, estimating their ECM in one step and the vector-based VECM using Johansen's method.[6]
The first step of this method is to pretest the individual time series one uses in order to confirm that they are non-stationary in the first place. This can be done by standard unit root DF testing and ADF test (to resolve the problem of serially correlated errors).Take the case of two different series
xt
yt
If they are both integrated to the same order (commonly I(1)), we can estimate an ECM model of the form
A(L)\Deltayt=\gamma+B(L)\Deltaxt+\alpha(yt-1-\beta0-\beta1xt-1)+\nut.
If both variables are integrated and this ECM exists, they are cointegrated by the Engle–Granger representation theorem.
The second step is then to estimate the model using ordinary least squares:
yt=\beta0+\beta1xt+\varepsilont
\hat{\varepsilont}=yt-\beta0-\beta1xt
A(L)\Deltayt=\gamma+B(L)\Deltaxt+\alpha\hat{\varepsilon}t-1+\nut.
One can then test for cointegration using a standard t-statistic on
\alpha
xt
\alpha
The Engle–Granger approach as described above suffers from a number of weaknesses. Namely it is restricted to only a single equation with one variable designated as the dependent variable, explained by another variable that is assumed to be weakly exogeneous for the parameters of interest. It also relies on pretesting the time series to find out whether variables are I(0) or I(1). These weaknesses can be addressed through the use of Johansen's procedure. Its advantages include that pretesting is not necessary, there can be numerous cointegrating relationships, all variables are treated as endogenous and tests relating to the long-run parameters are possible. The resulting model is known as a vector error correction model (VECM), as it adds error correction features to a multi-factor model known as vector autoregression (VAR). The procedure is done as follows:
The idea of cointegration may be demonstrated in a simple macroeconomic setting. Suppose, consumption
Ct
Yt
Ct=0.9Yt
Ct=\betaYt+\varepsilont
Yt
Ct
Yt
\DeltaYt
Ct
\DeltaCt=0.5\DeltaYt
In this setting a change
\DeltaCt=Ct-Ct-1
\DeltaCt=0.5\DeltaYt-0.2(Ct-1-0.9Yt-1)+\varepsilont
Yt
Ct
\varepsilont
Ct-1=0.9Yt-1
Yt
Ct
Ct
Yt
Ct
Ct-1
This structure is common to all ECM models. In practice, econometricians often first estimate the cointegration relationship (equation in levels), and then insert it into the main model (equation in differences).