In statistics and econometrics, a distributed lag model is a model for time series data in which a regression equation is used to predict current values of a dependent variable based on both the current values of an explanatory variable and the lagged (past period) values of this explanatory variable.[1] [2]
The starting point for a distributed lag model is an assumed structure of the form
yt=a+w0xt+w1xt-1+w2xt-2+...+errorterm
or the form
yt=a+w0xt+w1xt-1+w2xt-2+...+wnxt-n+errorterm,
where yt is the value at time period t of the dependent variable y, a is the intercept term to be estimated, and wi is called the lag weight (also to be estimated) placed on the value i periods previously of the explanatory variable x. In the first equation, the dependent variable is assumed to be affected by values of the independent variable arbitrarily far in the past, so the number of lag weights is infinite and the model is called an infinite distributed lag model. In the alternative, second, equation, there are only a finite number of lag weights, indicating an assumption that there is a maximum lag beyond which values of the independent variable do not affect the dependent variable; a model based on this assumption is called a finite distributed lag model.
In an infinite distributed lag model, an infinite number of lag weights need to be estimated; clearly this can be done only if some structure is assumed for the relation between the various lag weights, with the entire infinitude of them expressible in terms of a finite number of assumed underlying parameters. In a finite distributed lag model, the parameters could be directly estimated by ordinary least squares (assuming the number of data points sufficiently exceeds the number of lag weights); nevertheless, such estimation may give very imprecise results due to extreme multicollinearity among the various lagged values of the independent variable, so again it may be necessary to assume some structure for the relation between the various lag weights.
The concept of distributed lag models easily generalizes to the context of more than one right-side explanatory variable.
The simplest way to estimate parameters associated with distributed lags is by ordinary least squares, assuming a fixed maximum lag
p
Structured distributed lag models come in two types: finite and infinite. Infinite distributed lags allow the value of the independent variable at a particular time to influence the dependent variable infinitely far into the future, or to put it another way, they allow the current value of the dependent variable to be influenced by values of the independent variable that occurred infinitely long ago; but beyond some lag length the effects taper off toward zero. Finite distributed lags allow for the independent variable at a particular time to influence the dependent variable for only a finite number of periods.
The most important structured finite distributed lag model is the Almon lag model.[3] This model allows the data to determine the shape of the lag structure, but the researcher must specify the maximum lag length; an incorrectly specified maximum lag length can distort the shape of the estimated lag structure as well as the cumulative effect of the independent variable. The Almon lag assumes that lag weights are related to linearly estimable underlying parameters according to
wi=
n | |
\sum | |
j=0 |
ajij
for
i=0,...,k.
The most common type of structured infinite distributed lag model is the geometric lag, also known as the Koyck lag. In this lag structure, the weights (magnitudes of influence) of the lagged independent variable values decline exponentially with the length of the lag; while the shape of the lag structure is thus fully imposed by the choice of this technique, the rate of decline as well as the overall magnitude of effect are determined by the data. Specification of the regression equation is very straightforward: one includes as explanators (right-hand side variables in the regression) the one-period-lagged value of the dependent variable and the current value of the independent variable:
yt=a+λyt-1+bxt+errorterm,
where
0\leλ<1
b+λb+λ2b+...=b/(1-λ).
Other infinite distributed lag models have been proposed to allow the data to determine the shape of the lag structure. The polynomial inverse lag[4] [5] assumes that the lag weights are related to underlying, linearly estimable parameters aj according to
wi=
n | |
\sum | |
j=2 |
aj | |
(i+1)j |
,
for
i=0,...,infty.
The geometric combination lag[6] assumes that the lags weights are related to underlying, linearly estimable parameters aj according to either
wi=
n | |
\sum | |
j=2 |
i, | |
a | |
j(1/j) |
for
i=0,...,infty
wi=
n | |
\sum | |
j=1 |
aj[j/(n+1)]i,
for
i=0,...,infty.
The gamma lag[7] and the rational lag[8] are other infinite distributed lag structures.
Distributed lag models were introduced into health-related studies in 2002 by Zanobetti and Schwartz.[9] The Bayesian version of the model was suggested by Welty in 2007.[10] Gasparrini introduced more flexible statistical models in 2010[11] that are capable of describing additional time dimensions of the exposure-response relationship, and developed a family of distributed lag non-linear models (DLNM), a modeling framework that can simultaneously represent non-linear exposure-response dependencies and delayed effects.[12]
The distributed lag model concept was first to applied to longitudinal cohort research by Hsu in 2015,[13] studying the relationship between PM2.5 and child asthma, and more complicated distributed lag method aimed to accommodate longitudinal cohort research analysis such as Bayesian Distributed Lag Interaction Model[14] by Wilson have been subsequently developed to answer similar research questions.