Generated regressor explained

In least squares estimation problems, sometimes one or more regressors specified in the model are not observable. One way to circumvent this issue is to estimate or generate regressors from observable data.^[1] This generated regressor method is also applicable to unobserved instrumental variables. Under some regularity conditions, consistency and asymptotic normality of least squares estimator is preserved, but asymptotic variance has a different form in general.

Suppose the model of interest is the following:

y_i=g(x_1i,x_2i,\beta)+u_i

where g is a conditional mean function and its form is known up to finite-dimensional parameter β. Here

x_2i

is not observable, but we know that

x_2i=h(w_i,\gamma)

for some function h known up to parameter

\gamma

, and a random sample

y_i=g(x_1i,x_2i,\beta)+u_i

is available. Suppose we have a consistent estimator

\hat\gamma

\gamma

that uses the observation

w_i

's. Then, β can be estimated by (Non-Linear) Least Squares using

\hat{x_2i

}=h(w_,\hat\gamma). Some examples of the above setup include Anderson et al. (1976^[2] and Barro (1977).^[3]

This problem falls into the framework of two-step M-estimator and thus consistency and asymptotic normality of the estimator can be verified using the general theory of two-step M-estimator.^[4] As in general two-step M-estimator problem, asymptotic variance of a generated regressor estimator is usually different from that of the estimator with all regressors observed. Yet, in some special cases, the asymptotic variances of the two estimators are identical. To give one such example, consider the setting in which the regression function is linear in parameter and unobserved regressor is a scalar. Denoting the coefficient of unobserved regressor by

\delta

\delta=0

and

E[\triangledown\gammah(W,\gamma)U]=0

then the asymptotic variance is independent of whether observing the regressor.^[4]

With minor modifications in the model, the above formulation is also applicable to Instrumental Variable estimation. Suppose the model of interest is linear in parameter. Error term is correlated with some of the regressors, and the model specifies some instrumental variables, which are not observable but have the representation

z_i=h(w_i,\gamma)

. If a consistent estimator of

\gamma

\hat\gamma

is available using

\hatz_i=h(w_i,\hat\gamma)

as instruments, the parameter of interest can be estimated by IV. Similar to the above case, consistency and asymptotic normality follows under mild conditions, and the asymptotic variance has a different form than observed IV case. Yet, there are cases in which the two estimators have the same asymptotic variance. One such case occurs if

E[\triangledown\gammah(W,\gamma)]=0[4]

In this special case, inference on the estimated parameter can be conducted with the usual IV standard error estimator.

Notes and References

Pagan, A., 1984, “Econometric Issues in the Analysis of Regressions with Generated Regressors”, International Economic Review, 25 (1), 221-247.
Anderson, G. J., I. F. Pearce and P. K. Trivedi, "Output, Expected Demand and Unplanned Stocks," in I. F. Pearce et al., eds., A Model of Output, Employment, Wages and Prices in the U.K., Cambridge University Press.
Barro, R. J., 1977, "Unanticipated Money Growth and Unemployment in the United States," American Economic Review, 67, 101-115.
Wooldridge, J.M., Econometric Analysis of Cross Section and Panel Data, MIT Press, Cambridge, Mass