Panel (data) analysis is a statistical method, widely used in social science, epidemiology, and econometrics to analyze two-dimensional (typically cross sectional and longitudinal) panel data.[1] The data are usually collected over time and over the same individuals and then a regression is run over these two dimensions. Multidimensional analysis is an econometric method in which data are collected over more than two dimensions (typically, time, individuals, and some third dimension).[2]
A common panel data regression model looks like
yit=a+bxit+\varepsilonit
y
x
a
b
i
t
\varepsilonit
\varepsilonit
i
t
\varepsilonit
i
t
Panel data analysis has three more-or-less independent approaches:
The selection between these methods depends upon the objective of the analysis, and the problems concerning the exogeneity of the explanatory variables.
See also: Partial likelihood methods for panel data. Key assumption:
There are no unique attributes of individuals within the measurement set, and no universal effects across time.
Key assumption:
There are unique attributes of individuals that do not vary over time. That is, the unique attributes for a given individual
i
t
See main article: Random effects model. Key assumption:
There are unique, time constant attributes of individuals that are not correlated with the individual regressors. Pooled OLS can be used to derive unbiased and consistent estimates of parameters even when time constant attributes are present, but random effects will be more efficient.
Random effects model is a feasible generalised least squares technique which is asymptotically more efficient than Pooled OLS when time constant attributes are present. Random effects adjusts for the serial correlation which is induced by unobserved time constant attributes.
In the standard random effects (RE) and fixed effects (FE) models, independent variables are assumed to be uncorrelated with error terms. Provided the availability of valid instruments, RE and FE methods extend to the case where some of the explanatory variables are allowed to be endogenous. As in the exogenous setting, RE model with Instrumental Variables (REIV) requires more stringent assumptions than FE model with Instrumental Variables (FEIV) but it tends to be more efficient under appropriate conditions.[4]
To fix ideas, consider the following model:
yit=xit\beta+ci+uit
where
ci
xit
uis
zi=(zi1,\ldots,zit)
In REIV setting, key assumptions include that
zi
ci
uit
t=1,\ldots,T
On the other hand, FEIV estimator only requires that instruments be exogenous with error terms after conditioning on unobserved effect i.e.
E[uit\midzi,ci]=0[1]
The above discussion has parallel to the exogenous case of RE and FE models. In the exogenous case, RE assumes uncorrelatedness between explanatory variables and unobserved effect, and FE allows for arbitrary correlation between the two. Similar to the standard case, REIV tends to be more efficient than FEIV provided that appropriate assumptions hold.
See also: Dynamic unobserved effects model. In contrast to the standard panel data model, a dynamic panel model also includes lagged values of the dependent variable as regressors. For example, including one lag of the dependent variable generates:
yit=a+bxit+\rhoyit-1+\varepsilonit