Fractional model explained

In applied statistics, fractional models are, to some extent, related to binary response models. However, instead of estimating the probability of being in one bin of a dichotomous variable, the fractional model typically deals with variables that take on all possible values in the unit interval. One can easily generalize this model to take on values on any other interval by appropriate transformations.^[1] Examples range from participation rates in 401(k) plans^[2] to television ratings of NBA games.^[3]

Description

There have been two approaches to modeling this problem. Even though they both rely on an index that is linear in combined with a link function,^[4] this is not strictly necessary. The first approach uses a log-odds transformation of as a linear function of, i.e.,

\operatorname{logit}y=log

	y
	1-y

=x\beta

. This approach is problematic for two distinct reasons. The variable can not take on boundary values 1 and 0, and the interpretation of the coefficients is not straightforward. The second approach circumvents these issues by using the logistic regression as a link function. More specifically,

\operatornameE[y\lorx]=

	\exp(x\beta)
	1+\exp(x\beta)

It immediately becomes clear that this set up is very similar to the binary logit model, with that difference that the variable can actually take on values in the unit interval. Many of the estimation techniques for the binary logit model, such as non-linear least squares and quasi-MLE, carry over in a natural way, just like heteroskedasticity adjustments and partial effects calculations.^[5]

Extensions to this cross-sectional model have been provided that allow for taking into account important econometric issues, such as endogenous explanatory variables and unobserved heterogeneous effects. Under strict exogeneity assumptions, it is possible to difference out these unobserved effects using panel data techniques, although weaker exogeneity assumptions can also result in consistent estimators.^[6] Control function techniques to deal with endogeneity concerns have also been proposed.^[7]

Notes and References

Wooldridge, J. (2002): Econometric Analysis of Cross Section and Panel Data, MIT Press, Cambridge, Mass.
Papke, L. E. and J. M. Wooldridge (1996): "Econometric Methods for Fractional Response Variables with an Application to 401(k) Plan Participation Rates." Journal of Applied Econometrics (11), pp. 619–632
Hausman, J. A. and G. K. Leonard (1997): "Superstars in the National Basketball Association: Economic Value and Policy." Journal of Labor Economics (15), pp. 586–624
McCullagh, P. and J. A. Nelder (1989): Generalized Linear Models, CRC Monographs on Statistics and Applied Probability (Book 37), 2nd Edition, Chapman and Hall, London.
Wooldridge, J. (2002): Econometric Analysis of Cross Section and Panel Data, MIT Press, Cambridge, Mass.
Papke, L. E. and J. M. Wooldridge (1996): "Panel Data Methods for Fractional Response Variables with an Application to Test Pass Rates." Journal of Econometrics (145), pp. 121–133
Wooldridge, J.M. (2005): "Unobserved heterogeneity and estimation of average partial effects." Identification and Inference for Econometric Models: Essays in Honor of Thomas Rothenberg, ed. by Andrews, D.W.K. and J.H. Stock, Cambridge University Press, Cambridge, pp. 27–55