Hurdle model explained

A hurdle model is a class of statistical models where a random variable is modelled using two parts, the first which is the probability of attaining value 0, and the second part models the probability of the non-zero values. The use of hurdle models are often motivated by an excess of zeroes in the data, that is not sufficiently accounted for in more standard statistical models.

In a hurdle model, a random variable x is modelled as

\Pr(x=0)=\theta

\Pr(x\ne0)=px(x)

where

px(x)

is a truncated probability distribution function, truncated at 0.

Hurdle models were introduced by John G. Cragg in 1971,[1] where the non-zero values of x were modelled using a normal model, and a probit model was used to model the zeros. The probit part of the model was said to model the presence of "hurdles" that must be overcome for the values of x to attain non-zero values, hence the designation hurdle model. Hurdle models were later developed for count data, with Poisson, geometric,[2] and negative binomial[3] models for the non-zero counts .

Relationship with zero-inflated models

Hurdle models differ from zero-inflated models in that zero-inflated models model the zeros using a two-component mixture model. With a mixture model, the probability of the variable being zero is determined by both the main distribution function

p(x=0)

and the mixture weight

\pi

. Specifically, a zero-inflated model for a random variable x is

\Pr(x=0)=\pi+(1-\pi) x p(x=0)

\Pr(x=hi)=(1-\pi) x p(x=hi)

where

\pi

is the mixture weight that determines the amount of zero-inflation. A zero-inflated model can only increase the probability of

\Pr(x=0)

, but this is not a restriction in hurdle models.[4]

See also

Notes and References

  1. John G. . Cragg . 1971 . 1909582 . Some Statistical Models for Limited Dependent Variables with Application to the Demand for Durable Goods . Econometrica . 39 . 5 . 829–844 . 10.2307/1909582 .
  2. John . Mullahy . 10.1016/0304-4076(86)90002-3 . Specification and testing of some modified count data models . Journal of Econometrics . 33 . 3 . 1986 . 341–365 .
  3. A. H. . Welsh . R. B. . Cunningham . C. F. . Donnelly . D. B. . Lindenmayer . 1996 . 10.1016/0304-3800(95)00113-1 . Modelling the abundance of rare species: statistical models for counts with extra zeros . Ecological Modelling . 88 . 1–3 . 297–308 .
  4. Yongyi . Min . Alan . Agresti . 2005 . 10.1191/1471082X05st084oa . Random effect models for repeated measures of zero-inflated count data . Statistical Modelling . 5 . 1 . 1–19 . 2400918 . 10.1.1.296.3503 .