A hurdle model is a class of statistical models where a random variable is modelled using two parts, the first which is the probability of attaining value 0, and the second part models the probability of the non-zero values. The use of hurdle models are often motivated by an excess of zeroes in the data, that is not sufficiently accounted for in more standard statistical models.
In a hurdle model, a random variable x is modelled as
\Pr(x=0)=\theta
\Pr(x\ne0)=px(x)
where
px(x)
Hurdle models were introduced by John G. Cragg in 1971,[1] where the non-zero values of x were modelled using a normal model, and a probit model was used to model the zeros. The probit part of the model was said to model the presence of "hurdles" that must be overcome for the values of x to attain non-zero values, hence the designation hurdle model. Hurdle models were later developed for count data, with Poisson, geometric,[2] and negative binomial[3] models for the non-zero counts .
Hurdle models differ from zero-inflated models in that zero-inflated models model the zeros using a two-component mixture model. With a mixture model, the probability of the variable being zero is determined by both the main distribution function
p(x=0)
\pi
\Pr(x=0)=\pi+(1-\pi) x p(x=0)
\Pr(x=hi)=(1-\pi) x p(x=hi)
where
\pi
\Pr(x=0)