Horvitz–Thompson estimator explained
In statistics, the Horvitz–Thompson estimator, named after Daniel G. Horvitz and Donovan J. Thompson,[1] is a method for estimating the total[2] and mean of a pseudo-population in a stratified sample by applying inverse probability weighting to account for the difference in the sampling distribution between the collected data and the a target population. The Horvitz–Thompson estimator is frequently applied in survey analyses and can be used to account for missing data, as well as many sources of unequal selection probabilities.
The method
Formally, let
be an
independent sample from
n of
N ≥ n distinct strata with a common mean
μ. Suppose further that
is the
inclusion probability that a randomly sampled individual in a superpopulation belongs to the
ith stratum. The Horvitz–Thompson estimator of the total is given by:
[3]
and the Horvitz–Thompson estimate of the mean is given by:
\hat{\mu}HT=N-1\hat{Y}HT=N-1
Yi.
In a Bayesian probabilistic framework
is considered the proportion of individuals in a target population belonging to the
ith stratum. Hence,
could be thought of as an estimate of the complete sample of persons within the
ith stratum. The Horvitz–Thompson estimator can also be expressed as the limit of a weighted
bootstrap resampling estimate of the mean. It can also be viewed as a special case of multiple
imputation approaches.
[4] For post-stratified study designs, estimation of
and
are done in distinct steps. In such cases, computating the variance of
is not straightforward. Resampling techniques such as the bootstrap or the jackknife can be applied to gain consistent estimates of the variance of the Horvitz–Thompson estimator.
[5] The "survey" package for
R conducts analyses for post-stratified data using the Horvitz–Thompson estimator.
[6] Proof of Horvitz–Thompson unbiased estimation of the mean
The Horvitz–Thompson estimator can be shown to be unbiased when evaluating the expectation of the Horvitz–Thompson estimator,
, as follows:
\begin{align}
&\operatorname
=\operatornameE
\\[6pt]
={}&\operatornameE
\\[6pt]
={}&
\right]\\[6pt]
={}&
)\\[6pt]
={}&
\right)\pii\\[6pt]
={}&
Xi\\[6pt]
&whereDn=\{x1,x2,\ldots,xn\}
\end{align}
The Hansen–Hurwitz (1943) is known to be inferior to the Horvitz–Thompson (1952) strategy, associated with a number of Inclusion Probabilities Proportional to Size (IPPS) sampling procedures.[7]
External links
Notes and References
- Horvitz, D. G.; Thompson, D. J. (1952) "A generalization of sampling without replacement from a finite universe", Journal of the American Statistical Association, 47, 663–685, .
- William G. Cochran (1977), Sampling Techniques, 3rd Edition, Wiley.
- Book: Model Assisted Survey Sampling . 9780387975283 . 1992. Särndal . Carl-Erik . Swensson . Bengt . Wretman . Jan Hȧkan .
- Roderick J.A. Little, Donald B. Rubin (2002) Statistical Analysis With Missing Data, 2nd ed., Wiley.
- Quatember . A. . The Finite Population Bootstrap - from the Maximum Likelihood to the Horvitz-Thompson Approach . Austrian Journal of Statistics . 2014 . 43 . 2 . 93–102 . 10.17713/ajs.v43i2.10 . free .
- Web site: CRAN - Package survey. 19 July 2021 .
- PRABHU-AJGAONKAR, S. G. "Comparison of the Horvitz–Thompson Strategy with the Hansen–Hurwitz Strategy." Survey Methodology (1987): 221. (pdf)