In probability theory and statistics, the negative hypergeometric distribution describes probabilities for when sampling from a finite population without replacement in which each sample can be classified into two mutually exclusive categories like Pass/Fail or Employed/Unemployed. As random selections are made from the population, each subsequent draw decreases the population causing the probability of success to change with each draw. Unlike the standard hypergeometric distribution, which describes the number of successes in a fixed sample size, in the negative hypergeometric distribution, samples are drawn until
r
k
k
r
There are
N
K
Elements are drawn one after the other, without replacements, until
r
k
NHGN,K,r(k)
k
The negative hypergeometric distribution is a special case of the beta-binomial distribution[2] with parameters
\alpha=r
\beta=N-K-r+1
n=K
The outcome requires that we observe
k
(k+r-1)
(k+r)-th
(HGN,K,k+r-1(k))
(=N-K-(r-1))
(=N-(k+r-1)
k
r-th
r
\binom{K | |
k |
\binom{N-K}{k+r-1-k}}{\binom{N}{k+r-1}} ⋅
N-K-(r-1) | = | |
N-(k+r-1) |
{{k+r-1 | |
\choose{k}}{{N-r-k}\choose{K-k}}}{N |
\chooseK}.
X
f(k;N,K,r)\equiv\Pr(X=k)=
{{k+r-1 | |
\choose{k}}{{N-r-k}\choose{K-k}}}{N |
\chooseK} fork=0,1,2,...c,K
where
N
K
r
k
a\chooseb
K | |
\sum | |
k=0 |
\Pr(X=k)=
K | |
\sum | |
k=0 |
{{k+r-1 | |
\choose{k}}{{N-r-k}\choose{K-k}}}{N |
\chooseK} =
1 | |
N\chooseK |
K | |
\sum | |
k=0 |
{{k+r-1}\choose{k}}{{N-r-k}\choose{K-k}} =
1 | |
N\chooseK |
{N\chooseK}=1,
where we have used that,
\begin{align}
k | |
\sum | |
j=0 |
\binom{j+m}{j}\binom{n-m-j}{k-j}
k | |
&=\sum | |
j=0 |
(-1)j\binom{-m-1}{j}(-1)k-j\binom{m+1+k-n-2}{k-j}\\ &=(-1)k
k | |
\sum | |
j=0 |
\binom{-m-1}{j}\binom{k-n-2-(-m-1)}{k-j}\\ &=(-1)k\binom{k-n-2}{k}\\ &=(-1)k\binom{k-(n+1)-1}{k}\\ &=\binom{n+1}{k}, \end{align}
which can be derived using the binomial identity,
{{n\choosek}=(-1)k{k-n-1\choosek}},
and the Chu–Vandermonde identity,
k | |
\sum | |
j=0 |
\binommj\binom{n-m}{k-j}=\binomnk,
which holds for any complex-values
m
n
k
When counting the number
k
r
rK | |
N-K+1 |
\begin{align} E[X]&=
K | |
\sum | |
k=0 |
k\Pr(X=k)=
K | |
\sum | |
k=0 |
k
{{k+r-1 | |
\choose{k}}{{N-r-k}\choose{K-k}}}{N |
\chooseK} =
r | |
N\chooseK |
K | |
\left[\sum | |
k=0 |
(k+r) | |
r |
{{k+r-1}\choose{r-1}}{{N-r-k}\choose{K-k}}\right]-r\\ &=
r | |
N\chooseK |
K | |
\left[\sum | |
k=0 |
{{k+r}\choose{r}}{{N-r-k}\choose{K-k}}\right]-r =
r | |
N\chooseK |
K | |
\left[\sum | |
k=0 |
{{k+r}\choose{k}}{{N-r-k}\choose{K-k}}\right]-r\\ &=
r | |
N\chooseK |
\left[{{N+1}\chooseK}\right]-r=
rK | |
N-K+1 |
, \end{align}
where we have used the relationship
k | |
\sum | |
j=0 |
\binom{j+m}{j}\binom{n-m-j}{k-j}=\binom{n+1}{k}
The variance can be derived by the following calculation.
\begin{align} E[X2]&=
K | |
\sum | |
k=0 |
k2\Pr(X=k) =
K | |
\left[\sum | |
k=0 |
(k+r)(k+r+1)\Pr(X=k)\right]-(2r+1)E[X]-r2-r\\ &=
r(r+1) | |
N\chooseK |
K | |
\left[\sum | |
k=0 |
{{k+r+1}\choose{r+1}}{{N+1-(r+1)-k}\choose{K-k}}\right]-(2r+1)E[X]-r2-r\\ &=
r(r+1) | |
N\chooseK |
\left[{{N+2}\chooseK}\right]-(2r+1)E[X]-r2-r =
rK(N-r+Kr+1) | |
(N-K+1)(N-K+2) |
\end{align}
Then the variance is
rm{Var}[X]=E[X2]-\left(E[X]\right)2=
rK(N+1)(N-K-r+1) | |
(N-K+1)2(N-K+2) |
If the drawing stops after a constant number
n
HGN,K,n(k)
NHGN,K,r(k)=1-HGN,N-K,k+r(r-1)
Negative-hypergeometric distribution (like the hypergeometric distribution) deals with draws without replacement, so that the probability of success is different in each draw. In contrast, negative-binomial distribution (like the binomial distribution) deals with draws with replacement, so that the probability of success is the same and the trials are independent. The following table summarizes the four distributions related to drawing items:
With replacements | No replacements | ||
---|---|---|---|
| hypergeometric distribution | ||
| negative hypergeometric distribution |
Some authors[3] [4] define the negative hypergeometric distribution to be the number of draws required to get the
r
Y
Y=X+r
X
\Pr(Y=y)=\binom{y-1}{r-1} | \binom{N-y |
N-K-r |
If we let the number of failures
N-K
M
\Pr(Y=y)=\binom{y-1}{r-1} | \binom{N-y |
M-r |
The support of
Y
\{r,r+1,...,N-M+r\}
E[Y]=E[X]+r=
r(N+1) | |
M+1 |
and
rm{Var}[X]=rm{Var}[Y]