In probability theory and statistics, the beta-binomial distribution is a family of discrete probability distributions on a finite support of non-negative integers arising when the probability of success in each of a fixed or known number of Bernoulli trials is either unknown or random. The beta-binomial distribution is the binomial distribution in which the probability of success at each of n trials is not fixed but randomly drawn from a beta distribution. It is frequently used in Bayesian statistics, empirical Bayes methods and classical statistics to capture overdispersion in binomial type distributed data.
The beta-binomial is a one-dimensional version of the Dirichlet-multinomial distribution as the binomial and beta distributions are univariate versions of the multinomial and Dirichlet distributions respectively. The special case where α and β are integers is also known as the negative hypergeometric distribution.
The Beta distribution is a conjugate distribution of the binomial distribution. This fact leads to an analytically tractable compound distribution where one can think of the
p
x
n
\begin{align}f(x\midn,\alpha,\beta)&=
1 | |
\int | |
0 |
Bin(x|n,p)Beta(p\mid\alpha,\beta)dp\\[6pt] &={n\choosex}
1 | |
B(\alpha,\beta) |
1 | |
\int | |
0 |
px+\alpha-1(1-p)n-x+\beta-1dp\\[6pt] &={n\choosex}
B(x+\alpha,n-x+\beta) | |
B(\alpha,\beta) |
.\end{align}
f(x\midn,\alpha,\beta)=
\Gamma(n+1)\Gamma(x+\alpha)\Gamma(n-x+\beta) | |
\Gamma(n+\alpha+\beta)\Gamma(x+1)\Gamma(n-x+1) |
\Gamma(\alpha+\beta) | |
\Gamma(\alpha)\Gamma(\beta) |
The beta-binomial distribution can also be motivated via an urn model for positive integer values of α and β, known as the Pólya urn model. Specifically, imagine an urn containing α red balls and β black balls, where random draws are made. If a red ball is observed, then two red balls are returned to the urn. Likewise, if a black ball is drawn, then two black balls are returned to the urn. If this is repeated n times, then the probability of observing x red balls follows a beta-binomial distribution with parameters n, α and β.
By contrast, if the random draws are with simple replacement (no balls over and above the observed ball are added to the urn), then the distribution follows a binomial distribution and if the random draws are made without replacement, the distribution follows a hypergeometric distribution.
The first three raw moments are
\begin{align}\mu1&=
n\alpha | |
\alpha+\beta |
\\[8pt] \mu2&=
n\alpha[n(1+\alpha)+\beta] | |
(\alpha+\beta)(1+\alpha+\beta) |
\\[8pt] \mu3&=
n\alpha[n2(1+\alpha)(2+\alpha)+3n(1+\alpha)\beta+\beta(\beta-\alpha)] | |
(\alpha+\beta)(1+\alpha+\beta)(2+\alpha+\beta) |
\end{align}
and the kurtosis is
\beta2=
(\alpha+\beta)2(1+\alpha+\beta) | |
n\alpha\beta(\alpha+\beta+2)(\alpha+\beta+3)(\alpha+\beta+n) |
\left[(\alpha+\beta)(\alpha+\beta-1+6n)+3\alpha\beta(n-2)+6n2-
3\alpha\betan(6-n) | |
\alpha+\beta |
-
18\alpha\betan2 | |
(\alpha+\beta)2 |
\right].
Letting
p= | \alpha |
\alpha+\beta |
\mu=
n\alpha | |
\alpha+\beta |
=np
and the variance as
\sigma2=
n\alpha\beta(\alpha+\beta+n) | |
(\alpha+\beta)2(\alpha+\beta+1) |
=np(1-p)
\alpha+\beta+n | |
\alpha+\beta+1 |
=np(1-p)[1+(n-1)\rho]
where
\rho=\tfrac{1}{\alpha+\beta+1}
\rho
n=1
The -th factorial moment of a Beta-binomial random variable is
\operatorname{E}l[(X)rr]=
n! | |
(n-r)! |
B(\alpha+r,\beta) | |
B(\alpha,\beta) |
= (n)r
B(\alpha+r,\beta) | |
B(\alpha,\beta) |
The method of moments estimates can be gained by noting the first and second moments of the beta-binomial and setting those equal to the sample moments
m1
m2
\begin{align}\widehat{\alpha}&=
nm1-m2 | ||||
|
\\[5pt] \widehat{\beta}&=
| ||||||||||
|
. \end{align}
While closed-form maximum likelihood estimates are impractical, given that the pdf consists of common functions (gamma function and/or Beta functions), they can be easily found via direct numerical optimization. Maximum likelihood estimates from empirical data can be computed using general methods for fitting multinomial Pólya distributions, methods for which are described in (Minka 2003).The R package VGAM through the function vglm, via maximum likelihood, facilitates the fitting of glm type models with responses distributed according to the beta-binomial distribution. There is no requirement that n is fixed throughout the observations.
The following data gives the number of male children among the first 12 children of family size 13 in 6115 families taken from hospital records in 19th century Saxony (Sokal and Rohlf, p. 59 from Lindsey). The 13th child is ignored to blunt the effect of families non-randomly stopping when a desired gender is reached.
Males | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | |
Families | 3 | 24 | 104 | 286 | 670 | 1033 | 1343 | 1112 | 829 | 478 | 181 | 45 | 7 |
The first two sample moments are
\begin{align}m1&=6.23\\ m2&=42.31\\ n&=12 \end{align}
\begin{align}\widehat{\alpha}&=34.1350\\ \widehat{\beta}&=31.6085. \end{align}
The maximum likelihood estimates can be found numerically
\begin{align}\widehat\alphamle&=34.09558\\ \widehat\betamle&=31.5715 \end{align}
logl{L}=-12492.9
from which we find the AIC
AIC=24989.74.
The AIC for the competing binomial model is AIC = 25070.34 and thus we see that the beta-binomial model provides a superior fit to the data i.e. there is evidence for overdispersion. Trivers and Willard postulate a theoretical justification for heterogeneity in gender-proneness among mammalian offspring.
The superior fit is evident especially among the tails
Males | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | |
Observed Families | 3 | 24 | 104 | 286 | 670 | 1033 | 1343 | 1112 | 829 | 478 | 181 | 45 | 7 | |
Fitted Expected (Beta-Binomial) | 2.3 | 22.6 | 104.8 | 310.9 | 655.7 | 1036.2 | 1257.9 | 1182.1 | 853.6 | 461.9 | 177.9 | 43.8 | 5.2 | |
Fitted Expected (Binomial p = 0.519215) | 0.9 | 12.1 | 71.8 | 258.5 | 628.1 | 1085.2 | 1367.3 | 1265.6 | 854.2 | 410.0 | 132.8 | 26.1 | 2.3 |
The beta-binomial distribution plays a prominent role in the Bayesian estimation of a Bernoulli success probability
p
X=\{X1,X2, …
X | |
n1 |
\}
Xi\simBernoulli(p)
p
p\simBeta(\alpha,\beta)
Y1=\sum
n1 | |
i=1 |
Xi
Y1\simBetaBin(n1,\alpha,\beta)
After observing
Y1
p
\begin{align} f(p|X,\alpha,\beta)&\propto
n1 | |
\left(\prod | |
i=1 |
xi | |
p |
1-xi | |
(1-p) |
\right)p\alpha-1(1-p)\beta-1\\ &=
\sumxi+\alpha-1 | |
Cp |
n1-\sumxi+\beta-1 | |
(1-p) |
\\ &=
y1+\alpha-1 | |
Cp |
n1-y1+\beta-1 | |
(1-p) |
\end{align}
where
C
Beta(y1+\alpha,n1-y1+\beta)
Thus, again through compounding, we find that the posterior predictive distribution of a sum of a future sample of size
n2
Bernoulli(p)
Y2\simBetaBin(n2,y1+\alpha,n1-y1+\beta)
To draw a beta-binomial random variate
X\simBetaBin(n,\alpha,\beta)
p\simBeta(\alpha,\beta)
X\simB(n,p)
BetaBin(1,\alpha,\beta)\simBernoulli(p)
p= | \alpha |
\alpha+\beta |
BetaBin(n,1,1)\simU(0,n)
U(a,b)
\limsBetaBin(n,ps,(1-p)s)\simB(n,p)
p= | \alpha |
\alpha+\beta |
s=\alpha+\beta
B(n,p)
\limnBetaBin(n,\alpha,
np | |
(1-p) |
)\simNB(\alpha,p)
NB(\alpha,p)