In probability theory and statistics, the Poisson binomial distribution is the discrete probability distribution of a sum of independent Bernoulli trials that are not necessarily identically distributed. The concept is named after Siméon Denis Poisson.
p1,p2,...,pn
p1=p2= … =pn
The probability of having k successful trials out of a total of n can be written as the sum[1]
\Pr(K=k)=
\sum\limits | |
A\inFk |
\prod\limitsi\inpi
\prod\limits | |
j\inAc |
(1-pj)
where
Fk
\{1,2,3,...,n\}
F2=\left\{\{1,2\},\{1,3\},\{2,3\}\right\}
Ac
A
Ac=\{1,2,3,...,n\}\setminusA
Fk
n!/((n-k)!k!)
F15
\Pr(K=k)
As long as none of the success probabilities are equal to one, one can calculate the probability of k successes using the recursive formula [2] [3]
\Pr(K=k)=
n | |
\begin{cases} \prod\limits | |
i=1 |
(1-pi)&k=0\
1 | |
k |
k | |
\sum\limits | |
i=1 |
(-1)i-1\Pr(K=k-i)T(i)&k>0\ \end{cases}
where
n | |
T(i)=\sum\limits | |
j=1 |
\left(
pj | |
1-pj |
\right)i.
The recursive formula is not numerically stable, and should be avoided if
n
An alternative is to use a divide-and-conquer algorithm: if we assume
n=2b
f(pi:j)
pi,...,pj
*
f(p | |
1:2b |
)=
f(p | |
1:2b-1 |
)*f(p | |
2b-1+1:2b |
)
More generally, the probability mass function of a Poisson binomial can be expressed as the convolution of the vectors
P0,...,Pn
Pi=[1-pi,pi]
\Pr(K=0)
\Pr(K=n)
p1,...,pn
n
pi
pi
pi
pi
\Pr(K=k)
n\leq2000
n
pi
Another possibility is using the discrete Fourier transform.[4]
\Pr(K=k)=
1 | |
n+1 |
n | |
\sum\limits | |
l=0 |
C-lk
n | |
\prod\limits | |
m=1 |
\left(1+(Cl-1)pm\right)
where
C=\exp\left(
2i\pi | |
n+1 |
\right)
i=\sqrt{-1}
Still other methods are described in "Statistical Applications of the Poisson-Binomial and conditional Bernoulli distributions" by Chen and Liu[5] and in "A simple and fast method for computing the Poisson binomial distribution function" by Biscarri et al.[6]
The cumulative distribution function (CDF) can be expressed as:
\Pr(K\leqk)=
k | |
\sum | |
l=0 |
\sum\limits | |
A\inFl |
\prod\limitsi\inpi
\prod\limits | |
j\inAc |
(1-pj)
where
Fl
l
\{1,2,3,...,n\}
It can be computed by invoking the DC function above, and then adding elements
0
k
Since a Poisson binomial distributed variable is a sum of n independent Bernoulli distributed variables, its mean and variance will simply be sums of the mean and variance of the n Bernoulli distributions:
\mu=
n | |
\sum\limits | |
i=1 |
pi
\sigma2
n | |
=\sum\limits | |
i=1 |
(1-pi)pi
There is no simple formula for the entropy of a Poisson binomial distribution, but the entropy is bounded above by the entropy of a binomial distribution with the same number parameter and the same mean. Therefore, the entropy is also bounded above by the entropy of a Poisson distribution with the same mean.[7]
The Shepp–Olkin concavity conjecture, due to Lawrence Shepp and Ingram Olkin in 1981, states that the entropy of a Poisson binomial distribution is a concave function of the success probabilities
p1,p2,...,pn
pi
pi\leq1/2
The probability that a Poisson binomial distribution gets large, can be bounded using its moment generating function as follows (valid when
s\geq\mu
t>0
\begin{align} \Pr[S\ges] &\le\exp(-st)\operatornameE\left[\exp\left[t\sumiXi\right]\right] \\&=\exp(-st)\prodi
t | |
(1-p | |
i+e |
pi) \\&=\exp\left(-st+\sumilog\left(pi(et-1)+1\right)\right) \\&\le\exp\left(-st+\sumilog\left(
t-1))\right)\right) \\&= | |
\exp(p | |
i(e |
\exp\left(-st+\sumi
t-1)\right) \\&= | ||
p | \exp\left(s-\mu-slog | |
i(e |
s | |
\mu |
\right), \end{align}
where we took . This is similar to the tail bounds of a binomial distribution.
A Poisson binomial distribution
PB
B
\mu
pi
B
PB
B
n | |
Var(PB)=Var(B)-style\sum | |
i=1 |
2 | |
\displaystyle(p | |
i-\mu) |
As can be seen, the closer the
pi
\mu
pi
PB
pi
\mu
PB
B
Var(PB)=Var(B)
Ehm has determined bounds for the total variation distance of
PB
B
PB
B
\nu=1-\mu
d(PB,B)
PB
B
d(PB,B)\le(1-\mun+1-\nun+1)
| ||||||||||||||||
((n+1)\mu\nu) |
d(PB,B)\geCmin(1,
1 | |
n\mu\nu |
n | |
)style\sum | |
i=1 |
2 | |
\displaystyle(p | |
i-\mu) |
where
C\ge | 1 |
124 |
d(PB,B)
Var(PB)/Var(B)
A Poisson binomial distribution
PB
Po
n | |
λ=\sum | |
i=1 |
\displaystylepi
1 | min( | |
32 |
1 | |
λ |
n | |
,1)style\sum | |
i=1 |
\displaystyle
2\le | |
p | |
i |
d(PB,Po)\le
1-\epsilon-λ | |
λ |
n | |
\sum | |
i=1 |
\displaystyle
2 | |
p | |
i |
where
d(PB,B)
PB
Po
pi
Po
PB
As
n | |
Var(Po)=λ=\sum | |
i=1 |
\displaystylepi
Var(PB)
n | |
=\sum\limits | |
i=1 |
pi-\sum\limits
n | |
i=1 |
2 | |
p | |
i |
Var(Po)>Var(PB)
n | |
λ=\sum | |
i=1 |
\displaystylepi
pi
Var(Po)
Var(PB)
The reference [13] discusses techniques of evaluating the probability mass function of the Poisson binomial distribution. The following software implementations are based on it: