Bernoulli distribution explained
In probability theory and statistics, the Bernoulli distribution, named after Swiss mathematician Jacob Bernoulli,[1] is the discrete probability distribution of a random variable which takes the value 1 with probability
and the value 0 with probability
. Less formally, it can be thought of as a model for the set of possible outcomes of any single
experiment that asks a
yes–no question. Such questions lead to
outcomes that are
Boolean-valued: a single
bit whose value is success/
yes/
true/
one with
probability p and failure/no/
false/
zero with probability
q. It can be used to represent a (possibly biased)
coin toss where 1 and 0 would represent "heads" and "tails", respectively, and
p would be the probability of the coin landing on heads (or vice versa where 1 would represent tails and
p would be the probability of tails). In particular, unfair coins would have
The Bernoulli distribution is a special case of the binomial distribution where a single trial is conducted (so n would be 1 for such a binomial distribution). It is also a special case of the two-point distribution, for which the possible outcomes need not be 0 and 1.[2]
Properties
If
is a random variable with a Bernoulli distribution, then:
\Pr(X=1)=p=1-\Pr(X=0)=1-q.
of this distribution, over possible outcomes
k, is
f(k;p)=\begin{cases}
p&ifk=1,\\
q=1-p&ifk=0.
\end{cases}
[3] This can also be expressed as
f(k;p)=pk(1-p)1-k fork\in\{0,1\}
or as
f(k;p)=pk+(1-p)(1-k) fork\in\{0,1\}.
The Bernoulli distribution is a special case of the binomial distribution with
[4] The kurtosis goes to infinity for high and low values of
but for
the two-point distributions including the Bernoulli distribution have a lower excess kurtosis, namely −2, than any other probability distribution.
The Bernoulli distributions for
form an
exponential family.
The maximum likelihood estimator of
based on a random sample is the
sample mean.
Mean
The expected value of a Bernoulli random variable
is
This is due to the fact that for a Bernoulli distributed random variable
with
and
we find
\operatorname{E}[X]=\Pr(X=1) ⋅ 1+\Pr(X=0) ⋅ 0
=p ⋅ 1+q ⋅ 0=p.
Variance
The variance of a Bernoulli distributed
is
\operatorname{Var}[X]=pq=p(1-p)
We first find
\operatorname{E}[X2]=\Pr(X=1) ⋅ 12+\Pr(X=0) ⋅ 02=p ⋅ 12+q ⋅ 02=p=\operatorname{E}[X]
From this follows
\operatorname{Var}[X]=\operatorname{E}[X2]-\operatorname{E}[X]2=\operatorname{E}[X]-\operatorname{E}[X]2=p-p2=p(1-p)=pq
With this result it is easy to prove that, for any Bernoulli distribution, its variance will have a value inside
.
Skewness
The skewness is
}=\frac. When we take the standardized Bernoulli distributed random variable
| X-\operatorname{E |
[X]}{\sqrt{\operatorname{Var}[X]}} |
we find that this random variable attains
} with probability
and attains
} with probability
. Thus we get
\begin{align}
\gamma1&=\operatorname{E}\left[\left(
| X-\operatorname{E |
[X]}{\sqrt{\operatorname{Var}[X]}}\right) |
3\right]\\
&=p ⋅ \left(
}\right)^3 + q \cdot \left(-\frac\right)^3 \\&= \frac \left(pq^3-qp^3\right) \\&= \frac (q-p) \\&= \frac.\end
Higher moments and cumulants
The raw moments are all equal due to the fact that
and
.
\operatorname{E}[Xk]=\Pr(X=1) ⋅ 1k+\Pr(X=0) ⋅ 0k=p ⋅ 1+q ⋅ 0=p=\operatorname{E}[X].
The central moment of order
is given by
The first six central moments are
\begin{align}
\mu1&=0,\\
\mu2&=p(1-p),\\
\mu3&=p(1-p)(1-2p),\\
\mu4&=p(1-p)(1-3p(1-p)),\\
\mu5&=p(1-p)(1-2p)(1-2p(1-p)),\\
\mu6&=p(1-p)(1-5p(1-p)(1-p(1-p))).
\end{align}
The higher central moments can be expressed more compactly in terms of
and
\begin{align}
\mu4&=\mu2(1-3\mu2),\\
\mu5&=\mu3(1-2\mu2),\\
\mu6&=\mu2(1-5\mu2(1-\mu2)).
\end{align}
The first six cumulants are
\begin{align}
\kappa1&=p,\\
\kappa2&=\mu2,\\
\kappa3&=\mu3,\\
\kappa4&=\mu2(1-6\mu2),\\
\kappa5&=\mu3(1-12\mu2),\\
\kappa6&=\mu2(1-30\mu2(1-4\mu2)).
\end{align}
Related distributions
are independent, identically distributed (
i.i.d.) random variables, all
Bernoulli trials with success probability
p, then their sum is distributed according to a
binomial distribution with parameters
n and
p:
Xk\sim\operatorname{B}(n,p)
(
binomial distribution).
The Bernoulli distribution is simply
, also written as
- The categorical distribution is the generalization of the Bernoulli distribution for variables with any constant number of discrete values.
- The Beta distribution is the conjugate prior of the Bernoulli distribution.[5]
- The geometric distribution models the number of independent and identical Bernoulli trials needed to get one success.
- If , then has a Rademacher distribution.
See also
Further reading
- Book: Johnson . N. L. . Kotz . S. . Kemp . A. . 1993 . Univariate Discrete Distributions . 2nd . Wiley . 0-471-54897-9 .
- Book: Peatman, John G. . Introduction to Applied Statistics . New York . Harper & Row . 1963 . 162–171 .
External links
Notes and References
- Book: Uspensky, James Victor . Introduction to Mathematical Probability . McGraw-Hill . New York . 1937 . 45 . 996937 .
- Book: Dekking . Frederik . Kraaikamp . Cornelis . Lopuhaä . Hendrik . Meester . Ludolf . A Modern Introduction to Probability and Statistics . 9 October 2010 . Springer London . 9781849969529 . 43–48 . 1.
- Book: Bertsekas, Dimitri P.. Introduction to Probability. Dimitri_Bertsekas. 2002. Athena Scientific. Tsitsiklis, John N., Τσιτσικλής, Γιάννης Ν.. 188652940X. Belmont, Mass.. 51441829.
- Book: McCullagh, Peter . Peter McCullagh . Nelder, John . John Nelder . Generalized Linear Models, Second Edition . Boca Raton: Chapman and Hall/CRC . 1989 . 0-412-31760-5 . McCullagh1989 . Section 4.2.2 .
- Web site: Orloff . Jeremy . Bloom . Jonathan . Conjugate priors: Beta and normal . October 20, 2023 . math.mit.edu.