Bernoulli distribution explained

In probability theory and statistics, the Bernoulli distribution, named after Swiss mathematician Jacob Bernoulli,[1] is the discrete probability distribution of a random variable which takes the value 1 with probability

p

and the value 0 with probability

q=1-p

. Less formally, it can be thought of as a model for the set of possible outcomes of any single experiment that asks a yes–no question. Such questions lead to outcomes that are Boolean-valued: a single bit whose value is success/yes/true/one with probability p and failure/no/false/zero with probability q. It can be used to represent a (possibly biased) coin toss where 1 and 0 would represent "heads" and "tails", respectively, and p would be the probability of the coin landing on heads (or vice versa where 1 would represent tails and p would be the probability of tails). In particular, unfair coins would have

p1/2.

The Bernoulli distribution is a special case of the binomial distribution where a single trial is conducted (so n would be 1 for such a binomial distribution). It is also a special case of the two-point distribution, for which the possible outcomes need not be 0 and 1.[2]

Properties

If

X

is a random variable with a Bernoulli distribution, then:

\Pr(X=1)=p=1-\Pr(X=0)=1-q.

f

of this distribution, over possible outcomes k, is

f(k;p)=\begin{cases} p&ifk=1,\\ q=1-p&ifk=0. \end{cases}

[3]

This can also be expressed as

f(k;p)=pk(1-p)1-kfork\in\{0,1\}

or as

f(k;p)=pk+(1-p)(1-k)fork\in\{0,1\}.

The Bernoulli distribution is a special case of the binomial distribution with

n=1.

[4]

The kurtosis goes to infinity for high and low values of

p,

but for

p=1/2

the two-point distributions including the Bernoulli distribution have a lower excess kurtosis, namely −2, than any other probability distribution.

The Bernoulli distributions for

0\lep\le1

form an exponential family.

The maximum likelihood estimator of

p

based on a random sample is the sample mean.

Mean

The expected value of a Bernoulli random variable

X

is

\operatorname{E}[X]=p

This is due to the fact that for a Bernoulli distributed random variable

X

with

\Pr(X=1)=p

and

\Pr(X=0)=q

we find

\operatorname{E}[X]=\Pr(X=1)1+\Pr(X=0)0 =p1+q0=p.

Variance

The variance of a Bernoulli distributed

X

is

\operatorname{Var}[X]=pq=p(1-p)

We first find

\operatorname{E}[X2]=\Pr(X=1)12+\Pr(X=0)02=p12+q02=p=\operatorname{E}[X]

From this follows

\operatorname{Var}[X]=\operatorname{E}[X2]-\operatorname{E}[X]2=\operatorname{E}[X]-\operatorname{E}[X]2=p-p2=p(1-p)=pq

With this result it is easy to prove that, for any Bernoulli distribution, its variance will have a value inside

[0,1/4]

.

Skewness

The skewness is

q-p
\sqrt{pq
}=\frac. When we take the standardized Bernoulli distributed random variable
X-\operatorname{E
[X]}{\sqrt{\operatorname{Var}[X]}}
we find that this random variable attains
q
\sqrt{pq
} with probability

p

and attains
-p
\sqrt{pq
} with probability

q

. Thus we get

\begin{align} \gamma1&=\operatorname{E}\left[\left(

X-\operatorname{E
[X]}{\sqrt{\operatorname{Var}[X]}}\right)

3\right]\\ &=p\left(

q
\sqrt{pq
}\right)^3 + q \cdot \left(-\frac\right)^3 \\&= \frac \left(pq^3-qp^3\right) \\&= \frac (q-p) \\&= \frac.\end

Higher moments and cumulants

The raw moments are all equal due to the fact that

1k=1

and

0k=0

.

\operatorname{E}[Xk]=\Pr(X=1)1k+\Pr(X=0)0k=p1+q0=p=\operatorname{E}[X].

The central moment of order

k

is given by

\muk=(1-p)(-p)k+p(1-p)k.

The first six central moments are

\begin{align} \mu1&=0,\\ \mu2&=p(1-p),\\ \mu3&=p(1-p)(1-2p),\\ \mu4&=p(1-p)(1-3p(1-p)),\\ \mu5&=p(1-p)(1-2p)(1-2p(1-p)),\\ \mu6&=p(1-p)(1-5p(1-p)(1-p(1-p))). \end{align}

The higher central moments can be expressed more compactly in terms of

\mu2

and

\mu3

\begin{align} \mu4&=\mu2(1-3\mu2),\\ \mu5&=\mu3(1-2\mu2),\\ \mu6&=\mu2(1-5\mu2(1-\mu2)). \end{align}

The first six cumulants are

\begin{align} \kappa1&=p,\\ \kappa2&=\mu2,\\ \kappa3&=\mu3,\\ \kappa4&=\mu2(1-6\mu2),\\ \kappa5&=\mu3(1-12\mu2),\\ \kappa6&=\mu2(1-30\mu2(1-4\mu2)). \end{align}

Related distributions

X1,...,Xn

are independent, identically distributed (i.i.d.) random variables, all Bernoulli trials with success probability p, then their sum is distributed according to a binomial distribution with parameters n and p:
n
\sum
k=1

Xk\sim\operatorname{B}(n,p)

(binomial distribution).

The Bernoulli distribution is simply

\operatorname{B}(1,p)

, also written as \mathrm (p).

See also

Further reading

External links

Notes and References

  1. Book: Uspensky, James Victor . Introduction to Mathematical Probability . McGraw-Hill . New York . 1937 . 45 . 996937 .
  2. Book: Dekking . Frederik . Kraaikamp . Cornelis . Lopuhaä . Hendrik . Meester . Ludolf . A Modern Introduction to Probability and Statistics . 9 October 2010 . Springer London . 9781849969529 . 43–48 . 1.
  3. Book: Bertsekas, Dimitri P.. Introduction to Probability. Dimitri_Bertsekas. 2002. Athena Scientific. Tsitsiklis, John N., Τσιτσικλής, Γιάννης Ν.. 188652940X. Belmont, Mass.. 51441829.
  4. Book: McCullagh, Peter . Peter McCullagh . Nelder, John . John Nelder . Generalized Linear Models, Second Edition . Boca Raton: Chapman and Hall/CRC . 1989 . 0-412-31760-5 . McCullagh1989 . Section 4.2.2 .
  5. Web site: Orloff . Jeremy . Bloom . Jonathan . Conjugate priors: Beta and normal . October 20, 2023 . math.mit.edu.