In probability and statistics, a Bernoulli process (named after Jacob Bernoulli) is a finite or infinite sequence of binary random variables, so it is a discrete-time stochastic process that takes only two values, canonically 0 and 1. The component Bernoulli variables Xi are identically distributed and independent. Prosaically, a Bernoulli process is a repeated coin flipping, possibly with an unfair coin (but with consistent unfairness). Every variable Xi in the sequence is associated with a Bernoulli trial or experiment. They all have the same Bernoulli distribution. Much of what can be said about the Bernoulli process can also be generalized to more than two outcomes (such as the process for a six-sided die); this generalization is known as the Bernoulli scheme.
The problem of determining the process, given only a limited sample of Bernoulli trials, may be called the problem of checking whether a coin is fair.
A Bernoulli process is a finite or infinite sequence of independent random variables X1, X2, X3, ..., such that
In other words, a Bernoulli process is a sequence of independent identically distributed Bernoulli trials.
Independence of the trials implies that the process is memoryless. Given that the probability p is known, past outcomes provide no information about future outcomes. (If p is unknown, however, the past informs about the future indirectly, through inferences about p.)
If the process is infinite, then from any point the future trials constitute a Bernoulli process identical to the whole process, the fresh-start property.
The two possible values of each Xi are often called "success" and "failure". Thus, when expressed as a number 0 or 1, the outcome may be called the number of successes on the ith "trial".
Two other common interpretations of the values are true or false and yes or no. Under any interpretation of the two values, the individual variables Xi may be called Bernoulli trials with parameter p.
In many applications time passes between trials, as the index i increases. In effect, the trials X1, X2, ... Xi, ... happen at "points in time" 1, 2, ..., i, .... That passage of time and the associated notions of "past" and "future" are not necessary, however. Most generally, any Xi and Xj in the process are simply two from a set of random variables indexed by, the finite cases, or by, the infinite cases.
One experiment with only two possible outcomes, often referred to as "success" and "failure", usually encoded as 1 and 0, can be modeled as a Bernoulli distribution.[1] Several random variables and probability distributions beside the Bernoullis may be derived from the Bernoulli process:
The negative binomial variables may be interpreted as random waiting times.
The Bernoulli process can be formalized in the language of probability spaces as a random sequence of independent realisations of a random variable that can take values of heads or tails. The state space for an individual value is denoted by
2=\{H,T\}.
Consider the countably infinite direct product of copies of
2=\{H,T\}
\Omega=2N=\{H,T\}N
\Omega=2Z
(\Omega,l{B})
l{B}
If the chances of flipping heads or tails are given by the probabilities
\{p,1-p\}
P=\{p,1-p\}N
P=\{p,1-p\}Z
pX(1)=P(X=1)=p
pX(0)=P(X=0)=1-p
We denote this distribution by Ber(p).
Given a cylinder set, that is, a specific sequence of coin flip results
[\omega1,\omega2, … \omegan]
1,2, … ,n
P([\omega1,\omega2, … ,\omegan])=pk(1-p)n-k
P(X1=x1,X2=x2, … ,Xn=xn)=pk(1-p)n-k
Xi
xi=[\omegai=H]
1
\omegai=H
0
\omegai=T
P
Note that the probability of any specific, infinitely long sequence of coin flips is exactly zero; this is because
\limn\toinftypn=0
0\lep<1
To conclude the formal definition, a Bernoulli process is then given by the probability triple
(\Omega,l{B},P)
See main article: article, Law of large numbers, Central limit theorem and Binomial distribution. Let us assume the canonical process with
H
1
T
0
\bar{X}n:=
1 | |
n |
n | |
\sum | |
i=1 |
Xi
p
E[Xi]=P([Xi=1])=p,
Xi
N(k,n)={n\choosek}=
n! | |
k!(n-k)! |
If the probability of flipping heads is given by p, then the total probability of seeing a string of length n with k heads is
P([Sn=k])={n\choosek}pk(1-p)n-k,
Sn=\sum
n | |
i=1 |
Xi
As we can see from the above formula that, if n=1, the Binomial distribution will turn into a Bernoulli distribution. So we can know that the Bernoulli distribution is exactly a special case of Binomial distribution when n equals to 1.
Of particular interest is the question of the value of
Sn
n\toinfty
n!=\sqrt{2\pin} nne-n\left(1+
|
Inserting this into the expression for P(k,n), one obtains the Normal distribution; this is the content of the central limit theorem, and this is the simplest example thereof.
The combination of the law of large numbers, together with the central limit theorem, leads to an interesting and perhaps surprising result: the asymptotic equipartition property. Put informally, one notes that, yes, over many coin flips, one will observe H exactly p fraction of the time, and that this corresponds exactly with the peak of the Gaussian. The asymptotic equipartition property essentially states that this peak is infinitely sharp, with infinite fall-off on either side. That is, given the set of all possible infinitely long strings of H and T occurring in the Bernoulli process, this set is partitioned into two: those strings that occur with probability 1, and those that occur with probability 0. This partitioning is known as the Kolmogorov 0-1 law.
The size of this set is interesting, also, and can be explicitly determined: the logarithm of it is exactly the entropy of the Bernoulli process. Once again, consider the set of all strings of length n. The size of this set is
2n
2nH
H\le1
n\toinfty
H=-plog2p-(1-p)log2(1-p)
This value is the Bernoulli entropy of a Bernoulli process. Here, H stands for entropy; not to be confused with the same symbol H standing for heads.
John von Neumann posed a question about the Bernoulli process regarding the possibility of a given process being isomorphic to another, in the sense of the isomorphism of dynamical systems. The question long defied analysis, but was finally and completely answered with the Ornstein isomorphism theorem. This breakthrough resulted in the understanding that the Bernoulli process is unique and universal; in a certain sense, it is the single most random process possible; nothing is 'more' random than the Bernoulli process (although one must be careful with this informal statement; certainly, systems that are mixing are, in a certain sense, "stronger" than the Bernoulli process, which is merely ergodic but not mixing. However, such processes do not consist of independent random variables: indeed, many purely deterministic, non-random systems can be mixing).
The Bernoulli process can also be understood to be a dynamical system, as an example of an ergodic system and specifically, a measure-preserving dynamical system, in one of several different ways. One way is as a shift space, and the other is as an odometer. These are reviewed below.
See main article: article, Bernoulli scheme and Dyadic transformation. One way to create a dynamical system out of the Bernoulli process is as a shift space. There is a natural translation symmetry on the product space
\Omega=2N
T(X0,X1,X2, … )=(X1,X2, … )
The Bernoulli measure, defined above, is translation-invariant; that is, given any cylinder set
\sigma\inl{B}
P(T-1(\sigma))=P(\sigma)
Instead of the probability measure
f\circT-1
\left(f\circT-1\right)(\sigma)=f(T-1(\sigma))
l{B}\toR.
T
l{L}T
l{B}\toR.
f:l{B}\toR
l{L}Tf=f\circT-1
l{L}T
l{L}T(f+g)=l{L}T(f)+l{L}T(g)
l{L}T(af)=al{L}T(f)
f,g
a
l{L}T(P)=P.
If one restricts
l{L}T