In probability theory, a probability space or a probability triple
(\Omega,l{F},P)
A probability space consists of three elements:[1] [2]
\Omega
l{F}
P
In order to provide a model of probability, these elements must satisfy probability axioms.
In the example of the throw of a standard die,
\Omega
\{1,2,3,4,5,6\}
1
l{F}
\{5\}
\{2,4,6\}
P
\{5\}
1/6
\{2,4,6\}
3/6=1/2
When an experiment is conducted, it results in exactly one outcome
\omega
\Omega
l{F}
\omega
P
The Soviet mathematician Andrey Kolmogorov introduced the notion of a probability space and the axioms of probability in the 1930s. In modern probability theory, there are alternative approaches for axiomatization, such as the algebra of random variables.
A probability space is a mathematical triplet
(\Omega,l{F},P)
\Omega
l{F}
P
\Omega
l{F}
P
P
P:l{F}\to[0,1].
Head
Tail
P(Head\cupTail)
Head
Tail
P(Head)+P(Tail)
\Omega
P(\{Head,Tail\})
Head
Tail
Not every subset of the sample space
\Omega
In short, a probability space is a measure space such that the measure of the whole space is equal to one.
The expanded definition is the following: a probability space is a triple
(\Omega,l{F},P)
\Omega
l{F}\subseteq2\Omega
\Omega
l{F}
\Omega\inl{F}
l{F}
A\inl{F}
(\Omega\setminusA)\inl{F}
l{F}
Ai\inl{F}
i=1,2,...
l{F}
Ai\inl{F}
i=1,2,...
P:l{F}\to[0,1]
l{F}
\{Ai\}
infty | |
i=1 |
\subseteql{F}
P(\Omega)=1
Discrete probability theory needs only at most countable sample spaces
\Omega
\Omega
p:\Omega\to[0,1]
\Omega
l{F}=2\Omega
l{F}=2\Omega
l{F}\subseteq2\Omega
\Omega=B1\cupB2\cup...
A\inl{F}
A=B | |
k1 |
\cup
B | |
k2 |
\cup...
The case
p(\omega)=0
\omega
If is uncountable, still, it may happen that for some ; such are called atoms. They are an at most countable (maybe empty) set, whose probability is the sum of probabilities of all atoms. If this sum is equal to 1 then all other points can safely be excluded from the sample space, returning us to the discrete case. Otherwise, if the sum of probabilities of all atoms is between 0 and 1, then the probability space decomposes into a discrete (atomic) part (maybe empty) and a non-atomic part.
If for all (in this case, Ω must be uncountable, because otherwise could not be satisfied), then equation fails: the probability of a set is not necessarily the sum over the probabilities of its elements, as summation is only defined for countable numbers of elements. This makes the probability space theory much more technical. A formulation stronger than summation, measure theory is applicable. Initially the probabilities are ascribed to some "generator" sets (see the examples). Then a limiting procedure allows assigning probabilities to sets that are limits of sequences of generator sets, or limits of limits, and so on. All these sets are the σ-algebra
l{F}
l{F}
A probability space
(\Omega, l{F}, P)
B\inl{F}
P(B)=0
A \subset B
A\inl{F}
If the experiment consists of just one flip of a fair coin, then the outcome is either heads or tails:
\Omega=\{H,T\}
l{F}=2\Omega
22=4
\{H\}
\{T\}
\{\}
\{H,T\}
l{F}=\{\{\},\{H\},\{T\},\{H,T\}\}
P(\{\})=0
P(\{H\})=0.5
P(\{T\})=0.5
P(\{H,T\})=1
The fair coin is tossed three times. There are 8 possible outcomes: (here "HTH" for example means that first time the coin landed heads, the second time tails, and the last time heads again). The complete information is described by the σ-algebra
l{F}=2\Omega
Alice knows the outcome of the second toss only. Thus her incomplete information is described by the partition, where ⊔ is the disjoint union, and the corresponding σ-algebra
l{F}Alice=\{\{\},A1,A2,\Omega\}
l{F}Bryan
The two σ-algebras are incomparable: neither
l{F}Alice\subseteql{F}Bryan
l{F}Bryan\subseteql{F}Alice
If 100 voters are to be drawn randomly from among all voters in California and asked whom they will vote for governor, then the set of all sequences of 100 Californian voters would be the sample space Ω. We assume that sampling without replacement is used: only sequences of 100 different voters are allowed. For simplicity an ordered sample is considered, that is a sequence (Alice, Bryan) is different from (Bryan, Alice). We also take for granted that each potential voter knows exactly his/her future choice, that is he/she does not choose randomly.
Alice knows only whether or not Arnold Schwarzenegger has received at least 60 votes. Her incomplete information is described by the σ-algebra
l{F}Alice
Bryan knows the exact number of voters who are going to vote for Schwarzenegger. His incomplete information is described by the corresponding partition and the σ-algebra
l{F}Bryan
In this case, Alice's σ-algebra is a subset of Bryan's:
l{F}Alice\subsetl{F}Bryan
A number between 0 and 1 is chosen at random, uniformly. Here Ω = [0,1],
l{F}
In this case, the open intervals of the form, where, could be taken as the generator sets. Each such set can be ascribed the probability of, which generates the Lebesgue measure on [0,1], and the Borel σ-algebra on Ω.
A fair coin is tossed endlessly. Here one can take Ω = ∞, the set of all infinite sequences of numbers 0 and 1. Cylinder sets may be used as the generator sets. Each such set describes an event in which the first n tosses have resulted in a fixed sequence, and the rest of the sequence may be arbitrary. Each such event can be naturally given the probability of 2−n.
These two non-atomic examples are closely related: a sequence leads to the number . This is not a one-to-one correspondence between ∞ and [0,1] however: it is an isomorphism modulo zero, which allows for treating the two probability spaces as two forms of the same probability space. In fact, all non-pathological non-atomic probability spaces are the same in this sense. They are so-called standard probability spaces. Basic applications of probability spaces are insensitive to standardness. However, non-discrete conditioning is easy and natural on standard probability spaces, otherwise it becomes obscure.
See main article: article and Probability distribution.
See main article: article and Random variable. A random variable X is a measurable function X: Ω → S from the sample space Ω to another measurable space S called the state space.
If A ⊂ S, the notation Pr(X ∈ A) is a commonly used shorthand for
P(\{\omega\in\Omega:X(\omega)\inA\})
If Ω is countable, we almost always define
l{F}
l{F}=2\Omega
l{F}
On the other hand, if Ω is uncountable and we use
l{F}=2\Omega
l{F}
l{F}
See main article: article and Conditional probability. Kolmogorov's definition of probability spaces gives rise to the natural concept of conditional probability. Every set with non-zero probability (that is,) defines another probability measureon the space. This is usually pronounced as the "probability of B given A".
For any event such that, the function defined by for all events is itself a probability measure.
See main article: article and Statistical independence. Two events, A and B are said to be independent if .
Two random variables, and, are said to be independent if any event defined in terms of is independent of any event defined in terms of . Formally, they generate independent σ-algebras, where two σ-algebras and, which are subsets of are said to be independent if any element of is independent of any element of .
See main article: article and Mutual exclusivity. Two events, and are said to be mutually exclusive or disjoint if the occurrence of one implies the non-occurrence of the other, i.e., their intersection is empty. This is a stronger condition than the probability of their intersection being zero.
If and are disjoint events, then . This extends to a (finite or countably infinite) sequence of events. However, the probability of the union of an uncountable set of events is not the sum of their probabilities. For example, if is a normally distributed random variable, then is 0 for any, but .
The event is referred to as "A and B", and the event as "A or B".
The first major treatise blending calculus with probability theory, originally in French: Théorie Analytique des Probabilités.
The modern measure-theoretic foundation of probability theory; the original German version (Grundbegriffe der Wahrscheinlichkeitrechnung) appeared in 1933.
An empiricist, Bayesian approach to the foundations of probability theory.
Foundations of probability theory based on nonstandard analysis. Downloadable. http://www.math.princeton.edu/~nelson/books.html
A lively introduction to probability theory for the beginner, Cambridge Univ. Press.
An undergraduate introduction to measure-theoretic probability, Cambridge Univ. Press.