Intuitively, an algorithmically random sequence (or random sequence) is a sequence of binary digits that appears random to any algorithm running on a (prefix-free or not) universal Turing machine. The notion can be applied analogously to sequences on any finite alphabet (e.g. decimal digits). Random sequences are key objects of study in algorithmic information theory.
In measure-theoretic probability theory, introduced by Andrey Kolmogorov in 1933, there is no such thing as a random sequence. For example, consider flipping a fair coin infinitely many times. Any particular sequence, be it
0000...
011010...
011010...
0000...
As different types of algorithms are sometimes considered, ranging from algorithms with specific bounds on their running time to algorithms which may ask questions of an oracle machine, there are different notions of randomness. The most common of these is known as Martin-Löf randomness (K-randomness or 1-randomness), but stronger and weaker forms of randomness also exist. When the term "algorithmically random" is used to refer to a particular single (finite or infinite) sequence without clarification, it is usually taken to mean "incompressible" or, in the case the sequence is infinite and prefix algorithmically random (i.e., K-incompressible), "Martin-Löf–Chaitin random".
Since its inception, Martin-Löf randomness has been shown to admit many equivalent characterizations—in terms of compression, randomness tests, and gambling—that bear little outward resemblance to the original definition, but each of which satisfy our intuitive notion of properties that random sequences ought to have: random sequences should be incompressible, they should pass statistical tests for randomness, and it should be difficult to make money betting on them. The existence of these multiple definitions of Martin-Löf randomness, and the stability of these definitions under different models of computation, give evidence that Martin-Löf randomness is a fundamental property of mathematics and not an accident of Martin-Löf's particular model.
It is important to disambiguate between algorithmic randomness and stochastic randomness. Unlike algorithmic randomness, which is defined for computable (and thus deterministic) processes, stochastic randomness is usually said to be a property of a sequence that is a priori known to be generated by (or is the outcome of) an independent identically distributed equiprobable stochastic process.
Because infinite sequences of binary digits can be identified with real numbers in the unit interval, random binary sequences are often called (algorithmically) random real numbers. Additionally, infinite binary sequences correspond to characteristic functions of sets of natural numbers; therefore those sequences might be seen as sets of natural numbers.
The class of all Martin-Löf random (binary) sequences is denoted by RAND or MLR.
Richard von Mises formalized the notion of a test for randomness in order to define a random sequence as one that passed all tests for randomness. He defined a "collective" (kollektiv) to be an infinite binary string
x1:infty
\limn
1n | |
\sum |
n | |
i=1 |
xi=p\in(0,1)
(x | |
mi |
)i
\limn
1n | |
\sum |
n | |
i=1 |
x | |
mi |
=p
To pick out a subsequence, first pick a binary function
\phi
x1:k
xk+1
Stated in another way, each infinite binary string is a coin-flip game, and an admissible rule is a way for a gambler to decide when to place bets. A collective is a coin-flip game where there is no way for one gambler to do better than another over the long run. That is, there is no gambling system that works for the game.
The definition generalizes from binary alphabet to countable alphabet:
(x | |
mi |
)i
p=1/2
p=1/2
p\in(0,1)
Theorem. (Abraham Wald, 1936, 1937)[3] If there are only countably many admissible rules, then almost any sequence is a collective.
Proof sketch: Use measure-theoretic probability.
Fix one admissible rule. Sample a random sequence from Bernoulli space. With probability 1 (use martingales), the subsequence picked by the admissible rule still has
\limn
1n | |
\sum |
n | |
i=1 |
x | |
mi |
=p
\limn
1n | |
\sum |
n | |
i=1 |
x | |
mi |
=p
Counterexample. (Jean Ville, 1939)[4] If there are only countably many admissible rules, then there exists a collective with
1n | |
\sum |
n | |
k=1 |
xk\geqp
n
Proof: See.[5]
Intuitively, the long-time average of a random sequence should oscillate on both sides of
p
The Ville counterexample suggests that the Mises–Wald–Church sense of randomness is not good enough, because some random sequences do not satisfy some laws of randomness. For example, the Ville counterexample does not satisfy one of the laws of the iterated logarithm:Naively, one can fix this by requiring a sequence to satisfy all possible laws of randomness, where a "law of randomness" is a property that is satisfied by all sequences with probability 1. However, for each infinite sequence
y1:infty\in2\N
x1:infty ≠ y1:infty
(Per Martin-Löf, 1966)[6] defined "Martin-Löf randomness" by only allowing laws of randomness that are Turing-computable. In other words, a sequence is random iff it passes all Turing-computable tests of randomness.
The thesis that the definition of Martin-Löf randomness "correctly" captures the intuitive notion of randomness has been called the Martin-Löf–Chaitin Thesis; it is somewhat similar to the Church–Turing thesis.[7]
Martin-Löf–Chaitin Thesis. The mathematical concept of "Martin-Löf randomness" captures the intuitive notion of an infinite sequence being "random".Like how Turing-computability has many equivalent definitions, Martin-Löf randomness also has many equivalent definitions. See next section.Church–Turing thesis. The mathematical concept of "computable by Turing machines" captures the intuitive notion of a function being "computable".
Martin-Löf's original definition of a random sequence was in terms of constructive null covers; he defined a sequence to be random if it is not contained in any such cover. Gregory Chaitin, Leonid Levin and Claus-Peter Schnorr proved a characterization in terms of algorithmic complexity: a sequence is random if there is a uniform bound on the compressibility of its initial segments. Schnorr gave a third equivalent definition in terms of martingales. Li and Vitanyi's book An Introduction to Kolmogorov Complexity and Its Applications is the standard introduction to these ideas.
K(w)\geq|w|-c
An infinite sequence S is Martin-Löf random if and only if there is a constant c such that all of S finite prefixes are c-incompressible. More succinctly,
K(w)\geq|w|-O(1)
Ui
Ui+1\subseteqUi
\mu(Ui)\leq2-i
G\delta
Ui
A sequence is defined to be Martin-Löf random if it is not contained in any
G\delta
d:\{0,1\}*\to[0,infty)
d(w)=(d(w\smallfrown0)+d(w\smallfrown1))/2
a\smallfrownb
\limsupn\toinftyd(S\upharpoonrightn)=infty,
S\upharpoonrightn
\widehat{d}:\{0,1\}* x \N\to{Q
\widehat{d}(w,t)\leq\widehat{d}(w,t+1)<d(w),
\limt\toinfty\widehat{d}(w,t)=d(w).
A sequence is Martin-Löf random if and only if no constructive martingale succeeds on it.
The Kolmogorov complexity characterization conveys the intuition that a random sequence is incompressible: no prefix can be produced by a program much shorter than the prefix.
The null cover characterization conveys the intuition that a random real number should not have any property that is "uncommon". Each measure 0 set can be thought of as an uncommon property. It is not possible for a sequence to lie in no measure 0 sets, because each one-point set has measure 0. Martin-Löf's idea was to limit the definition to measure 0 sets that are effectively describable; the definition of an effective null cover determines a countable collection of effectively describable measure 0 sets and defines a sequence to be random if it does not lie in any of these particular measure 0 sets. Since the union of a countable collection of measure 0 sets has measure 0, this definition immediately leads to the theorem that there is a measure 1 set of random sequences. Note that if we identify the Cantor space of binary sequences with the interval [0,1] of real numbers, the measure on Cantor space agrees with Lebesgue measure.
An effective measure 0 set can be interpreted as a Turing machine that is able to tell, given an infinite binary string, whether the string looks random at levels of statistical significance. The set is the intersection of shrinking sets
U1\supsetU2\supsetU3\supset …
Un
Un
Un
2-n
The martingale characterization conveys the intuition that no effective procedure should be able to make money betting against a random sequence. A martingale d is a betting strategy. d reads a finite string w and bets money on the next bit. It bets some fraction of its money that the next bit will be 0, and then remainder of its money that the next bit will be 1. d doubles the money it placed on the bit that actually occurred, and it loses the rest. d(w) is the amount of money it has after seeing the string w. Since the bet placed after seeing the string w can be calculated from the values d(w), d(w0), and d(w1), calculating the amount of money it has is equivalent to calculating the bet. The martingale characterization says that no betting strategy implementable by any computer (even in the weak sense of constructive strategies, which are not necessarily computable) can make money betting on a random sequence.
There is a universal constructive martingale d. This martingale is universal in the sense that, given any constructive martingale d, if d succeeds on a sequence, then d succeeds on that sequence as well. Thus, d succeeds on every sequence in RANDc (but, since d is constructive, it succeeds on no sequence in RAND). (Schnorr 1971)
There is a constructive null cover of RANDc. This means that all effective tests for randomness (that is, constructive null covers) are, in a sense, subsumed by this universal test for randomness, since any sequence that passes this single test for randomness will pass all tests for randomness. (Martin-Löf 1966) Intuitively, this universal test for randomness says "If the sequence has increasingly long prefixes that can be increasingly well-compressed on this universal Turing machine", then it is not random." -- see next section.
Construction sketch: Enumerate the effective null covers as
((Um,)n)m
(\cupnUn,)k
If a sequence fails an algorithmic randomness test, then it is algorithmically compressible. Conversely, if it is algorithmically compressible, then it fails an algorithmic randomness test.
Construction sketch: Suppose the sequence fails a randomness test, then it can be compressed by lexicographically enumerating all sequences that fails the test, then code for the location of the sequence in the list of all such sequences. This is called "enumerative source encoding".[9]
Conversely, if the sequence is compressible, then by the pigeonhole principle, only a vanishingly small fraction of sequences are like that, so we can define a new test for randomness by "has a compression by this universal Turing machine". Incidentally, this is the universal test for randomness.
For example, consider a binary sequence sampled IID from the Bernoulli distribution. After taking a large number
N
M ≈ pN
N
M
i
log2\binom{N}{pN} ≈ NH(p)
H
N
M
i
N
\simH(p)N
\simH(p)
p=1/2
See main article: Impossibility of a gambling system. Consider a casino offering fair odds at a roulette table. The roulette table generates a sequence of random numbers. If this sequence is algorithmically random, then there is no lower semi-computable strategy to win, which in turn implies that there is no computable strategy to win. That is, for any gambling algorithm, the long-term log-payoff is zero (neither positive nor negative). Conversely, if this sequence is not algorithmically random, then there is a lower semi-computable strategy to win.
0 | |
\Sigma | |
2 |
0 | |
\Sigma | |
2 |
0 | |
\Sigma | |
2 |
0 | |
\Delta | |
2 |
0 | |
\Delta | |
1 |
0 | |
\Sigma | |
1 |
0 | |
\Pi | |
1 |
0 | |
\Delta | |
2 |
As each of the equivalent definitions of a Martin-Löf random sequence is based on what is computable by some Turing machine, one can naturally ask what is computable by a Turing oracle machine. For a fixed oracle A, a sequence B which is not only random but in fact, satisfies the equivalent definitions for computability relative to A (e.g., no martingale which is constructive relative to the oracle A succeeds on B) is said to be random relative to A. Two sequences, while themselves random, may contain very similar information, and therefore neither will be random relative to the other. Any time there is a Turing reduction from one sequence to another, the second sequence cannot be random relative to the first, just as computable sequences are themselves nonrandom; in particular, this means that Chaitin's Ω is not random relative to the halting problem.
An important result relating to relative randomness is van Lambalgen's theorem, which states that if C is the sequence composed from A and B by interleaving the first bit of A, the first bit of B, the second bit of A, the second bit of B, and so on, then C is algorithmically random if and only if A is algorithmically random, and B is algorithmically random relative to A. A closely related consequence is that if A and B are both random themselves, then A is random relative to B if and only if B is random relative to A.
Relative randomness gives us the first notion which is stronger than Martin-Löf randomness, which is randomness relative to some fixed oracle A. For any oracle, this is at least as strong, and for most oracles, it is strictly stronger, since there will be Martin-Löf random sequences which are not random relative to the oracle A. Important oracles often considered are the halting problem,
\emptyset'
\emptyset(n)
\emptyset(n-1)
0 | |
\Delta | |
2 |
0 | |
\Delta | |
2 |
0 | |
\Delta | |
2 |
Additionally, there are several notions of randomness which are weaker than Martin-Löf randomness. Some of these are weak 1-randomness, Schnorr randomness, computable randomness, partial computable randomness. Yongge Wang showed [11] that Schnorr randomness is different from computable randomness. Additionally, Kolmogorov–Loveland randomness is known to be no stronger than Martin-Löf randomness, but it is not known whether it is actually weaker.
At the opposite end of the randomness spectrum there is the notion of a K-trivial set. These sets are anti-random in that all initial segment is logarithmically compressible (i.e.,
K(w)\leqK(|w|)+b
Wald, A. (1937). Die Wiederspruchsfreiheit des Kollektivbegriffes der Wahrscheinlichkeitsrech- nung. Ergebnisse eines Mathematischen Kolloquiums, 8, 38–72