In combinatorial mathematics, a de Bruijn sequence of order n on a size-k alphabet A is a cyclic sequence in which every possible length-n string on A occurs exactly once as a substring (i.e., as a contiguous subsequence). Such a sequence is denoted by and has length, which is also the number of distinct strings of length n on A. Each of these distinct strings, when taken as a substring of, must start at a different position, because substrings starting at the same position are not distinct. Therefore, must have at least symbols. And since has exactly symbols, de Bruijn sequences are optimally short with respect to the property of containing every string of length n at least once.
The number of distinct de Bruijn sequences is
kn-1 | |
\dfrac{\left(k!\right) |
The sequences are named after the Dutch mathematician Nicolaas Govert de Bruijn, who wrote about them in 1946. As he later wrote, the existence of de Bruijn sequences for each order together with the above properties were first proved, for the case of alphabets with two elements, by . The generalization to larger alphabets is due to . Automata for recognizing these sequences are denoted as de Bruijn automata.
In most applications, A = .
The earliest known example of a de Bruijn sequence comes from Sanskrit prosody where, since the work of Pingala, each possible three-syllable pattern of long and short syllables is given a name, such as 'y' for short–long–long and 'm' for long–long–long. To remember these names, the mnemonic yamātārājabhānasalagām is used, in which each three-syllable pattern occurs starting at its name: 'yamātā' has a short–long–long pattern, 'mātārā' has a long–long–long pattern, and so on, until 'salagām' which has a short–short–long pattern. This mnemonic, equivalent to a de Bruijn sequence on binary 3-tuples, is of unknown antiquity, but is at least as old as Charles Philip Brown's 1869 book on Sanskrit prosody that mentions it and considers it "an ancient line, written by Pāṇini".[1]
In 1894, A. de Rivière raised the question in an issue of the French problem journal L'Intermédiaire des Mathématiciens, of the existence of a circular arrangement of zeroes and ones of size
2n
2n
n
2n-1-n | |
2 |
2n-1-n | |
2 |
Karl Popper independently describes these objects in his The Logic of Scientific Discovery (1934), calling them "shortest random-like sequences".
The de Bruijn sequences can be constructed by taking a Hamiltonian path of an n-dimensional de Bruijn graph over k symbols (or equivalently, an Eulerian cycle of an (n - 1)-dimensional de Bruijn graph).
An alternative construction involves concatenating together, in lexicographic order, all the Lyndon words whose length divides n.[2]
An inverse Burrows–Wheeler transform can be used to generate the required Lyndon words in lexicographic order.
de Bruijn sequences can also be constructed using shift registers or via finite fields.
Goal: to construct a B(2, 4) de Bruijn sequence of length 24 = 16 using Eulerian (n − 1 = 4 − 1 = 3) 3-D de Bruijn graph cycle.
Each edge in this 3-dimensional de Bruijn graph corresponds to a sequence of four digits: the three digits that label the vertex that the edge is leaving followed by the one that labels the edge. If one traverses the edge labeled 1 from 000, one arrives at 001, thereby indicating the presence of the subsequence 0001 in the de Bruijn sequence. To traverse each edge exactly once is to use each of the 16 four-digit sequences exactly once.
For example, suppose we follow the following Eulerian path through these vertices:
000, 000, 001, 011, 111, 111, 110, 101, 011,
110, 100, 001, 010, 101, 010, 100, 000.
These are the output sequences of length k:
0 0 0 0
_ 0 0 0 1
_ _ 0 0 1 1
This corresponds to the following de Bruijn sequence:
0 0 0 0 1 1 1 1 0 1 1 0 0 1 0 1
The eight vertices appear in the sequence in the following way:
1 1 1 1 0 1 1 0 0 1 0 1 0 1 1 1 0 1 1 0 0 1 0 1 0 0 1 1 0 1 1 0 0 1 0 1 0 0 0 1 0 1 1 0 0 1 0 1 0 0 0 0 0 1 1 0 0 1 0 1 0 0 0 0 1 1 1 0 0 1 0 1 0 0 0 0 1 1 1 0 0 1 0 1 0 0 0 0 1 1 1 0 0 1 0 1 0 0 0 0 1 1 1 1 0 1 0 1 0 0 0 0 1 1 1 1 0 1 0 1 0 0 0 0 1 1 1 1 0 1 0 1 0 0 0 0 1 1 1 1 0 1 1 1 0 0 0 0 1 1 1 1 0 1 1 0 0} 0 0 0 1 1 1 1 0 1 1 0 0 0 0 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 0 1 1 0 0 1 0