In mathematics, in the areas of combinatorics and computer science, a Lyndon word is a nonempty string that is strictly smaller in lexicographic order than all of its rotations. Lyndon words are named after mathematician Roger Lyndon, who investigated them in 1954, calling them standard lexicographic sequences.[1] Anatoly Shirshov introduced Lyndon words in 1953 calling them regular words.[2] Lyndon words are a special case of Hall words; almost all properties of Lyndon words are shared by Hall words.
Several equivalent definitions exist.
A
k
n>0
n
k
Alternately, a word
w
w<v
v
w=uv
u
Another characterisation is the following: A Lyndon word has the property that it is nonempty and, whenever it is split into two nonempty substrings, the left substring is always lexicographically less than the right substring. That is, if
w
w=uv
u
v
u<v
w
\ge2
u
v
u<v
w=uv
u
v
v
The Lyndon words over the two-symbol binary alphabet, sorted by length and then lexicographically within each length class, form an infinite sequence that begins
0, 1, 01, 001, 011, 0001, 0011, 0111, 00001, 00011, 00101, 00111, 01011, 01111, ...The first string that does not belong to this sequence, "00", is omitted because it is periodic (it consists of two repetitions of the substring "0"); the second omitted string, "10", is aperiodic but is not minimal in its permutation class as it can be cyclically permuted to the smaller string "01".
The empty string also meets the definition of a Lyndon word of length zero. The numbers of binary Lyndon words of each length, starting with length zero, form the integer sequence
1, 2, 1, 2, 3, 6, 9, 18, 30, 56, 99, 186, 335, ... Lyndon words correspond to aperiodic necklace class representatives and can thus be counted with Moreau's necklace-counting function.[3] [5]
provides an efficient algorithm for listing the Lyndon words of length at most
n
s
w
w
w
x
n
x
x
w=00111
x=00111 00\cancel{111}
x=00111 01
The worst-case time to generate the successor of a word
w
O(n)
n
x
w
w
w
w
n
n
n
n
According to the Chen–Fox–Lyndon theorem, every string may be formed in a unique way by concatenating a sequence of Lyndon words, in such a way that the words in the sequence are nonincreasing lexicographically.[7] The final Lyndon word in this sequence is the lexicographically smallest suffix of the given string.[8] A factorization into a nonincreasing sequence of Lyndon words (the so-called Lyndon factorization) can be constructed in linear time.[8] Lyndon factorizations may be used as part of a bijective variant of the Burrows–Wheeler transform for data compression,[9] and in algorithms for digital geometry.[10]
Such factorizations can be written (uniquely) as finite binary trees, with the leaves labelled by the alphabet, with each rightward branch given by the final Lyndon word in the sequence.Such trees are sometimes called standard bracketings and can be taken as factorization of the elements of a free group or as the basis elements for a free Lie algebra. These trees are a special case of Hall trees (as Lyndon words are a special case of Hall words), and so likewise, the Hall words provide a standard order, called the commutator collecting process for groups, and basis for Lie algebras.[11] Indeed, they provide an explicit construction for the commutators appearing in the Poincaré–Birkhoff–Witt theorem needed for the construction of universal enveloping algebras.
Every Lyndon word can be understood as a permutation, the suffix standard permutation.
developed an algorithm for finding the standard factorization that runs in linear time and constant space. It iterates over a string trying to find as long a Lyndon word as possible. When it finds one, it adds it to the result list and proceeds to search the remaining part of the string. The resulting list of strings is the standard factorization of the given string. More formal description of the algorithm follows.
Given a string S of length N, one should proceed with the following steps:
If one concatenates together, in lexicographic order, all the Lyndon words that have length dividing a given number n, the result is a de Bruijn sequence, a circular sequence of symbols such that each possible length-n sequence appears exactly once as one of its contiguous subsequences. For example, the concatenation of the binary Lyndon words whose length divides four is
0 0001 0011 01 0111 1This construction, together with the efficient generation of Lyndon words, provides an efficient method for constructing a particular de Bruijn sequence in linear time and logarithmic space.[12]
Lyndon words have an application to the description of free Lie algebras, in constructing a basis for the homogeneous part of a given degree; this was Lyndon's original motivation for introducing these words.[13] Lyndon words may be understood as a special case of Hall sets.[13]
For prime p, the number of irreducible monic polynomials of degree d over the field
Fp
A theorem of Radford states that a shuffle algebra over a field of characteristic 0 can be viewed as a polynomial algebra over the Lyndon words. More precisely, let A be an alphabet, let k be a field of characteristic 0 (or, more general, a commutative -algebra), and let R be the free noncommutative k-algebra . The words over A can then be identified with the "noncommutative monomials" (i.e., products of the xa) in R; namely, we identify a word (a1,a2,...,an) with the monomial xa1xa2...xan. Thus, the words over A form a k-vector space basis of R. Then, a shuffle product is defined on R; this is a k-bilinear, associative and commutative product, which is denoted by ш, and which on the words can be recursively defined by
1 ш v = v for any word v;
u ш 1 = u for any word u;
ua ш vb = (u ш vb)a + (ua ш v)b for any and any words u and v.The shuffle algebra on the alphabet A is defined to be the additive group R endowed with ш as multiplication. Radford's theorem now states that the Lyndon words are algebraically independent elements of this shuffle algebra, and generate it; thus, the shuffle algebra is isomorphic to a polynomial ring over k, with the indeterminates corresponding to the Lyndon words.