Complexity function explained

In computer science, the complexity function of a word or string (a finite or infinite sequence of symbols from some alphabet) is the function that counts the number of distinct factors (substrings of consecutive symbols) of that string. More generally, the complexity function of a formal language (a set of finite strings) counts the number of distinct words of given length.

Complexity function of a word

Let u be a (possibly infinite) sequence of symbols from an alphabet. Define the functionp_u(n) of a positive integer n to be the number of different factors (consecutive substrings) of length n from the string u.^[1] ^[2] ^[3] ^[4]

For a string u of length at least n over an alphabet of size k we clearly have

1\lep_u(n)\lekⁿ ,

the bounds being achieved by the constant word and a disjunctive word,^[5] for example, the Champernowne word respectively.^[6] For infinite words u, we have p_u(n) bounded if u is ultimately periodic (a finite, possibly empty, sequence followed by a finite cycle). Conversely, if p_u(n) ≤ n for some n, then u is ultimately periodic.^[2] ^[7]

An aperiodic sequence is one which is not ultimately periodic. An aperiodic sequence has strictly increasing complexity function (this is the Morse–Hedlund theorem),^[8] so p(n) is at least n+1.^[9]

A set S of finite binary words is balanced if for each n the subset S_n of words of length n has the property that the Hamming weight of the words in S_n takes at most two distinct values. A balanced sequence is one for which the set of factors is balanced.^[10] A balanced sequence has complexity function at most n+1.^[11]

A Sturmian word over a binary alphabet is one with complexity function n + 1.^[12] A sequence is Sturmian if and only if it is balanced and aperiodic.^[13] ^[14] An example is the Fibonacci word.^[12] ^[15] More generally, a Sturmian word over an alphabet of size k is one with complexity n+k−1. An Arnoux-Rauzy word over a ternary alphabet has complexity 2n + 1:^[12] an example is the Tribonacci word.^[16]

For recurrent words, those in which each factor appears infinitely often, the complexity function almost characterises the set of factors: if s is a recurrent word with the same complexity function as t are then s has the same set of factors as t or δt where δ denotes the letter doubling morphism a → aa.^[17]

Complexity function of a language

Let L be a language over an alphabet and define the function p_L(n) of a positive integer n to be the number of different words of length n in L^[18] The complexity function of a word is thus the complexity function of the language consisting of the factors of that word.

a(bb)^*a

takes values 3 and 4 on odd and even n≥2 respectively. There is an analogue of the Morse–Hedlund theorem: if the complexity of L satisfies p_L(n) ≤ n for some n, then p_L is bounded and there is a finite language F such that^[18]

L\subseteq\{xy^kz:x,y,z\inF, k\inN\} .

A polynomial or sparse language is one for which the complexity function p(n) is bounded by a fixed power of n. A regular language which is not polynomial is exponential: there are infinitely many n for which p(n) is greater than kⁿ for some fixed k > 1.^[19]

Related concepts

The topological entropy of an infinite sequence u is defined by

H_top(u)=\lim_n

	logp_u(n)
	nlogk

The limit exists as the logarithm of the complexity function is subadditive.^[20] ^[21] Every real number between 0 and 1 occurs as the topological entropy of some sequence is applicable,^[22] which may be taken to be uniformly recurrent^[23] or even uniquely ergodic.^[24]

For x a real number and b an integer ≥ 2 then the complexity function of x in base b is the complexity function p(x,b,n) of the sequence of digits of x written in base b.If x is an irrational number then p(x,b,n) ≥ n+1; if x is rational then p(x,b,n) ≤ C for some constant C depending on x and b.^[25] It is conjectured that for algebraic irrational x the complexity is bⁿ (which would follow if all such numbers were normal) but all that is known in this case is that p grows faster than any linear function of n.^[26]

The abelian complexity function p^ab(n) similarly counts the number of occurrences of distinct factors of given length n, where now we identify factors that differ only by a permutation of the positions. Clearly p^ab(n) ≤ p(n). The abelian complexity of a Sturmian sequence satisfies p^ab(n) = 2.^[27]

References

Book: Allouche . Jean-Paul . Shallit . Jeffrey . Jeffrey Shallit . 978-0-521-82332-6 . . Automatic Sequences: Theory, Applications, Generalizations . 2003 . 1086.11015 .
Book: Berstel . Jean . Lauve . Aaron . Reutenauer . Christophe . Saliola . Franco V. . Combinatorics on words. Christoffel words and repetitions in words . CRM Monograph Series . 27 . Providence, RI . . 2009 . 978-0-8218-4480-9 . 1161.68043 .
Book: Berthé . Valérie. Valérie Berthé . Rigo . Michel . Combinatorics, automata, and number theory . Encyclopedia of Mathematics and its Applications . 135 . Cambridge . . 2010 . 978-0-521-51597-9 . 1197.68006 .
Book: Bugeaud, Yann . Distribution modulo one and Diophantine approximation . Cambridge Tracts in Mathematics . 193 . Cambridge . . 2012 . 978-0-521-11169-0 . 1260.11001.
Book: Cassaigne . Julien . Nicolas . François . Factor complexity . 163–247 . Berthé . Valérie. Valérie Berthé . Rigo . Michel . Combinatorics, automata, and number theory . Encyclopedia of Mathematics and its Applications . 135 . Cambridge . . 2010 . 978-0-521-51597-9 . 1216.68204 .
Book: Lothaire, M. . M. Lothaire

. M. Lothaire . Algebraic combinatorics on words . With preface by Jean Berstel and Dominique Perrin . Reprint of the 2002 hardback . Encyclopedia of Mathematics and Its Applications . 90. Cambridge University Press . 2011 . 978-0-521-18071-9 . 1221.68183 .

Book: Pytheas Fogg, N. . Berthé, Valérie. Valérie Berthé. Ferenczi, Sébastien. Mauduit, Christian. Siegel, A. . Substitutions in dynamics, arithmetics and combinatorics . Lecture Notes in Mathematics . 1794 . Berlin . . 2002 . 3-540-44141-7 . 1014.11015 .

Notes and References

Lothaire (2011) p.7
Pytheas Fogg (2002) p.3
Berstel et al (2009) p.82
Allouche & Shallit (2003) p.298
Bugeaud (2012) p.91
Cassaigne & Nicolas (2010) p.165
Allouche & Shallit (2003) p.302
Cassaigne & Nicolas (2010) p.166
Lothaire (2011) p.22
Allouche & Shallit (2003) p.313
Lothaire (2011) p.48
Pytheas Fogg (2002) p.6
Lothaire (2011) p.46
Allouche & Shallit (2003) p.318
Information Processing Letters . 1995 . 307–312 . 10.1016/0020-0190(95)00067-M . A division property of the Fibonacci word . Aldo . de Luca . 54 . 6 .
Pytheas Fogg (2002) p.368
Berstel et al (2009) p.84
Berthé & Rigo (2010) p.166
Berthé & Rigo (2010) p.136
Pytheas Fogg (2002) p.4
Allouche & Shallit (2003) p.303
Cassaigne & Nicolas (2010) p.169
Berthé & Rigo (2010) p.391
Berthé & Rigo (2010) p.169
Bugeaud (2012) p.91
Berthé & Rigo (2010) p.414
Book: On the Asymptotic Abelian Complexity of Morphic Words . Developments in Language Theory. Proceedings, 17th International Conference, DLT 2013, Marne-la-Vallée, France, June 18-21, 2013 . 94–105 . 2013 . 10.1007/978-3-642-38771-5_10 . 978-3-642-38770-8 . Lecture Notes in Computer Science . 7907 . 0302-9743 . . Berlin, Heidelberg . Marie-Pierre . Béal . Olivier . Carton . Francine . Blanchet-Sadri . Nathan . Fox .