Cobham's theorem should not be confused with Cobham's thesis.
Cobham's theorem is a theorem in combinatorics on words that has important connections with number theory, notably transcendental numbers, and automata theory. Informally, the theorem gives the condition for the members of a set S of natural numbers written in bases b1 and base b2 to be recognised by finite automata. Specifically, consider bases b1 and b2 such that they are not powers of the same integer. Cobham's theorem states that S written in bases b1 and b2 is recognised by finite automata if and only if S differs by a finite set from a finite union of arithmetic progressions. The theorem was proved by Alan Cobham in 1969[1] and has since given rise to many extensions and generalisations.[2] [3]
Let
n>0
n0n1 … nh
n=n0+n1b+ … +n
h | |
hb |
where
0\len0,n1,\ldots,nh<b
nh>0
n0n1 … nh
\langlen\rangleb
nb
A set of natural numbers S is recognisable in base or more simply -recognisable or -automatic if the set
\{nb\midn\inS\}
b
\{0,1,\ldots,b-1\}
Two positive integers
k
\ell
p
q
kp=\ellq
84=163
More equivalent statements of the theorem have been given. The original version by Cobham is the following:[1] Another way to state the theorem is by using automatic sequences. Cobham himself calls them "uniform tag sequences.".[4] The following form is found in Allouche and Shallit's book:[5] We can show that the characteristic sequence of a set of natural numbers S recognisable by finite automata in base k is a k-automatic sequence and that conversely, for all k-automatic sequences
u
0\lei<k
Si
s
us=i
k
Cobham's theorem can be formulated in first-order logic using a theorem proven by Büchi in 1960.[6] This formulation in logic allows for extensions and generalisations. The logical expression uses the theory[7]
\langleN,+,Vr\rangle
of natural integers equipped with addition and the function
Vr
Vr(0)=1
m | |
V | |
r(n)=r |
rm
r
V2(20)=4
V3(20)=1
A set of integers
S
\langleN,+,Vr\rangle
Vr
Examples:
Vr
(\existsy)(x=y+y+1)
\{2n\midn\ge0\}
V2(x)=x
\langleN,+,Vk\rangle
\langleN,+,V\ell\rangle
An automatic sequence is a particular morphic word, whose morphism is uniform, meaning that the length of the images generated by the morphism for each letter of its input alphabet is the same. A set of integers is hence k-recognisable if and only if its characteristic sequence is generated by a uniform morphism followed by a coding, where a coding is a morphism that maps each letter of the input alphabet to a letter of the output alphabet. For example, the characteristic sequence of the powers of 2 is produced by the 2-uniform morphism (meaning each letter is mapped to a word of length 2) over the alphabet
B=\{a,0,1\}
a\mapstoa1 , 1\mapsto10 , 0\mapsto00
which generates the infinite word
a11010001 …
followed by the coding (that is, letter to letter) that maps
a
0
0
1
011010001 …
The notion has been extended as follows:[8] a morphic word
s
\alpha
\alpha
s=\pi(f\omega(b))
where the morphism
f:B*\toB*
B
f\omega(b)
\alpha>1
f
M(f)=(mx,y)x\in
mx,y
x
f(y)
A set S of natural numbers is
\alpha
s
\alpha
z>1
\{z'\in\Complex,|z'|<z\}
We then have the following statement:[8]
The logic equivalent permits to consider more general situations: the automatic sequences over the natural numbers
\N
\Z
\Nm
\R
\Rm
\Z
We code the base
k
0
k-1
k
-6=-8+2
1010
010*
110*
11000
-16+8=-8
\Nm
A subset
X
Nm
k
X
m
For example, in base 2, we have
3=112
9=10012
\begin{pmatrix}3\\9\end{pmatrix}
\begin{pmatrix}0011\\1001\end{pmatrix}=\begin{pmatrix}0\\1\end{pmatrix}\begin{pmatrix}0\\0\end{pmatrix}\begin{pmatrix}1\\0\end{pmatrix}\begin{pmatrix}1\\1\end{pmatrix}
m
Other extensions have been given to the real numbers and vectors of real numbers.
Samuel Eilenberg announced the theorem without proof in his book;[10] he says "The proof is correct, long, and hard. It is a challenge to find a more reasonable proof of this fine theorem." Georges Hansel proposed a more simple proof, published in the not-easily accessible proceedings of a conference.[11] The proof of Dominique Perrin[12] and that of Allouche and Shallit's book[13] contains the same error in one of the lemmas, mentioned in the list of errata of the book.[14] This error was uncovered in a note by Tomi Kärki,[15] and corrected by Michel Rigo and Laurent Waxweiler.[16] This part of the proof has been recently written.[17]
In January 2018, Thijmen J. P. Krebs announced, on Arxiv, a simplified proof of the original theorem, based on Dirichlet's approximation criterion instead of that of Kronecker; the article appeared in 2021.[18] The employed method has been refined and used by Mol, Rampersad, Shallit and Stipulanti.[19]
fr:Jean-Paul Allouche
. Automatic Sequences: theory, applications, generalizations. Shallit. Jeffrey. Jeffrey Shallit. Cambridge University Press. 2003. 0-521-82332-3. Cambridge.fr:Jean-Paul Allouche
. Automatic Sequences: theory, applications, generalizations. Shallit. Jeffrey. Jeffrey Shallit. Cambridge University Press. 2003. 0-521-82332-3. Cambridge. 350.fr:Jean-Paul Allouche
. Automatic Sequences: theory, applications, generalizations. Shallit. Jeffrey. Jeffrey Shallit. Cambridge University Press. 2003. 0-521-82332-3. Cambridge.