In formal language theory, the Chomsky - Schützenberger enumeration theorem is a theorem derived by Noam Chomsky and Marcel-Paul Schützenberger about the number of words of a given length generated by an unambiguous context-free grammar. The theorem provides an unexpected link between the theory of formal languages and abstract algebra.
In order to state the theorem, a few notions from algebra and formal language theory are needed.
Let
N
N
f=f(x)=
infty | |
\sum | |
k=0 |
akxk=a0+a1x1+a2x2+a3x3+ …
ak
N
f
g
an
bn
f(x) ⋅ g(x)=
infty | |
\sum | |
k=0 |
k | |
\left(\sum | |
i=0 |
aibk-i\right)xk.
In particular, we write
f2=f(x) ⋅ f(x)
f3=f(x) ⋅ f(x) ⋅ f(x)
f(x)
Q(x)
p0(x),p1(x),p2(x),\ldots,pn(x)
p0(x)+p1(x) ⋅ f+p2(x) ⋅ f2+ … +pn(x) ⋅ fn=0.
A context-free grammar is said to be unambiguous if every string generated by the grammar admits a unique parse tree or, equivalently, only one leftmost derivation.Having established the necessary notions, the theorem is stated as follows.
Chomsky - Schützenberger theorem. If
L
ak:=|L \cap\Sigmak|
k
L
infty | |
G(x)=\sum | |
k=0 |
akxk
N
Q(x)
Proofs of this theorem are given by, and by .
The theorem can be used in analytic combinatorics to estimate the number of words of length n generated by a given unambiguous context-free grammar, as n grows large. The following example is given by : the unambiguous context-free grammar G over the alphabet has start symbol S and the following rules
S → M | U
M → 0M1M | ε
U → 0S | 0M1U.
To obtain an algebraic representation of the power series associated with a given context-free grammar G, one transforms the grammar into a system of equations. This is achieved by replacing each occurrence of a terminal symbol by x, each occurrence of ε by the integer '1', each occurrence of '→' by '=', and each occurrence of '|' by '+', respectively. The operation of concatenation at the right-hand-side of each rule corresponds to the multiplication operation in the equations thus obtained. This yields the following system of equations:
S = M + U
M = M²x² + 1
U = Sx + MUx²
In this system of equations, S, M, and U are functions of x, so one could also write,, and . The equation system can be resolved after S, resulting in a single algebraic equation:
.
This quadratic equation has two solutions for S, one of which is the algebraic power series . By applying methods from complex analysis to this equation, the number
an
an\inO(2+\epsilon)n
an\notinO(2-\epsilon)n
\epsilon>0
The following example is from :