Chomsky–Schützenberger representation theorem explained

In formal language theory, the Chomsky - Schützenberger representation theorem is a theorem derived by Noam Chomsky and Marcel-Paul Schützenberger in 1959 about representing a given context-free language in terms of two simpler languages. These two simpler languages, namely a regular language and a Dyck language, are combined by means of an intersection and a homomorphism.

The theorem Proofs of this theorem are found in several textbooks, e.g. or .

Mathematics

Notation

A few notions from formal language theory are in order.

A context-free language is regular, if it can be described by a regular expression, or, equivalently, if it is accepted by a finite automaton.

A homomorphism is based on a function

which maps symbols from an alphabet

\Gamma

to words over another alphabet

\Sigma

; If the domain of this function is extended to words over

\Gamma

in the natural way, by letting

h(xy)=h(x)h(y)

for all words

and

, this yields a homomorphism

h:\Gamma^*\to\Sigma^*

A matched alphabet

T\cup\overlineT

is an alphabet with two equal-sized sets; it is convenient to think of it as a set of parentheses types, where

contains the opening parenthesis symbols, whereas the symbols in

\overlineT

contains the closing parenthesis symbols. For a matched alphabet

T\cup\overlineT

, the typed Dyck language

D_T

is given by

D_T=\{w\in(T\cup\overlineT)^*\midwisacorrectlynestedsequenceofparentheses\}.

For example, the following is a valid sentence in the 3-typed Dyck language:

([[ ] ])

Theorem

A language L over the alphabet

\Sigma

is context-free if and only if there exists

a matched alphabet

T\cup\overlineT

a regular language

over

T\cup\overlineT

and a homomorphism

h:(T\cup\overlineT)^*\to\Sigma^*

such that

L=h(D_T\capR)

We can interpret this as saying that any CFG language can be generated by first generating a typed Dyck language, filtering it by a regular grammar, and finally converting each bracket into a word in the CFG language.

References

Book: Autebert . Jean-Michel . Berstel . Jean . Boasson . Luc . 1997 . Context-Free Languages and Push-Down Automata . http://www-igm.univ-mlv.fr/~berstel/Articles/1997CFLPDA.pdf . In G. Rozenberg and A. Salomaa, eds., Handbook of Formal Languages, Vol. 1: Word, Language, Grammar (pp. 111 - 174) . Berlin . . 3-540-60420-0 .
Book: Martin D. . Davis. Ron. Sigal. Elaine J. . Weyuker . Computability, Complexity, and Languages: Fundamentals of Theoretical Computer Science. 1994. Elsevier Science. 0-12-206382-1. 306. 2nd.