In formal language theory, the Chomsky - Schützenberger representation theorem is a theorem derived by Noam Chomsky and Marcel-Paul Schützenberger in 1959 about representing a given context-free language in terms of two simpler languages. These two simpler languages, namely a regular language and a Dyck language, are combined by means of an intersection and a homomorphism.
The theorem Proofs of this theorem are found in several textbooks, e.g. or .
A few notions from formal language theory are in order.
A context-free language is regular, if it can be described by a regular expression, or, equivalently, if it is accepted by a finite automaton.
A homomorphism is based on a function
h
\Gamma
\Sigma
\Gamma
h(xy)=h(x)h(y)
x
y
h:\Gamma*\to\Sigma*
A matched alphabet
T\cup\overlineT
T
\overlineT
T\cup\overlineT
DT
DT=\{w\in(T\cup\overlineT)*\midwisacorrectlynestedsequenceofparentheses\}.
([[ ] ])
A language L over the alphabet
\Sigma
T\cup\overlineT
R
T\cup\overlineT
h:(T\cup\overlineT)*\to\Sigma*
such that
L=h(DT\capR)
We can interpret this as saying that any CFG language can be generated by first generating a typed Dyck language, filtering it by a regular grammar, and finally converting each bracket into a word in the CFG language.