In computer science theory – particularly formal language theory – Glushkov's construction algorithm, invented by Victor Mikhailovich Glushkov, transforms a given regular expression into an equivalent nondeterministic finite automaton (NFA). Thus, it forms a bridge between regular expressions and nondeterministic finite automata: two abstract representations of the same class of formal languages.
A regular expression may be used to conveniently describe an advanced search pattern in a "find and replace"–like operation of a text processing utility. Glushkov's algorithm can be used to transform it into an NFA, which furthermore is small by nature, as the number of its states equals the number of symbols of the regular expression, plus one.Subsequently, the NFA can be made deterministic by the powerset construction and then be minimized to get an optimal automaton corresponding to the given regular expression. The latter format is best suited for execution on a computer.
From another, more theoretical point of view, Glushkov's algorithm is a part of the proof that NFA and regular expressions both accept exactly the same languages; that is, the regular languages. The converse of Glushkov's algorithm is Kleene's algorithm, which transforms a finite automaton into a regular expression. The automaton obtained by Glushkov's construction is the same as the one obtained by Thompson's construction algorithm, once its ε-transitions are removed.
Given a regular expression, the Glushkov Construction Algorithm creates a non-deterministic automaton that accepts the language
L(e)
Linearisation of the expression. Each letter of the alphabet appearing in the expression is renamed, so that each letter occurs at most once in the new expression
e'
e'
L(e')
Computation of the sets
P(e')
D(e')
F(e')
P(e')
L(e')
D(e')
L(e')
F(e')
L(e')
L(e')
P(e')=\{x\inB\midxB*\capL(e')\ne\emptyset\}
D(e')=\{y\inB\midB*y\capL(e')\ne\emptyset\}
F(e')=\{u\inB2\midB*uB*\capL(e')\ne\emptyset\}
Computation of the set
Λ(e')
L(e')
Λ(e')=\{\varepsilon\}\capL(e')
\varepsilon
Computation of automaton recognizing the local language, as defined by
P(e')
D(e')
F(e')
Λ(e')
L'=(PB*\capB*D)\setminusB*(B2\setminusF)B*\cupΛ(e')
Strictly speaking, it is the computation of the automaton for the local language denoted by this linearised expression that is Glushkov's construction.
Remove the linearisation, replacing each indexed letter by the original letter of .
Consider[2] the regular expression
e=(a(ab)*)*+(ba)*
e'
+, ⋅ ,*
The obtained automaton is non-deterministic, and it has as many states as the number of letters of the regular expression, plus one. Furthermore, it has been shown[3] [4] that Glushkov's automaton is the same as Thompson's automaton when the ε-transitions are removed.
The computation of the automaton by the expression occurs often; it has been systematically used in search functions, in particular by the Unix grep command. Similarly, XML's specification also uses such constructions; for more efficiency, regular expressions of a certain kind, called deterministic expressions, have been studied.[4] [5]