Left recursion explained

In the formal language theory of computer science, left recursion is a special case of recursion where a string is recognized as part of a language by the fact that it decomposes into a string from that same language (on the left) and a suffix (on the right). For instance,

1+2+3

can be recognized as a sum because it can be broken into

1+2

, also a sum, and

{}+3

, a suitable suffix.

In terms of context-free grammar, a nonterminal is left-recursive if the leftmost symbol in one of its productions is itself (in the case of direct left recursion) or can be made itself by some sequence of substitutions (in the case of indirect left recursion).

Definition

A grammar is left-recursive if and only if there exists a nonterminal symbol

that can derive to a sentential form with itself as the leftmost symbol.^[1] Symbolically,

A ⇒ ⁺A\alpha

,where

⇒ ⁺

indicates the operation of making one or more substitutions, and

\alpha

is any sequence of terminal and nonterminal symbols.

Direct left recursion

Direct left recursion occurs when the definition can be satisfied with only one substitution. It requires a rule of the form

A\toA\alpha

where

\alpha

is a sequence of nonterminals and terminals . For example, the rule

Expression\toExpression+Term

is directly left-recursive. A left-to-right recursive descent parser for this rule might look likevoid Expression and such code would fall into infinite recursion when executed.

Indirect left recursion

Indirect left recursion occurs when the definition of left recursion is satisfied via several substitutions. It entails a set of rules following the pattern

A₀\to\beta_0A_1\alpha₀

A₁\to\beta_1A_2\alpha₁

…

A_n\to\beta_nA_0\alpha_n

where

\beta_0,\beta_1,\ldots,\beta_n

are sequences that can each yield the empty string, while

\alpha_0,\alpha_1,\ldots,\alpha_n

may be any sequences of terminal and nonterminal symbols at all. Note that these sequences may be empty. The derivation

A_{0 ⇒ \beta}_0A_1\alpha

	+

	0 ⇒

A_1\alpha_{0 ⇒ \beta}_1A_2\alpha_1\alpha

	+ … ⇒

	0 ⇒

⁺A_0\alpha_n...\alpha_1\alpha₀

then gives

A₀

as leftmost in its final sentential form.

Uses

Left recursion is commonly used as an idiom for making operations left-associative: that an expression a+b-c-d+e is evaluated as (((a+b)-c)-d)+e. In this case, that evaluation order could be achieved as a matter of syntax via the three grammatical rules

Expression\toTerm

Expression\toExpression+Term

Expression\toExpression-Term

These only allow parsing the

Expression

a+b-c-d+e as consisting of the

Expression

a+b-c-d and

Term

e, where a+b-c-d in turn consists of the

Expression

a+b-c and

Term

d, while a+b-c consists of the

Expression

a+b and

Term

c, etc.

Removing left recursion

Left recursion often poses problems for parsers, either because it leads them into infinite recursion (as in the case of most top-down parsers) or because they expect rules in a normal form that forbids it (as in the case of many bottom-up parsers). Therefore, a grammar is often preprocessed to eliminate the left recursion.

Removing direct left recursion

The general algorithm to remove direct left recursion follows. Several improvements to this method have been made.^[2] For a left-recursive nonterminal

, discard any rules of the form

A → A

and consider those that remain:

A → A\alpha₁\mid\ldots\midA\alpha_n\mid\beta₁\mid\ldots\mid\beta_m

where:

each

\alpha

is a nonempty sequence of nonterminals and terminals, and

each

\beta

is a sequence of nonterminals and terminals that does not start with

.Replace these with two sets of productions, one set for

A →

	\prime
\beta
	1A

\mid\ldots\mid

	\prime
\beta
	mA

and another set for the fresh nonterminal

(often called the "tail" or the "rest"):

A^\prime →

	\prime
\alpha
	1A

\mid\ldots\mid

	\prime
\alpha
	nA

\mid\epsilon

Repeat this process until no direct left recursion remains.

As an example, consider the rule set

Expression → Expression+Expression\midInteger\midString

This could be rewritten to avoid left recursion as

Expression → IntegerExpression'\midStringExpression'

Expression' → {}+ExpressionExpression'\mid\epsilon

Removing all left recursion

The above process can be extended to eliminate all left recursion, by first converting indirect left recursion to direct left recursion on the highest numbered nonterminal in a cycle.

Inputs A grammar: a set of nonterminals

A_1,\ldots,A_n

and their productions

Output A modified grammar generating the same language but without left recursion

For each nonterminal

A_i

1. Repeat until an iteration leaves the grammar unchanged:

1. 1. For each rule

A_{i → \alpha}_i

\alpha_i

being a sequence of terminals and nonterminals:

1. 1. 1. If

\alpha_i

begins with a nonterminal

A_j

and

j<i

1. 1. 1. 1. Let

\beta_i

\alpha_i

without its leading

A_j

1. 1. 1. 1. Remove the rule

A_{i → \alpha}_i

1. 1. 1. 1. For each rule

A_{j → \alpha}_j

1. 1. 1. 1. Add the rule

A_{i → \alpha}_j\beta_i

1. Remove direct left recursion for

A_i

as described above.Step 1.1.1 amounts to expanding the initial nonterminal
A_j

in the right hand side of some rule
A_i\toA_j\beta

, but only if
j<i

. If
A_i\toA_j\beta

was one step in a cycle of productions giving rise to a left recursion, then this has shortened that cycle by one step, but often at the price of increasing the number of rules.

The algorithm may be viewed as establishing a topological ordering on nonterminals: afterwards there can only be a rule

A_i\toA_j\beta

j>i

.Note that this algorithm is highly sensitive to the nonterminal ordering; optimizations often focus on choosing this ordering well.

Pitfalls

Although the above transformations preserve the language generated by a grammar, they may change the parse trees that witness strings' recognition. With suitable bookkeeping, tree rewriting can recover the originals, but if this step is omitted, the differences may change the semantics of a parse.

Associativity is particularly vulnerable; left-associative operators typically appear in right-associative-like arrangements under the new grammar. For example, starting with this grammar:

Expression → Expression-Term\midTerm

Term → Term*Factor\midFactor

Factor → (Expression)\midInteger

the standard transformations to remove left recursion yield the following:

Expression → Term Expression'

Expression' → {}-Term Expression'\mid\epsilon

Term → Factor Term'

Term' → {}*Factor Term'\mid\epsilon

Factor → (Expression)\midInteger

Parsing the string "1 - 2 - 3" with the first grammar in an LALR parser (which can handle left-recursive grammars) would have resulted in the parse tree:This parse tree groups the terms on the left, giving the correct semantics (1 - 2) - 3.

Parsing with the second grammar giveswhich, properly interpreted, signifies 1 + (-2 + (-3)), also correct, but less faithful to the input and much harder to implement for some operators. Notice how terms to the right appear deeper in the tree, much as a right-recursive grammar would arrange them for 1 - (2 - 3).

Accommodating left recursion in top-down parsing

A formal grammar that contains left recursion cannot be parsed by a LL(k)-parser or other naive recursive descent parser unless it is converted to a weakly equivalent right-recursive form. In contrast, left recursion is preferred for LALR parsers because it results in lower stack usage than right recursion. However, more sophisticated top-down parsers can implement general context-free grammars by use of curtailment. In 2006, Frost and Hafiz described an algorithm which accommodates ambiguous grammars with direct left-recursive production rules.^[3] That algorithm was extended to a complete parsing algorithm to accommodate indirect as well as direct left recursion in polynomial time, and to generate compact polynomial-size representations of the potentially exponential number of parse trees for highly ambiguous grammars by Frost, Hafiz and Callaghan in 2007.^[4] The authors then implemented the algorithm as a set of parser combinators written in the Haskell programming language.^[5]

References

. James Power, Department of Computer Science National University of Ireland, Maynooth Maynooth, Co. Kildare, Ireland.JPR02
Moore. Robert C.. Removing Left Recursion from Context-Free Grammars. 6th Applied Natural Language Processing Conference. May 2000. 249–255.
Frost. R.. R. Hafiz. 2006. A New Top-Down Parsing Algorithm to Accommodate Ambiguity and Left Recursion in Polynomial Time.. ACM SIGPLAN Notices. 41. 5. 46–54. 10.1145/1149982.1149988. 8006549., available from the author at http://hafiz.myweb.cs.uwindsor.ca/pub/p46-frost.pdf
Frost. R.. R. Hafiz. P. Callaghan. June 2007. Modular and Efficient Top-Down Parsing for Ambiguous Left-Recursive Grammars.. 10th International Workshop on Parsing Technologies (IWPT), ACL-SIGPARSE. 109–120. https://web.archive.org/web/20110527032954/http://acl.ldc.upenn.edu/W/W07/W07-2215.pdf. dead. 2011-05-27.
Book: Frost, R. . R. Hafiz . P. Callaghan. Practical Aspects of Declarative Languages . Parser Combinators for Ambiguous Left-Recursive Grammars . January 2008. 4902. 2008. 167–181. 10.1007/978-3-540-77442-6_12. Lecture Notes in Computer Science. 978-3-540-77441-9.

External links

Practical Considerations for LALR(1) Grammars

Left recursion explained

Definition

Direct left recursion

Indirect left recursion

Uses

Removing left recursion

Removing direct left recursion

Removing all left recursion

Pitfalls

Accommodating left recursion in top-down parsing

See also

References

External links