Pumping lemma for context-free languages explained

In computer science, in particular in formal language theory, the pumping lemma for context-free languages, also known as the Bar-Hillel lemma,[1] is a lemma that gives a property shared by all context-free languages and generalizes the pumping lemma for regular languages.

The pumping lemma can be used to construct a refutation by contradiction that a specific language is not context-free. Conversely, the pumping lemma does not suffice to guarantee that a language is context-free; there are other necessary conditions, such as Ogden's lemma, or the Interchange lemma.

Formal statement

If a language

L

is context-free, then there exists some integer

p\geq1

(called a "pumping length")[2]

Notes and References

  1. Book: Kreowski, Hans-Jörg . 1979 . Claus . Volker . Ehrig . Hartmut . Hartmut Ehrig . Rozenberg . Grzegorz . A pumping lemma for context-free graph languages . Graph-Grammars and Their Application to Computer Science and Biology. Lecture Notes in Computer Science . 73 . en . Berlin, Heidelberg . Springer . 270–283 . 10.1007/BFb0025726 . 978-3-540-35091-0.
  2. Stephen Scheinberg . 1960 . Note on the Boolean Properties of Context Free Languages . Information and Control . 3 . 4 . 372–375 . 10.1016/s0019-9958(60)90965-7 . free. Here: Lemma 3, and its use on p.374-375.
  3. Book: John E. Hopcroft, Jeffrey D. Ullman. Introduction to Automata Theory, Languages, and Computation. 1979. Addison-Wesley. 0-201-02988-X. Here: sect.6.1, p.129
  4. Book: Berstel . Jean . Lauve . Aaron . Reutenauer . Christophe . Saliola . Franco V. . Combinatorics on words. Christoffel words and repetitions in words . CRM Monograph Series . 27 . Providence, RI . . 2009 . 978-0-8218-4480-9 . 1161.68043 . 90 . (Also see [www-igm.univ-mlv.fr/~berstel/ Aaron Berstel's website)</ref> such that every string <math>s</math> in <math>L</math> that has a [[string length|length]] of

    p

    or more symbols (i.e. with

    |s|\geqp

    ) can be written as

    s=uvwxy

    with substrings

    u,v,w,x

    and

    y

    , such that

    1.

    |vx|\geq1

    ,

    2.

    |vwx|\leqp

    , and

    3.

    uvnwxny\inL

    for all

    n\geq0

    .

    Below is a formal expression of the Pumping Lemma.

    \begin{array}{l}(\forallL\subseteq\Sigma*)\(contextfree(L)\((\existsp\geq1)((\foralls\inL)((|s|\geqp)\((\existsu,v,w,x,y\in\Sigma*)(s=uvwxy\land|vx|\geq1\land|vwx|\leqp\land(\foralln\geq0)(uvnwxny\inL)))))))\end{array}

    Informal statement and explanation

    The pumping lemma for context-free languages (called just "the pumping lemma" for the rest of this article) describes a property that all context-free languages are guaranteed to have.

    The property is a property of all strings in the language that are of length at least

    p

    , where

    p

    is a constant—called the pumping length—that varies between context-free languages.

    Say

    s

    is a string of length at least

    p

    that is in the language.

    The pumping lemma states that

    s

    can be split into five substrings,

    s=uvwxy

    , where

    vx

    is non-empty and the length of

    vwx

    is at most

    p

    , such that repeating

    v

    and

    x

    the same number of times (

    n

    ) in

    s

    produces a string that is still in the language. It is often useful to repeat zero times, which removes

    v

    and

    x

    from the string. This process of "pumping up"

    s

    with additional copies of

    v

    and

    x

    is what gives the pumping lemma its name.

    Finite languages (which are regular and hence context-free) obey the pumping lemma trivially by having

    p

    equal to the maximum string length in

    L

    plus one. As there are no strings of this length the pumping lemma is not violated.

    Usage of the lemma

    The pumping lemma is often used to prove that a given language is non-context-free, by showing that arbitrarily long strings are in that cannot be "pumped" without producing strings outside .

    For example, if

    S\subset\N

    is infinite but does not contain an (infinite) arithmetic progression, then

    L=\{an:n\inS\}

    is not context-free. In particular, neither the prime numbers nor the square numbers are context-free.

    For example, the language

    L=\{anbncn|n>0\}

    can be shown to be non-context-free by using the pumping lemma in a proof by contradiction. First, assume that is context free. By the pumping lemma, there exists an integer which is the pumping length of language . Consider the string

    s=apbpcp

    in . The pumping lemma tells us that can be written in the form

    s=uvwxy

    , where, and are substrings, such that

    |vx|\geq1

    ,

    |vwx|\leqp

    , and

    uviwxiy\inL

    for every integer

    i\geq0

    . By the choice of and the fact that

    |vwx|\leqp

    , it is easily seen that the substring can contain no more than two distinct symbols. That is, we have one of five possibilities for :

    vwx=aj

    for some

    j\leqp

    .

    vwx=ajbk

    for some and with

    j+k\leqp

    vwx=bj

    for some

    j\leqp

    .

    vwx=bjck

    for some and with

    j+k\leqp

    .

    vwx=cj

    for some

    j\leqp

    .

    For each case, it is easily verified that

    uviwxiy

    does not contain equal numbers of each letter for any

    i1

    . Thus,

    uv2wx2y

    does not have the form

    aibici

    . This contradicts the definition of . Therefore, our initial assumption that is context free must be false.

    In 1960, Scheinberg proved that

    L=\{anbnan|n>0\}

    is not context-free using a precursor of the pumping lemma.[2]

    While the pumping lemma is often a useful tool to prove that a given language is not context-free, it does not give a complete characterization of the context-free languages. If a language does not satisfy the condition given by the pumping lemma, we have established that it is not context-free. On the other hand, there are languages that are not context-free, but still satisfy the condition given by the pumping lemma, for example

    L=\{bjckdl|j,k,l\inN\}\cup\{aibjcjdj|i,j\inN,i\ge1\}

    for with e.g. j≥1 choose to consist only of bs, for choose to consist only of as; in both cases all pumped strings are still in L.[3]

    References

    • Bar-Hillel. Y.. Yehoshua Bar-Hillel . Micha Perles . Micha Perles . Eli Shamir . Eli Shamir . On formal properties of simple phrase-structure grammars. Zeitschrift für Phonetik, Sprachwissenschaft, und Kommunikationsforschung. 14. 2. 143–172. 1961. - Reprinted in: Book: Y. Bar-Hillel . 1964 . Language and Information: Selected Essays on their Theory and Application . Addison-Wesley . Addison-Wesley series in logic . 783543642 . 0201003732 . 116–150.
    • Book: Michael Sipser . Michael Sipser . 1997 . Introduction to the Theory of Computation . PWS Publishing . 0-534-94728-X . Section 1.4: Nonregular Languages, pp. 77–83. Section 2.3: Non-context-free Languages, pp. 115–119.