Context-sensitive language explained

In formal language theory, a context-sensitive language is a language that can be defined by a context-sensitive grammar (and equivalently by a noncontracting grammar). Context-sensitive is known as type-1 in the Chomsky hierarchy of formal languages.

Computational properties

Computationally, a context-sensitive language is equivalent to a linear bounded nondeterministic Turing machine, also called a linear bounded automaton. That is a non-deterministic Turing machine with a tape of only

kn

cells, where

n

is the size of the input and

k

is a constant associated with the machine. This means that every formal language that can be decided by such a machine is a context-sensitive language, and every context-sensitive language can be decided by such a machine.

This set of languages is also known as NLINSPACE or NSPACE(O(n)), because they can be accepted using linear space on a non-deterministic Turing machine.[1] The class LINSPACE (or DSPACE(O(n))) is defined the same, except using a deterministic Turing machine. Clearly LINSPACE is a subset of NLINSPACE, but it is not known whether LINSPACE = NLINSPACE.[2]

Examples

One of the simplest context-sensitive but not context-free languages is

L=\{anbncn:n\ge1\}

: the language of all strings consisting of occurrences of the symbol "a", then "b"s, then "c"s (abc,,, etc.). A superset of this language, called the Bach language,[3] is defined as the set of all strings where "a", "b" and "c" (or any other set of three symbols) occurs equally often (etc.) and is also context-sensitive.[4] [5]

can be shown to be a context-sensitive language by constructing a linear bounded automaton which accepts . The language can easily be shown to be neither regular nor context-free by applying the respective pumping lemmas for each of the language classes to .

Similarly:

Lit{Cross}=\{ambncmdn:m\ge1,n\ge1\}

is another context-sensitive language; the corresponding context-sensitive grammar can be easily projected starting with two context-free grammars generating sentential forms in the formats

amCm

and

Bndn

and then supplementing them with a permutation production like

CBBC

, a new starting symbol and standard syntactic sugar.

LMUL3=\{ambncmn:m\ge1,n\ge1\}

is another context-sensitive language (the "3" in the name of this language is intended to mean a ternary alphabet); that is, the "product" operation defines a context-sensitive language (but the "sum" defines only a context-free language as the grammar

SaSc|R

and

RbRc|bc

shows). Because of the commutative property of the product, the most intuitive grammar for

Lit{MUL3}

is ambiguous. This problem can be avoided considering a somehow more restrictive definition of the language, e.g.

Lit{ORDMUL3}=\{ambncmn:1<m<n\}

. This can be specialized to

Lit{MUL1}=\{amn:m>1,n>1\}

and, from this, to
L
m2

=\{

m2
a

:m>1\}

,
L
m3

=\{

m3
a

:m>1\}

, etc.

LREP=\{w|w|:w\in\Sigma*\}

is a context-sensitive language. The corresponding context-sensitive grammar can be obtained as a generalization of the context-sensitive grammars for

Lit{Square}=\{w2:w\in\Sigma*\}

,

Lit{Cube}=\{w3:w\in\Sigma*\}

, etc.

Lit{EXP}=\{

2n
a

:n\ge1\}

is a context-sensitive language.[6]

Lit{PRIMES2}=\{w:|w|isprime\}

is a context-sensitive language (the "2" in the name of this language is intended to mean a binary alphabet). This was proved by Hartmanis using pumping lemmas for regular and context-free languages over a binary alphabet and, after that, sketching a linear bounded multitape automaton accepting

LPRIMES2

.[7]

Lit{PRIMES1}=\{ap:pisprime\}

is a context-sensitive language (the "1" in the name of this language is intended to mean a unary alphabet). This was credited by A. Salomaa to Matti Soittola by means of a linear bounded automaton over a unary alphabet[8] (pages 213-214, exercise 6.8) and also to Marti Penttonen by means of a context-sensitive grammar also over a unary alphabet (See: Formal Languages by A. Salomaa, page 14, Example 2.5).

An example of recursive language that is not context-sensitive is any recursive language whose decision is an EXPSPACE-hard problem, say, the set of pairs of equivalent regular expressions with exponentiation.

Properties of context-sensitive languages

See also

References

Notes and References

  1. .
  2. .
  3. Pullum . Geoffrey K. . 1983 . Context-freeness and the computer processing of human languages . .
  4. Bach, E. (1981). "Discontinuous constituents in generalized categorial grammars" . NELS, vol. 11, pp. 1 - 12.
  5. Joshi, A.; Vijay-Shanker, K.; and Weir, D. (1991). "The convergence of mildly context-sensitive grammar formalisms". In: Sells, P., Shieber, S.M. and Wasow, T. (Editors). Foundational Issues in Natural Language Processing. Cambridge MA: Bradford.
  6. Example 9.5 (p. 224) of Hopcroft, John E.; Ullman, Jeffrey D. (1979). Introduction to Automata Theory, Languages, and Computation. Addison-Wesley
  7. 10.1145/321466.321470 . On the Recognition of Primes by Automata . J. Hartmanis and H. Shank . . 15 . 3 . 382–389 . Jul 1968 . 1813/5864 . 17998039 . free .
  8. Salomaa, Arto (1969), Theory of Automata,, Pergamon, 276 pages.
  9. Book: John E. Hopcroft. Jeffrey D. Ullman. Introduction to Automata Theory, Languages, and Computation. registration. Addison-Wesley. 1979. 9780201029888.
    Exercise 9.10, p.230. In the 2000 edition, the chapter on context-sensitive languages has been omitted.
  10. Immerman . Neil . 1988 . Nondeterministic space is closed under complementation . SIAM J. Comput. . 5 . 935–938 . 10.1137/0217058 . 17 . https://web.archive.org/web/20040625094023/http://www.cs.umass.edu/~immerman/pub/space.pdf . 2004-06-25 . live. 10.1.1.54.5941 .