Range concatenation grammar (RCG) is a grammar formalism developed by Pierre Boullier [1] in 1998 as an attempt to characterize a number of phenomena of natural language, such as Chinese numbers and German word order scrambling, which are outside the bounds of the mildly context-sensitive languages.[2]
From a theoretical point of view, any language that can be parsed in polynomial time belongs to the subset of RCG called positive range concatenation grammars, and reciprocally.
Though intended as a variant on Groenink's literal movement grammars (LMGs), RCGs treat the grammatical process more as a proof than as a production. Whereas LMGs produce a terminal string from a start predicate, RCGs aim to reduce a start predicate (which predicates of a terminal string) to the empty string, which constitutes a proof of the terminal strings membership in the language.
A Positive Range Concatenation Grammar (PRCG) is a tuple
G=(N,~T,~V,~S,~P)
N
T
V
\dim:N → N\setminus\{0\}
S\inN
\dim(S)=1
P
\psi0 → \psi1\ldots\psim
\psii
Ai(\alpha1,\ldots,
\alpha | |
\dim(Ai) |
)
Ai\inN
\alphai\in(T\cupV)\star
A Negative Range Concatenation Grammar (NRCG) is defined like a PRCG, but with the addition that some predicates occurring in the right-hand side of a clause can have the form
\overline{Ai(\alpha1,\ldots,
\alpha | |
\dim(Ai) |
)}
A Range Concatenation Grammar is a positive or a negative one. Although PRCGs are technically NRCGs, the terms are used to highlight the absence (PRCG) or presence (NRCG) of negative predicates.
A range in a word
w\inT\star
\langlel,r\ranglew
0\leql\leqr\leqn
n
w
\langlel1,r1\ranglew
\langlel2,r2\ranglew
r1=l2
\langlel1,r1\ranglew ⋅ \langlel2,r2\ranglew=\langlel1,r2\ranglew
T\cupV
For a word
w=w1w2\ldotswn
wi\inT
\langlel,r\ranglew=w1\ldotswl-1\bulletwl\ldotswr-1\bulletwr\ldotswn
The strings of predicates being rewritten represent constraints that the string being tested has to satisfy (if positive), or in the case of negative predicates not satisfy. The order of predicates is irrelevant. Rewrite steps amount to replacing one constraint by zero or more simpler constraints.
Like LMGs, RCG clauses have the general schema
A(x1,...,xn)\to\alpha
\alpha
xi
xy
ab
x=\epsilon, y=ab; x=a, y=b; x=ab, y=\epsilon
xy
Predicate terms come in two forms, positive (which produce the empty string on success), and negative (which produce the empty string on failure/if the positive term does not produce the empty string). Negative terms are denoted the same as positive terms, with an overbar, as in
\overline{A(x1,...,xn)}
The rewrite semantics for RCGs is rather simple, identical to the corresponding semantics of LMGs. Given a predicate string
A(\alpha1,...,\alphan)
\alphai
A(x1,...,xn)\to\beta
\beta
xi
For example, given the rule
A(x,ayb)\toB(axb,y)
x
y
a
b
A(a,abb)
B(aab,b)
A(a,abb)
A(x,ayb)
x=a, y=b
A(x,ayb)\toA(x,x) A(y,y)
A(a,abb)
A(a,a) A(b,b)
A proof/recognition of a string
\alpha
S(\alpha)
S(\alpha)
RCGs are capable of recognizing the non-linear index language
\{www:w\in\{a,b\}*\}
Letting x, y, and z be variable symbols:The proof for is then
S(abbabbabb) ⇒ A(abb,abb,abb) ⇒ A(bb,bb,bb) ⇒ A(b,b,b) ⇒ A(\epsilon,\epsilon,\epsilon) ⇒ \epsilon
Or, using the more correct dotted notation for ranges:
S(\bullet{}abbabbabb\bullet{}) ⇒ A(\bullet{}abb\bullet{}abbabb,abb\bullet{}abb\bullet{}abb,abbabb\bullet{}abb\bullet{}) ⇒ A(a\bullet{}bb\bullet{}abbabb,abba\bullet{}bb\bullet{}abb,abbabba\bullet{}bb\bullet{})
⇒ A(ab\bullet{}b\bullet{}abbabb,abbab\bullet{}b\bullet{}abb,abbabbab\bullet{}b\bullet{}) ⇒ A(\epsilon,\epsilon,\epsilon) ⇒ \epsilon
For a string of
3n
\binom{3n+2}{2}=
(3n+2)(3n+1) | |
2 |
x,y,z
n
\epsilon
Every context-free grammar (CFG) can be converted into a range concatenation grammar:
A
1
A(x)
A\toBC
A(xy)\toB(x)C(y)
A\toa
a
A(a)\to\epsilon
The intersection and union of two range concatenation languages are trivially range concatenation languages:
S
A
B
S(x)\toA(x)B(x)
S
A
B
S(x)\toA(x)
S(x)\toB(x)
A consequence of the above is that it is undecidable whether a (positive) range concatenation language is nonempty, because it is undecidable whether the intersection of two context-free languages is nonempty. Hence range concatenation grammars are not generative.