Tree transducer explained

In theoretical computer science and formal language theory, a tree transducer (TT) is an abstract machine taking as input a tree, and generating output – generally other trees, but models producing words or other structures exist. Roughly speaking, tree transducers extend tree automata in the same way that word transducers extend word automata.

Manipulating tree structures instead of words enable TT to model syntax-directed transformations of formal or natural languages. However, TT are not as well-behaved as their word counterparts in terms of algorithmic complexity, closure properties, etcetera. In particular, most of the main classes are not closed under composition.

The main classes of tree transducers are:

Top-Down Tree Transducers (TOP)

A TOP T is a tuple such that:

q(f(x1,...,xn))\tou

, where f is a symbol of Σ, n is the arity of f, q is a state, and u is a tree on Γ and

Q x 1..n

, such pairs being nullary.

Examples of rules and intuitions on semantics

For instance,

q(f(x1,...,x3))\tog(a,q'(x1),h(q''(x3)))

is a rule – one customarily writes

q(xi)

instead of the pair

(q,xi)

– and its intuitive semantics is that, under the action of q, a tree with f at the root and three children is transformed into

g(a,q'(x1),h(q''(x3)))

where, recursively,

q'(x1)

and

q''(x3)

are replaced, respectively, with the application of

q'

on the first child and with the application of

q''

on the third.

The semantics of each state of the transducer T, and of T itself, is a binary relation between input trees (on Σ) and output trees (on Γ).

A way of defining the semantics formally is to see

\delta

as a term rewriting system, provided that in the right-hand sides the calls are written in the form

q(xi)

, where states q are unary symbols. Then the semantics

[[q]]

of a state q is given by

[[q]]=\{u\mapstov\miduisatreeon\Sigma,visatreeon\Gamma,andq(u)

*
\to
\delta

v\}.

The semantics of T is then defined as the union of the semantics of its initial states:

[[T]]=cupq\in[[q]].

Determinism and domain

As with tree automata, a TOP is said to be deterministic (abbreviated DTOP) if no two rules of δ share the same left-hand side, and there is at most one initial state. In that case, the semantics of the DTOP is a partial function from input trees (on Σ) to output trees (on Γ), as are the semantics of each of the DTOP's states.

The domain of a transducer is the domain of its semantics. Likewise, the image of a transducer is the image of its semantics.

Properties of DTOP

That the domain is DTTA-recognizable is not surprising, considering that the left-hand sides of DTOP rules are the same as for DTTA. As for the reason for the exponential explosion in the worst case (that does not exist in the word case), consider the rule

q(f(x1,x2))\tog(p1(x1),p2(x1),p3(x2))

. In order for the computation to succeed, it must succeed for both children. That means that the right child must be in the domain of

p3

. As for the left child, it must be in the domain of both

p1

and

p2

. Generally, since subtrees can be copied, a single subtree can be evaluated by multiple states during a run, despite the determinism, and unlike DTTA. Thus the construction of the DTTA recognising the domain of a DTOP must account for sets of states and compute the intersections of their domains, hence the exponential. In the special case of linear DTOP, that is to say DTOP where each

xi

appears at most once in the right-hand side of each rule, the construction is linear in time and space.

Consider the transducer coding the transformation

f(x)\tog(x,x)

; that is, duplicate the child of the input. This is easily done by a rule

q(f(x1))\tog(p(x1),p(x1))

, where p encodes the identity. Then, absent any restrictions on the first child of the input, the image is a classical non-regular tree language.

T'

such that the semantics of

T'

is that of T, restricted to L.

This property is linked to the reason deterministic top-down tree automata are less expressive than bottom-up automata: once you go down a given path, information from other paths is inaccessible. Consider the transducer coding the transformation

f(x,y)\toy

; that is, output the right child of the input. This is easily done by a rule

q(f(x1,x2))\top(x2)

, where p encodes the identity. Now let's say we want to restrict this transducer to the finite (and thus, in particular, regular) domain

\{f(c,a),f(c,b)\}

. We must use the rules

q(f(x1,x2))\top(x2),p(a)\toa,p(b)\tob

. But in the first rule,

x1

does not appear at all, since nothing is produced from the left child. Thus, it is not possible to test that the left child is c. In contrast, since we produce from the right child, we can test that it is a or b. In general, the criterion is that DTOP cannot test properties of subtrees from which they do not produce output.

This follows from the point about domain restriction: composing the DTOP encoding identity on

\{f(c,a),f(c,b)\}

with the one encoding

f(x,y)\toy

must yield a transducer with the semantics

\{f(c,a)\mapstoa,f(c,b)\mapstob\}

, which we know is not expressible by a DTOP.

Bottom-Up Tree Transducers (BOT)

As in the simpler case of tree automata, bottom-up tree transducers are defined similarly to their top-down counterparts, but proceed from the leaves of the tree to the root, instead of from the root to the leaves. Thus the main difference is in the form of the rules, which are of the form

f(q1(x1),...,qn(xn))\toq(u)

.

References

Notes and References

  1. Baker, B.S.: Composition of top-down and bottom-up tree transductions. Inf. Control 41(2), 186–213 (1979)
  2. Maneth. Sebastian. A Survey on Decidable Equivalence Problems for Tree Transducers. International Journal of Foundations of Computer Science. December 2015. 26. 8. 1069–1100. 10.1142/S0129054115400134. 20.500.11820/2f1acef4-1b06-485f-bfd1-88636c9e2fe6. free.
  3. Decidability results concerning tree transducers I. www.inf.u-szeged.hu.