Tree Adjoining Grammars

The TAG formalism can be motivated because it provides a larger domain of locality in which to state linguistic dependencies. In most formalisms, dependencies can be defined between the elements of a rule (i.e. between the nodes of a local tree). In TAG, it is possible to state dependencies between nodes of trees which are further apart, because the basic building blocks of the formalism are trees. For example, the relation between a topicalized constituent and its governor can be stated locally in a TAG, whereas in most other formalisms this relation must be stated with some global mechanism. Furthermore, the adjunction operation provides for the analysis of (at least some) kinds of discontinuous constituency constructions. This is reflected in the formal power of the formalism: some TAGs recognize context-sensitive languages.

An initial tree is a tree of which the interior nodes are all labelled with non-terminal symbols, and the nodes on the frontier are either labelled with terminal symbols, or with non-terminal symbols, which are marked with the substitution marker ( ).

An auxiliary tree is defined as an initial tree, except that exactly one of
its frontier nodes must be marked as *foot node* (`*'). The
foot node must be labelled with a non-terminal symbol which is the
same as the label of the root node.

As an example, consider the following initial and auxiliary trees in figure 4.5.

In initial tree , the

Derived trees are built from initial and auxiliary trees by *substitution* and *adjunction*.
Substituting a tree
in a tree
simply
replaces a substitution node in
with , under the
convention that the non-terminal symbol of the substitution node is
the same as the root node of . For example, substituting
in
gives the following tree:

Only initial trees, and derived trees, can be substituted in another tree.

Adjunction is a more complex operation.
Adjoining an auxiliary tree
at some node *n* of a derived tree
proceeds as follows. Firstly, the non-terminal symbol of the
root node (and hence the non-terminal symbol of the foot node)
of
should be the same as the
non-terminal symbol associated with *n*. The sub-tree *t* of
rooted by *n* is removed from , and
is substituted for
it instead; where *t* is substituted in the foot node of . An
illustration of the adjunction operation is presented in
figure 4.6. In (1) the sub-tree
at the node *n* where adjunction takes place, is identified. In (2) the
root node of the auxiliary tree is substituted at node *n*,
and the sub-tree of *n* is substituted in the foot node of the
auxiliary tree, resulting in (3).

As an example, consider the adjunction of at the `n' node of the derived tree:

This yields the derived tree in figure 4.7.

As a further example, we might then adjoin into this derived tree at the

A way to look at this problem from a more general perspective is
presented in [105]. In a way, an elementary tree *partially describes* possible trees. For example, the elementary tree
of figure 4.5 can be interpreted as in
figure 4.9.

Thus, we might view such an initial tree as licensing any tree which
can be built by adjoining at its interior nodes. For that reason,
there is always a sense in which we view such nodes `from the bottom'
or `from the top'. This intuition is formalized in constraint-based
versions as follows. Each of the nodes of an elementary tree is
associated with two variables (called
*top* and
*bot*) representing
the view `from the top' and `from the bottom' on which constraints can
be defined.

In the case of the substitution of a tree , at a substitution
node *n* of a tree , the result is as before, under the
convention that the
*top* variable of the root node of
is constrained to be equal to the
*top* variable of *n*, and
similarly the
*bot* variable of the root of
is equal to
the
*bot* variable of *n*.

Adjunction is slightly more complex. Adjoining an auxiliary tree
at some node *n* of a derived tree
is defined as
before, under the convention that the
*top* variable of *n* is
constrained to be equal to the
*top* variable of the root node of
; furthermore the
*bot* variable of *n* is constrained to be
equal to the
*bot* variable of the foot node of . See
figure 4.10 for an illustration. The
top features of the adjunction node are unified with the top features
of the root node of the auxiliary tree. The bottom features of the
adjunction node are unified with the bottom features of the foot node
of the auxiliary tree.

Finally, at the end of a derivation the
*top* and
*bot* variables
of each node are constrained to be equal as well. A derived tree with
a node of which the
*top* and
*bot* variables cannot be unified is not
regarded as a tree derived by the grammar.

Examples of the use of constraints for TAG are the usual number agreement, case marking, etc. However, it is also possible to use the constraint machinery to define `sign-based' TAGs in which each node is associated with a value for a `phonology' and a `semantics' attribute. We shall see that this will crucially illustrate the need for two distinct variables for each node.

The figures 4.11 and 4.12 illustrate how this might work.

Thus, the argument structure of the bottom part of the verb `saw' is defined as a binary predicate, of which the arguments are the argument structures of the subject and the object. The argument structure of the top part of this verb, however, is not yet known. For example, modifiers may adjoin later at this node to construct a more complex argument structure, built from the simple one associated with the bottom part of the node. Once no more adjunctions are applied, the argument structure is percolated to the verb phrase node: hence the top part of the verb is unified with the bottom part of the verb-phrase. Again, the bottom-part of the verb phrase may be modified, because of adjunctions. Its top-part is unified with the bottom part of the root node, because the argument structure should be percolated, if no more adjunctions apply.

1998-09-30