Tree Adjoining Grammars

Next: Linear and non-erasing grammars Up: Beyond concatenation Previous: Sequence union

Tree Adjoining Grammars

Tree Adjoining Grammar (TAG) is a formalism originally proposed in [40]. Several variations on that formalism are developed, among which we will be interested in lexicalized (LTAG) [1,81] and constraint-based (FTAG) [106,105] versions. A TAG consists of a number of elementary trees, which can be combined with a substitution, and an adjunction operation.

The TAG formalism can be motivated because it provides a larger domain of locality in which to state linguistic dependencies. In most formalisms, dependencies can be defined between the elements of a rule (i.e. between the nodes of a local tree). In TAG, it is possible to state dependencies between nodes of trees which are further apart, because the basic building blocks of the formalism are trees. For example, the relation between a topicalized constituent and its governor can be stated locally in a TAG, whereas in most other formalisms this relation must be stated with some global mechanism. Furthermore, the adjunction operation provides for the analysis of (at least some) kinds of discontinuous constituency constructions. This is reflected in the formal power of the formalism: some TAGs recognize context-sensitive languages.

Ordinary TAG.

A Tree Adjoining Grammar consists of a set of elementary trees, divided in initial and auxiliary trees. These trees constitute the basic building blocks of the formalism. Operations of adjunction and substitution are defined which build derived trees from elementary trees. TAG thus constitute a tree-generating system, rather than a string-generating system as context-free grammar. The string language of a TAG is defined as the yields of all the trees derived by a TAG.

An initial tree is a tree of which the interior nodes are all labelled with non-terminal symbols, and the nodes on the frontier are either labelled with terminal symbols, or with non-terminal symbols, which are marked with the substitution marker ( $\downarrow$ ).

An auxiliary tree is defined as an initial tree, except that exactly one of its frontier nodes must be marked as foot node (`*'). The foot node must be labelled with a non-terminal symbol which is the same as the label of the root node.

As an example, consider the following initial and auxiliary trees in figure 4.5.

**Figure 4.5:** Three initial and four auxiliary trees as an example of a Tree Adjoining Grammar.
$\begin{figure} \begin{center} \leavevmode \unitlength1pt \beginpicture \setplot... ...ntig}\input{boom-eenentwintig}\input{boom-tweeentwintig}\end{center}\end{figure}$

In initial tree $\alpha_{1}^{}$ , the np node is a substitution node, and left is a terminal symbol associated with a frontier node. In auxiliary tree $\beta_{4}^{}$ the foot node is the leftmost daughter of the root node. with is a terminal symbol at a frontier node, and the np node is a substitution node.

Derived trees are built from initial and auxiliary trees by substitution and adjunction. Substituting a tree $\alpha$ in a tree $\alpha{^\prime}$ simply replaces a substitution node in $\alpha{^\prime}$ with $\alpha$ , under the convention that the non-terminal symbol of the substitution node is the same as the root node of $\alpha$ . For example, substituting $\alpha_{2}^{}$ in $\alpha_{3}^{}$ gives the following tree:

$\begin{figure} \begin{center} \leavevmode \unitlength1pt \beginpicture \setplot... ...option \put{\hbox{boy}} [Bl] at 30.97 18.00 \endpicture \end{center}\end{figure}$

Only initial trees, and derived trees, can be substituted in another tree.

Adjunction is a more complex operation. Adjoining an auxiliary tree $\beta$ at some node n of a derived tree $\gamma$ proceeds as follows. Firstly, the non-terminal symbol of the root node (and hence the non-terminal symbol of the foot node) of $\beta$ should be the same as the non-terminal symbol associated with n. The sub-tree t of $\gamma$ rooted by n is removed from $\gamma$ , and $\beta$ is substituted for it instead; where t is substituted in the foot node of $\beta$ . An illustration of the adjunction operation is presented in figure 4.6. In (1) the sub-tree at the node n where adjunction takes place, is identified. In (2) the root node of the auxiliary tree is substituted at node n, and the sub-tree of n is substituted in the foot node of the auxiliary tree, resulting in (3).

**Figure 4.6:** Illustration of adjunction.
$\begin{figure} \par\begin{picture} (135,100)(-10,50) \put(0,80){\makebox(0,0){\b... ...ut(40,-65){\line(1,2){10}} \put(50,-65){\line(1,2){5}} \end{picture}\end{figure}$

As an example, consider the adjunction of $\beta_{3}^{}$ at the `n' node of the derived tree:

$\begin{figure} \begin{center} \leavevmode \unitlength1pt \beginpicture \setplot... ...option \put{\hbox{boy}} [Bl] at 30.97 18.00 \endpicture \end{center}\end{figure}$

This yields the derived tree in figure 4.7.

**Figure 4.7:** Result of adjoining $\beta_{3}^{}$ .
$\begin{figure} \begin{center} \leavevmode \unitlength1pt \beginpicture \setplot... ...option \put{\hbox{boy}} [Bl] at 75.65 18.00 \endpicture \end{center}\end{figure}$

As a further example, we might then adjoin $\beta_{2}^{}$ into this derived tree at the adj node, resulting in the tree shown in figure 4.8.

**Figure 4.8:** Result of adjoining $\beta_{2}^{}$ .
$\begin{figure} \begin{center} \leavevmode \unitlength1pt \beginpicture \setplot... ...option \put{\hbox{boy}} [Bl] at 90.20 48.00 \endpicture \end{center}\end{figure}$

Adjoining constraints.

Usually, a TAG may specify adjoining constraints on the nodes of its initial and auxiliary trees. Such constraints for example may explicitly forbid adjunction on a node, or only allow adjunction of certain auxiliary trees. However, we will ignore these adjoining constraints because in FTAG these can all be simulated using feature equations.

Lexicalized TAG.

In LTAG, each elementary tree contains at least one frontier node labelled with a terminal symbol. Thus each elementary tree is associated with at least one lexical element. Note that in the example grammar above each of the elementary trees is lexical; hence the grammar provides an example of a lexicalized TAG.

Constraint-based TAG.

Adding constraints to the TAG formalism may at first sight be somewhat similar to the addition of constraints to a context-free grammar: each non-terminal symbol may be constrained by equations. However, in the case of a TAG this will not do. If we simply would constrain a node of an elementary tree, and then perform adjunctions at this node, then this would lead to a situation where these constraints are associated both to two distinct nodes.

A way to look at this problem from a more general perspective is presented in [105]. In a way, an elementary tree partially describes possible trees. For example, the elementary tree $\alpha_{1}^{}$ of figure 4.5 can be interpreted as in figure 4.9.

**Figure 4.9:** Dominance relations in TAG. Solid lines represent immediate dominance, and dotted lines any dominance.
$\begin{figure} \begin{center} \leavevmode \unitlength1pt \beginpicture \setplot... ...ption \put{\hbox{left}} [Bl] at 36.76 18.00 \endpicture \end{center}\end{figure}$

Thus, we might view such an initial tree as licensing any tree which can be built by adjoining at its interior nodes. For that reason, there is always a sense in which we view such nodes `from the bottom' or `from the top'. This intuition is formalized in constraint-based versions as follows. Each of the nodes of an elementary tree is associated with two variables (called top and bot) representing the view `from the top' and `from the bottom' on which constraints can be defined.

In the case of the substitution of a tree $\alpha$ , at a substitution node n of a tree $\alpha{^\prime}$ , the result is as before, under the convention that the top variable of the root node of $\alpha$ is constrained to be equal to the top variable of n, and similarly the bot variable of the root of $\alpha$ is equal to the bot variable of n.

Adjunction is slightly more complex. Adjoining an auxiliary tree $\beta$ at some node n of a derived tree $\gamma$ is defined as before, under the convention that the top variable of n is constrained to be equal to the top variable of the root node of $\beta$ ; furthermore the bot variable of n is constrained to be equal to the bot variable of the foot node of $\beta$ . See figure 4.10 for an illustration. The top features of the adjunction node are unified with the top features of the root node of the auxiliary tree. The bottom features of the adjunction node are unified with the bottom features of the foot node of the auxiliary tree.

**Figure 4.10:** Illustration of adjunction with constraints.
$\begin{figure} \par\begin{picture} (135,100)(-10,-30) \put(0,0){\line(1,0){120}}... ...t(30,-5){\line(-1,-2){30}} \put(0,-65){\line(1,0){60}} \end{picture}\end{figure}$

Finally, at the end of a derivation the top and bot variables of each node are constrained to be equal as well. A derived tree with a node of which the top and bot variables cannot be unified is not regarded as a tree derived by the grammar.

Examples of the use of constraints for TAG are the usual number agreement, case marking, etc. However, it is also possible to use the constraint machinery to define `sign-based' TAGs in which each node is associated with a value for a `phonology' and a `semantics' attribute. We shall see that this will crucially illustrate the need for two distinct variables for each node.

The figures 4.11 and 4.12 illustrate how this might work.

**Figure 4.11:** Example of sign-based TAG: construction of phonology, using a difference-list encoding. Each local tree defines that the bottom string of the mother is the concatenation of the top of the daughters. However, the bottom and top of each node are not equated, as adjunctions may take place at each node.
$\begin{figure} \begin{center} \leavevmode \unitlength1pt \beginpicture \setplot... ...n \put{\hbox{\nnnpb }} [Bl] at 203.66 18.00 \endpicture \end{center}\end{figure}$

The phonology associated with the bottom part of the verb is the sequence $\langle$ saw $\rangle$ . However, adjunctions may apply to this node. For that reason, the phonology of the top part of this node is not yet known. Whatever its value will become, we know that the phonology of the bottom part of the verb phrase node, is the concatenation of the top parts of the verb node and the object np. Again, adjunctions at the verb phrase node may apply. Therefore the bottom and top parts of the verb phrase node are unrelated. The phonology of the bottom of the root node is the concatenation of the top of the subject np and the verb phrase. Note that, if no adjunctions apply at all, the bottom and top parts of each node is unified, giving exactly the right result.

**Figure 4.12:** Example of a sign-based TAG: construction of semantics. The bottom and top semantic structures of each node are not equated, as adjunctions may take place at each node.
$\begin{figure} \begin{center} \leavevmode \unitlength1pt \beginpicture \setplot... ... \put{\hbox{\nnnnpb }} [Bl] at 209.72 18.00 \endpicture \end{center}\end{figure}$

Thus, the argument structure of the bottom part of the verb `saw' is defined as a binary predicate, of which the arguments are the argument structures of the subject and the object. The argument structure of the top part of this verb, however, is not yet known. For example, modifiers may adjoin later at this node to construct a more complex argument structure, built from the simple one associated with the bottom part of the node. Once no more adjunctions are applied, the argument structure is percolated to the verb phrase node: hence the top part of the verb is unified with the bottom part of the verb-phrase. Again, the bottom-part of the verb phrase may be modified, because of adjunctions. Its top-part is unified with the bottom part of the root node, because the argument structure should be percolated, if no more adjunctions apply.

Next: Linear and non-erasing grammars Up: Beyond concatenation Previous: Sequence union

Noord G.J.M. van
1998-09-30