next up previous contents
Next: The head corner parser Up: Head-corner Parsing Previous: Linear and non-erasing grammars


A sample grammar

In this section I present a simple linear and non-erasing constraint-based grammar for a (tiny) fragment of Dutch. As a caveat I want to stress that the purpose of the current section is to provide an example of possible input for the parser to be defined in the next section, rather than to provide an account that is completely satisfactory from a linguistic point of view.

There is only one parameterized, binary branching, and headed rule in the grammar. The rule does not introduce any terminals. It is defined as follows, where the first daughter represents the head:

\pr\pred
\head{\mbox{\it sign}(\avm[\mbox{\rm M}]{ \mbox{\it syn}: \mbox{\rm Syn...
...\it cp}(\mbox{\rm M}, \langle \mbox{\rm H}, \mbox{\rm Arg}\rangle).}
\epred\epr
In this grammar rule, heads select arguments using a subcat list. Argument structures are specified lexically and are percolated from head to head. Syntactic features are shared between heads (hence I make the simplifying assumption that head = functor, which may have to be revised in order to treat modification). The relation `cp' defines how the string of the mother is constructed from its daughters. In the grammar I use revised versions of Pollard's head wrapping operations to analyze cross serial dependency and verb second constructions. For a linguistic background of these constructions and analyses, cf. [20], [49] and many others. The value of the attribute h-s (for `headed string') consists of three parts, to implement the idea of Pollard's `headed strings'. The parts left and right represent the strings left and right of the head. The part head represent the head string. Hence, the string associated with such a term is the concatenation of the three arguments from left to right. The predicate cp is defined as follows:

\pr\pred
\head{\mbox{\it cp}(\avm{ \mbox{\it h-s}: \mbox{\rm Mphon}\\
\mbox{\i...
... Mphon}),}
\body{ phon\_string(\mbox{\rm Mphon},\mbox{\rm String}).}
\epred\epr
In the first clause, the values of the attribute h-s associated with the two daughters of the rule are to be combined by the wrap predicate. Several versions of this predicate will be defined below. The value of phon of the mother node is defined with respect to its h-s value by the predicate $\mbox{\it phon\_string}$. This predicate is defined in terms of the predicate append/3. As an abbreviation I write A . B for C such that append(A,B,C). The definition of $\mbox{\it phon\_string}$ is:

\pr\pred
\head{\mbox{\it phon\_string}(\avm{ \mbox{\it left}:\mbox{\rm L} \\
\...
...mbox{\rm R} },\mbox{\rm L} \cdot \mbox{\rm H} \cdot \mbox{\rm R} ).}
\epred\epr

A few versions of the predicate wrap are listed below, to illustrate the idea that different string operations can be defined. Each version of the predicate will be associated with an atomic identifier to allow lexical entries to subcategorize for their arguments under the condition that a specific version of this predicate be used. The purpose of this feature is similar to the `order' feature found in UCG [113]. For example, a verb may select an object to its left, and an infinite verb phrase which has to be raised. For simple (left or right) concatenation the predicate is defined as follows:

\pr\pred\head{
\mbox{\it wrap}(\mbox{\rm left},\avm{ \mbox{\it left}: \mbox{\rm ...
...R} \cdot \mbox{\rm AL} \cdot \mbox{\rm AH} \cdot \mbox{\rm AR} }).
}
\epred\epr
In the first case the string associated with the argument is appended to the left of the string left of the head; in the second case this string is appended to the right of the string right of the head.

Lexical entries for intransitive verbs such as `ontwaakt' (wakes up) are defined as follows:

\pr\pred\head{sign(\avm{
\mbox{\it syn}: \mbox{\rm v}\\
\mbox{\it sc}: \langl...
...Subj}$)}\\
\mbox{\it phon}: \langle \mbox{\rm ontwaakt}\rangle
}).}
\epred\epr
I assume that lexical entries also specify that their phon-value is dependent on the h-s value. Furthermore, the values of the left and right attributes of h-s are the empty list. Henceforth, I will not specify the values of phon and h-s explicitly, but assume that each lexical entry extends

\begin{displaymath}\avm{
\mbox{\it h-s}: \avm{ \mbox{\it left}: \langle \rangle ...
...ght}: \langle \rangle } \\ \mbox{\it phon}: \mbox{\rm Head}
}.
\end{displaymath}
Hence, bi-transitive verbs such as `vertelt' (tells) are abbreviated as follows:

\pr\pred\head{\avm{
\mbox{\it syn}: \mbox{\rm v}\\
\mbox{\it sc}: \langle \av...
...\rm vertelt($\mbox{\rm Subj}$,$\mbox{\rm Iobj}$,$\mbox{\rm Obj}$)}
}}\epred\epr
A different version of this lexical entry selects an sbar (complementizer phrase) to the right (simplifying the argument structure):

\pr\pred\head{\avm{
\mbox{\it syn}: \mbox{\rm v}\\
\mbox{\it sc}: \langle \av...
...\rm vertelt($\mbox{\rm Subj}$,$\mbox{\rm Iobj}$,$\mbox{\rm Obj}$)}
}}\epred\epr
Proper nouns such as `Arie' are simply defined as:

\pr\pred\head{\avm{
\mbox{\it syn}: \mbox{\rm n}\\
\mbox{\it sc}: \langle \ra...
...langle \mbox{\rm arie}\rangle\\
\mbox{\it sem}: \mbox{\rm arie}
}}
\epred\epr
For the sake of the example I assume several other NP's to have such a definition.

The choice of data-structure for the value of the attribute h-s allows a simple definition of the verb raising vr version of the wrap predicate that may be used for Dutch cross serial dependencies:

\pr\pred\head{
\mbox{\it wrap}(\mbox{\rm vr},\avm{ \mbox{\it left}: \langle \ran...
...ox{\rm H}\\ \mbox{\it right}:
\mbox{\rm AH} \cdot \mbox{\rm AR}}). }
\epred\epr
Here the head and right string of the argument are appended to the right, whereas the left string of the argument is appended to the left. A raising verb, eg. `hoort' (hears) is defined as:

\pr\pred\head{\avm{
\mbox{\it syn}:\mbox{\rm v}\\
\mbox{\it sc}: \langle \avm...
...\mbox{\it sem}: \mbox{\rm hoort($\mbox{\rm Sj}$,$\mbox{\rm Oj}$)}
}}
\epred\epr
In this entry `hoort' selects -- apart from its NP-subject -- two objects, an NP and a VP (with category INF). The INF still has an element in its subcat list; this element is controlled by the NP (this is performed by the sharing of InfSj). To derive the subordinate phrase

\begin{exam}
\begin{flushleft}
dat Jan Arie Bob leugens hoort vertellen\\
that ...
...ll\\
{\it that Jan hears that Arie tells lies to Bob}
\end{flushleft}\end{exam}
the main verb `hoort' first selects the infinitival `bob leugens vertellen'. These two strings are combined into `bob leugens hoort vertellen' (using the vr version of the wrap predicate). After the selection of the object, resulting in `arie bob leugens hoort vertellen', the subject is selected resulting in the string `jan arie bob leugens hoort vertellen'. This string is selected by the complementizer, resulting in `dat jan arie bob leugens hoort vertellen'. The argument structure will be instantiated as dat(hoort(jan, vertelt(arie, bob, leugens))).

Note that this analysis of verb raising constructions faces problems because of the possibility to coordinate verb clusters. This possibility seems to indicate that an analysis in which subcategorization lists are manipulated (as discussed in the previous chapter) is more promising. For a discussion of these matters, cf. [30].

In Dutch main clauses, there usually is no overt complementizer; instead the finite verb occupies the first position (in yes-no questions), or the second position (right after the topic; ordinary declarative sentences). In the following analysis an empty complementizer selects an ordinary (finite) vp; the resulting string is formed by the following definition of wrap.

\pr\pred\head{
\mbox{\it wrap}(\mbox{\rm v2},\avm{ \mbox{\it left}: \langle \ran...
...x{\rm H} \\
\mbox{\it right}: \mbox{\rm L} \cdot \mbox{\rm R} }).
}\epred\epr
The `empty' finite complementizer is defined as:

\pr\pred\head{
\avm{
\mbox{\it syn}: \mbox{\rm comp}\\
\mbox{\it sc}: \langle...
...n}: \langle \rangle\\
\mbox{\it sem}: \mbox{\rm$\mbox{\rm Obj}$}
}}\epred\epr
whereas an ordinary complementizer, eg. `dat' (that) is defined as:

\pr\pred\head{\avm{
\mbox{\it syn}: \mbox{\rm comp}\\
\mbox{\it sc}: \langle ...
...\langle dat \rangle\\
\mbox{\it sem}: \mbox{\rm$\mbox{\rm Obj}$}
}}\epred\epr
Thus, after the application of the empty complementizer, a verb initial sentence is formed. In the case of root sentences, some mechanism for topicalization will apply, which in some way places a further constituent before the verb. In yes-no questions, the derivation is finished at this point.

Note that this analysis captures the special relationship between complementizers and (fronted) finite verbs in Dutch. The sentence

\begin{exam}
\begin{flushleft}
Hoort Arie Jan Bob vertellen dat Claire ontwaakt?...
...es Arie hear that Jan tells Bob that Claire wakes up?}
\end{flushleft}\end{exam}
is derived as in figure 4.13 (where the head of a string is represented in capitals).

Figure 4.13: Deriving `Hoort Arie Jan Bob vertellen dat Claire ontwaakt'
\begin{figure}
\begin{center}
\leavevmode
\unitlength1pt
\beginpicture
\setplot...
...ion
\put{\hbox{ARIE}} [Bl] at 336.88 378.00
\endpicture
\end{center}\end{figure}

What remains to be done is to define the two grammar specific predicates head/2 and yield/2. These are simply defined as follows:

\pr\pred
\head{\mbox{\it head}(\avm{ \mbox{\it syn}: \mbox{\rm Syn}},\avm{ \mbox...
...yield}(\avm{\mbox{\it phon}: \mbox{\rm String}},\mbox{\rm String}).}
\epred\epr


next up previous contents
Next: The head corner parser Up: Head-corner Parsing Previous: Linear and non-erasing grammars
Noord G.J.M. van
1998-09-30