Next: A specification of the Up: An Efficient Implementation of Previous: An Efficient Implementation of

Motivation

In this paper I discuss in full detail the implementation of the head-corner parser. But first I describe the motivations for this approach. I will start by considerations that lead to the choice of a head-driven parser; I will then argue why Prolog is an appropriate language for the implementation of the head-corner parser.

Head-driven Processing

Lexicalist grammar formalisms, such as Head-driven Phrase Structure Grammar (HPSG), have two characteristic properties. Lexical elements and phrases are associated with categories that have considerable internal structure. Second, instead of construction specific rules, a small set of generic rule schemata is used. Consequently, the set of constituent structures defined by a grammar cannot be `read off' the rule set directly, but is defined by the interaction of the rule schemata and the lexical categories. Applying standard parsing algorithms to such grammars is unsatisfactory for a number of reasons. Earley parsing is intractable in general, as the rule set is simply too general. For some grammars, naive top-down prediction may even fail to terminate. [19] therefore proposes a modified version of the Earley-parser, using restricted top-down prediction. While this modification often leads to better termination properties of the parsing method, in practice it easily leads to a complete trivialization of the top-down prediction step, thus leading to inferior performance.

Bottom-up parsing is far more attractive for lexicalist formalisms, as it is driven by the syntactic information associated with lexical elements. Certain inadequacies remain, however. Most importantly, the selection of rules to be considered for application may not be very efficient. Consider, for instance, the following DCG rule:

$\begin{figure}\begin{verbatim}s([],Sem) --> Arg, vp([Arg],Sem).\end{verbatim}\end{figure}$

A parser in which application of a rule is driven by the left-most daughter, as it is for instance in a standard bottom-up active chart parser, will consider the application of this rule each time an arbitrary constituent Arg is derived. For a bottom-up active chart parser, for instance, this may lead to the introduction of large numbers of active items. Most of these items will be useless. For instance, if a determiner is derived, there is no need to invoke the rule, as there are simply no VP's selecting a determiner as subject. Parsers in which the application of a rule is driven by the rightmost daughter, such as shift-reduce and inactive bottom-up chart parsers, encounter a similar problem for rules such as:

$\begin{figure}\begin{verbatim}vp(As,Sem) --> vp([Arg\vert As],Sem), Arg.\end{verbatim}\end{figure}$

Each time an arbitrary constituent Arg is derived, the parser will consider applying this rule, and a search for a matching VP-constituent will be carried out. Again, in many cases (if Arg is instantiated as a determiner or preposition, for instance) this search is doomed to fail, as a VP subcategorizing for a category Arg may simply not be derivable by the grammar. The problem may seem less acute than that posed by uninstantiated left-most daughters for an active chart parser, as only a search of the chart is carried out and no additional items are added to it. Note, however, that the amount of search required may grow exponentially, if more than one uninstantiated daughter is present:

$\begin{figure}\begin{verbatim}vp(As) --> vp([A1,A2\vert As]), A1, A2.\end{verbatim}\end{figure}$

or if the number of daughters is not specified by the rule:

$\begin{figure}\begin{verbatim}vp([A0]) --> vp([A0,...,An]), A1,..., An.\end{verbatim}\end{figure}$

as appears to be the case for some of the rule-schemata used in HPSG.

Several authors have suggested parsing algorithms that may be more suitable for lexicalist grammars. [8] discusses the concept of head-driven parsing. The key idea is that the linguistic concept head can be used to obtain parsing algorithms which are better suited for typical natural language grammars. Most linguistic theories assume that among the daughters introduced by a rule there is one daughter which can be identified as the head of that rule. There are several criteria for deciding which daughter is the head. Two of these criteria seem relevant for parsing. First of all, the head of a rule determines to a large extent what other daughters may or must be present, as the head selects the other daughters. Second, the syntactic category and morphological properties of the mother node are, in the default case, identical to the category and morphological properties of the head daughter. These two properties suggest that it may be possible to design a parsing strategy in which one first identifies a potential head of a rule, before starting to parse the non-head daughters. By starting with the head, important information about the remaining daughters is obtained. Furthermore, since the head is to a large extent identical to the mother category, effective top-down identification of a potential head should be possible.

In [8] two different head-driven parsers are presented. First, a `head-driven' shift-reduce parser is presented which differs from a standard shift-reduce parser in that it considers the application of a rule (i.e. a reduce step) only if a category matching the head of the rule has been found. Furthermore, it may shift elements onto the parse-stack which are similar to the active items (or `dotted rules') of active chart parsers. By using the head of a rule to determine whether a rule is applicable, the head-driven shift-reduce parser avoids the disadvantages of parsers in which either the leftmost or rightmost daughter is used to drive the selection of rules. Kay also presents a `head-corner' parser. The striking property of this parser is that it does not parse a phrase from left to right, but instead operates `bidirectionally'. It starts by locating a potential head of the phrase and then proceeds by parsing the daughters to the left and the right of the head. Again, this strategy avoids the disadvantages of parsers in which rule selection is uniformly driven by either the leftmost or rightmost daughter. Furthermore, by selecting potential heads on the basis of a `head-corner table' (comparable to the left-corner table of a left-corner parser) it may use top-down filtering to minimize the search-space. This head-corner parser generalizes the left-corner parser. Kay's presentation is reminiscent of the left-corner parser as presented by [14] which itself is a version without memoization of the BUP parser [11].

Head-corner parsing has also been considered elsewhere. In [18], [22], [23] and [21] chart-based head-corner parsing for context-free grammar is considered. It is shown that, in spite of the fact that bidirectional parsing seemingly leads to more overhead than left-to-right parsing, the worst-case complexity of a head-corner parser does not exceed that of an Earley parser. Some further variations are discussed in [12].

In [25] and [26] I argue that head-corner parsing is especially useful for parsing with non-concatenative grammar formalisms. In [10] and [27] head-driven parsing strategies for Lexicalized Tree Adjoining Grammars are presented.

The head-corner parser is closely related to the semantic-head-driven generation algorithm (cf. [20] and references cited there), especially in its purely bottom-up incarnation.

Selective memoization

The head-corner parser is in many respects different from traditional chart parsers. An important difference follows from the fact that in the head-corner parser only larger chunks of computation are memoized. Backtracking still plays an important role for the implementation of search.

This may come as a surprise at first. Common wisdom is that although small grammars may be succesfully treated with a backtracking parser, larger grammars for natural languages always require the use of a datastructure such as a chart or a table of items to make sure that each computation is only performed once. In the case of constraint-based grammars, however, the cost associated with maintaining such a chart should not be under-estimated. The memory requirements for an implementation of the Earley parser for a constraint-based grammar are often outrageous. Similarly, in an Earley deduction system too much effort may be spent on small portions of computation which are inexpensive to (re-)compute anyway.

For this reason I will argue for an implementation of the head-corner parser in which only large chunks of computation are memoized. In linguistic terms, I will argue for a model in which only maximal projections are memoized. The computation that is carried out in order to obtain such a `chunk' uses a depth-first backtrack search procedure. This solution dramatically improves upon the (average case) memory requirements of a parser; moreover it also leads to an increase in (average case) time efficiency, especially in combination with goal-weakening, because of the reduced overhead associated with the administration of the chart. In each of the experiments discussed in section 7 the use of selective memoization with goal weakening out-performs standard chart-parsers.

Why Prolog

Prolog is a particularly useful language for the implementation of a head-corner parser for constraint-based grammars. This is due to the following:

Prolog provides a built-in unification operation.
Prolog provides a built-in backtrack search procedure; memoization can be applied selectively.
Underspecification can be exploited to obtain results required by certain techniques for robust parsing.
Prolog is a high-level language; this enables the application of partial evaluation techniques.

These considerations are discussed in turn:

The first consideration does not deserve much further attention. We want to exploit the fact that the primary datastructures of constraint-based grammars and the corresponding information-combining operation can be modelled by Prolog's first order terms and unification.

As was argued above, Prolog backtracking is not used to simulate an iterative procedure to build up a chart via side-effects. On the contrary, Prolog backtracking is used truly for search. Of course, in order to make this approach feasible, certain well-chosen search-goals are memoized. This is clean and logically well-defined (consider, for example, [32]), even if our implementation in Prolog uses extra-logical predicates.

The third consideration is relevant only if we are interested in robust parsing. In certain methods in robust parsing we are interested in the partial results obtained by the parser. In order to make sure that a parser is complete with respect to such partial results, it is often assumed that a parser must be applied that works exclusively bottom-up. In section 6 it will be shown that the head-corner parser (which uses a mixture of bottom-up and top-down processing) can be applied in a similar fashion by using underspecification in the top-goal. Clearly, underspecification is a concept that arises naturally in Prolog.

The fact that Prolog is a high-level language has a number of practical advantages related to the speed of development. A further advantage is obtained because techniques such as partial evaluation can be applied. For example, I have succesfully applied the Mixtus partial evaluator [17] to the head-corner parser discussed below, to obtain an additional 20% speed increase. In languages such as C partial evaluation does not seem to be possible because the low-levelness of the language makes it impossible to recognize the concepts that are required.

Left-corner parsing and head-corner parsing

As the names suggest, there are many parallels between left-corner and head-corner parsing. In fact, head-corner parsing is a generalization of left-corner parsing. Many of the techniques that will be described in the following sections can be applied to a left-corner parser as well.

A head-corner parser for a grammar in which for each rule the left-most daughter is considered to be the head, will effectively function as a left-corner parser. In such cases the head-corner parser can be said to run in `left-corner mode'. Of course, in a left-corner parser certain simplifications are possible. Based on the experiments discussed in section 7, it can be concluded that a specialized left-corner parser is only about 10% faster than a head-corner parser running in left-corner mode. This is an interesting result: it implies that if a head-corner parser is used, you can do at least (almost) as good as a left-corner parser, and, as some of the experiments indicate, often better.

Practical relevance of head-corner parsing: efficiency and robustness

The head-corner parser is one of the parsers that is being developed as part of the NWO Priority Programme on Language and Speech Technology. An overview of the Programme can be found in [4]. An important goal of the Programme is the implementation of a spoken dialogue system for public transport information (the OVIS system). The language of the system is Dutch.

In the context of the OVIS system, it is important that the parser can deal with input from the speech recognizer. The interface between the speech recognizer and the parser consists of word-graphs. In section 5 I show how the head-corner parser is generalized to deal with word-graphs.

Moreover, the nature of the application also dictates that the parser proceeds in a robust way. In section 6 I discuss the OVIS Robustness component, and I show that the use of a parser which includes top-down prediction is not an obstacle towards robustness.

In section 7 we compare the head-corner parser with the other parsers implemented in the Programme for the OVIS application. It will be shown that the head-corner parser operates much faster than implementations of a bottom-up Earley parser and related chart-based parsers. Moreover, the space requirements are far more modest too. The difference with a left-corner parser, which was derived from the head-corner parser, is small.

We performed similar experiments for the Alvey NL Tools grammar of English [6], and the English grammar of the MiMo2 system [30]. From these experiments it can be concluded that selective memoization with goal-weakening (as applied to head-corner and left-corner parsing) is substantially more efficient than conventional chart-parsing. We conclude that at least for some grammars, head-corner parsing is a good option.

Next: A specification of the Up: An Efficient Implementation of Previous: An Efficient Implementation of

Noord G.J.M. van
1998-09-24