Head-corner parsing and Robustness

Certain approaches towards robust parsing use the partial results of the parser. In such approaches it is assumed that even if no full parse for the input could be constructed, the discovery of other phrases in the input might still be useful. In order for such approaches to work it is often assumed that a bottom-up parser is essential: parsers that use top-down information (such as the head-corner parser) may fail to recognize relevant sub-parses in the context of an ungrammaticality.

In the application for which the head-corner parser was developed, robust processing is essential. In a spoken dialogue system it is often impossible to parse a full sentence, but in such cases the recognition of e.g. temporal expressions might still be very useful. Therefore, a robust processing technique which collects the remnants of the parsing process in a meaningful way seems desirable.

In this subsection we show how the head-corner parser can be used in such circumstances. The approach consists of two parts. Firstly, the parser is modified in such a way that it finds all derivations of the start symbol anywhere in the input. Furthermore, the start symbol should be defined in such a way that it includes all categories which are considered useful for the application.

Underspecification of the positions

Normally the head-corner parser will be called as e.g. :

?- parse(s(Sem),0,12).

indicating that we want to parse a sentence from position 0 to 12 with category s(Sem) (a sentence with a semantic representation that is yet to be discovered). Suppose however that a specific robustness module is interested in all `maximal projections' anywhere in the sentence. Such a maximal projection may be represented by a term xp(Sem). Furthermore there may be unary grammar rules rewriting such an xp into appropriate categories, e.g.:

$\begin{figure}\begin{verbatim}xp(Sem) --> np(Sem). xp(Sem) --> s(Sem). xp(Sem) --> pp(Sem). xp(Sem) --> advp(Sem).\end{verbatim}\end{figure}$

If we want to recognize all maximal projections at all positions in the input, then we can simply give the following parse goal:

$\begin{figure}\begin{verbatim}?- parse(xp(Sem),_,_).\end{verbatim}\end{figure}$

Now one might expect that such an underspecified goal will dramatically slow down the head-corner parser, but this turns out to be false. In actual fact we have experienced an increase of efficiency using underspecification. This can only be understood in the light of the use of memoization. Even though we now have a much more general goal, the number of different goals that we need to solve is much smaller.

Also note that even though the first call to the parse predicate has variable extreme positions, this does not imply that all power of top-down prediction is lost by this move; recursive calls to the parse predicate may still have instantiated left and/or right extreme positions. The same applies with even more force for top-down information on categories.

The Robustness Component in OVIS

In an attempt to obtain a robust natural language understanding component we have experimented in OVIS with the techniques mentioned in the preceding paragraph. The top category (start symbol) of the OVIS grammar is defined as the category max(Sem). Moreover there are unary rules such as max(Sem) np(Sem,..) for NP, S, PP, ADVP.

In the first phase, the parser finds all occurrences of the top category in the input word-graph. Thus, we obtain items for all possible maximal projections anywhere in the input graph. In the second phase, the robustness component selects a sequence of such maximal projections. The robustness procedure consists of a best-first search from the beginning of the graph to the end of the graph. A path in the input graph can be constructed by taking steps of the following two types. To move from position P to Q you can either:

use a maximal projection from P to Q (as constructed by the parser)
use a transition from P to Q. In this case we say that we `skip' that transition.

In order to compare paths in the best-first search method, we have experimented with score functions which include some or all of the following factors:

the number of skips. We prefer paths with a smaller number of such skips.
the number of maximal projections. We prefer paths with a smaller number of such projections.
the combined acoustic score as defined in the word-graph.
the appropriateness of the semantic representation given the dialogue context
the bigram score.

If bigram scores are not included, then this best-first search method can be implemented efficiently because for each state in the word-graph we only have to keep track of the best path to that state.

The resulting `best' path in general consists of a number of maximal projections. In the OVIS application these often are simple time or place expressions. The pragmatic module is able to deal with such unconnected pieces of information and will perform better if given such partial parse results.

In order to evaluate the appropriate combination of the factors determining the scoring function, and to evaluate this approach with respect to other approaches, we use a corpus of word-graphs for which we know the corresponding actual utterances. We compare the sentence associated with the `best' path in the word-graph with the sentence that was actually spoken. Clearly if the robustness component more often uses the information that was actually uttered, then we have more confidence in that component. This notion of word accuracy is an approximation of semantic accuracy (or `concept accuracy'). The string comparison is defined by the minimal number of deletions and insertions that is required to turn the first string into the second (Levenshtein distance), although it may be worthwhile to investigate other measures. For example, it seems likely that for our application it is less problematic to `miss' information, whereas `hallucination' is a more severe problem. This could be formalized by a scoring function in which insertion (into analysis result) is cheaper than deletion.

Currently the best results are obtained with a scoring function in which bigram scores, acoustic scores and the number of skips is included. We have also implemented a version of the system in which acoustic scores and bigram scores are used to select the best path through the word-graph. This path is then sent to the parser and the robustness component. In this `best-1-mode' the system performs somewhat worse in terms of word-accuracy, but much faster (cf. the experiments in the next section).

Next: Practical Experience Up: An Efficient Implementation of Previous: Parsing Word-graphs with Probabilities

Noord G.J.M. van
1998-09-24