Robust and Efficient Parsing.

Next: Variants of Grammar-based NLP. Up: Grammar-based NLP Previous: Computational Grammar for Dutch.

Robust and Efficient Parsing.

Parsing algorithms for strings can be generalised to parse word graphs (van Noord 1995). In the ideal case, the parser will find a path in the word graph that can be assigned an analysis according to the grammar, such that the path covers the complete time span of the utterance, i.e. the path leads from the start state to a final state. The analysis gives rise to an update of the dialogue state, which is then passed on to the dialogue manager.

However, often no such paths can be found in the word graph, due to:

errors made by the speech recogniser,
linguistic constructions not covered in the grammar, and
irregularities in the spoken utterance.

Even if no full analysis of the word graph is possible, it is usually the case that useful information can be extracted from the word graph. Consider for example the utterance:

$\exg. Ik wil van van Assen naar Amsterdam\\ I want from from Assen to Amsterdam\\ I want to travel from Assen to Amsterdam \par$

The grammar will not assign an analysis to this utterance due to the repeated preposition. However, it would be useful if the parser would discover the prepositional phrases van Assen and naar Amsterdam since in that case the important information contained in the utterance can still be recovered. Thus, in cases where no full analysis is possible the system should fall back on an approach reminiscent of concept spotting. In van Noord et al. (1999) a general algorithm is proposed which achieves this.

The first ingredient to a solution is that the parser is required to discover all occurrences of major syntactic categories (such as noun phrase, prepositional phrase, subordinate sentence, root sentence) anywhere in the word graph. Conceptually, one can think of these categories as edges which are added to the word graph in addition to the transitions produced by the speech recogniser.

For such word graphs annotated with additional category edges, a path can be defined as a sequence of steps where each step is either a transition or a category edge. A transition step is called a `skip'. For a given annotated word graph many paths are possible. On the basis of an appropriate weight function on such paths, it is possible to search for the best path. The search algorithm is a straightforward generalisation of the DAG-SHORTEST-PATH algorithm (Cormen et al. 1990).

The weight function is sensitive to the following factors:

Acoustic score. Obviously, the acoustic score present in the word graph is an important factor.
The number of skips is minimised in order to obtain a preference for the maximal projections found by the parser.
Number of maximal projections. The number of maximal projections is minimised in order to obtain a preference for more extended linguistic analyses over a series of smaller ones.
Ngram statistics.

The grammar-based NLP component is implemented in SICStus Prolog. Below we report on a number of different methods which are all variations with respect to this weight function.

Next: Variants of Grammar-based NLP. Up: Grammar-based NLP Previous: Computational Grammar for Dutch.

2000-07-10