Next: Acknowledgements Up: Evaluation of the NLP Previous: Results of the Evaluation

Conclusions

The grammar-based methods developed in Groningen perform much better than the data-oriented methods developed in Amsterdam. For word graphs, the best data-oriented method obtains an error-rate for concept accuracy of 24.5%. The best grammar-based method performs more than 30% better: an error-rate for concept accuracy of 17.0%. For sentences, a similar difference can be observed. The best data-oriented method obtains an error rate for concept accuracy of 8.5% whereas the grammar-based method performs more than 40% better with a 5.0% error rate. The differences increase with increasing sentence length.

The grammar-based methods require less computational resources than the data-oriented methods. However, the CPU-time requirements are still outrageous for a small number of very large word graphs¹. For sentences, the grammar-based component performs satisfactorily (with a maximum CPU-time of 610 milliseconds).

The by far most important problem for the application consists of disambiguation of the word graph. The evaluation shows that NLP hardly helps here: a combination of speech scores and trigram scores performs much better in terms of string accuracy than the data-oriented methods. The grammar-based methods have incorporated the insight that Ngrams are good at disambiguating word graphs; by incorporating Ngram statistics similar results for string accuracy are obtained. In order to see whether NLP helps at all, we could compare the b(tr,1) method (which simply uses the best path in the word graph as input for the parser) with any of the other grammar-based methods. For instance, the method b(tr,4) performs somewhat better than b(tr,1) (83.0% vs. 82.2% concept accuracy). This shows that in fact NLP is helpful in choosing the best path². If it were feasible to use methods b(tr,N) or f(tr,N) with larger values of N, further improvements might be possible.

Once a given word graph has been disambiguated, then both NLP components work reasonably well: this can be concluded based upon the concept accuracy obtained for sentences. In those cases the grammar-based NLP component also performs better than the data-oriented parser; this indicates that the difference in performance between the two components is not (only) due to the introduction of Ngram statistics in the grammar-based NLP component.

The current evaluation has brought some important shortcomings of the DOP approach to light. Two important problems, for which solutions are in the making, are briefly discussed below.

The first one is the inadequacy of the definition of subtree probability. It turns out that Bod's equation (1) given on page shows a bias toward analyses derived by subtrees from large corpus trees. The error lies in viewing an annotated corpus as the ``flat'' collection of all its subtrees. Information is lost when the distribution of the analyses that supply the subtrees is ignored. The effect is that a large part of the probability mass is consumed by subtrees stemming from relatively rare, large trees in the tree-bank. A better model has been designed, that provides a more reliable way of estimating subtree probabilities.

The second shortcoming we will discuss is the fact that existing DOP algorithms are unable to generalise over the syntactic structures in the data. Corpus-based methods such as the current implementation of DOP, assume that the tree-bank which they employ for acquiring the parser, constitutes a rich enough sample of the domain. It is assumed that the part of the annotation scheme that is actually instantiated in the tree-bank does not under-generate on sentences of the domain. This assumption is not met by our current tree-bank. It turned out that one can expect the tree-bank grammar to generate a parse-space containing the right syntactic/semantic tree only for approximately 90-91% of unseen domain utterances. This figure constitutes an upper bound on the accuracy for any probabilistic model. Enlarging the tree-bank does not guarantee a good coverage, however. The tree-bank will always represent only a sample of the domain. A solution for this problem is the development of automatic methods for generalising grammars, to enhance their coverage. The goal is to improve both accuracy and coverage by generalising over the structures encountered in the tree-bank.

Subsections

Acknowledgements

Next: Acknowledgements Up: Evaluation of the NLP Previous: Results of the Evaluation

2000-07-10