Test Set

Next: Results of the Evaluation Up: Evaluation Procedure and Criteria Previous: Computational Resources.

Test Set

Some indication of the difficulty of the set of 1000 word graphs is presented in table 1. A further indication of the difficulty of this set of word graphs is obtained if we look at the word and sentence accuracy obtained by a number of simple methods. The method speech only takes into account the acoustic scores found in the word graph. No language model is taken into account. The method possible assumes that there is an oracle which chooses a path such that it turns out to be the best possible path. This method can be seen as a natural upper bound of what can be achieved.

The methods speech_bigram and speech_trigram use a combination of bigram (resp. trigram) statistics and the speech score. In the latter four cases, a language model was computed from about 50K utterances (not containing the utterances from the test set). The results are summarised in table 2.

Table 2: Characterisation of test set (2). Word accuracy and sentence accuracy based on acoustic score only (speech); using the best possible path through the word graph, i.e. based on acoustic scores only (possible); and using a combination of bigram (resp. trigram) scores and acoustic scores.

method	WA	SA
speech	69.8	56.0
possible	90.5	83.7
speech_bigram	81.1	73.6
speech_trigram	83.9	76.2

During the development of the NLP components of OVIS2, word graphs were typically small: about 4 transitions per word on average. During the evaluation, however, the number of transitions per word for the test set was much larger. It turned out that the NLP components had trouble with very large word graphs (both memory and CPU-time requirements increase rapidly).

Recently, improvements have already been obtained to treat such large word graphs. For example, the grammar-based NLP component has been extended with a heuristic version of the search algorithm which is not guaranteed to find the best path. In practice this implementation returns the same answers as the original search algorithm, but much more quickly so (two orders of magnitude faster).

Next: Results of the Evaluation Up: Evaluation Procedure and Criteria Previous: Computational Resources.

2000-07-10