Next: Results of the Evaluation
Up: Evaluation Procedure and Criteria
Previous: Computational Resources.
Some indication of the difficulty of the set of 1000 word graphs is
presented in table 1.
A further indication of the difficulty of this set of word graphs is
obtained if we look at the word and sentence accuracy obtained by a
number of simple methods. The method speech only takes into
account the acoustic scores found in the word graph. No language model
is taken into account. The method possible assumes that there is
an oracle which chooses a path such that it turns out to be the best
possible path. This method can be seen as a natural upper bound of
what can be achieved.
The methods speech_bigram and speech_trigram use a
combination of bigram (resp. trigram) statistics and the speech score.
In the latter four cases, a language model was computed from about
50K utterances (not containing the utterances from the test set).
The results are summarised in table 2.
Table 2:
Characterisation of test set (2).
Word accuracy and sentence accuracy based on
acoustic score only (speech); using the best
possible path through the word graph, i.e. based on acoustic scores only
(possible); and
using a combination of bigram (resp. trigram) scores and acoustic
scores.
method |
WA |
SA |
speech |
69.8 |
56.0 |
possible |
90.5 |
83.7 |
speech_bigram |
81.1 |
73.6 |
speech_trigram |
83.9 |
76.2 |
|
During the development of the NLP components of OVIS2, word graphs
were typically small: about 4 transitions per word on average. During
the evaluation, however, the number of transitions per word for the
test set was much larger. It turned out that the NLP components had
trouble with very large word graphs (both memory and CPU-time
requirements increase rapidly).
Recently, improvements have already been obtained to treat such large
word graphs. For example, the grammar-based NLP component has been
extended with a heuristic version of the search algorithm which is not
guaranteed to find the best path. In practice this implementation
returns the same answers as the original search algorithm, but much
more quickly so (two orders of magnitude faster).
Next: Results of the Evaluation
Up: Evaluation Procedure and Criteria
Previous: Computational Resources.
2000-07-10