Improving question-answering systems using phonetic information
Jelte van Waterschoot and Mariet Theune


Most multi-modal agents make use of spoken language for interaction with users. Currently speech recognition works well in isolated environments, but in real world settings speech recognition errors occur more frequently. Accents, overlapping voices and noise in the background all have a negative impact on speech recognition. Our research focus is on multimodal question-answering systems that work with a database of questions and corresponding answers. For such a system, getting an approximation of the user’s question could be sufficient for providing the correct answer; a perfect transcription is not necessary. We propose to deal with speech recognition errors via a method called phonetic encoding. This method involves transforming sentences (the questions in the QA database as well as the speech recognition results) into their phonetic representations. We then use both the original transcript and its phonetic encoding to improve accuracy of matching the user’s utterance to questions in the database. We investigated different methods to encode sentences phonetically , as a continuation of work by Wang et al. (Wang, Artstein, Leuski, & Traum, 2012). The evaluation of the system is ongoing and will be presented at CLIN ’29.

Wang, W. Y., Artstein, R., Leuski, A., & Traum, D. (2012). Improving Spoken Dialogue Understanding Using Phonetic Mixture Models. In Proceedings of the Twenty-Fourth International Florida Artificial Intelligence Research Society Conference (pp. 225–238).