Wendy Tromp (2003)
Predicting phonemes
The use of language models in a new way of speech recognition
Master's thesis, Rijksuniversiteit Groningen.
[ Paper (PDF, 5427 kb) ]


A major disadvantage of the current automated speech recognition systems is that they cannot make a distinction between background noise and speech, and interpret all input as speech. Tjeerd Andringa describes this in the first chapter of his thesis, called Continuity Preserving Signal Processing (Andringa, 2002).
He and the company Human Quality Speech Technologies (HuQ for short) work on techniques that use characteristics of the human voice to separate speech from noise. In this process, voiced (periodic) and unvoiced (aperiodic) parts are treated separately. Even though both voiced and unvoiced parts can serve as a basis for speech recognition, the unvoiced parts are more difficult to separate from the noise than the (robust) voiced parts. A tool is needed that can predict which voiceless parts (either in the future or in the past) can be expected (chapter 7, paragraph 2 of Andringa, 2002), using knowledge (or hypotheses) about the voiced parts of the signal.
In this research, a perl script creates a language model of a corpus of spontaneously spoken language by extracting phonotactical rules from it. Phonotactical rules are rules that describe which sequences of letters, or in this case phonemes, exist in a language. With this language model, a guided search can be conducted for the correct voiceless phonemes.

Goal of this research is to investigate the usefulness of knowledge gathered from a corpus in speech recognition. Two corpora with spoken Dutch have been selected for this purpose. They will be tested for representativeness. Representative language models created from the corpora will be tested for usefulness. There will be variations in the amount of data and knowledge, and a test is conducted to find out whether different speaking styles require different models to represent them.

This paper is built op as follows: Chapter 1: is thetheoretical background behind the research. It explains how current automated speech recognition works and how it fails, and proposes a new approach, using CPSP and language models. Chapter 2: presents the corpora used in this paper. Chapter 3: talks more about the language models used to improve automated speech recognition, and the test method. 0 tests the reliability of the language models created from the corpora outlined in Chapter 2:, while Chapter 5: tests their usefulness, varying several parameters in testing. Finally, in chapter 6 conclusions are drawn.