[ Paper (PDF, 5427 kb) ]
- Wendy Tromp (2003)
- Predicting phonemes
- The use of language models in a new way of speech recognition
- Master's thesis, Rijksuniversiteit Groningen.
A major disadvantage of the current automated speech recognition systems is
that they cannot make a distinction between background noise and speech,
and interpret all input as speech. Tjeerd Andringa describes this in the
first chapter of his thesis, called Continuity Preserving Signal Processing
He and the company Human Quality Speech Technologies (HuQ for short) work on
techniques that use characteristics of the human voice to separate speech
from noise. In this process, voiced (periodic) and unvoiced (aperiodic)
parts are treated separately. Even though both voiced and unvoiced parts
can serve as a basis for speech recognition, the unvoiced parts are more
difficult to separate from the noise than the (robust) voiced parts. A
tool is needed that can predict which voiceless parts (either in the
future or in the past) can be expected (chapter 7, paragraph 2 of
Andringa, 2002), using knowledge (or hypotheses) about the voiced parts
of the signal.
In this research, a perl script creates a language model of a corpus of
spontaneously spoken language by extracting phonotactical rules from it.
Phonotactical rules are rules that describe which sequences of letters,
or in this case phonemes, exist in a language. With this language model,
a guided search can be conducted for the correct voiceless phonemes.
Goal of this research is to investigate the usefulness of knowledge gathered
from a corpus in speech recognition. Two corpora with spoken Dutch have
been selected for this purpose. They will be tested for
representativeness. Representative language models created from the
corpora will be tested for usefulness. There will be variations in the
amount of data and knowledge, and a test is conducted to find out whether
different speaking styles require different models to represent them.
This paper is built op as follows: Chapter 1: is thetheoretical background
behind the research. It explains how current automated speech recognition
works and how it fails, and proposes a new approach, using CPSP and
language models. Chapter 2: presents the corpora used in this paper.
Chapter 3: talks more about the language models used to improve automated
speech recognition, and the test method. 0 tests the reliability of the
language models created from the corpora outlined in Chapter 2:, while
Chapter 5: tests their usefulness, varying several parameters in testing.
Finally, in chapter 6 conclusions are drawn.