Next: Future Work
Up: Practical Considerations
Previous: Membership and Non-membership Predicates
Smaller Transducers
The operations on predicate-augmented finite state recognizers and
transducers discussed here have been fully implemented and integrated
in a finite state toolkit. Although the implementation of these
operations is more involved than for normal automata it turned out
that the introduction of predicates has improved performance
considerably, because automata are smaller.
For example, consider the soundex algorithm expressed as a regular
expression, presented at the Xerox web-site.11 The
soundex algorithm maps proper names to four-letter codes, where
`similar' names are assigned the same code. This algorithm can be
used to match names that are misspelled, for instance due
to poor handwriting or voice transmission; similar problems occur in
historical archives. A description of the algorithm and some
historical remarks are given in [23].
The compilation of the
soundex regular expression yields a transducer with 1217 transitions.
By design, the soundex algorithm treats various classes of characters
identically. Using predicates for each of these classes yields a
transducer with 198 transitions. The construction is four times faster
as well. Depending on how predicates are implemented, running the
resulting transducer might be slower. In our experiments these effects
were not noticeable.
The observation that the use of predicates generally leads to
transducers with fewer states can be observed in practically relevant
examples as well. Consider the following hypothetical phonological
rule:
This rule indicates that an e should be mapped to an a if it is
followed by a consonant and an a. Assuming an alphabet consisting of
5 vowels and 21 consonants, the corresponding minimal transducer for
this example consists of 24 states and 620 transitions. If predicates
are used, the resulting automaton only has 4 states and 10
transitions (cf. figure 2).
Figure:
A minimal transducer with predicates implementing the
phonological rule
. The equivalent minimal
transducer without predicates has 24 states and 620
transitions.
 |
Next: Future Work
Up: Practical Considerations
Previous: Membership and Non-membership Predicates
Noord G.J.M. van
2001-06-22