next up previous
Next: Future Work Up: Practical Considerations Previous: Membership and Non-membership Predicates


Smaller Transducers

The operations on predicate-augmented finite state recognizers and transducers discussed here have been fully implemented and integrated in a finite state toolkit. Although the implementation of these operations is more involved than for normal automata it turned out that the introduction of predicates has improved performance considerably, because automata are smaller.

For example, consider the soundex algorithm expressed as a regular expression, presented at the Xerox web-site.11 The soundex algorithm maps proper names to four-letter codes, where `similar' names are assigned the same code. This algorithm can be used to match names that are misspelled, for instance due to poor handwriting or voice transmission; similar problems occur in historical archives. A description of the algorithm and some historical remarks are given in [23]. The compilation of the soundex regular expression yields a transducer with 1217 transitions. By design, the soundex algorithm treats various classes of characters identically. Using predicates for each of these classes yields a transducer with 198 transitions. The construction is four times faster as well. Depending on how predicates are implemented, running the resulting transducer might be slower. In our experiments these effects were not noticeable.

The observation that the use of predicates generally leads to transducers with fewer states can be observed in practically relevant examples as well. Consider the following hypothetical phonological rule:


\begin{displaymath}\mbox{\tt e} \rightarrow \mbox{\tt a} / \_\!\_\!\_ \mbox{\tt C a}\end{displaymath}

This rule indicates that an e should be mapped to an a if it is followed by a consonant and an a. Assuming an alphabet consisting of 5 vowels and 21 consonants, the corresponding minimal transducer for this example consists of 24 states and 620 transitions. If predicates are used, the resulting automaton only has 4 states and 10 transitions (cf. figure 2).

Figure: A minimal transducer with predicates implementing the phonological rule $\mbox{\tt e} \rightarrow \mbox{\tt a} /
\_\!\_\!\_ \mbox{\tt C a}$. The equivalent minimal transducer without predicates has 24 states and 620 transitions.
\begin{figure}
\par\footnotesize\par\begin{pspicture}(-1,-1)(10,6)
\psset{unit=0...
...gle\overline{\{\mbox{\tt a,e,i,o,u}\}}\rangle$}}
\end{pspicture}\par\end{figure}


next up previous
Next: Future Work Up: Practical Considerations Previous: Membership and Non-membership Predicates
Noord G.J.M. van
2001-06-22