next up previous
Next: Overview Up: Introduction Previous: Determinization of non-functional transducers

Previous Work

A possible implementation of the question mark operator is the introduction of a special symbol ? in finite state automata.2 This special symbol is understood as `any alphabet symbol not mentioned in the automaton', in order to translate examples such as ?-a. This technique requires that each question mark operator is expanded into the set of symbols occurring in the regular expression as a whole. This solution (implemented in a previous version of the FSA Utilities [34] and in xfst, the Xerox regular expression compiler [18]) therefore leads to a proliferation of transitions. For example, the expression $(a..z \cdot ? - d)$ would result in an automaton with 52 transitions: 26 transitions from the initial state to an intermediate state for each of the letters of the alphabet and 26 transitions from this intermediate state to a final state for each of the letters except `d', as well as for ?.3

The idea to allow predicates on transitions instead of symbols is also mentioned in [37] and [38]. The details of this proposal, however, are not given. Apparently, in Watson's proposal predicates potentially inspect arbitrary parts of the input, and consume arbitrary prefixes of the input; the resulting formalism is therefore much more powerful, and hence various closure and efficiency properties are not applicable. In contrast, for the type of predicates proposed here, these attractive properties in fact are applicable, as is shown in the remainder of the article.


next up previous
Next: Overview Up: Introduction Previous: Determinization of non-functional transducers
Noord G.J.M. van
2001-06-22