A possible implementation of the question mark operator is the
introduction of a special symbol ? in finite state
automata.2 This special symbol is understood as `any alphabet symbol not
mentioned in the automaton', in order to translate examples such as
?-a
. This technique requires that each question mark operator
is expanded into the set of symbols occurring in the regular
expression as a whole. This solution (implemented in a previous
version of the FSA Utilities [34] and in
xfst, the Xerox regular expression compiler [18])
therefore leads to a proliferation of transitions. For example, the
expression
would result in an automaton with 52
transitions: 26 transitions from the initial state to an intermediate
state for each of the letters of the alphabet and 26 transitions from
this intermediate state to a final state for each of the letters
except `d', as well as for ?.3
The idea to allow predicates on transitions instead of symbols is also mentioned in [37] and [38]. The details of this proposal, however, are not given. Apparently, in Watson's proposal predicates potentially inspect arbitrary parts of the input, and consume arbitrary prefixes of the input; the resulting formalism is therefore much more powerful, and hence various closure and efficiency properties are not applicable. In contrast, for the type of predicates proposed here, these attractive properties in fact are applicable, as is shown in the remainder of the article.