Finite State Calculus

Finite state approaches have proven to be very successful for efficient encoding of phonological rules. In particular, the work of [8] has provided a compiler from classical generative phonology rewriting rules to finite state transducers. This work has clearly shown how apparently procedural rules can be recast in a declarative, reversible framework.

In the process of developing their rule compiler, Kaplan & Kay also developed a high-level finite state calculus. They argue convincingly that this calculus provides an appropriate high-level approach for expressing regular languages and relations. The alternative conception in term of states and transitions can become unwieldy for all but the simplest cases.¹

Kaplan & Kay's finite state calculus now exists in multiple implementations, the most well-known of which is that of [13]. In this paper, however, we will use the alternative implementation provided by the FSA Utilities [22,23,24]. The FSA Utilities allows the programmer to introduce new regular expression operators of arbitrary complexity. This higher-level interface allows us to express our algorithm more easily. The syntax of the FSA Utilities calculus is summarized in Table 1.

`[]`	empty string
`[E1,E2,...,En]`	concatenation of `E1...En`
`{}`	empty language
`{E1,E2,...,En}`	union of `E1...En`
`(E)`	grouping for op. precedence
`E*`	Kleene closure
`E+`	Kleene plus
`E^`	optionality
`E1 - E2`	difference
`~E`	complement
`$ E`	containment
`E1 & E2`	intersection
`?`	any symbol
`E1 x E2`	cross-product
`A o B`	composition
`domain(E)`	domain of a transduction
`range(E)`	range of a transduction
`identity(E)`	identity transduction²
`inverse(E)`	inverse transduction

The finite state calculus has proven to be a very useful tool for the development of higher-level finite state operators [10,14,11,4]. An interesting feature of most such operators is that they are implemented using a generate-and-test paradigm. [11], for example, introduces an algorithm for a leftmost-longest replacement operator. Somewhat simplified, we may view this algorithm as having two steps. First, the generator freely marks up possible replacement sites. Then the tester, which is an identity transducer, filters out those cases not conforming to the leftmost-longest strategy. Since the generator and tester are both implemented as transducers, they can be composed into a single transducer, which eliminates the inefficiency normally associated with generate-and-test algorithms.