Finite-state Approximation and -moves

Next: Subset construction and -moves Up: Introduction Previous: Finite-state Language Processing

Finite-state Approximation and $\epsilon$ -moves

In experimenting with finite-state approximation techniques for context-free and more powerful grammatical formalisms (such as the techniques presented in [2], [17], [19], [18], [6], [14], [15], [10] ) we have found that the resulting automata often are extremely large. Moreover, the automata contain many $\epsilon$ -moves ( jumps). And finally, if such automata are determinised then the resulting automata are often smaller. It turns out that a straightforward implementation of the subset construction determinisation algorithm performs badly for such inputs. In this paper we consider a number of variants of the subset-construction algorithm which differ in their treatment of $\epsilon$ -moves.

Although we have observed that finite-state approximation techniques typically yield automata with large amounts of $\epsilon$ -moves, this is obviously not a necessity. Instead of trying to improve upon determinisation techniques for such automata it might be more fruitful, perhaps, to try to improve these approximation techniques in such a way that more compact automata are produced. ¹ However, because research into finite-state approximation is still of an exploratory and experimental nature, it can be argued that more robust determinisation algorithms do still have a role to play: it can be expected that approximation techniques are much easier to define and implement if the resulting automaton is allowed to be non-deterministic and to contain $\epsilon$ -moves.

Note furthermore that even if our primary motivation is in finite-state approximation, the problem of determinising finite-state automata with $\epsilon$ -moves may be relevant in other areas of language research as well.

Next: Subset construction and -moves Up: Introduction Previous: Finite-state Language Processing

2000-07-10