next up previous
Next: Subset construction and -moves Up: Introduction Previous: Finite-state Language Processing

Finite-state Approximation and $\epsilon $-moves

In experimenting with finite-state approximation techniques for context-free and more powerful grammatical formalisms (such as the techniques presented in [2], [17], [19], [18], [6], [14], [15], [10] ) we have found that the resulting automata often are extremely large. Moreover, the automata contain many $\epsilon $-moves ( jumps). And finally, if such automata are determinised then the resulting automata are often smaller. It turns out that a straightforward implementation of the subset construction determinisation algorithm performs badly for such inputs. In this paper we consider a number of variants of the subset-construction algorithm which differ in their treatment of $\epsilon $-moves.

Although we have observed that finite-state approximation techniques typically yield automata with large amounts of $\epsilon $-moves, this is obviously not a necessity. Instead of trying to improve upon determinisation techniques for such automata it might be more fruitful, perhaps, to try to improve these approximation techniques in such a way that more compact automata are produced. 1 However, because research into finite-state approximation is still of an exploratory and experimental nature, it can be argued that more robust determinisation algorithms do still have a role to play: it can be expected that approximation techniques are much easier to define and implement if the resulting automaton is allowed to be non-deterministic and to contain $\epsilon $-moves.

Note furthermore that even if our primary motivation is in finite-state approximation, the problem of determinising finite-state automata with $\epsilon $-moves may be relevant in other areas of language research as well.


next up previous
Next: Subset construction and -moves Up: Introduction Previous: Finite-state Language Processing

2000-07-10