In experimenting with finite-state approximation techniques for context-free and more powerful grammatical formalisms (such as the techniques presented in [2], [17], [19], [18], [6], [14], [15], [10] ) we have found that the resulting automata often are extremely large. Moreover, the automata contain many -moves ( jumps). And finally, if such automata are determinised then the resulting automata are often smaller. It turns out that a straightforward implementation of the subset construction determinisation algorithm performs badly for such inputs. In this paper we consider a number of variants of the subset-construction algorithm which differ in their treatment of -moves.
Although we have observed that finite-state approximation techniques typically yield automata with large amounts of -moves, this is obviously not a necessity. Instead of trying to improve upon determinisation techniques for such automata it might be more fruitful, perhaps, to try to improve these approximation techniques in such a way that more compact automata are produced. 1 However, because research into finite-state approximation is still of an exploratory and experimental nature, it can be argued that more robust determinisation algorithms do still have a role to play: it can be expected that approximation techniques are much easier to define and implement if the resulting automaton is allowed to be non-deterministic and to contain -moves.
Note furthermore that even if our primary motivation is in finite-state approximation, the problem of determinising finite-state automata with -moves may be relevant in other areas of language research as well.