In experimenting with finite-state approximation techniques for
context-free and more powerful grammatical formalisms (such as the
techniques presented in
[2],
[17],
[19],
[18],
[6],
[14],
[15],
[10]
) we have found that the resulting automata often
are extremely large. Moreover, the automata contain many
-moves ( jumps). And finally, if such automata are
determinised then the resulting automata are often smaller. It
turns out that a straightforward implementation of the subset
construction determinisation algorithm performs badly for such
inputs. In this paper we consider a number of variants of the
subset-construction algorithm which differ in their treatment of
-moves.
Although we have observed that finite-state approximation techniques
typically yield automata with large amounts of -moves, this
is obviously not a necessity. Instead of trying to improve upon
determinisation techniques for such automata it might be more
fruitful, perhaps, to try to improve these approximation techniques in
such a way that more compact automata are produced. 1 However, because
research into finite-state approximation is still of an exploratory
and experimental nature, it can be argued that more robust
determinisation algorithms do still have a role to play: it can be
expected that approximation techniques are much easier to define and
implement if the resulting automaton is allowed to be
non-deterministic and to contain
-moves.
Note furthermore that even if our primary motivation is in
finite-state approximation, the problem of determinising finite-state
automata with -moves may be relevant in other areas of
language research as well.