The experiments were performed using the FSA Utilities. The
FSA Utilities tool-box
[20,22,7,23] is a collection of
tools to manipulate regular expressions, finite-state automata and
finite-state transducers. Manipulations include determinisation,
minimisation, composition, complementation, intersection, Kleene
closure, etc. Various visualisation tools are available to browse
finite-state automata. The tool-box is implemented in SICStus Prolog,
and is available free of charge under Gnu General Public License via
anonymous ftp at ftp://ftp.let.rug.nl/pub/vannoord/Fsa/, and via the
web at http://www.let.rug.nl/~vannoord/Fsa/. At the time of our
initial experiments with finite-state approximation, an old version of
the tool-box was used, which ran into memory problems for some of these
automata. For this reason, the subset construction algorithm has been
re-implemented, paying special attention to the treatment of
-moves. Three variants of the subset construction algorithm
are identified which differ in the way
-moves are treated:
The motivation for this paper is the experience that the first
approach turns out to be impractical for automata with very large
numbers of -moves. An integration of the subset
construction algorithm with the computation of
-reachable
states performs much better in practice for such automata.
Section 2 presents a short statement of the problem (how to
determinise a given finite-state automaton), and a subset construction
algorithm which solves this problem in the absence of
-moves. Section 3 defines a number of subset
construction algorithms which differ with respect to the treatment of
-moves. Most aspects of the algorithms are not new and have
been described elsewhere, and/or were incorporated in previous
implementations; a comparison of the different algorithms had not been
performed previously. We provide a comparison with respect to the size
of the resulting deterministic automaton (in section 3) and
practical efficiency (in section 4). Section 4
provides experimental results both for randomly generated automata and
for automata generated by approximation algorithms. Our
implementations of the various algorithms are also compared with
AT&T's FSM utilities [13], to establish that the
experimental differences we find between the algorithms are truly
caused by differences in the algorithm (as opposed to accidental
implementation details).