Subset construction and -moves

Next: Subset Construction Up: Introduction Previous: Finite-state Approximation and -moves

Subset construction and $\epsilon$ -moves

The experiments were performed using the FSA Utilities. The FSA Utilities tool-box [20,22,7,23] is a collection of tools to manipulate regular expressions, finite-state automata and finite-state transducers. Manipulations include determinisation, minimisation, composition, complementation, intersection, Kleene closure, etc. Various visualisation tools are available to browse finite-state automata. The tool-box is implemented in SICStus Prolog, and is available free of charge under Gnu General Public License via anonymous ftp at ftp://ftp.let.rug.nl/pub/vannoord/Fsa/, and via the web at http://www.let.rug.nl/~vannoord/Fsa/. At the time of our initial experiments with finite-state approximation, an old version of the tool-box was used, which ran into memory problems for some of these automata. For this reason, the subset construction algorithm has been re-implemented, paying special attention to the treatment of $\epsilon$ -moves. Three variants of the subset construction algorithm are identified which differ in the way $\epsilon$ -moves are treated:

per graph: The most obvious and straightforward approach is sequential in the following sense. Firstly, an equivalent automaton without $\epsilon$ -moves is constructed for the input. In order to do this, the transitive closure of the graph consisting of all $\epsilon$ -moves is computed. Secondly, the resulting automaton is then treated by a subset construction algorithm for $\epsilon$ -free automata. Different variants of per graph can be identified, depending on the implementation of the $\epsilon$ -removal step.
per state: For each state which occurs in a subset produced during subset construction, compute the states which are reachable using $\epsilon$ -moves. The results of this computation can be memorised, or computed for each state in a preprocessing step. This is the approach mentioned briefly in [9].²
per subset: For each subset Q of states which arises during subset construction, compute $Q' \supseteq Q$ which extends Q with all states which are reachable from any member of Q using $\epsilon$ -moves. Such an algorithm is described in [1].

The motivation for this paper is the experience that the first approach turns out to be impractical for automata with very large numbers of $\epsilon$ -moves. An integration of the subset construction algorithm with the computation of $\epsilon$ -reachable states performs much better in practice for such automata.

Section 2 presents a short statement of the problem (how to determinise a given finite-state automaton), and a subset construction algorithm which solves this problem in the absence of $\epsilon$ -moves. Section 3 defines a number of subset construction algorithms which differ with respect to the treatment of $\epsilon$ -moves. Most aspects of the algorithms are not new and have been described elsewhere, and/or were incorporated in previous implementations; a comparison of the different algorithms had not been performed previously. We provide a comparison with respect to the size of the resulting deterministic automaton (in section 3) and practical efficiency (in section 4). Section 4 provides experimental results both for randomly generated automata and for automata generated by approximation algorithms. Our implementations of the various algorithms are also compared with AT&T's FSM utilities [13], to establish that the experimental differences we find between the algorithms are truly caused by differences in the algorithm (as opposed to accidental implementation details).

Next: Subset Construction Up: Introduction Previous: Finite-state Approximation and -moves

2000-07-10