next up previous
Next: Subset Construction Up: Introduction Previous: Finite-state Approximation and -moves

Subset construction and $\epsilon $-moves

The experiments were performed using the FSA Utilities. The FSA Utilities tool-box [20,22,7,23] is a collection of tools to manipulate regular expressions, finite-state automata and finite-state transducers. Manipulations include determinisation, minimisation, composition, complementation, intersection, Kleene closure, etc. Various visualisation tools are available to browse finite-state automata. The tool-box is implemented in SICStus Prolog, and is available free of charge under Gnu General Public License via anonymous ftp at, and via the web at At the time of our initial experiments with finite-state approximation, an old version of the tool-box was used, which ran into memory problems for some of these automata. For this reason, the subset construction algorithm has been re-implemented, paying special attention to the treatment of $\epsilon $-moves. Three variants of the subset construction algorithm are identified which differ in the way $\epsilon $-moves are treated:

per graph
The most obvious and straightforward approach is sequential in the following sense. Firstly, an equivalent automaton without $\epsilon $-moves is constructed for the input. In order to do this, the transitive closure of the graph consisting of all $\epsilon $-moves is computed. Secondly, the resulting automaton is then treated by a subset construction algorithm for $\epsilon $-free automata. Different variants of per graph can be identified, depending on the implementation of the $\epsilon $-removal step.
per state
For each state which occurs in a subset produced during subset construction, compute the states which are reachable using $\epsilon $-moves. The results of this computation can be memorised, or computed for each state in a preprocessing step. This is the approach mentioned briefly in [9].2
per subset
For each subset Q of states which arises during subset construction, compute $Q' \supseteq Q$ which extends Q with all states which are reachable from any member of Q using $\epsilon $-moves. Such an algorithm is described in [1].

The motivation for this paper is the experience that the first approach turns out to be impractical for automata with very large numbers of $\epsilon $-moves. An integration of the subset construction algorithm with the computation of $\epsilon $-reachable states performs much better in practice for such automata.

Section 2 presents a short statement of the problem (how to determinise a given finite-state automaton), and a subset construction algorithm which solves this problem in the absence of $\epsilon $-moves. Section 3 defines a number of subset construction algorithms which differ with respect to the treatment of $\epsilon $-moves. Most aspects of the algorithms are not new and have been described elsewhere, and/or were incorporated in previous implementations; a comparison of the different algorithms had not been performed previously. We provide a comparison with respect to the size of the resulting deterministic automaton (in section 3) and practical efficiency (in section 4). Section 4 provides experimental results both for randomly generated automata and for automata generated by approximation algorithms. Our implementations of the various algorithms are also compared with AT&T's FSM utilities [13], to establish that the experimental differences we find between the algorithms are truly caused by differences in the algorithm (as opposed to accidental implementation details).

next up previous
Next: Subset Construction Up: Introduction Previous: Finite-state Approximation and -moves