This
page describes the FSA Utilities toolbox: a collection of
utilities to manipulate regular expressions, finite-state automata
and finite-state transducers. Manipulations include automata
construction from regular expresssions, determinization (both for
finite-state acceptors and finite-state transducers),
minimization, composition, complementation, intersection, Kleene
closure, etc. Various visualization tools are available to browse
finite-state automata. Interpreters are provided to apply finite
automata. Finite automata can also be compiled into stand-alone C
programs. FSA6 extends FSA5 by allowing **predicates** on arcs
instead of atomic symbols. If you want to compile FSA yourself,
then you need SICStus Prolog, SWI-Prolog or YAP. In addition, there
are binaries available for various platforms (created with SICStus
Prolog; you don't need SICStus Prolog in order to use these
binaries).
The toolbox comes with an optional graphical user interface
(SICStus Prolog only) and an optional command interpreter. The
toolbox can also be applied as a UNIX filter, or as a Prolog
library.

**DOWNLOAD**- The sources. Download this package if you want to compile FSA yourself (required: either SICStus, YAP or SWI Prolog). In order to use the graphical user interface (SICStus only) you also need Tcl/Tk. Note: versions fsa6-2xx require SICStus 3.X; versions fsa6-3xx require SICStus 4.X.
- Check out the directory with
binary distributions (created with SICStus
Prolog; you don't need SICStus Prolog in order to use these
binaries. You do need Tcl/Tk though).
**NEW**The binary distribution for Windows has not been updated for a long time, but today it is up-to-date again! September 2006. - Highly recommended. Contains implementations of examples and algorithms from a variety of papers including a number of papers by Karttunen, Roche and Schabes, Mohri and Sproat, Gerdemann and van Noord, Kaplan and Kay, a twolevel rule compiler by Malouf, minimization implementation of Grimley-Evans, etc, etc.
- the incomple revision history is in CHANGES.

- The package is implemented in Prolog, and requires one of
the following Prolog compilers (latest version):
- SICStus Prolog; the graphical user interface requires Tcl/Tk.
- YAP
- SWI

- FSA is available under the conditions of the GNU General public licence, see also the Copyright notice.
- Installation instructions (these are part of the distribution as well).
- The FSA6 Reference Manual, generated automatically from the documentation in the source files. This is part of the distribution as well.

- Construction of finite automata on the basis of regular
expressions. User-defined regular expression operators are
supported. Built-in regular expression operators include:
- Concatenation, Kleene closure, Union and Option.
- Complement, Difference and Intersection.
- Composition,Cross-Product. Same-Length-Cross-Product. Domain. Range. Identity. Inversion.
- Intervals. Term-complement. Containment. Reversal.
- Perfect hashes.
- Minimization, Determinization (for various types of automata)
- `Any'-variable which matches any symbol.

- Arbitrary predicates can be used instead of atomic symbols. This extension is described in an article with Dale Gerdemann. [postscript, pdf ]
- FSA supports four types of automata:
- recognizers
- weighted recognizers
- transducers
- weighted transducers

- Macro's and other user-defined regular expression operators. As an example consider the implementations of Mohri's and Sproat's compiler of rewrite rules (ACL 1996), or the implementation of Karttunen's replace operator (ACL 1995). This functionality is described in a paper for WIA 99: Gertjan van Noord, Dale Gerdemann, An Extendible Regular Expression Compiler for Finite-state Approaches in Natural Language Processing. Accepted for WIA 99, Potsdam Germany. [postscript, html]; and exploited in Dale Gerdemann, Gertjan van Noord, Transducers from Rewrite Rules with Backreferences. EACL 99, Bergen Norway. [html, postscript, cmp-lg]
- Prolog Code Generation of a FSA or FST into a Prolog program (which can be used to check whether a given string is in the language defined by the automaton, or to generate the transduction of a given string w.r.t. a given transducer).
- C Code Generation of a deterministic FSA or FST. The resulting program will check each line for acceptance by the automaton (for acceptors) or produce the transduction of each line. This is much faster (100 times) than the compiled Prolog version. Currently it works only for deterministic machines.
- Determinization. A number of variants of the subset construction algorithm are provided, as well as an algorithm to remove epsilon transitions. These implementations are described and compared in a paper presented at FSMNLP, June 1998, Ankara, which appeared in Computational Linguistics. [postscript, pdf, html ]
- Minimization. Minimization of a FSA. Two different minimization algorithms are supported.
- Determinization and minimization of transducers. There is support for the construction of a sub-sequential transducer, given a finite-state transducer. Moreover, the system supports the minimization of a given sub-sequential transducer.
- Acceptance, Transduction and Production. Check a given string for acceptance by FA. Transduce a given string with a transducer. Produce all strings / pairs generated by FA.
- Visualisation. Much attention has been paid to be able to visualize finite state recognizers and finite state transducers. Support includes built-in visualization (Tk, LaTeX and Postscript) and interfaces to third party graph visualization software (dotty, VCG and daVinci).
- Conversion into/from the representation used in AT&T's FSM toolset (by Mohri, Pereira, Riley). FSM is available from http://www2.research.att.com/~fsmtools/fsm/