FSA6.2xx: Finite State Automata Utilities
This
page describes the FSA Utilities toolbox: a collection of
utilities to manipulate regular expressions, finite-state automata
and finite-state transducers. Manipulations include automata
construction from regular expresssions, determinization (both for
finite-state acceptors and finite-state transducers),
minimization, composition, complementation, intersection, Kleene
closure, etc. Various visualization tools are available to browse
finite-state automata. Interpreters are provided to apply finite
automata. Finite automata can also be compiled into stand-alone C
programs. FSA6 extends FSA5 by allowing predicates on arcs
instead of atomic symbols. If you want to compile FSA yourself,
then you need SICStus Prolog, SWI-Prolog or YAP. In addition, there
are binaries available for various platforms (created with SICStus
Prolog; you don't need SICStus Prolog in order to use these
binaries).
The toolbox comes with an optional graphical user interface
(SICStus Prolog only) and an optional command interpreter. The
toolbox can also be applied as a UNIX filter, or as a Prolog
library.
Links
- DOWNLOAD
- The sources. Download this
package if you want to compile FSA yourself (required: either
SICStus, YAP or SWI Prolog). In order to use the graphical
user interface (SICStus only) you also need Tcl/Tk. Note: versions
fsa6-2xx require SICStus 3.X; versions fsa6-3xx require SICStus 4.X.
- Check out the directory with
binary distributions (created with SICStus
Prolog; you don't need SICStus Prolog in order to use these
binaries. You do need Tcl/Tk though).
NEW The binary distribution for Windows has not
been updated for a long time, but today it is up-to-date
again! September 2006.
- Highly recommended.
Contains implementations of examples and algorithms from a
variety of papers including a number of papers by
Karttunen, Roche and Schabes, Mohri and Sproat, Gerdemann
and van Noord, Kaplan and Kay, a twolevel rule compiler by
Malouf, minimization implementation of Grimley-Evans,
etc, etc.
- the incomple revision history is in CHANGES.
- The package is implemented in Prolog, and requires one of
the following Prolog compilers (latest version):
- SICStus
Prolog; the graphical user interface requires
Tcl/Tk.
- YAP
- SWI
- FSA is available under the conditions of the GNU General public licence, see also the Copyright notice.
- Installation instructions (these
are part of the distribution as well).
- The FSA6 Reference Manual, generated
automatically from the documentation in the source files. This
is part of the distribution as well.
- There's a demo of FSA available on
line; every so often it is broken...
- Bouma's tutorial for FSA (in Dutch)
- FSA tutorial for
high-school students (in Dutch)
- course material for FSA Utilities (weeks
2, 3 and 4) (in Dutch)
- more course material for FSA Utilities (pages 7-17)
Highlights
- Construction of finite automata on the basis of regular
expressions. User-defined regular expression operators are
supported. Built-in regular expression operators include:
- Concatenation, Kleene closure, Union and Option.
- Complement, Difference and Intersection.
- Composition,Cross-Product. Same-Length-Cross-Product.
Domain. Range. Identity. Inversion.
- Intervals. Term-complement. Containment. Reversal.
- Perfect hashes.
- Minimization, Determinization (for various types of automata)
- `Any'-variable which matches any symbol.
- Arbitrary predicates can be used instead of atomic symbols. This
extension is described in an article with Dale
Gerdemann.
[postscript,
pdf
]
- FSA supports four types of automata:
- recognizers
- weighted recognizers
- transducers
- weighted transducers
- Macro's and other user-defined regular expression
operators. As an example consider the implementations of
Mohri's and Sproat's compiler of rewrite rules (ACL 1996),
or the implementation of Karttunen's replace operator (ACL
1995). This functionality is described in a paper for WIA
99: Gertjan van Noord, Dale Gerdemann, An Extendible Regular
Expression Compiler for Finite-state Approaches in Natural
Language Processing. Accepted for WIA 99, Potsdam Germany.
[postscript, html]; and exploited in
Dale Gerdemann, Gertjan van Noord, Transducers from Rewrite
Rules with Backreferences. EACL 99, Bergen Norway. [html, postscript, cmp-lg]
- Prolog Code Generation of a FSA or FST into a Prolog
program (which can be used to check whether a given string is in
the language defined by the automaton, or to generate the
transduction of a given string w.r.t. a given transducer).
- C Code Generation of a deterministic FSA or FST. The
resulting program will check each line for acceptance by the
automaton (for acceptors) or produce the transduction of each
line. This is much faster (100 times) than the compiled Prolog
version. Currently it works only for deterministic machines.
- Determinization. A number of variants of the subset
construction algorithm are provided, as well as an algorithm to
remove epsilon transitions. These implementations are described
and compared in a paper
presented at FSMNLP, June 1998, Ankara, which appeared in
Computational Linguistics.
[postscript,
pdf,
html
]
- Minimization. Minimization of a FSA. Two different
minimization algorithms are supported.
- Determinization and minimization of transducers. There is
support for the construction of a sub-sequential transducer,
given a finite-state transducer. Moreover, the system supports
the minimization of a given sub-sequential transducer.
- Acceptance, Transduction and Production. Check a given
string for acceptance by FA. Transduce a given string with a
transducer. Produce all strings / pairs generated by FA.
- Visualisation. Much attention has been paid to be able to
visualize finite state recognizers and finite state
transducers. Support includes built-in visualization (Tk, LaTeX
and Postscript) and interfaces to third party graph
visualization software (dotty, VCG and daVinci).
- Conversion into/from the representation used in AT&T's
FSM toolset (by Mohri, Pereira, Riley). FSM is available from http://www.research.att.com/sw/tools/fsm