contents index next

5. Formats of Finite State Automata

FSA is capable of reading and writing finite automata in a number of different formats. The defaul format for reading is determined by the global variable read. The default format for writing is determined by the global variable write.

The following formats are available both for reading and writing:

For writing, the following additional formats are available:

The normal format is the internal format used in FSA6. The module fsa_data provides a consistent interface to this format.

Here is a table indicating the relative speed of the standard input and output formats:

format      compact    fast     normal
writing          20       1          4
reading           5       1          4

Here is a table indicating the relative size of the standard input and output formats (measured in bytes):

compact    fast     normal
      1     1.8        1.6

5.1. The old format

In the old format a finite automaton is given as a Prolog program. The automaton is defined by clauses for the predicates:

Note that in this format states can be are arbitrary ground Prolog terms (these will be converted to integers). In the case of transducers, Sym is a pair Left/Right. The empty list [] is used to indicate the empty string. In the case of sequential transducers, Right must be a list of symbols.

5.2. The compact format

The compact format fairly closely follows the normal format. See the documentation on the normal format in the fsa_data module for more information. In this format a file is an ordinary text file. The format is intended to be used for machines only, and is not very helpful for human consumption.

Example:

fsa write=compact -r '[class(a..f),{g,h}]'
fsa6
r
fsa_preds
3
0
1
0       in([a,b,c,d,e,f])       2
2       g       1
        h       1

5.3. The fast format

The fast format uses the same Prolog term representation as the normal format, except that library(fastrw) is used to read and write the Prolog term. This implies that reading and writing of automata in this format is very fast; the disadvantage is that fast is a binary format and therefore cannot be (easily) treated by other programs.

5.4. Internal Format of Finite Automata

This module provides a consistent interface to the internal format of finite automata. A finite automaton is a term

fa(Symbols,States,Starts,Finals,Transs,Jumps)

Symbols is a term r(Sig) (for recognizers) or t(SigD,SigR) for transducers. Weighted automata have r(Sig,fsa_frozen) and t(SigD,SigR,fsa_frozen). Here Sig, SigD, SigR are the predicate modules.

States is an integer indicating the number of states in the automaton.

Starts is an ordered list of integers indicating the start states of the automaton.

Finals is an ordered list of integers indicating the final states of the automaton.

Transs is an ordered list of triples trans(A,B,C) where A and C are integers indicating source and target state, and B is the symbol part. The symbol part is different for the different types of automata:

recognizers: P where P is a predicate

weighted recognizers: P/W where P is a predicate and W is a number

letter transducer: P/Q where P and Q is a predicate or the empty list

sequence transducer: P/Q where P is a predicate or the empty list, and Q is a list of predicates

weighted letter transducer: P/(Q/W) where P and Q is a predicate or the empty list, and W is a number

weighted sequence transducer: P/(Q/W) where P is a predicate or the empty list, and Q is a list of predicates

In addition, transducers allow a term '$@'(P) anywhere where a predicate is allowed (P a predicate).

Jumps is an ordered list of pairs jump(A,B) where A and B are integers indicating source and target state. This implies there is an epsilon transition from A to B.

contents index next