contents index next

8. C Code Generation

FSA supports the production of C code on the basis of a finite automaton. Before the automaton is translated into C code, it is determinized. In the case of transducers, Mohri's determinization algorithm is applied. Note however that certain transductions cannot be determinized: in that case the algorithm will not terminate.

The C program will contain definitions of the following functions:

int accepts(char *in)
int w_accepts(char *in,int *out)
int t_accepts(char *in,char *out)
int wt_accepts(char *in,char *out, int *wout)

In each case, the functions return 1 if the string in is accepted. Otherwise they return 0. For transducers the resulting string or weight is available in out and wout.

If the global variable c_with_main is set to on, then the resulting C program will also contain a main function. This function is defined in such a way that it reads lines from standard input and applies the corresponding accept functions for each line. For recognizers, either yes or no is printed to standard error. For transducers, the transduction is written to standard output; if the input string is not in the domain of the transducer then no is written to standard error.

The representation of the finite-automaton in C is similar to the technique explained on page 43 (table 4.2) of Jan Daciuk's dissertation `Incremental Construction of Finite-State Automata and Transducers and their use in the Natural Language Processing'. Politechnika Gdanska, 1998.

The special input symbol ^A is used in the representation of the automaton in C to indicate a symbol not otherwise mentioned in the automaton: it will match any such symbol. Similarly, in transducers the symbol ^B is used to indicate an unknown symbol with an associated identity. The corresponding output symbol is also ^B. When a string is transduced this ^B is replaced by the actual input symbol (by means of a queue). An unknown output symbol without an associated identity is represented using the symbol given by the global flag fl_arbitrary_symbol. In the case of final states with multiple outputs a special meta-notation is used using a special symbol given by the global variable fl_multiple_symbol_start which starts a sequence of possible outputs where each output is seperated using a symbol given by the global variable fl_multiple_symbol_sep.

contents index next