contents index next

10. JAVA Code Generation

FSA supports the production of JAVA code on the basis of a finite automaton. Before the automaton is translated into C code, it is determinized. In the case of transducers, Mohri's determinization algorithm is applied. Note however that certain transductions cannot be determinized: in that case the algorithm will not terminate.

The JAVA program will define a class (named in accordance with the given output file name) which inherits from Applet. The class defines the method:

static void main(String argv[])

The instance itself is an applet in which you can write strings which are checked against the automaton. The JAVA program starts a graphical user interface in which you can input strings (if the option -w is the single option), or reads lines from standard input and writes the result of applying the automaton to standard output.

public void gui();

starts a graphical user interface in which you can input strings.

public DFA automaton();

returns the automaton part of the applet. This DFA class in turn defines the following methods:

public boolean Recognizer();
public boolean Transducer();
public boolean WeightedRecognizer();
public boolean WeightedTransducer();

As well as:

public void filter ()

reads lines from standard input and displays the result of running each line through the automaton to standard output.

public boolean accepts ( String in )
public String transduces ( String in )
public Integer weighs ( String in )
public StringWeightPair string_weight( String in )

The `main' method is provided only if the global variable java_with_main is set to on.

In order to be able to run the JAVA code generated by FSA you need a java compiler, as well as the class files which are distributed with FSA. You have to ensure that the JAVA compiler knows where to find these files. For instance, if you have installed FSA in /usr/local/lib then the class files are in /usr/local/lib/fsa/Java. For instance, after running the FSA command:

% fsa write=java -r '[a,b,c,? *]' z.java

you compile the JAVA file with e.g.:

% javac -classpath /usr/local/lib/fsa/Java:\
                    /usr/lib/java/lib/classes.zip:. z.java

Instead of the -classpath option to javac it is preferable to include the relevant class directories in the CLASSPATH environment variable. You can now run the program using one of:

% java z -w
% java z

The representation of a finite-automaton in JAVA is similar to the technique explained on page 43 (table 4.2) of Jan Daciuk's dissertation `Incremental Construction of Finite-State Automata and Transducers and their use in the Natural Language Processing'. Politechnika Gdanska, 1998, except that instead of the number of transitions we have a boolean flag indicating for each line whether that line is the last transition for the current state.

The special input symbol ^A is used in the representation of the automaton in C to indicate a symbol not otherwise mentioned in the automaton: it will match any such symbol. Similarly, in transducers the symbol ^B is used to indicate an unknown symbol with an associated identity. The corresponding output symbol is also ^B. When a string is transduced this ^B is replaced by the actual input symbol (by means of a queue). An unknown output symbol without an associated identity is represented using the symbol given by the global flag fl_arbitrary_symbol. In the case of final states with multiple outputs a special meta-notation is used using a special symbol given by the global variable fl_multiple_symbol_start which starts a sequence of possible outputs where each output is seperated using a symbol given by the global variable fl_multiple_symbol_sep.

contents index next