Finitestate techniques have been used extensively in language processing. For instance, it has been shown that phonological models consisting of sets of contextsensitive rewriterules are, given some reasonable assumptions, finitestate [50,59,54]. Recently, a similar result has been obtained for more modern phonological models expressed in Optimality Theory [57].
Syntactic research has almost always followed Chomsky's arguments against finitestate grammars as a competence model, but there are some exceptions [62,43].
In linguistic applications the use of finitestate techniques is very widespread; the following overview only lists a small number of recent publications:
If we restrict our attention to finitestate approaches to natural language syntax in which a distinction is maintained between a competence grammar (expressed in a formalism with at least contextfree power) and language performance, then two different approaches can be identified as methods to realize the link between competence and performance.
Firstly, some have proposed parsing algorithms which interpret such a grammar. The parser, however, is associated with a fixed maximal amount of memory. Publications such as [52]; [88]; [93] are examples of this interpreter approach. [93] shows how a leftcorner parser [97], augmented with an operation of ``eager'' composition, and formalised as a pushdown automaton, leads to a parser which utilises its stack only for centerembedding constructions. Leftrecursive and rightrecursive structures, on the other hand, can be processed without exploiting the stack. In contrast, a purely bottomup or topdown parser formalised in a similar fashion requires to use its stack for rightrecursive (respectively leftrecursive) structures.
In contrast, in finitestate approximation a finitestate grammar is explicitly constructed, often on the basis of a specific parsing algorithm restricted with some mechanism to limit its recognising power to finitestate languages. This approach, exemplified in [84]; [96]; [85]; [77] and [51] could then be called a compilation approach.
Chomsky [25], footnote 10, discussing finitestate devices, quite rightly warns us that
The assumption that sentences are produced or recognised by a device of this sort tells us almost nothing about the method of processing.
In this context, it is worth mentioning the result by Stearns [106] that it is possible to determine whether a deterministic pushdown machine recognises a regular language. Furthermore Stearns shows that if the pushdown machine recognises a regular language, then the number of states of a finitestate automaton recognising the same language may be bounded by an expression of the order t^{qqq}, where q is the number of states, and t the size of the alphabet of the pushdown automaton. Stearns concludes that^{2.2}
If this bound cannot be improved significantly, then it would appear profitable in some cases to maintain a pushdown design for a recogniser even if a finitestate design is possible.
In light of these observations, it may be difficult to choose between the interpreter or compiler approach, or still other approaches that may be designed (for instance finitestate automata augmented with a finite number of registers [62,5]). Minimised finitestate automata are an appropriate normal form for regular languages; but they are only `minimal' in the sense that there isn't a smaller automaton in that representation. For example, suppose you have a very complicated language X. Furthermore, you want to have an automaton for the language . The minimal automaton for that language will essentially need to have a copy of the automaton for X in it, as in the first automaton of figure 2.2. However, if you had a single global memory cell containing the value of the first letter you read, then you would only need one copy of X, as in the second automaton of figure 2.2. So even if the two representations are formally equivalent, their sizes may vary dramatically.

As a starting point, the minimal deterministic finitestate automata constructed in the compiler approach may provide a normalform; a representation that we can use to compare different approaches.