A major research question concerns the possibility of *
approximating* an underlying general and abstract grammar by
techniques of a much simpler sort. The idea that a competence
grammar might be approximated by finite-state means goes back to early
work by Chomsky. For instance, in [25] it is proposed
that, among other things, the theory of grammar should make available
a function *g* which returns for a given grammar *G*_{i} and a given
amount of memory *n*, the description of a finite automaton that takes
sentences as input and gives structural descriptions as output.
Chomsky then continues (page 121):

[this] would take us one step closer to a theory of the actual use of language. We can attempt to constructgin such a way thatg(i,n) will be a reasonable model for the production (or recognition) of sentences by the speaker (or hearer) who has internalised the grammarG_{i}and who has a memory capacity determined by the value ofn. Notice that although the grammarG_{i}mastered by the user of a language is of course finite, it is not to be expected (and, in the case of natural languages, it is not in fact true) that a finite automaton can be constructed which will be able to accept (or generate) all and only the sentences generated byG_{i}, or which will be able to ``understand'' just these sentences [...]. This is no stranger than the fact that someone who has learned the rules of multiplication perfectly (perhaps without being able to state them) may be unable to calculate 3,872 x 18,694 in his head, although the rules that he has mastered uniquely determine the answer.

There are essentially three observations which motivate the view that the processing of natural language is finite-state:

- humans have a finite (small, limited, fixed) amount of memory available for language processing
- humans have problems with certain grammatical constructions, such as center-embedding, which are impossible to describe by finite-state means
- humans process natural language very efficiently (in linear time)

The first argument is expressed in [24] on page 390 as follows:

[...] we must conclude that thecompetenceof the native speaker cannot be characterised by a finite automaton [...]. Nevertheless, theperformanceof the speaker or hearer must be representable by a finite automaton of some sort. The speaker/hearer has only a finite memory, a part of which he uses to store the rules of grammar (a set of rules for a device with unbounded memory), and a part of which he uses for computation in actually producing a sentence or ``perceiving'' its structure and understanding it.

These considerations are sufficient to show the importance of gaining a better understanding of the source and extent of the excess generative power of context-free grammars over finite automata (even though context-free grammars are demonstrably not fully adequate for the grammatical description of natural languages).

Pulman (1986) comments as follows (page 198):

Expressed in this way, the claim that the parsing abilities of speaker-hearers can be modelled in finite-state terms is fairly trivial: the performance of any cognitive ability by any mortal organism with finite powers is likewise guaranteed to be finite state in character.

We could add that not just mortal organisms face this restriction, but
any physical object, including modern computers. Pulman then
introduces * strict* finite state devices which are finite state
devices with a ``fairly small'' amount of fixed memory, and suggests
that Chomsky (and others) had this in mind, when stating
the finite-state requirement above.

The second argument, concerning center-embedding, is perhaps the best-known argument for the hypothesis that natural language processing is finite-state in nature. Grammatical constructions involving center-embedding are very hard to process by humans [73,26]. For example, the following sentences are grammatical, but hard to understand:

If the embedding occurs at the edge of a word group, the examples
are easy to understand, indicating that the observed difficulty in
understanding is not of a semantic nature:

Such observations find natural description in a finite-state approach, since finite-state devices are incapable of describing such center-embedded constructions (of arbitrary depth).

Of particular interest for a Dutch audience is the observation that the famous Dutch cross-serial dependency construction becomes extremely hard to understand if the number of argument selecting verbs which take part in the construction is larger than 3. This construction has been the basis of the most convincing argument to date that natural languages cannot be described by context-free means [47,102].

Similar sentences in which the recursion occurs at the edge of the
word-group are much easier to understand:

Again, such a limit might easily be explained in a finite-state
approach, because finite-state devices are uncapable of describing
such crossing dependencies.

Finally, let us consider the third argument, concerning the observed
efficiency of language processing for humans. For a given grammar or
grammatical formalism, computational efficiency is expressed by
stating the number of steps that is required to compute the analysis
of a sentence of length *n*. In the simple case, the number of steps
will increase proportionally with *n*. In such cases the complexity is
said to be linear. For more complex grammatical formalisms, the number
of steps will increase faster than *n*, e.g. it increases
proportionally with *n*^{3} for context-free grammars. This implies that
such more powerful grammatical formalisms are unlikely models of human
language, since humans display no noticeable delay in understanding
longer sentences.