Arguments for Finite-state Language Processing

Next: Previous Work in Finite-state Up: Finite-state Language Processing Previous: Finite-state Language Processing Contents

Arguments for Finite-state Language Processing

A major research question concerns the possibility of approximating an underlying general and abstract grammar by techniques of a much simpler sort. The idea that a competence grammar might be approximated by finite-state means goes back to early work by Chomsky. For instance, in [25] it is proposed that, among other things, the theory of grammar should make available a function g which returns for a given grammar G_i and a given amount of memory n, the description of a finite automaton that takes sentences as input and gives structural descriptions as output. Chomsky then continues (page 121):

[this] would take us one step closer to a theory of the actual use of language. We can attempt to construct g in such a way that g(i,n) will be a reasonable model for the production (or recognition) of sentences by the speaker (or hearer) who has internalised the grammar G_i and who has a memory capacity determined by the value of n. Notice that although the grammar G_i mastered by the user of a language is of course finite, it is not to be expected (and, in the case of natural languages, it is not in fact true) that a finite automaton can be constructed which will be able to accept (or generate) all and only the sentences generated by G_i, or which will be able to ``understand'' just these sentences [...]. This is no stranger than the fact that someone who has learned the rules of multiplication perfectly (perhaps without being able to state them) may be unable to calculate 3,872 x 18,694 in his head, although the rules that he has mastered uniquely determine the answer.

There are essentially three observations which motivate the view that the processing of natural language is finite-state:

humans have a finite (small, limited, fixed) amount of memory available for language processing
humans have problems with certain grammatical constructions, such as center-embedding, which are impossible to describe by finite-state means
humans process natural language very efficiently (in linear time)

The first argument is expressed in [24] on page 390 as follows:

[...] we must conclude that the competence of the native speaker cannot be characterised by a finite automaton [...]. Nevertheless, the performance of the speaker or hearer must be representable by a finite automaton of some sort. The speaker/hearer has only a finite memory, a part of which he uses to store the rules of grammar (a set of rules for a device with unbounded memory), and a part of which he uses for computation in actually producing a sentence or ``perceiving'' its structure and understanding it.

These considerations are sufficient to show the importance of gaining a better understanding of the source and extent of the excess generative power of context-free grammars over finite automata (even though context-free grammars are demonstrably not fully adequate for the grammatical description of natural languages).

Pulman (1986) comments as follows (page 198):

Expressed in this way, the claim that the parsing abilities of speaker-hearers can be modelled in finite-state terms is fairly trivial: the performance of any cognitive ability by any mortal organism with finite powers is likewise guaranteed to be finite state in character.

We could add that not just mortal organisms face this restriction, but any physical object, including modern computers. Pulman then introduces strict finite state devices which are finite state devices with a ``fairly small'' amount of fixed memory, and suggests that Chomsky (and others) had this in mind, when stating the finite-state requirement above.

The second argument, concerning center-embedding, is perhaps the best-known argument for the hypothesis that natural language processing is finite-state in nature. Grammatical constructions involving center-embedding are very hard to process by humans [73,26]. For example, the following sentences are grammatical, but hard to understand:

$\ex. \a. \%I called the man who put the book that you told me about down up \b. ... ...s a friend of mine \d. \%The rat the cat the dog chased bit ate the cheese \par$

$\exg. \%De de de oude schuur bewonende boer toebehorende kat haat ratten\\ The ... ...rats\\ {\em The cat owned by the farmer living in the old shed hates rats} \par$
If the embedding occurs at the edge of a word group, the examples are easy to understand, indicating that the observed difficulty in understanding is not of a semantic nature:

$\ex. \a. I called the man up who put the book down that you told me about \b. {}... ... is [the dog that chased [the cat that bit [the rat that ate the cheese]]] \par$

$\exg. De kat die toebehoort aan de boer die de oude schuur bewoont haat ratten\\... ...rats\\ {\em The cat owned by the farmer living in the old shed hates rats} \par$

Such observations find natural description in a finite-state approach, since finite-state devices are incapable of describing such center-embedded constructions (of arbitrary depth).

Of particular interest for a Dutch audience is the observation that the famous Dutch cross-serial dependency construction becomes extremely hard to understand if the number of argument selecting verbs which take part in the construction is larger than 3. This construction has been the basis of the most convincing argument to date that natural languages cannot be described by context-free means [47,102].

$\ex. \ag. dat de inspectie de bewaker de gevangenen zag helpen ontsnappen\\ th... ... management let the inspection see the guard help the prisoners to escape} \par$
Similar sentences in which the recursion occurs at the edge of the word-group are much easier to understand:

$\ex. \ag. dat de inspectie zag dat de bewaker de gevangenen hielp ontsnappen\\ ... ... management let the inspection see the guard help the prisoners to escape} \par$
Again, such a limit might easily be explained in a finite-state approach, because finite-state devices are uncapable of describing such crossing dependencies.

Finally, let us consider the third argument, concerning the observed efficiency of language processing for humans. For a given grammar or grammatical formalism, computational efficiency is expressed by stating the number of steps that is required to compute the analysis of a sentence of length n. In the simple case, the number of steps will increase proportionally with n. In such cases the complexity is said to be linear. For more complex grammatical formalisms, the number of steps will increase faster than n, e.g. it increases proportionally with n³ for context-free grammars. This implies that such more powerful grammatical formalisms are unlikely models of human language, since humans display no noticeable delay in understanding longer sentences.

Next: Previous Work in Finite-state Up: Finite-state Language Processing Previous: Finite-state Language Processing Contents

2000-07-10