What to do?

Next: Bibliography Up: The intersection of a Previous: Undecidability

Subsections

What to do?

The following approaches towards the undecidability problem can be taken:

limit the power of the FSA
limit the power of the DCG
compromise completeness
compromise soundness

These approaches are discussed now in turn.

Limit the FSA

Rather than assuming the input for parsing is a FSA in its full generality, we might assume that the input is an ordinary word graph (a FSA without cycles).

Thus the techniques for robust processing that give rise to such cycles cannot be used. One example is the processing of an unknown sequence of words, e.g. in case there is noise in the input and it is not clear how many words have been uttered during this noise.

It is not clear to me right now what we loose (in practical terms) if we give up such cycles.

Note that it is easy to verify that parsing is decidable on the basis of a FSA without cycles and a DCG that is off-line parsable.

Limit the DCG

Another approach is to limit the size of the categories that are being employed. This is the GPSG and F-TAG approach. In that case we are not longer dealing with DCGs but rather with CFGs (which have been shown to be insufficient in general for the description of natural languages).

Compromise completeness

Completeness in this context means: the parse forest grammar contains all possible parses. It is possible to compromise here, in such a way that the parser is guaranteed to terminate.

For example, if we assume that each edge in the FSA is associated with a probability it is possible to define a threshold such that each partial result that is derived has a probability higher than the threshold. Thus, it is still possible to have cycles in the FSA, but anytime the cycle is `used' the probability decreases and if too many cycles are encountered the threshold will cut of that derivation.

For any threshold it is the case that the intersection problem of off-line parsable DCGs and FSA is decidable.

Compromise soundness

Soundness in this context should be understood as the property that all parse trees in the parse forest grammar are valid parse trees. A possible way to ensure termination is to remove all constraints from the DCG and parse according to this context-free skeleton. The resulting parse-forest grammar will be too general most of the times.

A practical variation can be conceived as follows. From the DCG we take its context-free skeleton. This skeleton is obtained by removing the constraints from each of the grammar rules. Then we compute the intersection of the skeleton with the input FSA. This results in a parse forest grammar. Finally, we add the corresponding constraints from the DCG to the grammar rules of the parse forest grammar.

This has the advantage that the result is still sound, although the size of the parse forest grammar is not optimal (as a consequence it is not guaranteed that the parse forest grammar contains a parse tree).

Next: Bibliography Up: The intersection of a Previous: Undecidability

Noord G.J.M. van
1998-09-29