Next: Head-corner Parsing and Robustness Up: Robust Parsing with the Previous: Introduction

Head-corner Parsing for Natural Language Understanding

Representation

I assume that grammars are defined in the Definite Clause Grammar formalism [5]. Without any loss of generality I assume that no external Prolog calls (the ones that are defined within { and }) are used. Moreover I will assume that such a grammar is represented in a somewhat different manner to make the definition of the parser easier, and to make sure that rules are indexed appropriately. This representation will in practice be compiled from a representation in user-friendly notation.

More specifically, I will assume that grammar rules are represented by the predicate headed_rule/4 in which the first argument is the head of the rule, the second argument is the mother node of the rule, the third argument is the reversed list of daughters left of the head, and the fourth argument is a list of the daughters right of the head.

As an example, the DCG rule

x(A,E) --> a(A), b(B,A), x(C,B), d(C,D), e(D,E).

(1)

of which the third daughter constitutes the head, is represented now as:

headed_rule( x(C,B), x(A,E), [b(B,A), a(A)], [d(C,D), e(D,E)]).

(2)

I will be assuming furthermore that lexical lookup has been performed already by another module. This module has asserted clauses for the predicate lexical_analysis/3 where the first two arguments are the string positions and the third argument is the (lexical) category. For an input sentence `Time flies like an arrow' this module may produce the following set of clauses:

lexical_analysis(0,1,verb).  lexical_analysis(0,1,noun).
lexical_analysis(0,2,noun).  lexical_analysis(1,2,noun).
lexical_analysis(1,2,verb).  lexical_analysis(2,3,prep).
lexical_analysis(2,3,verb).  lexical_analysis(3,4, det).
lexical_analysis(4,5,noun).

(3)

The Basics

A simple non-deterministic version of the head-corner parser is now given as follows. The predicate visible to the rest of the world will be the predicate parse/3. This predicate is defined in terms of the parse/5 predicate. The extra arguments introduce a pair of indices which represent the extreme positions between which a parse should be found. This is explained in more detail in [8].

% parse(?Cat,+P0,+P)
parse(Cat,P0,P) :- parse(Cat,P0,P,P0,P).

(4)

A goal category can be parsed if a predicted lexical category can be shown to be a head-corner of that goal.

% parse(?Cat,?P0,?P,+E0,+E) 
% there is a category Cat from P0-P within the interval E0-E
parse(Cat,P0,P,E0,E) :-
    predict(Cat,P0,P,SmallCat,Q0,Q),
    head_corner(SmallCat,Q0,Q,Cat,P0,P,E0,E).

(5)

The head-corner predicate constructs in a bottom-up fashion larger and larger head-corners.

% head_corner(?Small,+P0,+P,?Cat,?Q0,?Q,+E0,+E)
% Small from P0-P is a head-corner of Cat from Q0-Q 
% where Q0-Q occurs within E0-E
head_corner(Cat,P0,P,Cat,P0,P,_,_).
head_corner(Small,Q0,Q,Cat,P0,P,E0,E) :-
    headed_rule(Small,Mother,RevLeftDs,RightDs),
    parse_left_ds(RevLeftDs,QL,Q0,E0),  parse_right_ds(RightDs,Q,QR,E),
    head_corner(Mother,QL,QR,Cat,P0,P,E0,E).

(6)

To parse a list of daughter categories we have to parse each daughter category in turn.

% parse_left_ds(+RevLeftDs,-Qleft,+Qright,+ExtremeLeft) 
% there are categories LeftDs from Qleft to Qright 
% s.t. RevLeftDs is reverse of LeftDs, and ExtremeLeft < Qleft. 
parse_left_ds([],Q,Q,_).
parse_left_ds([H|T],Q0,Q,E) :-
    parse(H,Q1,Q,E,Q), parse_left_ds(T,Q0,Q1,E).

% parse_right_ds(+RightDs,+Qleft,-Qright,+ExtremeRight) 
% there are categories RightDs from Qleft to Qright s.t. 
% Qright < ExtremeRight. 
parse_right_ds([],Q,Q,_).
parse_right_ds([H|T],Q0,Q,E) :-
    parse(H,Q0,Q1,Q0,E),  parse_right_ds(T,Q1,Q,E).

(7)

A predicted category must be a lexical category that lies somewhere between the extreme positions.

% predict(?Cat,-P0,-P,+E0,+E,-SmallCat,-Q0,-Q) 
% SmallCat from Q0-Q is a lexical category and possible 
% head-corner of Cat from P0-P, where P0-P is within E0-E.
predict(_Cat,_P0,_P,E0,E,Small,Q0,Q) :-
    lexical_analysis(Q0,Q,Small),  between(Q0,Q,E0,E).
% between(+P0,+P,+E0,+E) P0-P occurs within E0-E
between(P0,P,E0,E) :-  E0 =< P0, P =< E.

(8)

Improvements

The simple and non-deterministic version of the head-corner parser given above is turned into a very efficient parser by making use of the following techniques. Each of these techniques is explained in [8] in more detail:

the use of top-down information by a table representing the head-corner relation. Furthermore indexing is used for efficient table lookup. The head-corner relation includes information on the start and end positions (e.g. the requirement that the head of an sbar is a complementizer at the left-most position of that sbar phrase.
the use of underspecification of position information in the case of empty productions (epsilon rules).
the (limited) use of memo-ization. Memo-ization is only applied for the parse/5 predicate. This implies that each maximal projection is computed only once; partial projections of a head can be constructed during a parse any number of times, as can sequences of categories (considered as sisters to a head). Active chart parsers `memo' everything; inactive chart parsers only memo categories, but not sequences of categories. In our proposal, we memo only those categories that are `maximal projections', i.e. projections of a head which unify with the top category (start symbol) or with a non-head daughter of a rule.
Note that nothing would prevent us to memo other predicates as well. Experience suggests that the cost of maintaining tables for e.g. the head_corner relation is (much) higher than the associated profit. The use of memoization for only the parse/5 goals implies that the memory requirements of the head-corner parser in terms of the number of items that is being recorded is much smaller than in ordinary chart parsers. Not only do we refrain from asserting so-called active items, but we also refrain from asserting inactive items for non-maximal projections of heads. In practice the difference in space requirements is enormous (2 orders of magnitude). This difference may in turn be a significant reason for the practical efficiency of the head-corner parser. ¹
the use of goal-weakening. The insight behind `goal weakening' in the context of memo-ization is that we may combine a number of slightly different goals into a single more general goal. Very often it is much cheaper to solve this single (but more general) goal, than to solve each of the specific goals in turn. Clearly, one must be careful not to remove essential information from the goal (in the worst case this may even lead to non-termination of otherwise well-behaved programs).
Depending on the properties of a particular grammar, it may for example be worthwile to restrict a given category to its syntactic features before we attempt to solve the parse goal of that category. Shieber's restriction operator [6] can be used here. Thus we essentially throw some information away before an attempt is made to solve a (memoized) goal. For example, the category
x(A,B,f(A,B),g(A,h(B,i(C))))

(9)
may be weakened into:
x(A,B,f(_,_),g(_,_))

(10)
Note that goal weakening is sound. An answer a to a weakened goal g is only considered as an answer for g if a and g unify. Also note that goal-weakening is complete in the sense that for an answer a to a goal g there will always be an answer a' to the weakening of g such that a' subsumes a. For practical implementations the use of goal weakening can be extremely important. It is my experience that a well-chosen goal weakening operator may reduce parsing times with an order of magnitude.
Compact representation of parse trees by packing ambiguities.

Parsing Word-graphs

The head-corner parser is one of the parsers developed within the NWO Priority Programme on Language and Speech Technology. In this program a spoken dialog system is developed for public transportation information [2]

In this system the input for the parser is not a simple list of words, but rather a word-graph: a directed, acyclic graph where the states are points in time, and the edges are labelled with word hypotheses and their corresponding probability. Thus, such word-graphs are acyclic weighted finite-state automata.

In certain approaches toward ill-formed input processing the desire to generalize from input strings to input finite-state automata is also clearly present. E.g., in [3] a framework for ill-formed input processing is described in which certain common errors are modelled as (weighted) finite-state transducers. The composition of an input sentence with these transducers produces a (weighted) finite state automaton which is then input for the parser.

The generalization from strings to weighted automata introduces essentially two complications. Firstly, we cannot use string indices anymore. Secondly we need to keep track of the probabilities of the words used in a certain derivation.

Parsing on the basis of a finite-state automaton can be seen as the computation of the intersection of that automaton with the grammar. It can be shown that if the definite-clause grammar is off-line parsable, and if the finite-state automaton is acyclic, then this computation can be guaranteed to terminate [7]. Moverover, existing techniques for parsing based on strings can be generalized easily by using the names of states in the automaton instead of the usual string indices.

In the head-corner parser, this leads to a change in the definition of the predicate between/4. Rather than a simple integer comparison, we now need to check that a derivation from P0 to P can be extended to a derivation from E0 to E by checking that there are paths in the word-graph from E0 to P0 and from P to E.

The predicate between/4 is implemented using memo-ization as follows. It is assumed that state names are integers; to rule out cyclic word-graphs we also require that for all transitions from P0 to P it is the case that P0 < P. Transitions in the word-graph are represented by clauses of the form wordgraph:trans(P0,Sym,P,Score) which indicate that there is a transition from state P0 to P with symbol Sym and probability Score.

between(L0,R0,L,R) :-  connection(L,L0), connection(R0,R).

connection(A,B):-
        (  var(A)          -> true
        ;  var(B)          -> true
        ;  A=:=B           -> true
        ;  B < A           -> fail % word-graphs are acyclic
        ;  ok_conn(A,B)    -> true
        ;  fail_conn(A,B)  -> fail
        ;  wordgraph:trans(A,_,X,_),
           connection(X,B) -> assertz(ok_conn(A,B))
        ;  assertz(fail_conn(A,B)),
           fail 
        ).

(11)

Accounting for Word-graph Probabilities.

In order to account for the probabilities of a derivation (defined as the product of the probabilites associated with all transitions from the word-graph involved in the derivation) we assume that the predicate lexical_analysis represents the probability of the piece of the word-graph that it covers by an extra argument. During the first phase probabilities are ignored. During the second phase (when a particular derivation is constructed) the probabilities are combined.

Next: Head-corner Parsing and Robustness Up: Robust Parsing with the Previous: Introduction

Noord G.J.M. van
1998-09-25