Next: Discussion Up: Reversibility and Self-Monitoring in Previous: Generation of Unambiguous Utterances

Generation of Paraphrases

When parsing of an utterance yields several readings, one way in order to determine the intended meaning is to start a clarification dialog. During such a special dialog situation the multiple interpretations of the parsed utterance are contrasted by restating them in different text forms. Now, the dialog partner who produced the ambiguous utterance is requested to choose the appropriate paraphrase, e.g., by asking her `Do you mean X or Y ?'.

This situation has already been exemplified in section 4 fig. 1. In this example, parsing of S (`Remove the folder with the system tools') has lead to two readings LF' and LF''. The multiple semantic forms are then paraphrased by means of the utterances S' and S'' (`Do you mean ``Remove the folder by means of the systems tools'' or ``Remove the folder that contains the system tools''?').

A naive version

A first naive algorithm that performs generation of paraphrases using a reversible grammar can be described as follows. Consider the situation in fig. 1. Suppose S is the input for the parser then the set

{(S, LF'), (S, LF'')}

is computed. Now LF' and LF'' are respectively given as input to the generator to compute possible paraphrases. The sets

{(LF', S'), (LF', S)}

and

{(LF'', S), (LF'', S'')}

result. By means of comparison of the elements of the sets obtained during generation with the set obtained during parsing one can easily determine the two paraphrases S' and S'' because of the relationship between strings and logical forms defined by the grammar. Note that if this relationship is effectively reversible (see section 2) then this problem is effectively computable.

This `generate-and-test' approach is naive because of the following reasons. Firstly, it assumes that all possible paraphrases are generated at once. Although `all-parses' algorithms are widely used during parsing in natural language systems a corresponding `all-paraphrases' strategy is not practical because in general the search space during generation is much larger (which is a consequence of the modular design discussed in section 3). Secondly, the algorithm only guarantees that an ambiguous utterance is restated differently. It is possible that irrelevant paraphrases are produced because the source of the ambiguity is not used directly.

A suitable strategy

The crucial point during the process of generation of paraphrases is that one has not only to guarantee that an ambiguous utterance is restated differently but also that only relevant paraphrases are to be produced that appropriately resolve structural ambiguities.

In order to be able to take into account the source of ambiguity obtained during parsing the basic idea of the proposed approach is to generate paraphrases along `parsed' structures. Suppose that parsing of an utterance has yielded two interpretations LF' and LF'' with corresponding derivations trees d₁ and d₂. It is now possible to generate a new utterance for each logical form LF_i by means of the monitored generation algorithm described in the previous section. In this case, the corresponding derivation tree d_i of LF_i is marked by means of the others. The so marked tree is then used to `guide' the generation step as already known.

The paraphrasing algorithm in detail

Because most of the predicates to use are already defined in section 5 as well as the definitions of signs and rules we can directly specify the top-level predicate interactive_parsing as follows:

The predicate find_all_parse computes all possible parses of a given string Str, where TreeSet are all corresponding derivation trees extracted from the set of the parsed structures SignSet. If the parser obtains multiple interpretations then for each element of SignSet a paraphrase has to be generated. This is done by means of the predicate generate_paraphrases, whose definition will be given below. All computed Paraphrases are then given to the user who has to choose the appropriate paraphrase. The corresponding logical form of the chosen Sign determines the result of the paraphrasing process. For each parsed sign of the form sign(LF,Str,Syn,D) a paraphrase is generated in the following way: First its derivation tree D is marked by means of the set of derivations trees contained in TreeSet. The resulting marked derivation tree Guide is then used in order to guide the generation of the sign's logical form LF using the predicate mgen. Note, that this directly reflects the definition of the predicate revision, which definition was given in the previous section. Therefore we can simply specify the definition of the predicate generate_paraphrases as follows: generate_paraphrases([Sign|ParsedSigns], TreeSet, [Paraphrased|T]):- revision(Sign,TreeSet,Paraphrased), !, % one alternative for each reading generate_paraphrases(ParsedSigns, TreeSet, T).

A simple example

In order to clarify how the strategy works we consider the attachment example of section 5 again. Suppose that for the sentence

$\begin{exam} \begin{flushleft} Die M\uml anner haben die Frau mit dem Fernglas gesehen.\\ The men has the woman with the telescope seen. \end{flushleft}\end{exam}$
the parser has determined the derivation trees in figure 4 with corresponding (simplified) semantic representations:

mit(fernglas, sehen(pl (mann), frau))

for the left and

sehen(pl (mann), mit(frau, fernglas))

for the right tree. For the first reading the paraphrase

$\begin{exam} Die M\uml anner haben mit dem Fernglas die Frau gesehen. \end{exam}$
is generated in the same way described in section 5. In this case the left tree of figure 4 is marked by means of the right one.

In order to yield a paraphrase for the second reading, the right derivation tree of figure 4 is marked by means of the left one. In this case markers are placed in the right tree at the nodes named `pp_mod' and `gesehen'. If the grammar allows to realize `mit(frau, fernglas)' using a relative clause then the paraphrase

$\begin{exam} \begin{flushleft} Die M\uml anner haben die Frau, die das Fernglas ... ... The men have the woman, who the telescope has, seen. \end{flushleft}\end{exam}$
is generated. Otherwise, the markers are pushed up successively to the root node `topic' of that tree yielding the paraphrase:

$\begin{exam} \begin{flushleft} Die Frau mit dem Fernglas haben die M\uml anner g... ...en.\\ The woman with the telescope have the men seen. \end{flushleft}\end{exam}$

Now, the produced paraphrases are given to the user who has to choose the appropriate one. In the current implementation this is simply done by entering the corresponding number of the selected paraphrase.

Properties

In principle the same properties as those already discussed for the monitored generator are valid. This means, that only unambiguous paraphrases are generated. Therefore it is guaranteed that the same paraphrase is not produced for different interpretations. This is important because it could be the case that a paraphrase, say S^' is also ambiguous such that it has the same interpretations as S. Therefore it could happen that the same utterance S^' is generated as a paraphrase for both LF^' and LF^''. For example in German the following sentence:

$\begin{exam} \begin{flushleft} Den Studenten hat der Professor benotet, der das ... ...C has the professor marked, who developed the program. \end{flushleft}\end{exam}$
is ambiguous because it is not clear who developed the program. If a paraphrase is to be generated, which expresses that the student developed the program, then this can be done by means of the utterance:

$\begin{exam} \begin{flushleft} Der Professor hat den Studenten benotet, der das ... ...the-ASC student-ACC marked, who developed the program. \end{flushleft}\end{exam}$
But this utterance has still the same ambiguity. This means, that one has to check also the ambiguity of the paraphrase. An unambiguous solution for the example is, e.g., the utterance:

$\begin{exam} \begin{flushleft} Den Studenten, der das Programm entwickelte hat d... ...C, who developed the program has the professor marked. \end{flushleft}\end{exam}$

The advantage of our approach is that only one paraphrase for each interpretation is produced and that the source of the ambiguity is used directly. Therefore, the generation of irrelevant paraphrases is avoided.

Furthermore, we do not need special predefined `ambiguity specialists', as proposed by [Meteer and Shaked1988], but rather use the parser to detect possible ambiguities. Hence our approach is much more independent of the underlying grammar.

Next: Discussion Up: Reversibility and Self-Monitoring in Previous: Generation of Unambiguous Utterances

Noord G.J.M. van
1998-09-30