Jori Mur (2004)
Offline answer extraction using Dependency Relations
Master's thesis, Rijksuniversiteit Groningen.
[ Paper (PDF, 294 kb) ]


The quantity of information available online increases everyday. It is up to the user to find in this huge amount of information a relevant part that answers the question he had in mind. Search engines, like Google and Altavista, so called Information Retrieval systems, can help the user in his search for information. The user types in one or more query terms and a list of links to relevant documents is returned. But often one still has to read a large amount of text to find an answer.

Question Answering systems offer an alternative. Such systems take as input a natural language question, they analyse the question and look for an answer in a large text collection. Then they return the answer immediately, instead of giving a list of links. The questions are usually factoid questions with answers that are typically named entities.

At the University of Amsterdam they have implemented a multiple strategy approach in their QA-system, because it turned out that different kind of questions are answered best by different kind of strategies. One of the strategies is the off line answer extraction method. For many types of questions the answer occurs in a fixed pattern. The off line answer extraction method exploits this idea to transform unstructured data to semi-structured data. Information that could be an answer to a question is extracted from a text collection using regular expressions and stored in a table.

It is important to increase recall for this method. A question can not be answered if the answer could not be found in the table at all. One problem using regular expressions will always be that information that differs slightly from the predefined patterns will not be put in the table.

A solution for this problem could be to extract the information using a dependency parser. Dependency parsing is based on relations between words. Using a dependency parser, useful information that differs from the usual surface patterns will still be found and extracted, because we are looking at grammatical relations between words. In this thesis I have investigated to what extent the use of dependency relations for the offline answer extraction method could improve the results of question answering compared to using regular expressions for the offline answer extraction method.

The results of my experiments confirm my hypothesis that recall would increase using a dependency parser and that more questions would be answered correctly. I found that using a dependency parser, performance was significantly better than using regular expressions to extract potential answers.