This is the Groningen page for the project Coreference Resolution for Extracting Answers (COREA).
The project is part of the Stevin-initiative of the Dutch and Flemish government, and will be carried out in collaboration with the Language Technology Group of the University of Antwerp and Language and Computing.
Project Summary
Coreference resolution is a key ingredient for the automatic
interpretation of text. It has been studied mainly from a linguistic
perspective, with an emphasis on establishing potential antecedents
for pronouns. Practical applications, such as Information Extraction
(IE), summarization and Question Answering (QA), require accurate
identification of coreference relations between noun phrases in
general. Computational systems for assigning such relations
automatically, require the availability of a sufficient amount of
annotated data for training and testing. For Dutch, annotated data is
scarce and coreference resolution systems are lacking.
In this project, we aim to develop a robust system for assigning such
relations automatically, and we will investigate the effect of making
coreference relations explicit on the accuracy of systems for IE and
QA. We will annotate a limited amount of application-specific corpus
material, which is required for the evaluation of the coreference
resolution system in the context of IE and QA.
The project contributes to the goals of Stevin by providing a robust
coreference resolution system which is applicable in a range of
applications for Dutch, such as information extraction, question
answering and summerization. In addition, general guidelines for
coreference annotation will become available and a tool will be
developed to support the annotation of coreference in text. Finally, a
limited amount of data annotated with coreferential information,
including spoken language data, will be produced.
The post doc in Groningen will be connected to local work on
syntactic annotation within the Stevin-initiative and with the
QA system which is being developed within the IMIX programme.
The full text of the scientific parts of the proposal can be found here.
Annotated Corpora
Annotated texts are provided as XML. A stylesheet is used to support
visualization (tested for Firefox and Opera, highlighting does not work in IE). Click on the _inline.xml files to see
the texts. For two corpora, annotated texts are available: