D-MAP: Deep Meaning Annotation Project

Description

The Deep Meaning Annotation Project (D-MAP) is concerned with the development of a large-scale deep semantic annotation of text and its application of machine learning methods in computational semantics. The focus is on integration of different aspects of meaning (word senses, thematic roles, quantifier scope, tense and aspect, anaphora, presupposition, rhetorical relations, background knowledge) into one semantic formalism.

Such a resource doesn't exist yet, and we expect it to have a large impact on computational semantics because it will enable (a) quantified evaluation of semantic formalisms and (b) the use of statistical methods. We believe that D-MAP will play the same role in computational semantics as treebanks did (and still do) for computational syntax. It can also influence related areas of research such as natural language generation, automated summarisation, and machine translation. The project has started in autumn 2010 and will run for five years.

Method

Manual annotation is unlikely to produce the size of annotated corpus that we envisage, namely around 100,000 texts. The method that we will use will combine existing techniques for automated deep semantic analysis of texts (and further develop these) as well as modern methods for using web-communties to acquire gold-standard annotations. The latter will be based on crowdsourcing methods and games with a purpose, and are indeed inspired by successful initiatives such as Phrase Detectives and the Mechanical Turk. The theoretical backbone is established by by Discourse Representation Theory, a formal theory of meaning developed by the philosopher of language Hans Kamp. Extensions of the theory are required to bridge the gap between theory and practice, in particular in the area of text segmentation, tense and aspect, idioms, plurals, and comparatives. An important aspect in this modelling procedure is the ability to express these linguistic phenomena in a first-order language, enabling the practical use of first-order theorem provers and model builders.

Results

The Groningen Meaning Bank. This is the development version -- a stable version will be published periodically.

People

Valerio Basile (PhD student)
Johan Bos (coordinator)
Kilian Evang (PhD student)
Noortje Venhuizen (PhD student)