User Tools

Site Tools


abstract_hessel

Singleton detection using semantic word vectors and neural networks.

Using semantic word vectors and neural networks, a state-of-the- art singleton detection system is developed. Singleton detection is a pre-processing task for coreference resolution, which aims to filter out singleton mentions (mentions not involved in a coreference cluster). Filtering out singleton mentions reduces the search space of the coreference resolution system, which helps to improve performance. Instead of using features present in previous approaches (part-of- speech tags, surface semantics, named entity information, etc.), we make use of semantic word vectors, a relatively new technology which represents words as real-valued, high-dimensional dense vectors. These vectors have shown promise as features in other NLP tasks, such as named-entity recognition, paraphrase detection, syntactic parsing and sentiment analysis. In addition to word representations, a recursive autoencoder is used to generate vector representations for mentions con- sisting of multiple words. These features are used in a multi-layer perceptron classifier to achieve state-of-the-art performance (79.5% overall accuracy) in singleton detection on the CoNLL 2011 and 2012 Shared Task data (i.e., the OntoNotes corpus). It is shown that off-the-shelf semantic word vectors are information-rich features, which can be used for more than just ‘low-level’ syntactic NLP tasks, but also for tasks more semantic in nature, such as singleton detection. This shows their promise for further advances in natural language processing.

It is hypothesized, because semantic word vectors contain more semantic information than other commonly used features, and because they are types of features not already used by coreference resolution systems, that a singleton detection system based on this can benefit coreference resolution more than earlier mention filtering systems. To this goal, performance is evaluated with the most recent versions of the Stanford and Berkeley coreference resolution systems, which are among the state-of-the-art in English coreference resolution. Performance with the Stanford system is good (0.7 point increase in CoNLL F1-score), but not with the Berkeley system (0.3 point increase). As such, the conclusion has to be drawn that, as they are used in this study, semantic word vectors do not have significant added value for coreference resolution performance.

abstract_hessel.txt · Last modified: 2019/02/06 16:03 (external edit)