CLIN 29 in Groningen

Automatic Recognition of Discourse Relations with Neural Networks Produces State-of-the-art Quality
Elizaveta Kuzmenko and Talita Anthonio

Recognizing implicit discourse relations is a difficult task. It is hard to get a corpus with implicit discourse relations. Therefore, most studies have used the RST and PDTB data sets. This research proves that it is possible to automatically label implicit discourse relations with limited resources and simple feature extraction and to receive state-of-the art results.
We create a corpus of implicit discourse relations from Gutenberg texts. We used the same method as Marcu and Echihabi (2002) to mimic implicit discourse relations. Hence, we extracted pairs of sentences with explicit discourse connective and deleted the explicit discourse connectives. Our system recognized four relation types: ELABORATION, CONTRAST, CONDITION and CAUSE-EXPLANATION. Among them, we also successfully classified texts without discourse relations.
Instead of using word embeddings and complex lexical features, we attempted to reach
maximum quality with simple bag-of-words features. This was achieved using a multi-layered perceptron with a RELU activation function. Our system received state-of-the art quality for binary classifiers, with F-score values in different settings ranging from 0.4 to 0.8. For multiclass classification the F-score is 0.45 which is on a par with similar studies (F1 = 44.98 in Liu et al., 2016 ). The best performance is achieved for binary classifiers that use all words as features. The results are close to state-of-the-art, despite the fact that we use a small and noisy dataset and simple bag-of-word representations. Experimenting with additional features (word embeddings, tf-idf, etc.) and more complex neural architectures is the scope of future work.