Word Representations for out-of-domain Part-of-Speech Tagging
Erik Schill and Daniël de Kok


Recurrent neural network part-of-speech taggers have been shown to outperform
traditional taggers based on log-linear and hidden markov models (Ling et al.,
2015; Plank et al., 2016), with reported accuracies higher than 97% on
languages such as English and German. Even though these results are impressive,
the evaluation is typically performed on relatively clean, in-domain text. It
is well-known that real-world performance is often substantially lower (see
e.g. Gimpel et al., 2011). When working out-of-domain, taggers are often
hampered by out-of-vocabulary tokens, different orthography, and syntactical
variation.

We investigate the impact of different word representations on out-of-domain
tagging of German, in order to assess their capacity for generalization across
domain. We train models on the TüBa-D/Z news paper corpus (Telljohann et al.,
2017) and evaluate the taggers on the NoSta-D non-standard varieties corpus
(Dipper et al., 2013). First, we compare skip-gram (Mikolov et al., 2013) and
structured skip-gram embeddings (Ling et al., 2015) and show that the use of
the syntax-oriented structured skip-gram embeddings is beneficial. Then we look
at subword representations, comparing character-level RNNs to a simpler feed
forward layer that captures prefix and suffix features (De Kok, 2015). We show
that the simpler feed-forward model does not only lead to drastically shorter
training times, but also outperforms character RNNs in out-of-domain tagging.