Does Syntactic Knowledge Transfer Cross-Lingually in Multilingual Neural Language Models?
Prajit Dhar, Arianna Bisazza and Wessel Kraaij


Recent work has shown that multilingual neural models can be successfully trained for tasks such as language modeling (Ostling and Tiedemann, 2017) and translation (Johnson et al. 2017), suggesting that these models learn to share useful knowledge cross-lingually through their learned representations. However, a thorough analysis of such models is still lacking.

We provide an initial analysis of the possible transfer of syntactic knowledge across languages in a bilingual model of next word prediction. Taking psycholinguistic studies (Jarvis and Pavlenko, 2008 and Kootstra et al., 2012) as an inspiration, we evaluate various language models on a recently introduced long-range agreement benchmark (Gulordava et al. 2018). Specifically, we investigate whether training the model on a large corpus of a helper language yields an increased agreement accuracy in the target language, which would be evidence for syntactic transfer. French is taken as the helper language and Italian or Russian as the target languages.

For both target languages, the bilingual models yield small gains with respect to the monolingual target models for naturally occurring sentences, and slightly decrease in accuracy for nonce sentences, suggesting that language relatedness does not matter. Contrary to our expectations, we also observed that words in the helper language are often assigned a considerable probability mass even in target language-only contexts.

This is an extension of our previous work (Dhar and Bisazza 2018) including the new Russian model results and an analysis of the probability distribution by language.