Machine Translation Dutch <> Gronings
Rick Kosse


Gronings is a dialect of Dutch spoken in the north-eastern part of The Netherlands. While closely-related to Dutch, Gronings differs mainly in grammar (word order, and pro-drop for second person pronouns). Hardly any language technology nor language resources are available for Gronings. In this work we aim to build a competitive machine translation (MT) system for Dutch to Gronings and vice versa despite the scarcity of resources. We approach this task by first prepossessing and aligning a parallel corpus For Dutch--Gronings of roughly 8,000 sentence pairs. With the use of the NLTK library, sentences were split, tokenized and punctuation was normalized. Then, the sentences in both sides of the corpus where aligned with Hunalign. This sentence aligner supports the use of a bilingual dictionary to help the program align. A dictionary of a user-based website about Gronings was used, and it showed that recall scores were higher with the use of a dictionary than without. In recent studies, neural networks have proven to be very effective for MT and therefore our MT systems are built under this paradigm. Since Gronings is a dialect of Dutch and large part of the vocabulary is shared, we investigate whether character-based MT is more effective than word-based MT and the use of sub-word units.