Going Dutch: Creating SimpleNLG-NL
Ruud de Jong and Mariët Theune
We present SimpleNLG-NL, an adaptation of the surface realiser SimpleNLG, that can be used to generate Dutch text. Surface realisation is the last step of Natural Language Generation after determining the content of the text and planning the sentence. The system was developed using a novel approach that used target sentences from a treebank. Using the bilingual SimpleNLG-EnFr as a basis, an iterative generate-evaluate-revise cycle was used to determine the subset of Dutch grammar to be implemented. For every target sentence, the SimpleNLG-NL input code was written and the resulting output was analysed. Differences between the target sentence and the result were addressed by implementing the relevant Dutch grammar rules. After two rounds of this cycle with treebank sentences and two sets of unit tests, 74 out of 86 sentences (86.0%) were successfully generated. One example of a Dutch grammar feature that was implemented concerns Separable Complex Verbs, which consist of a verb prefixed by another word and require special syntactic and morphological handling (e.g. the past tense of toekennen ‘assign’ is kende toe, the participle is toegekend). We believe that the grammatical coverage of SimpleNLG-NL is large enough to be able to generate simple sentences. The system will be used in the POSTHCARD project, which uses a simulation of Alzheimer’s patients to train caregivers.
SimpleNLG-NL is released under the MPL1.1 license and is available on Github.