Neural Machine Translation with and without parallel data
Dimitar Shterionov and Andy Way


Neural Machine Translation (NMT) as a data-driven paradigm relies heavily on the amount and quality of the available parallel data. That is even more evident for the newer Transformer architecture. In many scenarios, e.g., low-resource language pairs, however, the data is not enough to train a good NMT system. One approach to overcome this problem is to use a pivot language to bridge the source (L1) and target (L2) languages. That is, using parallel data between the L1 and some other language (L*) and between L* and the target L2 to learn to translate between L1 and L2.

Another approach, recently on Neural MT, Zero Shot Translation (ZST) exploits parallel data in multiple languages to train a single NMT system that can translate between all possible language combinations. Experiments showed that ZST-based models could translate between multiple languages even though parallel data was not provided during training. In a recent study we explored ZST for a low-resource scenario where we built NMT systems for Indian languages with scarce parallel data.

In this work we focus on NMT systems built without parallel data, both ZST approach and using a pivot, and we exploit the effects of adding small amounts of parallel data to such engines to boost the performance. We apply our methodology on low-resource as well as high-resource scenarios and analyse the impact of the additional parallel data on the translation quality. Our results are optimistic and show that even little parallel data boosts the performance of the NMT system.