CLIN 29 in Groningen

Sentiment Preservation in Machine Translation
Pintu Lohar, Haithem Afli and Andy Way

For most people nowadays, interacting with social media is an everyday occurrence. Twitter has become a popular platform to facilitate information sharing. Its maximum 280-character restriction encourages users to post informal short texts (tweets) that often convey a certain degree of sentiment. Unfortunately, sentiment analysis (SA) tools are almost uniquely available in English, so if one has a large amount of customer feedback, they need to be first translated into English prior to SA being performed. In many cases, maintaining sentiment polarity in the target language is much more important than the actual translation quality. However, MT can sometimes alter the sentiment in the target language [Mohammad et al., 2016]. Furthermore, parallel Twitter data for MT training is barely available. This paper highlights our three major contributions in this area: (i) our published English–German parallel corpus of FIFA 2014 World Cup tweets (FooTweets) with manually annotated sentiment scores, (ii) a suite of sentiment-specific MT systems focusing on sentiment preservation during translation, and (iii) a nearest sentiment class-combination method to extend the existing sentiment-specific MT systems by adding training data from the nearest-sentiment class. Our extensive evaluation revealed the following useful observations; (i) our published resource is very useful in translating user-generated content, (ii) the sentiment MT systems significantly improve sentiment preservation with only a slight deterioration in overall translation quality, and (iii) the proposed approach is capable of achieving a proper balance between translation quality and sentiment preservation.