CLIN 29 in Groningen

Automatic modal sense disambiguation in the vaccination debate.
Elizabeth King and Roser Morante

We present research into the automatic processing of modal auxiliary senses in a corpus of texts related to the vaccination debate. The primary aims of this research are to identify the most informative features for machine learning classification with regards to modal sense disambiguation, and discover whether modal senses are a useful feature in predicting vaccination stance. Informed by previous research conducted by Ruppenhofer and Rehbein (2012), Zhou et al. (2015), Marasovi´c and Frank (2016) and King (2018), and a corpus analysis that investigated potential causes of sense ambiguity, two main feature groups were implemented: context and subject. Support Vector Machines and boosted trees with gradient descent are used to test a combination of context and subject features for the modal sense disambiguation task, and the SVM is used to ascertain whether incorporating modal senses as a feature aids in predicting vaccination stance. The majority of experiments conducted improve on baselines using the most frequent class and the addition of subject features are shown to be largely informative. Different classifiers have different strengths, where the boosted trees experiments gain more correct predictions for the dynamic sense, and SVM gains more correct predictions for the epistemic-dynamic. Investigations regarding the informativeness of modal senses as a predictor of vaccination stance are not conclusive and further research is needed, with a larger corpus and additional features. We used the DM50, the Disneyland Measles corpus of 50 documents. We plan to annotate 200 extra documents and also use MASC (Zhou et al. 2015).