You Write Like You Eat
Angelo Basile


The meaning of a sentence can be conveyed in different forms. Pronunciation, word choices, and constructions can differ and when the variation is linked to social factors like age or gender, then it is called a social variation. In this work, we investigate what restaurant reviews can reveal about syntactic variation occurring between different socio-economic groups. We work on user-generated text and we use an automatic method to collect the data: starting from the Yelp data set, we use distant supervision to assign to each author a label denoting his/her inferred socio-economic status. We exploit the price range of a restaurant as a proxy for the income of its reviewers, in the same way as the prestige of a store has been used by Labov, 2006 as a proxy for the social class of their workers to investigate phonetic variation in New York. We use a machine learning approach to test our hypothesis: given a labeled corpus, we argue that if a classifier can accurately predict the class of a given text by using only syntactic features, then there must be some form of syntactic variation between groups. We report significant positive results. Furthermore, we experiment with using abstract features and we also test whether the results hold across languages. To the best of our knowledge, this work is the first computational study to investigate syntactic variation along the socio-economic axis.

References:
- W. Labov, The social stratification of English in New York city, Cambridge University Press, 2006