Detecting linguistic differences between product reviews in high and low price categories of web shop Coolblue
Stijn Eikelboom


This study was performed as a bachelor thesis under supervision of Martijn Wieling at the University of Groningen. It takes the first steps in the virtually unexplored field of detecting linguistic differences between product reviews in high and low price categories. While previous work has studied terminology in positive and negative restaurant reviews and confirmed that product reviews play an important role in customer decisions and thus affect sales, the topic of this study is relatively new.

Reviews available on Dutch web shop Coolblue are used to develop a classifier using the concept of supervised Machine Learning, in order to distinguish between Relatively cheap and Relatively expensive products. By experimenting with different configurations, an optimal model is created that reaches an F1-score of 0.90 and hence outperforms the baseline of 0.50. The corresponding results show that linguistic differences between the two classes exist and that products can automatically be classified into price classes based on them.

Eventually, the Most Informative Features resulting from the generated models are analysed to get insight into the terminology used by reviewers of cheap and expensive products. Differences between the classes are found in the extremity of sentiment words, the described characteristics of the product and the attitude towards its price level. Additional length features suggest that higher priced products either receive more or longer reviews. All in all, the results and the dataset produced in this study will hopefully open doors towards further studies in this area.