Domain adaptation in sentiment analysis: combining cross-domain data for better text classification

In this paper we investigate the usefulness of combining data from differ- ent domains as a way to accomplish domain adaptation in sentiment anal- ysis. Text classification is carried out through the Naive Bayes algorithm and is implemented through Python Nltk. The data used consists of user- generated game reviews web-scraped from online gaming platform Steam. Genre categories of these reviews are used as different domains. Having a training set consist of 70-90% of genre Sports and rest other genre yields the best results for an overall good classification system, with an average ac- curacy score around 0.75. A potential reason for this observation is that the feature set for Sports is relatively sparse and a result lends itself as a good base set for sentiment analysis. Additionally, domain-dependent behavior is observed in using genres for different domains, which might mean that a smaller differentiating factor such as genre can be utilized in overcoming the domain adaptation problem.

The complete Bachelor Thesis can be downloaded here:
Thesis

Computational Linguistics Reading Group

Sidebar

Domain adaptation in sentiment analysis: combining cross-domain data for better text classification

Computational Linguistics Reading Group

User Tools

Site Tools

Sidebar

Domain adaptation in sentiment analysis: combining cross-domain data for better text classification

Page Tools