This page is obsolete. Please refer to the new page: http://valeriobasile.github.io/twita

Sentiment Analysis

This page is obsolete. Please refer to the new page: http://valeriobasile.github.io/twita

As part of the TWITA project investigation, we developed a lexicon for Sentiment Analysis called Sentix (Sentiment Italian Lexicon). The lexicon is the result of the alignment of several resources:

WordNet - a large lexical database of English
MultiWordNet - a multilingual lexical database
BabelNet - A very large multilingual ontology
SentiWordNet - a lexical resource for opinion mining

Sentix is available for download from the downloads page.

An entry in our lexicon consists in an italian lemma, part-of-speech (noun, verb, adjective, adverb), WordNet synset ID, a positive and a negative score from SentiWordNet, a polarity score ranging from -1 to 1, and an intensity score ranging from 0 to 1. Here is an example entry:

lemma	POS	Wordnet synset ID	positive score	negative score	polarity	intensity
naturale	A	00074346	0,75	0,125	0,789	0,76

Polarity and Intensity

In SentiWordNet the positive p and negative n polarity scores are assigned to synsets independently, that is, the sum of positive, negative and neutral score is 1, but no constraint is posed on the relative size of p and n. Geometrically, a synset in SentiWordNet is a point in the cartesian space where its y coordinate is the positive score and the y coordinate is the negative score. Since x+y≤1, the sentiment plane is restricted to a triangle:
geometric sentiment space

Intuitively, synsets that fall into the P region are "positive", synsets that fall into the N region are "negative", and synsets that fall into the U region are "neutral". However, the three regions are not clearly delimited.

Formally, given a cartesian-to-polar coordinates transformation:
geometric sentiment space
We define Polarity (π) and Intensity (ι) as:

Polarity (π) ranges from -1 (totally negative) to 1 (totally positive). Intensity ranges from 0 (totally neutral) to 1 (totally polarized). Polarity and Intensity are functions of the positive and negative scores found in SentiWordNet, thus they don't add any informationi to it. However, they are intuitive measures of lexical sentiment.

Polypathy

When it's not possible to perform Word Sense Disambiguation, then all senses of a word must be considered (or one can use heuristics such as taking the most frequent sense). Many words are polysemous, that is, they have multiple meanings. The meanings of a word can convey sentiments with opposite polarities. The polypathy of a lemma is a measure of the spread of its senses on the polarity scale.

One way of compute the polypathy of word is the standard deviation of the polarity of its senses, but this method does not take into account the intensity of the synsets. An alternative method is to take the mean distance of all synsets from their centroid in the sentiment plane. Two lists of Italian and English words with their polypath (computed as standard deviation of polarities) are available for downloads in the downloads page.

Statistics

Resource	Part of speech	Lemmas	Synsets
Sentix	noun	52,257	12,228
	adjective	3,359	1,805
	verb	2,775	1,260
	adverb	1,351	750
	all	59,742	16,043
Sentix (only aligned with MultiWordNet)	noun	10,042	4,779
	adjective	3,359	1,805
	verb	2,775	1,260
	adverb	1,351	750
	all	17,527	8,594
Sentix (only aligned with Babelnet)	noun	45,517	11,567
	adjective	0	0
	verb	0	0
	adverb	0	0
	all	45,517	11,567