This page is obsolete. Please refer to the new page: http://valeriobasile.github.io/twita
Downloads
The full collection of Italian tweets is not available for download due to
the Twitter terms and conditions, so we release the TWITA as a list of the IDs
of the tweet we collected. This is enough to recreate the collection, minus
the tweets that have been deleted. One inefficient way of downloading a tweet
given its ID is with the wget command:
$ wget http://www.twitter.com/uid/statuses/TWEET_ID
and subsequently parse its output looking for the tag
<p class="js-tweet-text tweet-text">
A better way would be to use the
Twitter Streaming API.
new! TWITA tweet ID lists
- February 2012 (4,788,492 tweets 31MB)
- March 2012 (6,915,622 tweets 45MB)
- April 2012 (7,440,299 tweets 48MB)
- May 2012 (8,474,224 tweets 54MB)
- June 2012 (9,047,127 tweets 58MB)
- July 2012 (8,978,568 tweets 58MB)
- August 2012 (8,482,334 tweets 54MB)
- September 2012 (9,695,155 tweets 62MB)
- October 2012 (7,379,208 tweets 47MB)
- November 2012 (11,306,397 tweets 71MB)
- December 2012 (3,622,892 tweets 23MB)
- January 2013 (12,715,122 tweets 80MB)
- February 2013 (12,014,345 tweets 76MB)
- March 2013 (12,933,829 tweets 82MB)
- April 2013 (8,509,986 tweets 53MB)
- May 2013 (18,154,491 tweets 113MB)
- June 2013 (5,125,215 tweets 32MB)
note: there is a "hole" in the collection from 12th to 30th June, due to a
failure to switch to the new version of the Twitter API.
Hashtags frequency lists
Resources for Sentiment Analysis
- Sentix - a lexicon for Sentiment Analysis of Italian (503KB gzipped text file)
- polypathy_en - List of English lemmas ordered by their polypathy score (73.7 gzipped text file)
- polypathy_it - List of Italian lemmas ordered by their polypathy score (30.7 gzipped text file)