Unraveling the Chinese Whispergame – Can we detect Circular Reporting?
Maaike de Boer, Matthias Fäth, Judith Dijk and Freek Bomhof


The news often suffers from the Chinese Whispergame, in which one source reports on a certain event and others copy and slightly change the content. In this way it seems that the information is coming from multiple sources, whereas it originates from one source. In this research, we try to detect this circular reporting by performing two experiments on real news data. We collected 1002 news articles from several English and Spanish sources, of which roughly 750 about Venezuela and the others about other unrelated topics. We extracted n-grams, stylometric features, Names, Organizations and Locations (NOL) as features from the articles. First, we classify whether an article is about Venezuela or not. We use state-of-the-art classifiers, such as Random Forest, SGD, and k-NN. The results show that the performance (accuracy and F1-score) is higher than 90% for all classifiers used. The second experiment includes the detection of similar articles. We use the cosine similarity between the n-grams and the set overlap between the NOLs as classifiers, in which a score > 0.9 indicates that two articles are the same. Because we do not have a ground truth, we use Amazon Mechanical Turk to validate that this classification is correct on positively classified samples and negative on random samples. The results show that the three human annotators barely agreed on the similarity of the articles (Fleiss’ kappa < 0.4), showing that it is even for humans hard to unravel the Chinese Whisper Game. Future work includes the collection of ground-truth data.