Professional Documents
Culture Documents
Author Final
Author Final
on Social Media
terval between the time tweet T was posted and the time the N th 20
retweet was posted. Negative
Average number of retweets Neutral
Positive
4.5 Methods for Statistical Analysis 15
Initially, we examined the mean and distribution of the measures
of message virality for the message sentiments. We classified all
tweets as positive, negative, or neutral. For each category, we ob- 10
tained the mean and distribution of the number of retweets and N -
retweet time. When analyzing dataset DS , we estimated the mean
of the number of retweets in the population because dataset DS 5
was obtained from dataset DA by biased sampling. The method
of estimating the mean number of retweets of positive tweets was
as follows. Let µpi be the sample mean of the number of retweets
0
of positive tweets in section i (Table 1) and in dataset DS , and let Subjective Objective
fip be the number of positive tweets in section i and in dataset DA classification classification
divided by the number of positive tweets in dataset DA .∑ The mean
number of retweets of positive tweets was estimated as i fip µpi .
Figure 1: Relation between tweet sentiment and the mean num-
Next, we performed regression analysis to investigate the effects
ber of retweets. Left-hand bars show the estimated mean val-
of message sentiment on its virality considering other factors re-
ues obtained from dataset DS , and the right-hand bars show
lated to retweet behavior. We used the variables shown in Table 5.
the simple mean values obtained from dataset DA . The pop-
Following the method in Ref. [27], we used the presence of URLs,
ulation is the tweets whose number of retweets is more than
hashtags, and the number of followers as control variables because
one. Retweet volume of negative tweets is larger than those of
these factors affect message diffusion [22,27,28]. Using these con-
positive and neutral tweets.
trol variables, we examined the effects of message sentiment on its
virality eliminating the effects of other factors. We did not include a
variable for the activity of twitter users because this does not affect where βn is the regression coefficient. Note that follower is log
message diffusion [27]. We did not use dataset DS for regression transformed because the distribution of the number of followers is
analysis because it was obtained from biased sampling. heavy-tailed. For the regression of N RTtime, we used a simple
Following the method in Ref. [27], we used a binomial regres- linear regression model.
sion model for regression of RTnum because the variance of the
number of retweets is large (Tables 1 and 2). In the negative bi-
nomial regression model, the relation between dependent and inde-
5. RESULTS
pendent variables is modeled as
5.1 Analysis of Descriptive Statistics
To address RQ1, we examined the mean number of retweets
log(RTnum) =β0 + β1 URL + β2 hash for each category based on tweet sentiment (Fig. 1). Bars on the
(1)
+ β3 log(follower ) + β4 pos + β5 neg, left-hand side of the figure show the results obtained from dataset
DS , and bars on the right-hand side show the results obtained from
RTnum =eβ0 × eβ1 URL × eβ2 hash dataset DA . The results of dataset DS show estimated mean values
(2)
× follower β3 × eβ4 pos × eβ5 neg , that are explained in Section 4.5.
1 1
0.9 0.9
0.8 0.8
Empirical CDF
Empirical CDF
0.7 0.7
0.6 0.6
0.5 0.5
0.4 0.4
0.3 Negative 0.3 Negative
0.2 Neutral 0.2 Neutral
0.1 Positive 0.1 Positive
0 0
2 20 200 2000 2 20 200 2000
Number of retweets Number of retweets
(a) Subjective classification (b) Objective classification
Figure 2: Cumulative distribution of the number of retweets for each category. Note that (a) shows the cumulative distributions of
the retweet volume of the sampled tweets, not the total population. Negative tweets tend to have a larger retweet volume than positive
and neutral tweets.
20 18
18 16
16 14
N retweet time [h]
14 12
12
10
10
8
8
6 6 Negative
Negative
4 Neutral 4 Neutral
Positive Positive
2 2
0 0
500 1000 1500 2000 500 1000 1500 2000
Retweet count N Retweet count N
(a) Subjective classification (b) Objective classification
Figure 3: Average N -retweet time for each category. Average N -retweet time of negative tweets is shorter than those of positive and
neutral tweets.
Figure 1 shows that the retweet volume of negative tweets is ap- category. Average N -retweet time was obtained by calculating the
proximately 1.2–1.6-fold that of neutral tweets, and the retweet vol- average N -retweet time for tweets that were retweeted at least N
umes of positive and neutral tweets are similar to each other. We times. Because the number of samples with a large retweet count,
performed the pairwise test on the results of dataset DA using the N , is limited, the average values fluctuate if N is large.
Steel-Dwass [12, 25] method, and found that there were signifi- Figure 3 shows that average N -retweet time of negative tweets
cant differences in the number of retweets between any two cate- is shorter than those of positive and neutral tweets. In particular,
gories based on sentiment (p < 0.05). These results suggest that when N > 100, the average N -retweet time of negative tweets
the retweet volume of negative tweets is larger than that of neutral is approximately 20% shorter than those of positive and neutral
and positive tweets and the retweet volume of positive tweets is tweets. Note that the fraction of tweets retweeted more than 100
similar to neutral tweets. The differences of the mean values ob- times is only 1% in the collected dataset. Namely, tweets with a
tained with datasets DS and DA may be caused by the difference retweet count of N > 100 have high virality in terms of diffusion
between objective and subjective classifications (Table 4). volume. These results suggest that negative tweets spread faster
Next, we investigated the distributions of the number of retweets than positive and neutral tweets, particularly for tweets with large
for each category (Fig. 2). Figure 2 confirms that negative tweets diffusion volume. The diffusion time of negative tweets was ap-
tend to have a larger retweet volume than positive and neutral tweets. proximately 20% shorter than that of positive and neutral tweets,
We can also find that positive tweets tend to have slightly larger namely the diffusion speed of negative tweets was about 1.25-fold
retweet volume than neutral tweets (Fig. 2 (b)). that of positive and neutral tweets. In contrast, the N -retweet time
Next, we tackled retweet speed to answer RQ2 by using average of positive tweets was slightly longer than that of neutral tweets.
N retweet time. Figure 3 shows average N -retweet times for each
1 1
0.9 0.9
0.8 0.8
Empirical CDF
Empirical CDF
0.7 0.7
0.6 0.6
0.5 0.5
0.4 0.4
0.3 Negative 0.3 Negative
0.2 Neutral 0.2 Neutral
0.1 Positive 0.1 Positive
0 0
0 5 10 15 20 25 30 0 5 10 15 20 25 30
10-retweet time [h] 10-retweet time [h]
(a) Subjective classification (b) Objective classification
Figure 4: Cumulative distribution of 10-retweet time for each category. 10-retweet time for negative tweets and tweets with other
sentiment is similar.
1 1
0.9 0.9
0.8 0.8
Empirical CDF
Empirical CDF
0.7 0.7
0.6 0.6
0.5 0.5
0.4 0.4
0.3 Negative 0.3 Negative
0.2 Neutral 0.2 Neutral
0.1 Positive 0.1 Positive
0 0
0 5 10 15 20 25 30 0 5 10 15 20 25 30
100-retweet time [h] 100-retweet time [h]
(a) Subjective classification (b) Objective classification
Figure 5: Cumulative distribution of 100-retweet time for each category. 100-retweet time for negative tweets and tweets with other
sentiment is similar.
We investigated the distribution of N -retweet time of tweets for positive tweets, particularly when the retweet count was large, and
each category. Figures 4, 5, and 6 show the cumulative distributions positive and neutral tweets spread at similar speeds.
of N -retweet time for each category when N = 10, 100, and 1000,
respectively. 5.2 Regression Analysis
These results confirm that negative tweets spread faster than neu- The results in the previous section show that the message senti-
tral and positive tweets do if the retweet count, N , is large. Fig- ment and virality are closely related to each other. In this section,
ure 6 shows that the diffusion speed of negative tweets is faster than we perform regression analysis to investigate the relation between
tweets with other sentiment when N = 1000. In contrast, Figs. 4 message sentiment and virality, eliminating the effects of other fac-
and 5 show that N -retweet time for negative tweets and tweets with tors affecting message diffusion. We performed negative binomial
other sentiment is similar. The difference in N -retweet time be- regression analysis for investigating the effects of message senti-
tween positive and neutral tweets was only observed in Fig. 5(b). ment on diffusion volume. The dependent variable was RTnum,
The pairwise test with the Steel-Dwass method [12, 25] shows that and the independent variables were pos, neg, follower, URL, and
there is a significant difference in 10-, 100-, and 1000-retweet time hash. Table 6 shows the regression analysis results. The regression
among tweet sentiment categories in dataset DA (p < 0.05). coefficient, β, and the values of eβ for each variable are shown in
These analyses show similar results from datasets DS and dataset the table to demonstrate the effects of each independent variable on
DA , which suggests that the results are robust. For RQ1, our re- the dependent variable.
sults suggest that in terms of retweet volume, negative tweets were The result of the regression analysis shows that whether the sen-
the most viral and the virality of positive tweets was similar to neu- timent of a tweet is negative or positive increases its number of
tral tweets. For RQ2, negative tweets spread faster than neutral and retweets in the model. The strength of the effect of each vari-
able can be estimated from the regression coefficient, eβ (Eq.(2)).
1 1
0.9 0.9
0.8 0.8
Empirical CDF
Empirical CDF
0.7 0.7
0.6 0.6
0.5 0.5
0.4 0.4
0.3 Negative 0.3 Negative
0.2 Neutral 0.2 Neutral
0.1 Positive 0.1 Positive
0 0
0 5 10 15 20 25 30 0 5 10 15 20 25 30
1000-retweet time [h] 1000-retweet time [h]
(a) Subjective classification (b) Objective classification
Figure 6: Cumulative distribution of 1000-retweet time for each category. The diffusion speed of negative tweets is faster than tweets
with other sentiment when N = 1000.
sion volume of negative tweets was 1.2–1.6-fold that of positive of the 20th International Conference on World Wide Web
and neutral tweets, and that the diffusion speed of negative tweets (WWW’11), pages 665–674, Mar. 2011.
was 1.25-fold that of positive and neutral tweets when the diffusion [11] J. Cheng, L. Adamic, P. A. Dow, J. M. Kleinberg, and
volume was large. J. Leskovec. Can cascades be predicted? In Proceedings of
the 23rd International Conference on World Wide Web
Acknowledgements (WWW’14), pages 925–936, Apr. 2014.
The authors would like to thank Dr. Mitsuo Yoshida of Toyohashi [12] M. Dwass. Some k-sample rank-order tests. In Contributions
University of Technology for his support to the data collection, and to Probability and Statistics, pages 198–202. Stanford
Hisayuki Mori of Kwansei Gakuin University for helping the anal- University Press, 1960.
yses. This work was partly supported by JSPS KAKENHI Grant [13] P. Gonçalves, M. Araújo, F. Benevenuto, and M. Cha.
Number 25280030 and 26870076. Comparing and combining sentiment analysis methods. In
Proceedings of the first ACM Conference on Online Social
8. REFERENCES Networks (COSN’13), pages 27–38, Oct. 2013.
[1] Mecab: Yet Another Part-of-Speech and Morphological [14] A. Gruzd, S. Doiron, and P. Mai. Is happiness contagious
Analyzer. http://mecab.sourceforge.net. online? A case of Twitter and the 2010 Winter Olympics. In
[2] A. Agarwal, B. Xie, I. Vovsha, O. Rambow, and Proceedings of the 44th Hawaii International Conference on
R. Passonneau. Sentiment analysis of Twitter data. In System Sciences (HICSS’11), pages 1–9, Jan. 2011.
Proceedings of the Workshop on Languages in Social Media [15] L. Hansen, A. Arvidsson, F. Nielsen, E. Colleoni, and
(LSM’11), pages 30–38, June 2011. M. Etter. Good friends, bad news - Affect and virality in
[3] E. Bakshy, J. M. Hofman, W. A. Mason, and D. J. Watts. Twitter. Future Information Technology, 185:34–43, Dec.
Everyone’s an influencer: Quantifying influence on Twitter. 2011.
In Proceedings of the 4th ACM International Conference on [16] L. Hong, G. Convertino, and E. H. Chi. Language matters in
Web Search and Data Mining (WSDM’11), pages 65–74, Twitter: A large scale study. In Proceedings of the 5th
Feb. 2011. International AAAI Conference on Weblogs and Social
[4] E. Bakshy, I. Rosenn, C. Marlow, and L. Adamic. The role of Media (ICWSM’11), pages 518–521, July 2011.
social networks in information diffusion. In Proceedings of [17] L. Hong, O. Dan, and B. Davison. Predicting popular
the 21st International Conference on World Wide Web messages in Twitter. In Proceedings of the 20th International
(WWW’12), pages 519–528, Apr. 2012. Conference on World Wide Web (WWW’11), pages 57–58,
[5] L. Barbosa and J. Feng. Robust sentiment detection on Apr. 2011.
Twitter from biased and noisy data. In Proceedings of the [18] D. Kempe, J. M. Kleinberg, and E. Tardos. Maximizing the
23rd International Conference on Computational Linguistics spread of influence through a social network. In Proceedings
(COLING’10), pages 36–44, Aug. 2010. of the 9th ACM SIGKDD International Conference on
[6] R. F. Baumeister and E. Bratslavsky. Bad is stronger than Knowledge Discovery and Data Mining (KDD’03), pages
good. Review of General Psychology, 5(4):323–370, Dec. 137–146, Aug. 2003.
2001. [19] E. Kontopoulos, C. Berberidis, T. Dergiades, and
[7] J. Berger. Arousal increases social transmission of N. Bassiliades. Ontology-based sentiment analysis of Twitter
information. Psychological Science, 22(7):891–893, July posts. Expert Systems with Applications, 40(10):4065–4074,
2011. Aug. 2013.
[8] J. Berger and K. L. Milkman. What makes online content [20] A. Kupavskii, L. Ostroumova, A. Umnov, S. Usachev,
viral? Journal of Marketing Research, 49(2):192–205, Apr. P. Serdyukov, G. Gusev, and A. Kustarev. Prediction of
2012. retweet cascade size over time. In Proceedings of the 21st
[9] D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation. ACM International Conference on Information and
Journal of Machine Learning Research, 3:993l–1022, Jan. Knowledge Management (CIKM’12), pages 2335–2338, Oct.
2003. 2012.
[10] C. Budak, D. Agrawal, and A. El Abbadi. Limiting the [21] H. Kwak, C. Lee, H. Park, and S. Moon. What is Twitter, a
spread of misinformation in social networks. In Proceedings social network or a news media? In Proceedings of the 19th
International Conference on World Wide Web (WWW’10), [28] B. Suh, L. Hong, P. Pirolli, and E. Chi. Want to be
pages 591–600, Apr. 2010. retweeted? Large scale analytics on factors impacting
[22] N. Naveed, T. Gottron, J. Kunegis, and A. Alhadi. Bad news retweet in Twitter network. In Proceedings of the 2nd IEEE
travel fast: A content-based analysis of interestingness on International Conference on Social Computing
Twitter. In Proceedings of the ACM Web Science Conference (SocialCom’10), pages 177–184, Aug. 2010.
2011 (WebSci’11), pages 1–7, June 2011. [29] H. Takamura, T. Inui, and M. Okumura. Extracting semantic
[23] S. Rosenthal, P. Nakov, S. Kiritchenko, S. M. Mohammad, orientations of words using spin model. In Proceedings of
A. Ritter, and V. Stoyanov. SemEval-2015 task 10: the 43rd Annual Meeting on Association for Computational
Sentiment analysis in Twitter. In Proceedings of the 9th Linguistics (ACL’05), pages 133–140, June 2005.
International Workshop on Semantic Evaluation [30] H. Takamura, T. Inui, and M. Okumura. Extracting semantic
(SemEval’15), pages 451–463, June 2015. orientations using spin model. IPSJ Journal, 47(2):627–637,
[24] P. Rozin and E. B. Royzman. Negativity bias, negativity Feb. 2006. (in Japanese).
dominance, and contagion. Personality and Social [31] S. E. Taylor. Asymmetrical effects of positive and negative
Psychology Review, 5(4):296–320, Nov. 2001. events: The mobilization-minimization hypothesis.
[25] R. G. D. Steel. A rank sum test for comparing all pairs of Psychological Bulletin, 110(1):67–85, July 1991.
treatments. Technometrics, 2(2):197–207, May 1960. [32] S. Wen, J. Jiang, Y. Xiang, S. Yu, W. Zhou, and W. Jia. To
[26] S. Stieglitz and L. Dang-Xuan. Political communication and shut them up or to clarify: Restraining the spread of rumors
influence through microblogging—an empirical analysis of in online social networks. IEEE Transactions on Parallel &
sentiment in Twitter messages and retweet behavior. In Distributed Systems, 25(12):3306–3316, Dec. 2014.
Proceedings of the 45th Hawaii International Conference on [33] S. Yang, A. Kolcz, A. Schlaikjer, and P. Gupta. Large-scale
System Science (HICSS’12), pages 3500–3509, Jan. 2012. high-precision topic modeling on Twitter. In Proceedings of
[27] S. Stieglitz and L. Dang-Xuan. Emotions and information the 20th ACM SIGKDD Conference on Knowledge Discovery
diffusion in social media—sentiment of microblogs and and Data Mining (KDD’14), pages 1907–1916, Aug. 2014.
sharing behavior. Journal of Management Information
Systems, 29(4):217–247, 2013.