Rathore2018 - Epidemic Model-Based Visibility Estimation in Online Social Networks

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Epidemic model based visibility estimation

in Online Social Networks

Nemi Chandra Rathore Somanath Tripathy


Department of Computer Science & Engineering Department of Computer Science & Engineering
Indian Institute of Technology Patna Indian Institute of Technology Patna
Patna, Bihar, India. Patna, Bihar, India
Email: nemi@iitp.ac.in Email: som@iitp.ac.in

Abstract—The emergence of various Online So- picture, animation and so on. Twitter is also one of
cial Network (OSN) services has revolutionized the the popular micro-blogging services that allows its
way people express themselves among their social users to share messages with the maximum length
connections and to the world. Twitter is one of the of 280 characters, called tweets.
most popular OSNs, which allows its users to share
ideas with their followers and public, in the form of Most of the OSN users spend a significant
tweets. Visibility prediction of a tweet is an interesting amount of time on such social sites on a regular
issue that might be useful in estimating privacy risk basis. They share a variety of information on these
caused by the tweet. In this paper, we propose a
sites in form of their profile and posts but do not
technique inspired by epidemic models to predict the
visibility of a tweet. Our model exploits user interest have any idea about their audience. All of these
and relationship strength to predict the visibility of objects might reveal highly sensitive and personal
a tweet. The evaluation results show that one can information about users. For example a user’s OSN
predict the total number of likes and re-tweets of a profile typically includes her/his gender, sexual
tweet with the accuracy of approximately 89%. orientation, email, education, profession and so on.
Further, users voluntarily publish variety of infor-
Keywords—On-line Social Network; Privacy; Vis-
mation at OSN using different data/activity sharing
ibility Prediction; Information Diffusion; Forwarding
Probability; Keyword Extraction. services offered by OSN. This huge amount of
personal information about users on such sites
attract malicious users, who might misuse that
I. I NTRODUCTION information in order to launch various kind of
attacks [1], [2], [3], [4].
Online Social Networks (OSNs) are web-based
services that offer users to create articulated virtual Therefore, OSNs like Twitter urgently requires
social interaction network with others as per their a mechanism that allows its users to know who
interest. In recent years, the number of Internet can access their tweets and who are not. Presently,
users has increased tremendously worldwide due Twitter allows its users to make all the tweets either
to various reasons. This phenomenon has boosted public or private. If users choose the public setting
the growth of various social network sites such (default one), then their tweets become accessible
as Facebook1 , Twitter2 and many more. These to even non-twitter users. But, tweets of users with
platforms allow users to share about ideas, events, private settings become available to followers only.
actions, activities, feelings with their contacts or Such kind of settings do not offer enough privacy to
even with the public as well. These massages may the users. Hence, methods to measure and restrict
be in the variety of formats like text, audio, video, the visibility of a tweet need to be developed.
1 www.facebook.com We firmly believe that an estimate of the visi-
2 www.twitter.com bility of a tweet might help in controlling potential

978-1-5386-5314-2/18/$31.00 ©2018 IEEE


Authorized licensed use limited to: University 2161
of Exeter. Downloaded on June 17,2020 at 19:50:53 UTC from IEEE Xplore. Restrictions apply.
privacy leakages caused by a tweet. Here, by the Cristofaro et al. [8], presented a privacy-preserving
visibility (or publicity)3 of a tweet, we mean how model, inspired from Twitter where a tweeter u
much re-tweets or likes a tweet might get. Hence, encrypts all his tweets to control their visibility.
in this paper, we propose a model inspired by Each tweet t has an associated Access Control
epidemic models for estimating the publicity of List (ACL) defined by u which governs who can
a tweet. The model exploits follower’s interest in access t. But, this solution is not practicable for
the topic associated with a tweet, follower’s trust Twitter as it requires modification in the present
in forwarding user of the tweet and topology of architecture of Twitter. Hogg et al. [9], proposed
the local graph of the source user of the tweet. a stochastic model to predict user response for a
Another objective of this study is to know whether post on twitter and focus only in predicting user
user behavioral parameters such as user interest, behavior. Zhu et al. [10] used the visibility of a
trust and topological attributes like the number node to predict link formation among users on
of followers (out-degree) have any impact on the Twitter. They define visibility as a measure of
visibility of a tweet or not. The prediction of the efforts required to discover a user on OSN.
publicity of a tweet also has other applications like
viral marketing, information propagation, influence Some works have focused on measuring the
measurement. influence of a user on other users [11], [12],
[13], [14], [15]. These models try to measure the
The main contributions of this paper are as capacity of spreading a piece of information in
follows: the network for a user. Most of these models
focus on the topological properties of a node to
• We proposed a model inspired by epidemic estimate the information spreading capability of
models [5], [6] of information diffusion to a node. But, none of these methods measure the
predict the publicity of a tweet using user visibility of a node from the perspective of privacy
interest, trust, number of followers and hop preservation. Some machine learning based models
count. have been proposed in [16],[17],[18] that used
• We used Naive Bayes, Multinomial Naive various user and tweet parameters to predict if a
Bayes and Linear SVC models for extract- tweet will be retweeted or not. Authors in [16] gave
ing topic from tweets. During the study, a model to predict the number of re-tweet in a time
we find that the Multinomial Naive Bayes interval, but they focused on the PageRank and
classifier gives highest classification accu- user influence to predict a tweet cascade. In [19]
racy which confirms that for small text authors used objects attached with tweets to predict
data like tweets Multinomial Naive Bayes tweet visibility. Rathore et. al. [20] proposed a
outperforms Linear SVC.[7]. mechanism for predicting the visibility of a user
based on topological parameters of the OSN graph.
Our paper is organized into total six sections. But, none of these methods used user’s behavioral
The Section-II gives a brief review of the related parameters to measure the visibility of a tweet.
work. In Section-III, we have presented our pro-
posed model. Further, Section-IV provides details III. P ROPOSED MODEL FOR M EASURING
about the implementation and evaluation of the TWEET P UBLICITY ON T WITTER
proposed model. Finally, in Section-V, we have
presented the results of the evaluation and con- To predict the visibility of a tweet, we propose
cluded our work in Section-VI with future direc- to use users interest in the topic of a tweet and the
tions. trust in the source of the tweet, in addition to users
degree. Following observations are the basis of our
II. R ELATED W ORK proposal:

The research about visibility prediction of user 1) The followers of a user often forward
contents on OSN platforms is in its early stage. those contents whose topic matches with
their interest. We believe that the contents
3 In this paper, we use visibility and publicity as synonym matching with a user’s interest have a

2162
Authorized licensed use limited to: University of Exeter. Downloaded on June 17,2020 at 19:50:53 UTC from IEEE Xplore. Restrictions apply.
higher probability of being forwarded by
the user. Hence, based on the keywords
from previously forwarded tweets, we in-
fer interest of a user. These keywords
may allow us to develop a measure of a
user’s interest, which in turn provide the
probability of forwarding a tweet further.
2) Moreover, the strength of the relationship
between the source user (one who gener-
ates the tweet) and the forwarding user
(one who forwards/shares it further) also
affects the forwarding decision.

A. The System Model


An OSN is portrayed by a directed graph
G = (V, E) where V the set of nodes characterizes
either a user or a tweet. Let T = {t1 , t2 , ...., tk }
be the finite set of all the topics, the twitter users
tweets upon. Each user v ∈ V has interest in at
Fig. 1: Twitter Network Graph
least one topic t ∈ T . E is the set of directed
edges, where each edge represents an association
between two nodes. The edges in E may be of
two types: 1) edge between two users 2) edge the tweet would be very low, and hence may be
between a user and a tweet. Edge (u1 , u2 ), s.t. ignored.
u1 , u2 ∈ V , represents that u1 follows u2 . Edge
(tw1 , u1 ) represents that tweet tw1 is available to A user u ∈ S may increase the visibility of a
u1 . On the other hand, edge (u1 , tw1 ) indicates tweet tw by forwarding it further with probability
that tweet is generated by user u1 . The Figure- αut (tw), where t ∈ T is the topic, that tw is
1, shows a small twitter network with two type of associated with. The probability, αut (tw) is called
nodes, one representing user and other representing forwarding probability of u for tw. For u and tw,
their shared objects (like tweet). A directed edge the probability αut (tw) will be higher, if t matches
from a user to a tweet represents that the user with the interest of u. Moreover, this probability
has generated that tweet, and its reverse edge (i.e., also depends on the nature of the relationship
directed edge from the tweet to the user) represents between tweet owner and the forwarding user. A
that the tweet is available to the user. higher value of forwarding probability would result
in higher visibility of a tweet.
1) Diffusion of a Tweet: The Figure-2 shows
our proposed model which is inspired by Epidemic
models proposed in [5], [6]. We refer those users α
S I
that are likely to forward a tweet further as Sus-
ceptible nodes4 and denote the set of such users
by S. Users to whom a tweet is visible are likely 1 − α

to forward it further. Here, it is important to note D

that, we only consider followers as susceptible


users as they can see the tweet on their home Fig. 2: Proposed Model
timeline. The reason behind this assumption is
that the probability of a non-follower, visiting the We assume that each tweet is associated with
home timeline of the source user, for forwarding a topic reflected by a set of keywords. The topic
indicates the subject of the idea communicated by
4 We use terms user and node inter-changeably. a tweet. We further assume that each user has

2163
Authorized licensed use limited to: University of Exeter. Downloaded on June 17,2020 at 19:50:53 UTC from IEEE Xplore. Restrictions apply.
interest in at least one topic t ∈ T with non-zero users), are given by :
probability and only forwards messages belonging
to the topics of her/his interest. Ith (u) = αt Sth (u)
h
As soon as a user u ∈ V , publishes a tweet tw
X
= αti k i (2)
on topic t, it becomes available to all the followers i=1
of u. As a result of that, followers of u become
susceptible to forward tw further. Any of these Let Mhu be the total number of users up to hop
followers may forward tw further depending on length h from u, then the number of users who
her/ his interest and influence of u on him/her. For remain reluctant to the message up to hop length
simplicity, we assume that a tweet is forwarded by h, is given by
any user at most once. Once a follower forwards h
tw, s/he increases the visibility of tw. Let αt be the
X
Dth (u) = Mhu − Ith (u) (3)
average forwarding probability with respect to the i=1
topic t ∈ T associated with tw, for all susceptible
users. The users who have forwarded tw are termed Moreover, It + Dt = M , where It and Dt be
as influenced users for the topic t and denoted by the total number of infected and deactivated users
the set It . The influenced users make tw available respectively for the topic t.
to their respective followers. As a result of it, 2) Forwarding Probability: Let tw be a tweet
some of their followers also become susceptible to with topic t ∈ T , then the forwarding probability
forward the message further with probability αt . of tw for user u is proportional to the level of
And some of those followers remain uninterested interest that u has in t, and the relationship strength
about the message with probability β = 1−αt . We of u with the tweet’s forwarder/owner. Let τtu ∈
refer them as Deactivated nodes or Neutral nodes [0, 1], be the probability of interest u has in topic
with respect to the topic t and denote the set of of tw shared by her/his friend v, and ruv ∈ [0, 1]
such users as Dt . be relationship strength of link (u, v). Then, the
Let M be the total number of users in OSN and forwarding probability of the tweet tw for user u
there are k followers at one hop distance for a user is given by
on average. Then, there will be k + k 2 followers αtu = τtu .ruv (4)
up to 2-hop distance, and k + k 2 + k 3 up to hop Trust: The forwarding probability of tweet tw of
3-distance and so on. Furthermore, k, k + αt k 2 , u, by her/his follower f is also proportional to the
k+αt k 2 +αt2 k 3 are the number of susceptible users level of trust f has on u. The measurement of trust
for the hop distance 1, 2, 3 respectively. Hence, the between two OSN users is not a trivial task. Some
total number of susceptible users for tweet tw up schemes to measure trust has been proposed in the
to hop length h from source user u are given by literature[21]. Sticking ourself to privacy only, we
following equation: give a simple formula to measure trust a user v has
Sth (u) = k + αt k 2 + αt2 k 3 + αt3 k 4 + .... + αth−1 k h on a user u for a time window ∆ as follows:
Xh Let Zu be the set of tweets of u out of which
= αti−1 k i m tweets are liked or forwarded by user v. Then
i=1 trust that v has on u is given by following formula:
= k[1 + αt k + (αt k)2 + .... + (αt k)h−1 ] ∆ m
" # ηv,u = (5)
1 − (αt k)h |Zu |
=k , αt k > 1
1 − αt k
" # User Interest: Users have different preferences
(αt k)h − 1 concerning the topic associated with a tweet. They
Sth (u) = k (1) like/forward tweets that match their preferences
αt k − 1
and ignore tweets that do not match any of their
Similarly, the number of users up to hop length interests. To find the forwarding probability of a
h, who have shared tw further (i.e. the influenced user, we estimate the probability of interest for a

2164
Authorized licensed use limited to: University of Exeter. Downloaded on June 17,2020 at 19:50:53 UTC from IEEE Xplore. Restrictions apply.
predetermined set of topics using supervise learn- to our need to remove any stop word occurring in
ing technique over the set of tweets the user has tw.
shared in the past.
Algorithm 2: Algorithm to extract keywords
We trained Naive Bayes (NB) classifier, Multi-
from a given tweet.
nomial Naive Bayes (MNB) classifier and Linear
SVC for detecting the topics from the set of tweets Data: tw: a tweet
of Twitter users. We chose NB classifier as it Data: st words: a list of stop words.
is one of the popular classifier used in text pro- Result: keywords: A finite multi-set of
cessing [22]. As features, we supplied frequency keywords.
distribution of keywords that we extracted using 1 text = Remove emoji (tw)
Algorithm-2 from the set of tweets of the target . Removes emojis and similar
user. We used Algorithm-1 to prepare frequency symbols
distribution of the keywords extracted from tweets 2 text = clean text (text)
of the user. We chose the value of frequency thresh- . Removes all urls, image etc.
old as 6, since it gave us maximum classification 3 noun phrase = np extractor (text)
accuracy during the experiment. 4 for each np in noun phrase do
5 w = split(np)
6 words.append (w);
Algorithm 1: This Algorithm generates fre-
7 for each w in words do
quency distribution of keywords from a given
multi-set of keywords. 8 if w ∈ st words then
9 words.remove (w);
Data: K: A multi-set of keywords
Data: µ: frequency threshold 10 for each w in words do
Result: F : A set of tuples (k, f ), where k, 11 w = spell correct (w);
f are keyword its frequency of
12 for each w in words do
occurrence.
13 w = lammatize(w);
1 F = {} . An empty dictionary
14 keywords.append (w)
2 for each k ∈ K do
. count occurrences for k 15 return(keywords);
3 if k ∈ F then
4 F[k] = F[k] + 1
5 else
IV. I MPLEMENTATION & E VALUATION
6 F[k] = 1
To evaluate the proposed model, we sampled
7 for each k ∈ F do a subgraph of Twitter using BFS Sampling. We
8 if F[k] < µ then chose BFS Sampling because it is one of the
9 remove(F[k]) popular methods to get a plausible sample graph
10 return (F) of OSN[24].

A. DataSet
Feature Extraction: To find user interest in The dataset we used for our experiment purpose
chosen topics, we exploited keywords extracted consists of 100176 twitter users with their fol-
from the set of tweets of a user u. We used lower information and recent tweets. The Figure-3,
Algorithm-2 that extracts keywords from tw, by shows the out-degree distribution of our sampled
employing some of the Natural Language Process- Twitter graph which indicates that around 99, 000
ing functions available in NLTK library [23]. This of the users has followers (out-degree) less than
algorithm takes a tweet tw, and a set of Stop-words 100. Further, we took a Twitter account as the
as arguments and returns a list of keywords that source for BFS sampling algorithm. As most of
occur in tw. We used a list of stop words fitting the popular accounts on Twitter has a large number

2165
Authorized licensed use limited to: University of Exeter. Downloaded on June 17,2020 at 19:50:53 UTC from IEEE Xplore. Restrictions apply.
TABLE I: Classification Metrics
Metrics Topic NB MN-NB Linear SVC
Average Accuracy (%) 57.256 95.692 76.077
Accuracy
Variance in Accuracy 220.170 17.059 188.020
Statistics
Std Dev in Accuracy 14.838 4.130 13.712
Film & Music 0.428 0.470 0.417
Politics & Governance 0.820 0.864 0.838
Precision Science & Technology 0.806 0.938 0.852
Sports 0.843 0.925 0.891
Tourism 0.696 0.782 0.778
Film & Music 0.891 0.944 0.944
Politics & Governance 0.684 0.981 0.988
Recall Science & Technology 0.426 0.981 0.995
Sports 0.637 0.963 0.963
Tourism 0.488 0.980 0.980
Film & Music 0.512 0.560 0.524
Politics & Governance 0.691 0.899 0.886
F-Score Science & Technology 0.528 0.948 0.907
Sports 0.657 0.917 0.899
Fig. 3: Out-degree Distribution in the graph. Tourism 0.483 0.840 0.838

example @nature, @ScienceChannel for Science


of followers and if we go to retrieve all of the
& Technology, @politico, @BBCPolitics, @my-
followers, then it might result in a graph that would
govindia for Politics & Governance. Each of these
have a large number of nodes. Therefore, to get
accounts was having the majority of tweets belong-
only a reasonable size sample graph, we restricted
ing to only one of the chosen topics. We identified
our sampling algorithms to retrieve at most 500
100 such user accounts in total (20 per topic)
followers for a user.
and manually annotated them with the respective
Moreover, we retrieved 32005 recent tweet topics. We extracted the frequency distribution for
of these Twitter users with favorite and retweet all the keywords appeared in the tweet sets of
counts. To extract the above-mentioned informa- these account. This distribution was fed with their
tion from Twitter, we wrote a script using Python respective topic label to chosen classifier model
2.7.12[25] employing tweepy API [26] to extract instances for training. We used 70% of the data for
tweets of users with their friends and followers. training and 30% testing. Our training and testing
The tweepy library is a Python library to com- sets were disjoint. We also measured Precision,
municate with Twitter’s REST API[27]. We also Recall and F-measure for different topic categories
retrieved short description of user from her/his pro- for all the classifier models. The results are shown
file that also gives information about user’s interest. by Table-I.
The dataset was stored in MySQL Server version:
To estimate the skill of our trained models,
5.7.18-0ubuntu0.16.10.1 (Ubuntu), database. Fur-
we randomly divided the dataset into the training
ther, tweets of all the users were stored in separate
set, and test set using the ratio of 70% and 30%
csv files. We performed all of these operations in-
and performed the training and test experiments 50
cluding the experiment on a machine having Intel-
times and recorded the average values of various
i3 processor with 4GB RAM and 64-bit Ubuntu
parameters as shown in Table-I . From the Table-I,
16.10 Operating System.
we can observe that the MNB classifier achieves
the highest average accuracy of 95% with mini-
B. Training & testing the classifier mum variance in the accuracy in comparison to
In order to get the training and testing data, the rest of the two models.
we took five topics of interest for our experiment
that include topics Politics & Governance, Film & V. R ESULTS AND D ISCUSSION
Music, Sports, Tourism and Science & Technology.
We identified some Twitter accounts that were To evaluate our model, first, we retrieved inter-
dedicated to exactly one of the above topics. For est of all the twitter users using MNB Classifier,
we trained earlier. The output of the classifier gives
5 As at present, Twitter allows one to retrieve at most 3200 the probability distribution for the user for the
recent tweets of a user. chosen topics. During the evaluation of our model,

2166
Authorized licensed use limited to: University of Exeter. Downloaded on June 17,2020 at 19:50:53 UTC from IEEE Xplore. Restrictions apply.
we made predictions for different users for hop
count value ranging from 2 to 5 and recorded the
results of predictions. We divided users into two
categories: non-celebrity and celebrity. We refer
users having follower count less than 300 as non-
celebrity users and rest as celebrity users. We
made predictions for 100 randomly selected tweets
belonging to both categories of users that were
also selected randomly. After that, we matched the
predicted values with, the actual number of likes
(favorite count) plus re-tweets count each tweet has
got. Here, we refer this value as publicity/visibility
value. After that, we calculated the prediction error
(in %) by subtracting the predicted value from
the actual publicity value and averaged all of the Fig. 5: Error rate vs Hop-Count in tweet publicity
results. Since during predictions, our model made prediction for Celebrity users
some over-predictions as well, hence we calculated
root mean square error for these predictions and
plots them against hop count. Figure-Figure-4 and
5 show these results. These results confirm that follower’s interest in
the topic of a tweet, trust and local topological
parameters like out-degree impacts the visibility of
a tweet. Therefore, if we can hide a tweet from
the followers having the higher interest in the topic
of the tweet or high trust on the forwarding user
or both, then the visibility of a tweet might be
controlled. Similarly, we can keep the visibility of
a tweet low by hiding it from a follower with high
out-degree or higher interest in the topic of the
tweet.

VI. C ONCLUSION & F UTURE W ORK


In this paper, we proposed an Epidemic based
model to predict the publicity of a tweet using
user interest and trust between the user and their
Fig. 4: Error rate vs Hop-Count in tweet publicity followers. Results of the evaluation show that our
prediction for Non-Celebrity users model achieves a fair prediction with hop length
4 for non-celebrity users. Our results also show
that the MNB classifier works well with small text
like tweets. In the future, we will explore more
From the Figure-4, we can observe that our sophisticated features like bigrams and trigrams
proposed model gives the maximum accuracy of in a tweet that might improve the accuracy of
approximately 89% if we take hop count value our predictions further. Moreover, we shall work
as 4 for a non-celebrity user. But for hop count to exploit the trust and forwarding probability of
5, it starts making over-prediction and error rate followers to control the visibility of tweets.
becomes more than 80%. On the other hand,
Figure-5, shows that for celebrity users, our model
achieves the maximum accuracy of only 38%. We R EFERENCES
believe that this is because our dataset has at most [1] E. Zheleva and L. Getoor, Privacy in Social Networks: A
500 followers of such users. Survey. Boston, MA: Springer US, 2011, pp. 277–306.

2167
Authorized licensed use limited to: University of Exeter. Downloaded on June 17,2020 at 19:50:53 UTC from IEEE Xplore. Restrictions apply.
[2] C. Zhang et al., “Privacy and security for online social M. A. Sharaf, M. A. Cheema, and J. Qi, Eds. Cham:
networks: Challenges and opportunities,” Netwrk. Mag. Springer International Publishing, 2015, pp. 104–116.
of Global Internetwkg., vol. 24, no. 4, pp. 13–18, Jul. [18] M. Jenders et al., “Analyzing and predicting viral tweets,”
2010. in Proceedings of the 22Nd International Conference on
[3] C. D. Marsan, “15 worst internet privacy scandals World Wide Web, ser. WWW ’13 Companion. New
of all time,” Jan 2012. [Online]. Available: York, NY, USA: ACM, 2013, pp. 657–664.
http://www.networkworld.com/article/2185187/security/ [19] E. F. Can et al., “Predicting retweet count using visual
15-worst-internet-privacy-scandals-of-all-time.html cues,” in Proceedings of the 22nd ACM international con-
[4] L. C. Williams, “The 9 biggest privacy and security ference on Conference on information &#38; knowledge
breaches that rocked 2013,” Dec 2013. [Online]. Avail- management, ser. CIKM ’13. ACM, 2013, pp. 1481–
able: https://thinkprogress.org/the-9-biggest-privacy- 1484.
and-security-breaches-that-rocked-2013-416a61e194450 [20] N. C. Rathore et al., Predicting User Visibility in Online
[5] D. Gruhl et al., “Information diffusion through Social Networks Using Local Connectivity Properties.
blogspace,” in Proceedings of the 13th International Springer International Publishing, 2015, pp. 419–430.
Conference on World Wide Web, ser. WWW ’04. ACM, [21] W. Sherchan et al., “A survey of trust in social networks,”
2004, pp. 491–501. ACM Comput. Surv., vol. 45, no. 4, pp. 47:1–47:33, Aug.
[6] R. Zafarani et al., Social Media Mining: An Introduction. 2013.
New York, NY, USA: Cambridge University Press, 2014. [22] H. Mao et al., “Loose tweets: An analysis of privacy
leaks on twitter,” in Proceedings of the 10th Annual
[7] H. Mao et al., “Loose tweets: An analysis of privacy
ACM Workshop on Privacy in the Electronic Society, ser.
leaks on twitter,” in Proceedings of the 10th Annual
WPES ’11. ACM, 2011, pp. 1–12.
ACM Workshop on Privacy in the Electronic Society, ser.
WPES ’11. ACM, 2011, pp. 1–12. [23] “Natural language processing toolkit,” June 2018.
[Online]. Available: http://www.nltk.org/
[8] E. D. Cristofaro et al., “Hummingbird: Privacy at the
time of twitter,” in 2012 IEEE Symposium on Security [24] M. Kurant et al., “Towards unbiased bfs sampling,” IEEE
and Privacy, May 2012, pp. 285–299. Journal on Selected Areas in Communications, vol. 29,
no. 9, pp. 1799–1809, October 2011.
[9] T. Hogg et al., “Stochastic models predict user behavior
in social media,” CoRR, vol. abs/1308.2705, 2013. [25] “Python,” July 2018. [Online]. Available: https:
//www.python.org/
[10] L. Zhu and K. Lerman, “A visibility-based model for
[26] “Tweepy,” July 2017. [Online]. Available: http:
link prediction in social media,” in Proceedings of the
//tweepy.readthedocs.io/en/v3.5.0/
ASE/IEEE Conference on Social Computing, 2014.
[27] “Twitter developer documentation,” July 2017. [Online].
[11] D. Kempe et al., “Maximizing the spread of influence Available: https://dev.twitter.com/rest/public
through a social network,” in Proceedings of the Ninth
ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining, ser. KDD ’03. ACM, 2003,
pp. 137–146.
[12] N. Du et al., “Scalable influence estimation in
continuous-time diffusion networks,” in Proceedings of
the 26th International Conference on Neural Information
Processing Systems, ser. NIPS’13, 2013, pp. 3147–3155.
[13] A. Goyal et al., “Learning influence probabilities in social
networks,” in Proceedings of the Third ACM Interna-
tional Conference on Web Search and Data Mining, ser.
WSDM ’10. ACM, 2010, pp. 241–250.
[14] J. Yang and J. Leskovec, “Modeling information diffusion
in implicit networks,” in Proceedings of the 2010 IEEE
International Conference on Data Mining, ser. ICDM
’10. IEEE Computer Society, 2010, pp. 599–608.
[15] A. Guille et al., “Information diffusion in online social
networks: A survey,” SIGMOD Rec., vol. 42, no. 2, pp.
17–28, jul 2013.
[16] A. Kupavskii et al., “Prediction of retweet cascade size
over time,” in Proceedings of the 21st ACM International
Conference on Information and Knowledge Management,
ser. CIKM ’12. ACM, 2012, pp. 2335–2338.
[17] M. M. Anwar et al., “Predicting the spread of a new
tweet in twitter,” in Databases Theory and Applications,

2168
Authorized licensed use limited to: University of Exeter. Downloaded on June 17,2020 at 19:50:53 UTC from IEEE Xplore. Restrictions apply.

You might also like