Professional Documents
Culture Documents
Untitled Collection 2ye0ujym Composite User Behaviour Assisted Rumour Detection Over 4abixgi4tv
Untitled Collection 2ye0ujym Composite User Behaviour Assisted Rumour Detection Over 4abixgi4tv
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY WARANGAL. Downloaded on June 26,2023 at 04:47:21 UTC from IEEE Xplore. Restrictions apply.
396
generating authors follow more people to attract other users most of the readers can do either a comment or repost.
to follow them. So, for a rumour generating authors, the Hence the total number of comments and repost for a
NF2 is much larger in number than the normal authors. It is rumour post is much larger than a normal post.
calculated as follows
2. Questioned comments ratio (QCR): the posts related to
= (1) rumours originate from unreliable sources. Hence they are
prone to be challenged in its determination process.
4. Average post speed per day: it is referred to the number Mendoza et al. [10] discovered that the false information is
of posts posted by an author on OSM per day. As the main much more questioned than the information ended up with
intention of rumour generating authors is to spread their the truth. Almost all OSM platforms are providing
information, they post more number of posts per day commenting facility to their users such that they can express
compared to normal users. Hence the value of average their opinions of feelings freely to any post through
number of posts per day of rumour generating authors is commenting service. According to Mendoza et al. a post
much higher and it is calculated from their posts posted on with larger number of questioned comments has a larger
OSM in a day. probability to be a rumour. Hence we use the QCR to
signify the questioning behaviour of readers.
5. Number of possible online social media sources: it is Mathematically the QC are is calculated as
referred to the total number of users who post particular post
| !|
orits similar posts instead of forwarding it. Usually one = (2)
|" !|
person or a small Number of persons initiates the rumour
posts on OSM, while the authentic or normal users Where the larger value of QCR shows that the post # is a
originates a lot number of posts which are not related to rumour and vice versa.
each other. Hence the source of rumour initiation is only
one if it was initiated by single user or its size is not more 3. Number of corrections: in OSM, there exists so many
than a group size if it was initiated by a colluded users number of posts those tries to get corrected if they found as
group. disinformation and misinformation. According to Shirai et
al. [10], 14.7% individuals or organizations would follow a
6. User role: User Role measures the ratio of followers to rumour corrections strategy if they found that a post is
the followees for a user. A user with larger value of rumour. Absolutely the posts of rumours subjected to
follower-to-followeeratio is regarded as an author while the rumour correction are larger in number compared to the
user with follower-to-followee ratio is regarded as a reader posts related to the normal posts. Hence the post with larger
or receiver. The values regarding number of followers and number of correction is regarded as a rumour.
followeescan be derived from the authors account.
After the representation of each Post with the set of
7. Profile picture: it is also considered as one of the user ten features then we apply machine learning algorithms to
behaviour related feature. A user with larger number of train classifier for the detection of rumours. Here we used
profile picture changes can be regarded as a rumour SVM and K-NN algorithms for the purpose of classification.
generating author as the normal and calm authors won't As it is a binary classifier it can perfectly classifies the
change their profile pictures rigorously. If a user is a rumours and non-rumours much effectively.
rumour generator he / she tries to change profile pic for
every new post and they chooses the pic in such a way it is IV. EXPERIMENTAL RESULTS
relevant to the post. If the author uses such kind of pictures, In this section, we explain the details of
then the readers will get attracted and spreads the rumour in experimental results. Under this section initially, we explain
a wide manner. the details of datasets, then the results of our experiments.
b. Features based on reader’s behaviour 1. Experimental Setup
Under this subsection,we explore the details of For experimental validation, we used a standard
features related to the behaviour of readers which can also dataset called as Zubiaga dataset [22]. It is one of the
be called as users. Under these features, we extract totally standard dataset and quite renowned in this field and have
three features namely number of comments and been used in various studies. It can cover a wide range of
reposts, number of questioned comments and number of topics including politics, health and disaster etc. At first,
corrections. The details about all these three features are Zubiaga et al. employed a Twiter Streaming API to acquire
explored here. the tweets in two different environments; they are 1)
1. Number of comments and Reposts: almost the entire Particular rumours those are detected as a priori and 2)
OSM sites allow users to comment and repost a post which breaking news those are likely to sparkle multiple rumours.
they had seen or read on their timeline. Both the reposts and They collected the tweets from five events of breaking news
comments are regarded as the behaviour of a reader which namely Charlie Hebdo, Ferguson, Germanwings Crash,
can contribute towards the rumour detection. Number of Ottawa Shooting and Sydney Seige. After the collection
comments describes the total number of people expressed larger volume of tweets, they only sampled the tweets those
their opinions on a post and the total number of reposts were provoked with larger number of retweets. Then they
describe that number of people Reposted a post. Generally annotated the tweets manually as either rumour or non-
these factors can be used to know the popularity of a post. rumour. Totally, they collected 6425 tweets, among which
A larger value of these features denotes that the post is 4023 are rumour related tweets and the remaining 2402 are
popular in OSM. Since the rumour posts also looks like non-rumour related tweets.
tempting and popular and describe the events of interest,
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY WARANGAL. Downloaded on June 26,2023 at 04:47:21 UTC from IEEE Xplore. Restrictions apply.
397
2. Results and Discussion
After the preparation of dataset, the next step is to During the testing, the posts are tested one by one and the
extract features in which every tweet is represented with performance metrics are measured from the confusion
totally ten set of features. Then the classifiersare trained one matrix as depicted above. Based on the TPs, FPs, FNs and
by one. To get the better results on the dataset, we employed TNs, the performance metrics are modeled and their
k-fold cross validation. For this purpose, the entire dataset is mathematical representation is done as follows;
partitioned into k groups of equal size and then perform k
separate learning experiments using k-1 groups for training ",
$%&'()(*+ = (3)
and the one remaining group for testing. At last, the average ",- ,
of all the five folds isconsidered as final performance. Here, ",
k is taken as 5 to explore the sensitivity of proposed method &'.// = (4)
",-
to the size of dataset.For performance assessment, we used
three metrics namely precision, recall and F1-score. Recall ∗, 4 ∗5 4
1 − 2'*%& = (5)
reveals the fraction of relevant documents those are , 4 -5 4
retrieved and precision reveals the fraction of correctly
retrieved documents those are relevant. F1-score is the Here TP is considered when the post related to rumour is
Harmonic mean of recall and precision. These metrics are detected as Rumour, FN is considered when the Rumor is
measured based on the confusion matrix, as shown in detected as non-rumour, FP is considered when the Non-
Table.1. rumour is detected a rumour and finally TN is considered
when the non-rumour is detected as non-rumour. At every
Table.1 Sample Confusion Matrix validation, the performance is measured and they are
Predicted explored in the following Table.2. The results shown in
Table.2 are belongs to the simulation study with full set of
Rumour Non-Rumour
features and two machine learning algorithms such as K-NN
Rumour True Positive False Negative and SVM.
Original (TP) (FN)
Non-Rumour False Positive True Negative
(FP) (TN)
55
applied two machine learning algorithms and hence the total
50 simulation studies are six. At every simulation study, we
measured the performance through precision, recall and F1-
45
score.
40
Fig.1 describes the precision details at different
35
features and at different classifiers. From the Fig.1, we can
30 see that the maximum precision is obtained at full set of
features while the minimum precision is observed at the
25
reader features. Since the reader’s behavior has less
20 significance in the representation of a rumour post, it had
Full Features Reader Features Author Features
shown least precision. Compared to the reader’s behavior,
User Behaviour Features
the author’s behaviour has more importance and thus it has
gained greater precision. On the other side, the SVM has
Fig.1 Precision analysis at different features and classifiers
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY WARANGAL. Downloaded on June 26,2023 at 04:47:21 UTC from IEEE Xplore. Restrictions apply.
398
gained better precision at full set of features and Author performance by achieving a recall rate of 59.7720% which
features while K-NN gained better performance at reader’s is approximately 10% boost up from author features. Next,
features. The average, precision at full set of features is Fig.3 describes the performance of proposed approach in
observed as 68.1050% while for readers and authors terms of F1-score. As the F1-score is the harmonic means of
features, it is observed as 48.3887% and 57.9928% precision and recall, it ensures a perfect detection
respectively. performance. From Fig.3, the average F1-score at full set of
features is observed as 63.5389% while for readers and
70
K-NN
authors features, it is observed as 38.6673% and 53.7621%
65 SVM respectively.
60 V. CONCLUSION
55 Recently, the Online Social Media has become an
easy and flexiblesource for information sharing. Moreover,
Recall(%)
50
it provides a free access to the users to share their opinions
45 and feelings regarding several real world events. Though it
40
has several benefits to society, its misutilizaion is being
encouraged by posting false news or rumours which causes
35 a serious damage to the society. Hence, the rumour
30
detection is required which can identify a post either it
belongs to a rumour or non-rumour. Towards such intention,
25 this paper proposed composite feature set based rumour
detection strategy assisted with machine learning
20
Full Features Reader Features Author Features algorithms. Mainly, this approach considered the features
User Behaviour Features related to user’s behavior and analyzed both author and
reader’s behavior. With the help of user’s behavior, each
Fig. 2 Recall analysis at different features and classifiers post is represented with ten features and processed through
70
machine learning algorithms for classification. Two
K-NN machine learning algorithm’s namely SVM and K-NN are
65 SVM used to train and test the data. For experimental validation,
60 we referred a standard ZubiagaDataset and the performance
is measured through precision, recall and F1-score. The
55
average precision at for K-NN algorithm with full feature
F1-Score(%)
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY WARANGAL. Downloaded on June 26,2023 at 04:47:21 UTC from IEEE Xplore. Restrictions apply.
399
[4] F. Chierichetti, S. Lattanzi, and A. Panconesi, “Rumor spreading in
socialnetworks,” in Automata, Languages and Programming, New York,
NY,USA: Springer, 2009, pp. 375–386.
[5] L. Hang, “Overview of statistical learning methods” in The Study
Methodof Statics. Beijing, China: Tsinghua Express, 2012, pp. 7–24.
[6] M. A. Hall, “Correlation-based feature selection for machine
learning,”Ph.D. dissertation, Dept. Comput. Sci., The University of
Waikato,Hamilton, New Zealand, 1999.
[7] S. Sun, H. Liu, J. He, and X. Du, “Detecting event rumors on
SinaWeibo automatically,” in Web Technologies and Applications, New
York, NY, USA: Springer, 2013, pp. 120–131.
[8] G. Cai, H. Wu, and R. Lv, “Rumors detection in Chinese via crowd
responses,” in Proc. IEEE/ACM Int. Conf. Adv. Social Netw. Anal. Min.
(ASONAM’14), 2014, pp. 912–917.
[9] T. Takahashi and N. Igata, “Rumor detection on twitter,” in Proc. Joint
6th Int. Conf. Soft Comput.Intell.Syst. (SCIS); 13th Int. Symp. Adv. Intell.
Syst. (ISIS), 2012, pp. 452–457.
[10] M. Mendoza, B. Poblete, and C. Castillo, “Twitter under crisis: Can
we trust what we RT?,” in Proc. 1st Workshop Social Media Anal.
(SOMA’10), 2010, pp. 71–79.
[11] Md. RashedIbnNawab, Kazi Md. Shahiduzzaman, TityaEng, and Md
Noor Jamal, “Rumor Detection in Social Media with User Information
Protection”, European Journal of Electrical Engineering and Computer
ScienceVol. 4, No. 4, July 2020.
[12] Junjie Cen, Yongbo Li, “A Rumor Detection Method from Social
Network Based on Deep Learning in Big Data Environment”,
Computational Intelligence and Neuroscience, Vol.2022, Article ID
1354233, 8 pages, 2022.
[13] SushilaShelke, &Vahida Attar, “Rumor detection in social network
based on user, content and lexical features”, Multimedia Tools and
Applications (2022) 81:17347–17368.
[14] ZhiruiLuo ,Qingqing Li , and Jun Zheng, “Deep Feature Fusion for
Rumor Detection on Twitter”, IEEE Access, Vol.9, 2021, pp:126065-
126074.
[15] J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, ``BERT: Pre-
training of deep bidirectional transformers for language understanding,'' in
Proc. Conf. North Amer. Chapter Assoc. Comput. Linguistics, Hum. Lang.
Technol., 2019, pp. 4171_4186.
[16] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M.
Lewis, L. Zettlemoyer, and V. Stoyanov, ``RoBERTa: A robustly
optimized BERT pretraining approach,'' 2019, arXiv:1907.11692.
[17] Kumar, A., Bhatia, M.P.S., and Sangwan, S.R., “Rumour Detection
Using Deep learning and Filter wrapper feature selection in Benchmark
twitter dataset”, Multimed Tools Appl, August 2021.
[18] Aoshuang Ye, Lina Wang, Run Wang, Wenqi Wang, JianpengKe, and
Danlei Wang, “An End-to-End Rumour detection model based on Feature
aggregation”, Complexity, Vol.2021, Article ID 6659430, 16 pages, 2021.
[19] M. Alizadeh, J. N. Shapiro, C. Buntain, and J. A. Tucker, “Content-
based features predict social media influence operations,” Science
Advances, vol. 6, Article ID eabb5824, 2020.
[20] G. Liang, W. He, C. Xu, L. Chen, and J. Zeng, “Rumor identification
in micro-blogging systems based on users’ behavior,” IEEE Transactions
on Computational Social Systems, vol. 2, pp. 99–108, 2015.
[21] T. Shiraiet al., “Estimation of false rumor diffusion modeland
estimation of prevention model of false rumor diffusion on twitter (in
japanese), in 26th Annu. Conf. Jpn. Soc. Artif.Intell., 2012, vol. 26, pp. 1–4.
[22] A. Zubiaga, M. Liakata, and R. Procter, “Learning reporting dynamics
during breaking news for rumour detection in social media,” Oct. 2016,
arXiv:1610.07363. [Online]. Available: https://arxiv.org/abs/1610.07363
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY WARANGAL. Downloaded on June 26,2023 at 04:47:21 UTC from IEEE Xplore. Restrictions apply.
400