Professional Documents
Culture Documents
Paper Jan1
Paper Jan1
Abstract: The expeditious way in which rumours flow due to social media
platforms have a drastic impact on the thought process, forges opinion in an
individual and the reaction of people to these rumours. Repudiating the rumours
makes it imperative that one detects them before they can cause torment and pain.
This research paper aims to put forth the methods for rumour detection particularly
in social media as a means to quash any sort of harm and trauma. This paper
proposes the machine learning approach with sentimental analysis is discussed for
rumour detection. The different classification procedures like K-nearest neighbor,
Decision tree and Random forest are implemented that give excellent accuracy.
I. Introduction
An unchecked story, news or a statement coming from an unreliable source that has no
concrete evidence constitutes a rumour. Due to advance in technology and social media
there is no control over how fast these rumours can spread in a minuscule period.
Information could very well take the shape of a rumour if the source itself is questionable
or the information is mutated. It becomes imperative that one investigates the authenticity
of the information.
Rumor discernment and detection is the procedure of checking the genuineness of any
information or statement. Rumour detection has to countenance numerous challenges
before drawing any conclusion like understanding the mélange of data, identifying the
source of the rumour and identification of latest rumors from time period data needs to be
dealt with while performing rumor detection.
This paper proposes sentiment analysis technique for detecting optimistic or negative
sentiment in text. In NLP, sentiment analysis is all about deciphering such sentiment from
text. It is used to detect sentiment behind the text by the writer put in social media.
Sentiment analysis models focus on feeling and thus find the motive of a rumour(positive,
negative, and neutral). The paper aims in applying the different classification techniques
such as K-nearest neighbor, Decision tree and Random forest which not only will help in
categorizing the status of information but also will give a comparative in terms of
accuracy for the excellent results.
2
II. Background.
Alessandro Bondielli and Francesco Marcelloni et al [29] explained detecting rumours is
essential, keeping in mind the volume and velocity of user-generated information on
social media. Social media allows information propagation regardless of the source
verification status and truth value. Forwarding and sharing content combined with the
lack of validation fuels rumours as it permits exchange and broadcasting at an unmatched
level. Nevertheless, this can be harmful when users are exposed to damaging or
undesirable content. Pathak, Ajeet and Mahajan et al [2] gives an overview of rumor
detection, datasets, application areas, and performs comparative analysis of the state-of-
the-art rumor detection approaches. Also, most social media platforms allow users to
form groups based on their shared interests; however, such virtual alignments may lead to
the creation of echo-chambers in which participants’ views are amplified and reinforced.
Such echo chambers also make unconfirmed posts appear more trustworthy. When a
group member receives a certain piece of information, they might think that the
information is truthful because it is from their “own” people. Akshi Kumar and Saurabh
Raj Sangwan [4] in his paper proposed that it can be extended to a multi-level, fine-grain
classification where rumors can be detected for being a misinformation or a
disinformation, hoaxes, etc. Various novel and hybrid machine learning techniques such
as fuzzy, neuro fuzzy can also be used for detecting.
A. Sentiment Analysis
The information on social media is utilized to detect the sentiment hidden in the text. A
unique method that uses sentiment as a significant tool for analysis in NLP is the
sentimental analysis. It deciphers the sentiment of the person embodied in the written
text. It is in fact the process of decoding and interpreting the positivity or negativity of the
sentiment in a text. The dictionary based approach applied here basically uses three
dictionaries carrying the various terms that convey the positive, negative or harmful
sentiment. A positive dictionary naturally holds the terms that convey positive vibes and
similarly negative dictionary will consist of terms that carry negative vibes.eg the term
good, excellent carry positivity and will find a hit in the positive dictionary. Similarly, the
word “bad” finds place in the negative. If a word does not find a hit in either then it
comes under the normal category. However the critical one, that is the harmful dictionary
refers to words that could cause physical or mental harm to an individual. Example: beat,
stab. These words are irrefutably negative but contemporaneously they give “harmful”
vibes. Language analysis indicates around hundred or more words in harmful dictionary.
Sentiment analysis technique is implemented to classify the term in one of the three
categories. Prior to analysis one needs to create a Sentiment Dictionary of Lists with
Parameters and Label in sorted form. Using a Ordered Sequential Model Search Method
on the dictionaries results in returning the prediction. The search process returns a list of
the [sentiment, similarity score] pairs relative to a word. It can be categorized into
Harmful, Negative and Positive considering its TFIDF.
3
The prediction for some of the terms and its score using the sequential search method
are explained.
Example: The prediction of the sentiment for the term “bad” is as given below: The
table shows the sentiment of the text with its similarity score for three different words.
negative Words
29.2 Positive words
Harmful negative words
69.6
.
Fig. 1. Sentiment Word Frequency Ratio
Positive Dictionary: Opinion : Positive. The file contains a list of words that give
positive vibes hence categorized as POSITIVE opinion words, 2041 positive words [9],
[10]
4
Negative Dictionary: Opinion Lexicon: Negative. The file contains a list of words that
give negative vibes hence categorized as NEGATIVE opinion words. 4818 negative
words [9], [10]
Harmful Dictionary: A new dictionary is prepared known as Harmful dictionary. It
consists of words that refer to be harmful or cause adverse effect. It reflects the
psychological condition of individual through words. Harmful dictionary currently
consists of more than 400 words. A word may be negative but if it is harmful, it causes
major damage. Example: - beat, bombard, raid, stab.
For this research paper we have utilized the available standard PHEME dataset.
The benchmark PHEME dataset used in this research is a conglomeration of tweets that
are connected to breaking news events. These tweets are categorized as ‘rumour’ and
‘non-rumour’ as interpreted by expert journalists.
This is one of the breaking news that was used as a dataset for our simulation to help
categorize. News event of#germanwingscrash– “On 24th March 2015, an Airbus A320-
211 scheduled for the international German wings Flight 9525from Barcelona–El Prat
Airport in Spain to Düsseldorf Airport in Germany crashed 100 km (62 mi) north-west of
Nice in the French Alps, killing all 144 passengers and six crew members. The crash was
a deliberate one caused by the co-pilot diagnosed with suicidal tendencies and declared
unfit for work by his doctor”. The dataset contains 238 rumours and 231 non-rumours.
A. Accuracy
Classification
Accuracy
Algorithms
Multinomial Naïve
97.42%
Bayes
Accuracy(%)
98
97.5
97
96.5
96
Accuracy
95.5
95
94.5
94
93.5
Logistic Re- Multinominal K nearest Decision Tree Random Forest
gression Bayes
Different classification97.42 Methods
Accuracy(%)
B. Confusion Matrix
. Confusion matrix in a matrix form gives the actual values to predicted values. The
matrix form encompasses the true negative, false positive, false negative and true positive
values
8
a. Text Searching
In text searching step we search the particular message, as given below, whether it is
rumour or not. Our dataset is divided into sets-train and test dataset. The message is
classified as being a rumour or not, based on the authenticity of dataset. As per our
observations, Multinomial Naïve Bayes Algorithm is more accurate as compared to other
algorithms. In our implementation we have used the same due to its excellent accuracy
percentage to rumour detection. It gives good results in identifying the data or
information as being real or fake.
9
The input to our simulation is a news information in the form of text. After the
classification procedures are applied the Multinominal Bayes gives excellent results with
an accuracy of 99%. identifying the news as fake.
Input text:
Where: Precision is the ratio of true positive predictions to the total number of positive
predictions made by the model. It measures the accuracy of positive predictions.
Recall is the ratio of true positive predictions to the total number of actual positives in the
dataset. It measures the model's ability to correctly identify all positive instances.
The F1 score ranges from 0 to 1, with 1 indicating perfect precision and recall, and 0
indicating the worst possible performance. A high F1 score indicates that a model is
achieving both high precision and high recall, which is desirable in many classification
tasks.
Methodology P R F1
TFIDF+IG-ACONB 0.776 0.745 0.732
TFIDF_IG,CO,SU,GR , 0.794 0.773 0.783
NB
10
1.2
Performance Metrices For "News" for Multinominal Bayes
1
0.8
0.6
in %
0.4
0.2
0
Precision Recall Accuracy F1
Different parameters
1.2
0.8
0.6 Accuracy
0.4 F1
0.2
0
L ogistic Multinominal K-nearest Decision Tree Random
Regression Bayes Forest
References
1. Kumar, A., Bhatia, M.P.S. &Sangwan, S.R. Rumour detection using deep learning and filter-
wrapper feature selection in benchmark twitter dataset. Multimed Tools Appl (2021).
https://doi.org/10.1007/s11042-021-11340-x
11
2. Pathak, Ajeet& Mahajan, Aditee & Singh, Keshav&Patil, Aishwarya& Nair, Anusha. (2020).
Analysis of Techniques for Rumor Detection in Social Media. Procedia Computer Science.
167. 2286-2296. 10.1016/j.procs.2020.03.281.
3. Sarah A. Alkhodair, Steven H.H. Ding, Benjamin C.M. Fung, Junqiang Liu, Detecting
breaking news rumors of emerging topics in social media, Information Processing &
Management, Volume 57, Issue 2, 2020, 102018, ISSN 0306-4573,
https://doi.org/10.1016/j.ipm.2019.02.016.
4. Kumar, A., &Sangwan, S. R. (2018). Rumor Detection Using Machine Learning Techniques
on Social Media. Lecture Notes in Networks and Systems, 213– 221. doi:10.1007/978-981-
13-2354-6_23
5. Kumar, A., Sangwan, S.R. &Nayyar, A. Rumour veracity detection on twitter using particle
swarm optimized shallow classifiers. Multimed Tools Appl 78, 24083–24101 (2019).
https://doi.org/10.1007/s11042-019-7398-6.
6. WalaaMedhat, Ahmed Hassan, HodaKorashy, Sentiment analysis algorithms and
applications: A survey, Ain Shams Engineering Journal, Volume 5, Issue 4, 2014
7. Alonso, M.A.; Vilares, D.; Gómez-Rodríguez, C.; Vilares, J. Sentiment Analysis for Fake
News Detection. Electronics 2021, 10, 1348.
8. V. Sivasangari, Ashok Kumar Mohan, K. Suthendran, M. Sethumadhavan, “Isolating Rumors
Using Sentiment Analysis”, Journal of Cyber Security and Mobility. 2018 7.
10.13052/jcsm2245-1439.7113.
9. Kapusta, Jozef&Benko, Ľubomír&Munk, Michal. (2020). Fake News Identification Based on
Sentiment and Frequency Analysis. 10.1007/978-3-030-36778-7_44. Pages 1093-1113
10. O. Ajao, D. Bhowmik and S. Zargari, "Sentiment Aware Fake News Detection on Online
Social Networks," ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech
and Signal Processing (ICASSP), Brighton, United Kingdom, 2019, pp. 2507-2511
11. Q. Li, S. Shah, R. Fang, A. Nourbakhsh and X. Liu, "Tweet Sentiment Analysis by
Incorporating Sentiment-Specific Word Embedding and Weighted Text Features", 2016
IEEE/WIC/ACM International Conference on Web Intelligence (WI), Omaha, NE, 2016, pp.
568-571.
12. A. J. J. Mary and L. Arockiam, "Jen-Ton: A framework to enhance the accuracy of aspect
level sentiment analysis in big data," 2017 International Conference on Inventive Computing
and Informatics (ICICI), Coimbatore, 2017, pp. 452-457.
13. Kula S., Choraś M., Kozik R., Ksieniewicz P., Woźniak M. (2020) Sentiment Analysis for
Fake News Detection by Means of Neural Networks. In: Krzhizhanovskaya V. et al. (eds)
Computational Science – ICCS 2020. ICCS 2020. Lecture Notes in Computer Science, vol
12140. Springer, Cham.
14. Minqing Hu and Bing Liu. "Mining and Summarizing Customer Reviews", Proceedings of the
ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-
2004), Aug 22-25, 2004, Seattle, Washington, USA
15. Bing Liu, Minqing Hu and Junsheng Cheng. "Opinion Observer: Analyzing and Comparing
Opinions on the Web." Proceedings of the 14th International World Wide Web conference
(WWW-2005), May 10-14, 2005, Chiba, Japan.
16. WalaaMedhat, Ahmed Hassan, HodaKorashy, Sentiment analysis algorithms and
applications: A survey, Ain Shams Engineering Journal, Volume 5, Issue 4, 2014
17. Alonso, M.A.; Vilares, D.; Gómez-Rodríguez, C.; Vilares, J. Sentiment Analysis for Fake
News Detection. Electronics 2021, 10, 1348.
18. V. Sivasangari, Ashok Kumar Mohan, K. Suthendran, M. Sethumadhavan, “Isolating Rumors
Using Sentiment Analysis”, Journal of Cyber Security and Mobility. 2018 7.
10.13052/jcsm2245-1439.7113.
12
19. Kapusta, Jozef& Benko, Ľubomír& Munk, Michal. (2020). Fake News
Identification Based on Sentiment and Frequency Analysis. 10.1007/978-3-030-36778-7_44.
Pages 1093-1113
20. O. Ajao, D. Bhowmik and S. Zargari, “Sentiment Aware Fake News Detection on Online
Social Networks,” ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech
and Signal Processing (ICASSP), Brighton, United Kingdom, 2019, pp. 2507-2511
21. Q. Li, S. Shah, R. Fang, A. Nourbakhsh and X. Liu, “;Tweet Sentiment Analysis by
Incorporating Sentiment-Specific Word Embedding and Weighted Text Features”, 2016
IEEE/WIC/ACM International Conference on Web Intelligence (WI), Omaha, NE, 2016, pp.
568-571.
22. A. J. J. Mary and L. Arockiam, "Jen-Ton: A framework to enhance the accuracy of
aspect level sentiment analysis in big data," 2017 International Conference on Inventive
Computing and Informatics (ICICI), Coimbatore, 2017, pp. 452-457.
23. Kula S., Choraś M., Kozik R., Ksieniewicz P., Woźniak M. (2020) Sentiment Analysis for
Fake News Detection by Means of Neural Networks. In: Krzhizhanovskaya V. et al. (eds)
Computational Science – ICCS 2020. ICCS 2020. Lecture Notes in Computer Science, vol
12140. Springer, Cham.
24. Minqing Hu and Bing Liu. “Mining and Summarizing Customer Reviews”, Proceedings of
the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
(KDD-2004), Aug 22-25, 2004, Seattle, Washington, USA
25. Bing Liu, Minqing Hu and Junsheng Cheng. “Opinion Observer: Analyzing and Comparing
Opinions on the Web." Proceedings of the 14th International World Wide Web
conference (WWW-2005), May 10-14, 2005, Chiba, Japan.
26. Yuhang Yu. 2021. Review of the Application of Machine Learning in Rumor Detection. In
Proceedings of the 5th International Conference on Control Engineering and Artificial
Intelligence (CCEAI '21). Association for Computing Machinery, New York, NY, USA, 46–
52. https://doi.org/10.1145/3448218.3448238
27. Rani, N, Das, P, Bhardwaj, AK. Rumor, misinformation among web: A contemporary review
of rumor detection techniques during different web waves. Concurrency ComputatPractExper.
2022; 34(1):e6479. https://doi.org/10.1002/cpe.6479
28. Rani, Neetu and Das, Prasenjit and Bharadwaj, Amit, Rumour Detection in Online Social
Networks: Recent Trends (March 30, 2020). Proceedings of the International Conference on
Innovative Computing & Communications (ICICC) 2020,
http://dx.doi.org/10.2139/ssrn.3564070
29. Alessandro Bondielli, Francesco Marcelloni, A survey on fake news and rumour detection
techniques, Information Sciences, Volume 497, 2019, Pages 38-55, ISSN 0020-0255,
https://doi.org/10.1016/j.ins.2019.05.035
30. Olan, F., Jayawickrama, U., Arakpogun, E.O. et al. Fake news on Social Media: the Impact on
Society. InfSyst Front (2022). https://doi.org/10.1007/s10796-022-10242-z
31. Raza, S., Ding, C. Fake news detection based on news content and social contexts: a
transformer-based approach. Int J Data Sci Anal 13, 335–362 (2022).
https://doi.org/10.1007/s41060-021-00302-z
32. Takahashi T, Igata N (2012) Rumor detection on twitter. In: The 6th international conference
on soft computing and intelligent systems, and the 13th international symposium on advanced
intelligence systems, IEEE, pp 452–457
33. Zhao Z, Resnick P, Mei Q (2015) Enquiring minds: early detection of rumors in social media
from enquiry posts. In: Proceedings of the 24th international conference on world wide web.
International world wide web conferences steering committee, pp 1395–1405