Professional Documents
Culture Documents
Sentiment Analysis and Sarcasm Detection of Tweets
Sentiment Analysis and Sarcasm Detection of Tweets
Detection of Tweets
Jasleen Kaur
Department of Computer Science
Punjab Engineering College
Chandigarh
February 20, 2019
Abstract
Over the last few years, a lot of research has been carried out in the
area of Sentiment Analysis of textual data available on social network-
ing websites e.g.Facebook, Twitter, Instagram, YouTube. Sentiment
Analysis is contextual mining of text which determines whether opin-
ion expressed in a piece of text is positive, negative or neutral. Many
challenges are associated with sentiment analysis, one of them is sar-
casm. Every day thousands of slang words are created and used on
social media. The ambiguous nature of sarcasm sometimes makes it
hard to detect not only for computers but also for humans.
Thus, in order to detect the sarcasm current collection of positive
and negative sentiments may not be correct. Generally, sarcasm is
ignored during Sentiment Analysis because of its complex behaviour
and difficulties associated with it. Consequently, the outcome of these
analyses is disturbed. Hence, sarcasm detection is the main problem
associated with Sentiment Analysis which requires immediate atten-
tion. Hence, sarcasm detection is the main problem associated with
Sentiment Analysis which requires immediate attention. This paper
will address the problem of sarcasm detection through the most pop-
ular style of sarcasm - “positive sentiment associated with a negative
situation”.
1
1 Introduction
Nowadays, social media has gained popularity in ample amount. In fact,
now it is the source of communication, people residing in any corners of the
world can communicate easily. Therefore due to versatility and huge data,
Sentiment Analysis has become the most researched area. The overall idea of
Sentiments Analysis is to find the text’s polarity. The presence of sarcasm in
text obstructs the performance of current Sentiment Analysis(SA) systems.
One of the most challenging topics in sentiment analysis is sarcasm detec-
tion. NLP systems like dialogue system, summarization systems and brand
monitoring systems are being used to detect the polarity of posts or tweets
posted. Consider the given tweet on Twitter- “A perfect senseless movie!”.
In this example, two words ”perfect” and ”senseless” are of opposite polarity
namely positive and negative respectively, but negative emotion is attached
to the tweet. The review is rated as a positive review by any movie summa-
rization system which does not use sarcasm detection. Therefore, a genuine
and accurate sarcasm detection mechanism is the need of the hour which is
able to detect sarcastic posts over social media. The objective of the pro-
posed model on sarcasm detection is to recognize sarcasm on Twitter which
is the outcome of the difference where the negative situation is referred by
a positive sentiment. These problems are addressed by taking advantage of
the most popular style of sarcasm - “positive sentiment associated with a
negative situation”. Two ensemble-based approaches are used - voted en-
semble classifier and random forest classifier. In order to train the classifier,
the present approach to sarcasm detection depends on the existing collection
of positive and negative sentiments. But in this case, seeding algorithm is
used to generate the training corpus and the pragmatic classifier will detect
the sarcasm based on emoticons. The complete organization of the paper is
given as: Section 2 represents the motivation behind this paper. Section 3
specifies the proposed methodology and Section 4 concludes this paper.
2 Related Work
Sarcasm is the most popular way to express views and emotions on social
media and tough to identify unless the actual context is known. The de-
gree of sarcastic content on famous social networking websites like Twitter
has increased gigantically in the last 10 years. While speaking, the tone
2
usually gives away the Sarcasm. But that is not the case in the written text
though. Sentiment analysis has been affected significantly because of sarcasm
but even then the sarcasm is abandoned due to its uncertain behaviour. In
Twitter sarcastic tweets are prevalent e.g. “All your clothes are incredibly
horrible!!!” might be considered as a text of positive polarity but in reality,
negative emotion is associated with the tweet. Sometimes the feeling con-
veyed by a person through a tweet is completely different from the original
statement. So a method is required which will automatically examine the
sarcasm associated with the tweets. This paper states two ensembled-based
methods - Random Forest and Weighted Ensemble.
Rilof et al. considered ‘Sarcasm as a Contrast of positive sentiment and neg-
ative situation’. They have used a novel bootstrapping algorithm for gener-
ation of corpuses for positive and negative phrases. For training of Machine
learning based classifiers, they used tweets containing ‘hastage sarcasm’ and
applied them to Naive Bayes and SVM [1].
Aditya et al. presented a model based on the explicit and implicit incon-
gruity of sentiments exposed via tweets. To detect sarcasm the text of the
tweet was broken into various 2-grams and 3-grams and the congruity of the
grams was tested using the already existing corpus of positive and negative
words. It worked using lexical and pragmatic feature of the tweet as rules for
support vector machine. It outperformed the existing systems by improving
performance by 10 percent [2]. The system (SCUBA) aimed to address the
task of sarcasm detection on Twitter by using the behavioral features intrin-
sic to users expressing sarcasm. They identified such traits using the user’s
past tweets and employed theories from behavioral and psychological studies
to construct a behavioral modeling framework tuned for detecting sarcasm.It
first theorized the core forms of sarcasm using existingpsychological and be-
havioral studies. Next, it developed computational features to capture these
forms of sarcasm using user’s current and past tweets. Finally, it combined
these features to train a Naive Bayes and SVM to train classifier [3].
It is observed that [4] [3] [2] identify contextualised sarcasm i.e. the sarcasm
that arises in conversation between two people. Also, [1] [2] [3] [4] [5] use
Machine Learning Algorithms like Naive Bayes and Support Vector machine
to accomplish this task. Also, [1] [2] [3] [4] are dependent on external corpus
of positive and negative sentiment phrases to detect sarcasm.
3
3 Proposed Methodology
Generally, any tweet is classified as a phrase which in turn can be a positive
or a negative or a combination of both phrases. The sarcasm can be ex-
pressed through words or emoticons. The proposed methodology recognizes
the existence of sarcasm as the difference of positive sentiment attached to
a negative situation. The implementation flow for any tweet is represented
through figure 1. The first step is to identify the sentiment attached to
the tweet i.e. positive, negative or both. In the next step, emoticons are
extracted from the tweet. Based on these two steps the type of tweet is iden-
tified whether it is “Sarcastic” one or “Non-sarcastic” one. The proposed
methodology constitutes following steps:
4
3.2 Seeding
In this paper, a new technique known as seeding algorithm is proposed. The
seeding algorithm is explained as follows:
1. A series of positive and negative sentiment phrases are generated by the
use of seeding algorithm. These sentiment phrases will be treated as the
input to machine learning classifiers and pragmatic classifier for feature ex-
traction and emoticon behaviour respectively.
2. A positive sentiment word will act as the seed to the algorithm.’Like’ is
the most popular positive sentiment word and is used as the seed in this
paper. The word ’like’ can be used in two different situations: one where an
emotion is described and other where some comparison is performed.
3. Here, the word ’Like’ is used to describe the emotion.For example ’The
girl there is just like my cousin’ shows the comparison while ’She likes you’
represents an emotion. Of these two latter is the one which is being used.
4. The seed is applied to the tweets which contain hashtags like ”sarcasm”,
”humorous” and ”sarcastic” and a series of negative situation phrases is gen-
erated based on them. If the seed is succeeded by any unigram, bigram or
trigram and rules given in Table 1 are followed then that particular gram
5
will be appended to the negative situation phrase.
5. Consider the tweet ”She likes to waste her time on pathetic movie hashtag
sarcasm”, the word ’likes’in the tweet represents a negative situation. As a
result of which resulted n-grams would be ”waste”, ”waste her”, ”waste her
time”. But the only unigram ”waste” follows the rules as per Table 2. So the
word which will be appended to the list is ”waste”. And the list of positive
phrases is computed by the list of previously generated negative situation
phrases.
6. For each of the phrase in the negative situation phrase list, if rules
mentioned in Table 2 work properly for any unigram or bigram preceding,
then these words are appended to the positive sentiment phrase list. Now,
these phrases can again be used to generate corresponding negative situation
phrases and so on.
6
3.3 Lexical Classifier
The training dataset is further divided into two groups based on the positive
sentiment and negative situation phrases attached to the tweet :
1. The first group is of those tweets which contain either positive senti-
ment phrases alone or negative situation phrases alone. This group is given
as the input to the pragmatic classifier to find the emoticon based sarcasm.
2. The second group is of those tweets which contain both positive sentiment
and negative situation phrases. And the machine learning classifiers use this
group as its input.
7
categories:
1. Positive emoticon.
2. Negative emoticon.
4 Conclusion
Presently, all the work related to Sarcasm Detection in Sentiment Analysis
is based on n-grams method. This approach considers Naive Bayes and
SVM for Machine Learning Classifier. Random forest and weighted methods
which are a part of ensemble-based approach are considered more efficient
in terms of precision. To calculate the efficiency of the weighted ensemble,
we need to calculate the efficiency of component classifiers which can be
further improved by using the classifiers of machine learning. The pragmatic
classifier also increases the exactness of the proposed system as it depends
upon the emoticon based sarcasm which is not observed by the machine
8
learning classifier. The seeding algorithm used for training set generation
was introduced which is independent of any previous works of sentiment
analysis. Seeding algorithm is evolving as more and more changes are being
done in the field of linguistics.
References
[1] Bamman, David, and Noah A. Smith. ”Contextualized Sarcasm Detec-
tion on Twitter.” ICWSM. 2015.
[2] Joshi, Aditya, Vinita Sharma, and Pushpak Bhattacharyya. ”Harnessing
Context Incongruity for Sarcasm Detection.” ACL(2). 2015.
[3] Rajadesingan, Ashwin, Reza Zafarani, and Huan Liu. ”Sarcasm detec-
tion on twitter: A behavioral modeling approach.” Proceedings of the Eighth
ACM International Conference on Web Search and Data Mining. ACM, 2015.
[4] Riloff, Ellen, et al. ”Sarcasm as Contrast between a Positive Sentiment
and Negative Situation.” EMNLP. Vol. 13. 2013.
[5]Tanya Jain, Nilesh Agrawal, Garima Goyal,Niyati Aggrawal. ”Sarcasm
detection of tweets: A comparative study”, 2017 Tenth International Con-
ference on Contemporary Computing (IC3),2017.