Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Sentiment Analysis And Sarcasm

Detection of Tweets
Jasleen Kaur
Department of Computer Science
Punjab Engineering College
Chandigarh
February 20, 2019

Abstract
Over the last few years, a lot of research has been carried out in the
area of Sentiment Analysis of textual data available on social network-
ing websites e.g.Facebook, Twitter, Instagram, YouTube. Sentiment
Analysis is contextual mining of text which determines whether opin-
ion expressed in a piece of text is positive, negative or neutral. Many
challenges are associated with sentiment analysis, one of them is sar-
casm. Every day thousands of slang words are created and used on
social media. The ambiguous nature of sarcasm sometimes makes it
hard to detect not only for computers but also for humans.
Thus, in order to detect the sarcasm current collection of positive
and negative sentiments may not be correct. Generally, sarcasm is
ignored during Sentiment Analysis because of its complex behaviour
and difficulties associated with it. Consequently, the outcome of these
analyses is disturbed. Hence, sarcasm detection is the main problem
associated with Sentiment Analysis which requires immediate atten-
tion. Hence, sarcasm detection is the main problem associated with
Sentiment Analysis which requires immediate attention. This paper
will address the problem of sarcasm detection through the most pop-
ular style of sarcasm - “positive sentiment associated with a negative
situation”.

1
1 Introduction
Nowadays, social media has gained popularity in ample amount. In fact,
now it is the source of communication, people residing in any corners of the
world can communicate easily. Therefore due to versatility and huge data,
Sentiment Analysis has become the most researched area. The overall idea of
Sentiments Analysis is to find the text’s polarity. The presence of sarcasm in
text obstructs the performance of current Sentiment Analysis(SA) systems.
One of the most challenging topics in sentiment analysis is sarcasm detec-
tion. NLP systems like dialogue system, summarization systems and brand
monitoring systems are being used to detect the polarity of posts or tweets
posted. Consider the given tweet on Twitter- “A perfect senseless movie!”.
In this example, two words ”perfect” and ”senseless” are of opposite polarity
namely positive and negative respectively, but negative emotion is attached
to the tweet. The review is rated as a positive review by any movie summa-
rization system which does not use sarcasm detection. Therefore, a genuine
and accurate sarcasm detection mechanism is the need of the hour which is
able to detect sarcastic posts over social media. The objective of the pro-
posed model on sarcasm detection is to recognize sarcasm on Twitter which
is the outcome of the difference where the negative situation is referred by
a positive sentiment. These problems are addressed by taking advantage of
the most popular style of sarcasm - “positive sentiment associated with a
negative situation”. Two ensemble-based approaches are used - voted en-
semble classifier and random forest classifier. In order to train the classifier,
the present approach to sarcasm detection depends on the existing collection
of positive and negative sentiments. But in this case, seeding algorithm is
used to generate the training corpus and the pragmatic classifier will detect
the sarcasm based on emoticons. The complete organization of the paper is
given as: Section 2 represents the motivation behind this paper. Section 3
specifies the proposed methodology and Section 4 concludes this paper.

2 Related Work
Sarcasm is the most popular way to express views and emotions on social
media and tough to identify unless the actual context is known. The de-
gree of sarcastic content on famous social networking websites like Twitter
has increased gigantically in the last 10 years. While speaking, the tone

2
usually gives away the Sarcasm. But that is not the case in the written text
though. Sentiment analysis has been affected significantly because of sarcasm
but even then the sarcasm is abandoned due to its uncertain behaviour. In
Twitter sarcastic tweets are prevalent e.g. “All your clothes are incredibly
horrible!!!” might be considered as a text of positive polarity but in reality,
negative emotion is associated with the tweet. Sometimes the feeling con-
veyed by a person through a tweet is completely different from the original
statement. So a method is required which will automatically examine the
sarcasm associated with the tweets. This paper states two ensembled-based
methods - Random Forest and Weighted Ensemble.
Rilof et al. considered ‘Sarcasm as a Contrast of positive sentiment and neg-
ative situation’. They have used a novel bootstrapping algorithm for gener-
ation of corpuses for positive and negative phrases. For training of Machine
learning based classifiers, they used tweets containing ‘hastage sarcasm’ and
applied them to Naive Bayes and SVM [1].
Aditya et al. presented a model based on the explicit and implicit incon-
gruity of sentiments exposed via tweets. To detect sarcasm the text of the
tweet was broken into various 2-grams and 3-grams and the congruity of the
grams was tested using the already existing corpus of positive and negative
words. It worked using lexical and pragmatic feature of the tweet as rules for
support vector machine. It outperformed the existing systems by improving
performance by 10 percent [2]. The system (SCUBA) aimed to address the
task of sarcasm detection on Twitter by using the behavioral features intrin-
sic to users expressing sarcasm. They identified such traits using the user’s
past tweets and employed theories from behavioral and psychological studies
to construct a behavioral modeling framework tuned for detecting sarcasm.It
first theorized the core forms of sarcasm using existingpsychological and be-
havioral studies. Next, it developed computational features to capture these
forms of sarcasm using user’s current and past tweets. Finally, it combined
these features to train a Naive Bayes and SVM to train classifier [3].
It is observed that [4] [3] [2] identify contextualised sarcasm i.e. the sarcasm
that arises in conversation between two people. Also, [1] [2] [3] [4] [5] use
Machine Learning Algorithms like Naive Bayes and Support Vector machine
to accomplish this task. Also, [1] [2] [3] [4] are dependent on external corpus
of positive and negative sentiment phrases to detect sarcasm.

3
3 Proposed Methodology
Generally, any tweet is classified as a phrase which in turn can be a positive
or a negative or a combination of both phrases. The sarcasm can be ex-
pressed through words or emoticons. The proposed methodology recognizes
the existence of sarcasm as the difference of positive sentiment attached to
a negative situation. The implementation flow for any tweet is represented
through figure 1. The first step is to identify the sentiment attached to
the tweet i.e. positive, negative or both. In the next step, emoticons are
extracted from the tweet. Based on these two steps the type of tweet is iden-
tified whether it is “Sarcastic” one or “Non-sarcastic” one. The proposed
methodology constitutes following steps:

3.1 Data Extraction and Cleaning


This step involves the extraction and cleaning of the data as follows : i. Use
Tweety API to collect the tweets from Twitter that contain hashtags like
’sarcasm’,’humorous’, ’sarcastic’ etc. ii. Based on the obtained collection
of tweets machine learning classifiers are trained. iii. Collected tweets are
cleaned and slang words are standardized in accordance with the dictionary
of slang words for example ’Sorryyyyyyy’ is standardized to ’Sorry’. iv. Use
the standard Edit Distance Algorithm to eliminate spelling mistakes. v. Use
regular expressions to remove the numeric data and punctuation marks.

4
3.2 Seeding
In this paper, a new technique known as seeding algorithm is proposed. The
seeding algorithm is explained as follows:
1. A series of positive and negative sentiment phrases are generated by the
use of seeding algorithm. These sentiment phrases will be treated as the
input to machine learning classifiers and pragmatic classifier for feature ex-
traction and emoticon behaviour respectively.
2. A positive sentiment word will act as the seed to the algorithm.’Like’ is
the most popular positive sentiment word and is used as the seed in this
paper. The word ’like’ can be used in two different situations: one where an
emotion is described and other where some comparison is performed.
3. Here, the word ’Like’ is used to describe the emotion.For example ’The
girl there is just like my cousin’ shows the comparison while ’She likes you’
represents an emotion. Of these two latter is the one which is being used.
4. The seed is applied to the tweets which contain hashtags like ”sarcasm”,
”humorous” and ”sarcastic” and a series of negative situation phrases is gen-
erated based on them. If the seed is succeeded by any unigram, bigram or
trigram and rules given in Table 1 are followed then that particular gram

5
will be appended to the negative situation phrase.
5. Consider the tweet ”She likes to waste her time on pathetic movie hashtag
sarcasm”, the word ’likes’in the tweet represents a negative situation. As a
result of which resulted n-grams would be ”waste”, ”waste her”, ”waste her
time”. But the only unigram ”waste” follows the rules as per Table 2. So the
word which will be appended to the list is ”waste”. And the list of positive
phrases is computed by the list of previously generated negative situation
phrases.
6. For each of the phrase in the negative situation phrase list, if rules
mentioned in Table 2 work properly for any unigram or bigram preceding,
then these words are appended to the positive sentiment phrase list. Now,
these phrases can again be used to generate corresponding negative situation
phrases and so on.

6
3.3 Lexical Classifier
The training dataset is further divided into two groups based on the positive
sentiment and negative situation phrases attached to the tweet :
1. The first group is of those tweets which contain either positive senti-
ment phrases alone or negative situation phrases alone. This group is given
as the input to the pragmatic classifier to find the emoticon based sarcasm.
2. The second group is of those tweets which contain both positive sentiment
and negative situation phrases. And the machine learning classifiers use this
group as its input.

3.4 Machine Learning Classifier


In order to extract and classify the important information from the collected
tweets data mining and machine learning techniques are used. Generally,
most of the current works in this field use Naive Bayes or Support Vector
Machine to detect sarcasm. Here two different approaches - Random Forest
and Weighted Ensemble are presented for this task.
1. Random Forest: Random Forest in an ensemble learning technique in
which multiple decision trees are constructed.The training data set is split
into smaller chunks and each chunk corresponds to a decision tree. Moreover,
the output class is the mode or the mean average of each of the decison tree.
2. Weighted Ensemble: Weighted Ensemble is also a form of ensem-
ble based learning which contains variouss machine learning classifiers.The
weightage is given to each of the classifier according to its accuracy.The
output class is estimated by the weighted ensemble classifier by taking the
average mean or mode of each of the individual classifier into account.In this
paper weighted ensemble consists of Naive Bayes, Logistic Regression and
Random Forest.

3.5 Emoticon Extraction


Emoticons are nothing but the visual portrayal of a facial expression which
depicts moods of the persons. Almost every social networking sites provide
support for emoticons. To every emoticon, a unique code is assigned. In
this step, the extraction of unique emoticons used in tweets is performed.
Now meaning and sentiments associated with emoticons in terms of person’s
moods eg sad, happy or angry are identified and they are classified into two

7
categories:
1. Positive emoticon.
2. Negative emoticon.

3.6 Pos-neg Recognition


The preprocessed tweets will act as the dataset to this step. Now the pos-
itive or negative sentiment is assigned to every tweet of the dataset. The
identification of positive or negative will remain the same. In this step, the
sarcasm detection is implemented in such a way that negative emotion pre-
ceded by a positive sentence. The output of such implementation is used to
make seeding corpus by the use of n-grams.

3.7 Pragmatic Classifier


This will deal with the emoticon based sarcasm.Consider the given emoticon
based cases :
case 1: Cross sentiment is used in the negative emoticon. e.g. We like to
waste our time :( (sad).
case 2: A positive emoticon is succeeding a negative statement e.g. I am so
jealous of you :* (affection).
case 3: Different kinds of emoticons are used by the user successively.
For above mentioned scenarios, the total polarity is computed by taking both
emoticons and text as the selection criteria.

4 Conclusion
Presently, all the work related to Sarcasm Detection in Sentiment Analysis
is based on n-grams method. This approach considers Naive Bayes and
SVM for Machine Learning Classifier. Random forest and weighted methods
which are a part of ensemble-based approach are considered more efficient
in terms of precision. To calculate the efficiency of the weighted ensemble,
we need to calculate the efficiency of component classifiers which can be
further improved by using the classifiers of machine learning. The pragmatic
classifier also increases the exactness of the proposed system as it depends
upon the emoticon based sarcasm which is not observed by the machine

8
learning classifier. The seeding algorithm used for training set generation
was introduced which is independent of any previous works of sentiment
analysis. Seeding algorithm is evolving as more and more changes are being
done in the field of linguistics.

References
[1] Bamman, David, and Noah A. Smith. ”Contextualized Sarcasm Detec-
tion on Twitter.” ICWSM. 2015.
[2] Joshi, Aditya, Vinita Sharma, and Pushpak Bhattacharyya. ”Harnessing
Context Incongruity for Sarcasm Detection.” ACL(2). 2015.
[3] Rajadesingan, Ashwin, Reza Zafarani, and Huan Liu. ”Sarcasm detec-
tion on twitter: A behavioral modeling approach.” Proceedings of the Eighth
ACM International Conference on Web Search and Data Mining. ACM, 2015.
[4] Riloff, Ellen, et al. ”Sarcasm as Contrast between a Positive Sentiment
and Negative Situation.” EMNLP. Vol. 13. 2013.
[5]Tanya Jain, Nilesh Agrawal, Garima Goyal,Niyati Aggrawal. ”Sarcasm
detection of tweets: A comparative study”, 2017 Tenth International Con-
ference on Contemporary Computing (IC3),2017.

You might also like