A minor project report on

"Sentiment Analysis of Twitter Data on GST"

submitted in partial fulfillment of the requirements for the degree of

Bachelor of Technology

In Discipline of School of Engineering





under the guidance of


School of Computer Science and Engineering



This is to certify that the project report entitled

"Sentiment Analysis of Twitter Data on GST"



in partial fulfilment of the requirements for award of the Degree of the Bachelors of
Technology in Discipline of Computer Science and Engineering is a bonafide record of
work carried out under my(our) guidance and supervision at School of Computer Science and
Engineering, KIIT University.

Signature of Supervisor:
Dr. Manoj Kumar Mishra
School of Computer Science and
KIIT University

The Project was evaluated by us on _____________


Working on this project was interesting. However, it would not have been possible without the
kind support and guidance of everyone who have given us an opportunity to extend our
knowledge and research in the field. We would like to extend our sincere thanks to all of them.

We are highly indebted to Dr. Manoj Kumar Mishra for her guidance, supervision as well as
for providing all necessary information regarding the project work. We would like to express
our gratitude towards member of KIIT UNIVERSITY for always co-operating and encouraging
us in completion of this project. We would like to express our special gratitude and thanks for
giving us such attention and time.

Abhishek Kumar
Asmita Mukherjee
Abhishek Chanda

This project addresses the problem of sentiment analysis in twitter, that is, classifying tweets
according to the sentiment expressed in them. It can be positive, negative or neutral. Twitter
is an online micro-blogging and social-networking platform which allows users to write short
status updates of maximum length 140 characters. It is a rapidly expanding service with over
200 million registered users.

The growing popularity of social media has raised the opportunity for exploring and tracking
the response of new reforms and policies in India. Many researchers have been analysing the
tweets by citizens of a nation on Twitter which is a microblogging website where users read
and write millions of tweets on a variety of topics on daily basis. Our goal is to not only
classify the reactions based on the sentiments, but also to predict whether the upcoming
tweets on this issue is on a positive note, or a negative note[1].
In this paper, Twitter has been used as a forum to understand the sentiments of citizens of
India towards recently launched Goods and Services Tax by Indian Government on 1st July
2017. In this paper, R language is used to extract the tweets from Twitter and by using
machine learning algorithm like Naïve Bayes we provide research on twitter data stream.
The tweets originating in India before the implementation of GST has been analysed.
We have also analysed the tweets after the implementation of the GST. Then, we have
compared the tweets on the basis of anger, anticipation, disgust, fear, joy, sadness, and

Keywords: Sentiment Analysis, Twitter, Word cloud, GST, Review, Opinion Mining
1. Introduction
1.1 Motivation ……………………………………………………………………6
1.2 Domain Introduction …………………………………………………………6
1.3 Aim of our work ……………………………………………………………...7
1.4 Scope of our work ……………………………………………………………7

2. Review of Literature
2.1 Related work.…………………………………….……………………………8

3. Report in the present investigation

3.1 Sentiment Analysis.……………………………………………………………9
3.1.1 Sentence Level.…………………………………………………………9
3.1.2 Document Level.………………………………………………………..9
3.1.3 Feature Level.…………………………………………………………..9
3.2 Proposed Methodology.………………………………………………………..9
3.2.1 Steps to extract the tweets.……………………………………………9-11
3.3 Pre-processing of extracted tweets.…………………………………………….12
3.3.1 Cleaning of text.………………………………………………………...12
3.3.2 Filtering.……....………………………………………………………...12
3.3.3 Removal of stop words..……………………………………………...…12
3.3.4 Construction of n-grams.………………………………………………..12
3.4 Lexical Analysis.……………………………………………………………….12
3.5 Calculating sentiment score.…………………………………………………....13
3.6 Visualization.…………………………………………………………………...13

4. Results and Discussion

4.1 Extracting GST tweets.………………………………………………………...14
4.2 Access twitter data sets.………………………………………………………..14
4.3 Classification of tweets.…………………………………………………….15-16
4.4 Discussion on the result.……………………………………………………….17

5. Summary and Conclusion

5.1 Conclusion……………………………………………………………………...18
5.2 Future Recommendation……………………………………………………….18

6. References……………………………………………………………………………..19
Chapter 1 | Introduction

Goods and Services Tax (GST) is an indirect tax applicable throughout India which replaced
multiple taxes applied by the central and state governments. It was introduced as The
Constitution (One Hundred and First Amendment) Act 2017, following the passage of
Constitution 122nd Amendment Bill. The GST is governed by a GST Council and its
Chairman is the Finance Minister of India. Under GST, goods and services is taxed at the
following rates, 0%, 5%, 12%, 18%, 28%[3].
There is a special rate of 0.25% on rough precious and semi-precious stones and 3% on gold.
The Goods and Services Tax (GST), India's biggest tax reform in 70 years of independence,
was launched on the midnight of 30 June 2017 [2a] by the Prime Minister of India Narendra
Modi. The launch was marked by a historic midnight (June 30-July 1, 2017) session of both
the houses of parliament convened at the Central Hall of the Parliament[4].

1.1 Motivation
This project addresses the result of sentiment analysis using Twitter data. We are classifying
tweets according to the sentiment expressed in them: positive, negative or neutral. We have
chosen to work with Twitter since we feel it is a better approximation of public sentiment as
opposed to conventional internet articles and web blogs. The reason is that the amount of
relevant data is much larger for Twitter, as compared to traditional blogging sites. Moreover,
the response on Twitter is more prompt and also more general. Sometime, the result may vary
from the know fact but in majority of the case the result is correct[2].

1.2 Domain Introduction

This project of analyzing sentiments from twitter data comes under the topic “Sentimental
Sentiment Analysis refers to the use of natural language processing, text analysis to
systematically identify, extract, quantify, and study affective states and subjective
information. Sentiment analysis is widely applied to voice of customer such as reviews and
survey responses, online and social media, and healthcare materials for applications that
range from marketing to customer service to clinical medicine.
Generally speaking, Sentiment Analysis aims to determine the attitude of a speaker, writer, or
other subject with respect to some topic or the overall contextual polarity or emotional
reaction to a document or event.

1.3 Aim of our work

Our aim is to understand the sentiments of people before GST was applied and the sentiments
of people after it was applied using Twitter data. This result will show how effective is the
implementation of GST in our country.

1.4 Scope of our work

The output given by our project will help people to get more insight about GST and work on
analyzing the advantages and disadvantages of GST based on the analyzed sentiments.
Chapter 2 | Review of Literature

2.1 Related Work

Social media has been explored to estimate the popularity of politicians, sentiments of
general public towards some recently introduced policy maybe budget, tax reforms etc. to
find out the sentiments of social media users. Social networking sites have also been used to
compare people’s political preferences expressed online with those observed by elections.
Social media can be analyzed on daily or hourly basis during an electoral campaign so as to
get a detailed insight into emotions of voters . It is possible to track in real-time trends and
capture any sudden change by monitoring and analyzing the conversation on social
networking sites and get the public opinion well before declaration of results of polls. There
are few studies that claim that analyzing social media allows a reliable forecast of the final
result. In a study by researchers [11], it has been stated that the number of times a
candidate is mentioned in blog posts is a good predictor of electoral success and can achieve
better predictions than election polls. There are claims by some researchers that more the
number of facebook supporters an electoral candidate has, better are the chances to win. Party
pointed out on Twitter with the results of the 2009 German election and discussed that the
relative number of tweets related to each party is a good predictor of its vote share. There
stands a better way to analyze tweets such that not just the count or mention of party name or
candidate name is considered but the sentiment attached in tweets are also analysed. A
sentiment classifier based on lexical induction has been built by and correlations between
several polls conducted during the 2008 presidential election and the content of wall posts
available on Facebook has been found[10].
There are other studies by researchers that show similar results displaying correlation
between Obama’s approval rate and the sentiment expressed by Twitter users. For predicting
the results of both the 2011and the 2012 legislative elections in the Netherlands, sentiment
analysis of tweets proved to perform quite well.
Chapter 3 | Report on the present investigation

3.1 Sentiment Analysis

3.1.1 Sentence Level

The task at this level goes to the sentences and determines whether each sentence
expressed a positive, neutral or negative opinion. The first step is to identify whether
the sentence is subjective or objective.

3.1.2 Document Level

The task at this level is to determine whether a whole opinion document expresses
a positive, negative or neutral opinion.

3.1.3 Feature Level

Both the sentence and document level analysis do not discover what exactly people
liked and did not like. Instead of looking at language constructs, aspect level directly
looks at the opinion itself.

3.2 Proposed Methodology

3.2.1. Steps to extract the tweets

1.The first step is Creation of twitter application

2. In R tool, twitteR package act as interface to the Twitter web API.
3. ROAuth package is used for authentication.
4. Twitter authenticated credential object such as consumer key, consumer
secret, access token, access secret are created.
5. During authentication, redirection to a URL automatically when clicks on
authorize app, and enter the unique 7-digit number to get linked to the
Figure 3.1: Twitter API Customer key, Request,Authorise,Access URL
Figure 3.2: Access Token, Access Token Secret, Owner ID

Figure 3.3: Application Permission

3.3 Pre-processing of extracted tweets

After retrieval of tweets, Sentiment analysis tool is applied on raw tweets but in most
of cases results to very poor performance.
Therefore, pre-processing techniques are necessary for obtaining better results. We
extract tweets i.e. short messages from twitter which are used as raw data. This raw
data needs to be pre-processed. So, pre-processing involves following steps:

3.3.1 Cleaning text

The process of cleaning text is carried out by removing unnecessary data from
twitter data set such as HTML Tags, emoticons, White spaces, and Numbers.

3.3.2 Filtering
Filtering is nothing but cleaning of raw data. In this step, URL links (E.g., special words in twitter (e.g.
“RT” which means ReTweet), user names in twitter (e.g. @abhi , @symbol
indicating a user name), and emoticons are removed.

3.3.3 Removal of Stop-words

Articles such as “a”, “an”, “the” and other stop-words such as “to”, “of”, “is”,
“are”, “this”, “for” removed in this step.

3.3.4 Construction of n-grams

Set of n-grams can make out of consecutive words. Negation words such as
“no”, “not” is attached to a word which follows or precedes it.
For Instance: “I do not like remix music” has two bigrams: “I do+not”,
“do+not like”, “not+like remix music”. So, the accuracy of the classification
improves by such procedure because negation plays an important role in
sentiment analysis. Negation needs to be taken into account, because it is a
very common linguistic construction that
affects polarity.

3.4 Lexical Analysis

Lexical analysis may be carried out using lexicon-based approach, which uses a set of
positive and negative words. A database, created by Hui Lui contains 2006 positive
and 4783 negative sentiment words, is loaded into R and the words in the tweets are
compared with the words in the database and the sentiment is predicted.
3.5 Calculating sentiment score

Using Scoring Function score of every tweet has been calculated using Hui Lui lexicons.
Sentiment Score = Σ positive words – Σ Negative words

Polarity types

(i) Positive polarity - Number of positive words are greater than number of negative
(ii) Negative polarity - Number of negative words are greater than number of
positive words.
(iii) Neutral polarity - Number of positive and negative words are same or is no
existence of any opinion words.

3.6 Visualization

Sentiment analysis can be visualized by graphical representation using R-studio, there

are a rich set of graphical packages are available in R. In this paper, word clouds, pie-
chart and bar charts are used to represent the outcomes of the sentiment analysis.
Chapter 4 | Results and Discussion

4.1 Extracting GST Tweets

Before mining any data from Twitter using APIs, we have to authenticate with Twitter
using an application created on Twitter. Once the application is created, we get access
to consumer key, consumer secret, access token, access secret using which the API
has to authenticate itself with the Twitter Authentication server.

consumer_key<- "6mSYUluKJbpqgIIhbVTL3JpMK"
consumer_secret<- "GxuDV8KkkKrpslrQk6Xk5jhs9pZitDuzEO56tliKa513in2Fkb"
access_token<- "981146936077008896-v5URCf1uPUUYawCwJkaJYxab74mCu1C"
access_secret<- "xkgwj520PdKVYd7giqF18pbB7iAdKglxjyDuGx8x2Uv6S"

setup_twitter_oauth(consumer_key, consumer_secret, access_token, access_secret)

4.2 Access twitter data sets

Once API is authenticated with Twitter Authentication service, a token is generated and is
made available to API for every transaction with the Twitter server. Using this token, tweets
are mined using hashtags. We use searchTwitter() function to access the data. In this work we
extract 1000 tweets on GST[8].

tweets=searchTwitter(‘GST’, n=1000, since=NULL, until=NULL, lang="en",


Figure 4.1: Word-Cloud of GST (Before Implementation of GST)

Figure 4.2: Sentiment of people day before the implementation of GST

4.3 Classification of tweets

The polarity operation is applied on pre-processed data set, the data which contains
cleaned data with bigram features, the polarity function can generate the sentiment
scores for each tweet, if it is negative or positive tweets, and we need what are the
positives and negative from the public[7].

Earlier, the word-cloud and bar-graph represent the sentiment of people before the
implementation of GST. Now, the below word-cloud and bar graph represent the
sentiment of people after the GST come into action.
Figure 4.3: Word-cloud of GST(after GST come into affect)

Figure 4.4: Bar-graph of the sentiment of people after the implementation of GST
Figure 4.5: Pie-chart of the result

4.4 Discussion on the result

Before the implementation of GST, people have fear about it. Many people donot
know what will be the outcomes of GST, even though they are in fear and sorrow.

The condition of people can be easily viewed in the bar-graph. More than 60% people
have negative sentiment about the GST. Some section of society is in joy as they are
seeing some hope of benefit from the GST.

But, after the implementation of GST, the happiness of the majority of people can be
visible. About, 65% people have shown positive sentiment about the implementation
of GST. Some section of society is still in sorrow and fear[10].

Government is doing lot of changes in the GST so that every section of society get
equal benefit from it.
Chapter 5 | Summary and Conclusion

5.1 Conclusion

The increasing number of social media websites by Internet users has raised the interest about
the opportunity to understand the relation between people’s preferences and actual political
This study focuses on the question that whether the data from social networking sites can be
utilized to interpret the attitude of citizens of a nation towards various policies.
We analysed 3,000 twitter messages mentioning keyword viz. “GST” for two days, viz. one
day prior to announcement and on the day of announcement We have observed that
twitter is very commonly being used as a platform for deliberation by citizens of India. It has
been concluded that social media is a powerful and reliable source of public opinion as far as
a nation like India is concerned[6]. The discussions on twitter are equivalent to traditional
discussions and are capable enough to give a fair idea of emotions of general public. We have
done sentiment analysis of emotions of people which shows people’s acceptance for GST but
with too much of anticipation feeling.

5.2 Future Recommendation

In future, we plan to convert this analysis in real time corresponding to tweets arriving on
temporal scale. We want to extend our work on analysing the stock market exchange and the
effect of GST on stock market.
