Professional Documents
Culture Documents
"Sentiment Analysis of Twitter Data On GST": Submitted in Partial Fulfillment of The Requirements For The Degree of
"Sentiment Analysis of Twitter Data On GST": Submitted in Partial Fulfillment of The Requirements For The Degree of
Bachelor of Technology
By
DECEMBER 2018
CERTIFICATE
in partial fulfilment of the requirements for award of the Degree of the Bachelors of
Technology in Discipline of Computer Science and Engineering is a bonafide record of
work carried out under my(our) guidance and supervision at School of Computer Science and
Engineering, KIIT University.
Signature of Supervisor:
Dr. Manoj Kumar Mishra
School of Computer Science and
Engineering
KIIT University
DECEMBER 2018
ACKNOWLEDGEMENT
Working on this project was interesting. However, it would not have been possible without the
kind support and guidance of everyone who have given us an opportunity to extend our
knowledge and research in the field. We would like to extend our sincere thanks to all of them.
We are highly indebted to Dr. Manoj Kumar Mishra for her guidance, supervision as well as
for providing all necessary information regarding the project work. We would like to express
our gratitude towards member of KIIT UNIVERSITY for always co-operating and encouraging
us in completion of this project. We would like to express our special gratitude and thanks for
giving us such attention and time.
Abhishek Kumar
Asmita Mukherjee
Abhishek Chanda
Abstract
This project addresses the problem of sentiment analysis in twitter, that is, classifying tweets
according to the sentiment expressed in them. It can be positive, negative or neutral. Twitter
is an online micro-blogging and social-networking platform which allows users to write short
status updates of maximum length 140 characters. It is a rapidly expanding service with over
200 million registered users.
The growing popularity of social media has raised the opportunity for exploring and tracking
the response of new reforms and policies in India. Many researchers have been analysing the
tweets by citizens of a nation on Twitter which is a microblogging website where users read
and write millions of tweets on a variety of topics on daily basis. Our goal is to not only
classify the reactions based on the sentiments, but also to predict whether the upcoming
tweets on this issue is on a positive note, or a negative note[1].
In this paper, Twitter has been used as a forum to understand the sentiments of citizens of
India towards recently launched Goods and Services Tax by Indian Government on 1st July
2017. In this paper, R language is used to extract the tweets from Twitter and by using
machine learning algorithm like Naïve Bayes we provide research on twitter data stream.
The tweets originating in India before the implementation of GST has been analysed.
We have also analysed the tweets after the implementation of the GST. Then, we have
compared the tweets on the basis of anger, anticipation, disgust, fear, joy, sadness, and
surprise[1,2].
Keywords: Sentiment Analysis, Twitter, Word cloud, GST, Review, Opinion Mining
TABLE OF CONTENTS
1. Introduction
1.1 Motivation ……………………………………………………………………6
1.2 Domain Introduction …………………………………………………………6
1.3 Aim of our work ……………………………………………………………...7
1.4 Scope of our work ……………………………………………………………7
2. Review of Literature
2.1 Related work.…………………………………….……………………………8
6. References……………………………………………………………………………..19
Chapter 1 | Introduction
Goods and Services Tax (GST) is an indirect tax applicable throughout India which replaced
multiple taxes applied by the central and state governments. It was introduced as The
Constitution (One Hundred and First Amendment) Act 2017, following the passage of
Constitution 122nd Amendment Bill. The GST is governed by a GST Council and its
Chairman is the Finance Minister of India. Under GST, goods and services is taxed at the
following rates, 0%, 5%, 12%, 18%, 28%[3].
There is a special rate of 0.25% on rough precious and semi-precious stones and 3% on gold.
The Goods and Services Tax (GST), India's biggest tax reform in 70 years of independence,
was launched on the midnight of 30 June 2017 [2a] by the Prime Minister of India Narendra
Modi. The launch was marked by a historic midnight (June 30-July 1, 2017) session of both
the houses of parliament convened at the Central Hall of the Parliament[4].
1.1 Motivation
This project addresses the result of sentiment analysis using Twitter data. We are classifying
tweets according to the sentiment expressed in them: positive, negative or neutral. We have
chosen to work with Twitter since we feel it is a better approximation of public sentiment as
opposed to conventional internet articles and web blogs. The reason is that the amount of
relevant data is much larger for Twitter, as compared to traditional blogging sites. Moreover,
the response on Twitter is more prompt and also more general. Sometime, the result may vary
from the know fact but in majority of the case the result is correct[2].
Our aim is to understand the sentiments of people before GST was applied and the sentiments
of people after it was applied using Twitter data. This result will show how effective is the
implementation of GST in our country.
The output given by our project will help people to get more insight about GST and work on
analyzing the advantages and disadvantages of GST based on the analyzed sentiments.
Chapter 2 | Review of Literature
The task at this level goes to the sentences and determines whether each sentence
expressed a positive, neutral or negative opinion. The first step is to identify whether
the sentence is subjective or objective.
The task at this level is to determine whether a whole opinion document expresses
a positive, negative or neutral opinion.
Both the sentence and document level analysis do not discover what exactly people
liked and did not like. Instead of looking at language constructs, aspect level directly
looks at the opinion itself.
After retrieval of tweets, Sentiment analysis tool is applied on raw tweets but in most
of cases results to very poor performance.
Therefore, pre-processing techniques are necessary for obtaining better results. We
extract tweets i.e. short messages from twitter which are used as raw data. This raw
data needs to be pre-processed. So, pre-processing involves following steps:
3.3.2 Filtering
Filtering is nothing but cleaning of raw data. In this step, URL links (E.g.
http://twitter.com), special words in twitter (e.g.
“RT” which means ReTweet), user names in twitter (e.g. @abhi , @symbol
indicating a user name), and emoticons are removed.
Lexical analysis may be carried out using lexicon-based approach, which uses a set of
positive and negative words. A database, created by Hui Lui contains 2006 positive
and 4783 negative sentiment words, is loaded into R and the words in the tweets are
compared with the words in the database and the sentiment is predicted.
3.5 Calculating sentiment score
Using Scoring Function score of every tweet has been calculated using Hui Lui lexicons.
Sentiment Score = Σ positive words – Σ Negative words
Polarity types
(i) Positive polarity - Number of positive words are greater than number of negative
words.
(ii) Negative polarity - Number of negative words are greater than number of
positive words.
(iii) Neutral polarity - Number of positive and negative words are same or is no
existence of any opinion words.
3.6 Visualization
Before mining any data from Twitter using APIs, we have to authenticate with Twitter
using an application created on Twitter. Once the application is created, we get access
to consumer key, consumer secret, access token, access secret using which the API
has to authenticate itself with the Twitter Authentication server.
consumer_key<- "6mSYUluKJbpqgIIhbVTL3JpMK"
consumer_secret<- "GxuDV8KkkKrpslrQk6Xk5jhs9pZitDuzEO56tliKa513in2Fkb"
access_token<- "981146936077008896-v5URCf1uPUUYawCwJkaJYxab74mCu1C"
access_secret<- "xkgwj520PdKVYd7giqF18pbB7iAdKglxjyDuGx8x2Uv6S"
Once API is authenticated with Twitter Authentication service, a token is generated and is
made available to API for every transaction with the Twitter server. Using this token, tweets
are mined using hashtags. We use searchTwitter() function to access the data. In this work we
extract 1000 tweets on GST[8].
The polarity operation is applied on pre-processed data set, the data which contains
cleaned data with bigram features, the polarity function can generate the sentiment
scores for each tweet, if it is negative or positive tweets, and we need what are the
positives and negative from the public[7].
Earlier, the word-cloud and bar-graph represent the sentiment of people before the
implementation of GST. Now, the below word-cloud and bar graph represent the
sentiment of people after the GST come into action.
Figure 4.3: Word-cloud of GST(after GST come into affect)
Figure 4.4: Bar-graph of the sentiment of people after the implementation of GST
Figure 4.5: Pie-chart of the result
Before the implementation of GST, people have fear about it. Many people donot
know what will be the outcomes of GST, even though they are in fear and sorrow.
The condition of people can be easily viewed in the bar-graph. More than 60% people
have negative sentiment about the GST. Some section of society is in joy as they are
seeing some hope of benefit from the GST.
But, after the implementation of GST, the happiness of the majority of people can be
visible. About, 65% people have shown positive sentiment about the implementation
of GST. Some section of society is still in sorrow and fear[10].
Government is doing lot of changes in the GST so that every section of society get
equal benefit from it.
Chapter 5 | Summary and Conclusion
5.1 Conclusion
The increasing number of social media websites by Internet users has raised the interest about
the opportunity to understand the relation between people’s preferences and actual political
behaviour.
This study focuses on the question that whether the data from social networking sites can be
utilized to interpret the attitude of citizens of a nation towards various policies.
We analysed 3,000 twitter messages mentioning keyword viz. “GST” for two days, viz. one
day prior to announcement and on the day of announcement We have observed that
twitter is very commonly being used as a platform for deliberation by citizens of India. It has
been concluded that social media is a powerful and reliable source of public opinion as far as
a nation like India is concerned[6]. The discussions on twitter are equivalent to traditional
discussions and are capable enough to give a fair idea of emotions of general public. We have
done sentiment analysis of emotions of people which shows people’s acceptance for GST but
with too much of anticipation feeling.
In future, we plan to convert this analysis in real time corresponding to tweets arriving on
temporal scale. We want to extend our work on analysing the stock market exchange and the
effect of GST on stock market.
Chapter 6 | References
1. The constitution (one hundred and first amendment)act, Amendment No. 101 of Retrieved
from Internet,2016, 8.
2. India's midnight 'tryst with destiny': GST rolled out. Dynamite News.com. (ANI).
Retrieved from Internet,
2017.
5. Congress To Boycott GST Launch, Arun Jaitley Suggests Broader Shoulders, NDTV, 29,
2017.
6. Gloor PA, Krauss J, Nann S, Fischbach K, Schoder D. Web science 2.0: Identifying trends
through semantic
social network analysis CSE. International Conference on Computational Science and
Engineering, Vancouver,
BC, 2009; 4:215-222.
7. Barbera P. Birds of the same feather tweet together. Bayesian ideal point estimation using
twitter data, 2012.