ARTICLE 370:r Studio Case Study

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

PROJECT REPORT ON

TWITTER SENTIMENT ANALYSIS ON


Is Scrapping of Article 370 Justified?

Submitted by:
Ayushi Agarwal 19070243003
Varun Shrivastava 19070243023

Symbiosis Institute of Geoinformatics, Pune


MSc Data Science and Spatial Analytics
(2019-2021)

Project Guide:
Om Prakash Lalchandani
INTRODUCTION

After weeks and days of intense speculation about the situation in Jammu and Kashmir, the
Narendra Modi government finally revealed its cards. On 5th August 2019, Home minister
Amit Shah announced in Parliament the suspension of Article 370.
There has been an exponential surge in the online activity especially in twitter where people
share their views & displeasures etc. So here, we are going to analyses what people are
posting on Twitter about this, using data science skills. Using this analysis, we can better
understand whether the decision is good or bad as per people across the globe.

WHAT IS TWITTER SENTIMENT ANALYSIS?

Twitter is an online news and social networking service that enables users to send and read
short 140-character messages called "tweets". Hence Twitter is a public platform with a
mine of public opinion of people all over the world and of all age categories.

Sentiment Analysis is a subset of the Natural Language Processing (NLP) is the process of
determining the emotional tone behind a series of words, used to gain an understanding of
the attitudes, opinions and emotions expressed within an online mention.

Twitter Text analysis generates a means of following the feelings and approaches on the
web and determines if they are positively or negatively received by the followers.

ARTICLE 370

Article 370 of the Indian Constitution is a “temporary provision” which grants special
autonomous status to Jammu and Kashmir. This means that all the provisions of the
constitution related to citizenship, ownership of property, and fundamental rights that are
applicable to other states are not applicable to J&K.

1|Page
Various phases of Data Science project are:

1. REQUIREMENTS UNDERSTANDING

We will try to analyse the sentiments of tweets which contain hashtag #article370.
The objective is to find out whether Scrapping of Article 370 is good or bad for India?
The code is divided into following parts:

• Extracting tweets using Twitter application


• Cleaning the tweets for further analysis
• Getting sentiment score & Plotting word frequencies
• Wordcloud & Sentiment Analysis

We will be using programming language as R where we will perform text mining. Data
Collection of tweets is done directly through Twitter API.

2. ANALYSIS AND FEASIBILITY STUDY

Since twitter application is built for the purpose of analysing data, it is easy to get the
tweets and perform analysis with large set of packages available.
Also, R Studio is an open source software that provides easy interface to statistical
computing and analysis.
Thus the project is feasible economically, technically

3. DATA PREPARATION – cleaning, formatting

Tools and Packages used:


twitteR: Twitter web API that provides an interface
ROAuth: provides an interface to the OAuth 1.0 specification which allows user to
connect to the server and authenticate themselves.
stringr: it makes String functions more reliable, simple and easy to use. It does this by
confirming that arguments and names are dependable, all functions deal with NA and
zero length character easily.
ggplot2: helps build plot step by step.
tm: framework for text mining applications inside R
wordcloud: package supports in creating pretty viewing word clouds in text mining.
sentiment: package with tools for sentiment analysis for positive / negative or
emotion classification
RCurl: provides functions to allow one to compose general HTTP requests and provide
functions to fetch URIs
RJSONIO: package that allows conversion to and from data in JSON format

2|Page
Steps taken are:
Creating Twitter Application:

• First, we need to create a developer account so as to access the services and


facilities of Twitter.
• Then create an App in twitter and generate Key and Secret Key to authenticate
yourself
• Next, invoke Twitter API using the app we have created and using the keys and
access tokens we got through the app.

Extracting and saving tweets:


Once authorization is completed, we can extract tweets and save it in correct data
frame format for further analysis. We tried to fetch 10,000 tweets pertaining to
hashtag article 370 from date 5th Aug 2019 (date of suspension announcement)

3|Page
Next, we will transform those tweets into a data frame format using function
twListToDF which is more understandable & workable.

Data Cleaning:

We will build a corpus of those tweets using function Corpus which is available in tm
package. The corpus needs to be cleaned for better analysis. We need to remove stop
words, punctuations, stripping white spaces, removing numbers, converting it to
lower case & all. Basically, these are things which don’t express any emotions.

4|Page
The tweets are cleaned by:
• Removing extra punctuations
• Redundant blank spaces
• URLS
• Remove Controls and special characters
• Remove Emoticons
• Remove Mentions

The corpus contains the tweet part, hashtags, and URLs. We need to remove hashtags
and URLs from it so that we are left only with the main tweet part to run our sentiment
analysis. Ideally we will write a function for it & apply it on corpus.

4. MODEL/PROTOTYPE DEVELOPMENT

Now, a cleaned corpus is transformed into document term matrix. A document terms
matrix represents frequency of every word present in the corpus. Column names are
words and row names are documents.

5|Page
After accomplishing all the tasks, the sentiment of the user will be able to plot
histogram and other plots to visualize.

Plot word frequencies:


The frequency of the first 10 frequent words are plotted below

6|Page
The above word cloud shows that most frequently used words in the tweets are
article, Kashmir, India, decision & so on. The different colours and size of the words
indicate their frequency. For example, ‘Article’ has higher frequency than other words.
Followed by decision of government on Kashmir & so on.

The finest tools that lets us to visualize maximum of the words and terms contained
in tweets can be done with word cloud.

7|Page
5. COMMUNICATION / PRESENTATION

Our main aim is to analyse the sentiments of people around article 370. The analysis
will consist of eight different emotions and two sentiments positive and negative.
Getting 8 different Emotions (Anger, anticipation, disgust, fear, joy, sadness, surprise,
trust) And their corresponding Valence from NRC Dictionary

Above Bar graph representation is used to visualize the various sentiments behind
tweets. As expected, positive is highest followed by Trust & Negative.

8|Page
CONCLUSION
Our conclusion is purely based on the 15,412 number of tweets we pulled out from Twitter
API on 12th October 2019.

There were mostly positive reactions from people since it has highest sentiment scores
followed by Negative & Trust. This means there are large number of people who thinks that
this decision to revoke article 370 will bring positive changes. However, there are almost
comparable number of people who think that it will open pandora’s box unnecessary.
So, most of the Indians are in favour of abolishing or scrapping ‘The Article 370’.

LIMITATIONS
• Not effective in detecting sarcasm
• Cannot get 100% efficiency in analyzing sentiments
• Giving a hashtag under wrong category will still give results. No error messages.

9|Page

You might also like