Professional Documents
Culture Documents
ARTICLE 370:r Studio Case Study
ARTICLE 370:r Studio Case Study
ARTICLE 370:r Studio Case Study
Submitted by:
Ayushi Agarwal 19070243003
Varun Shrivastava 19070243023
Project Guide:
Om Prakash Lalchandani
INTRODUCTION
After weeks and days of intense speculation about the situation in Jammu and Kashmir, the
Narendra Modi government finally revealed its cards. On 5th August 2019, Home minister
Amit Shah announced in Parliament the suspension of Article 370.
There has been an exponential surge in the online activity especially in twitter where people
share their views & displeasures etc. So here, we are going to analyses what people are
posting on Twitter about this, using data science skills. Using this analysis, we can better
understand whether the decision is good or bad as per people across the globe.
Twitter is an online news and social networking service that enables users to send and read
short 140-character messages called "tweets". Hence Twitter is a public platform with a
mine of public opinion of people all over the world and of all age categories.
Sentiment Analysis is a subset of the Natural Language Processing (NLP) is the process of
determining the emotional tone behind a series of words, used to gain an understanding of
the attitudes, opinions and emotions expressed within an online mention.
Twitter Text analysis generates a means of following the feelings and approaches on the
web and determines if they are positively or negatively received by the followers.
ARTICLE 370
Article 370 of the Indian Constitution is a “temporary provision” which grants special
autonomous status to Jammu and Kashmir. This means that all the provisions of the
constitution related to citizenship, ownership of property, and fundamental rights that are
applicable to other states are not applicable to J&K.
1|Page
Various phases of Data Science project are:
1. REQUIREMENTS UNDERSTANDING
We will try to analyse the sentiments of tweets which contain hashtag #article370.
The objective is to find out whether Scrapping of Article 370 is good or bad for India?
The code is divided into following parts:
We will be using programming language as R where we will perform text mining. Data
Collection of tweets is done directly through Twitter API.
Since twitter application is built for the purpose of analysing data, it is easy to get the
tweets and perform analysis with large set of packages available.
Also, R Studio is an open source software that provides easy interface to statistical
computing and analysis.
Thus the project is feasible economically, technically
2|Page
Steps taken are:
Creating Twitter Application:
3|Page
Next, we will transform those tweets into a data frame format using function
twListToDF which is more understandable & workable.
Data Cleaning:
We will build a corpus of those tweets using function Corpus which is available in tm
package. The corpus needs to be cleaned for better analysis. We need to remove stop
words, punctuations, stripping white spaces, removing numbers, converting it to
lower case & all. Basically, these are things which don’t express any emotions.
4|Page
The tweets are cleaned by:
• Removing extra punctuations
• Redundant blank spaces
• URLS
• Remove Controls and special characters
• Remove Emoticons
• Remove Mentions
The corpus contains the tweet part, hashtags, and URLs. We need to remove hashtags
and URLs from it so that we are left only with the main tweet part to run our sentiment
analysis. Ideally we will write a function for it & apply it on corpus.
4. MODEL/PROTOTYPE DEVELOPMENT
Now, a cleaned corpus is transformed into document term matrix. A document terms
matrix represents frequency of every word present in the corpus. Column names are
words and row names are documents.
5|Page
After accomplishing all the tasks, the sentiment of the user will be able to plot
histogram and other plots to visualize.
6|Page
The above word cloud shows that most frequently used words in the tweets are
article, Kashmir, India, decision & so on. The different colours and size of the words
indicate their frequency. For example, ‘Article’ has higher frequency than other words.
Followed by decision of government on Kashmir & so on.
The finest tools that lets us to visualize maximum of the words and terms contained
in tweets can be done with word cloud.
7|Page
5. COMMUNICATION / PRESENTATION
Our main aim is to analyse the sentiments of people around article 370. The analysis
will consist of eight different emotions and two sentiments positive and negative.
Getting 8 different Emotions (Anger, anticipation, disgust, fear, joy, sadness, surprise,
trust) And their corresponding Valence from NRC Dictionary
Above Bar graph representation is used to visualize the various sentiments behind
tweets. As expected, positive is highest followed by Trust & Negative.
8|Page
CONCLUSION
Our conclusion is purely based on the 15,412 number of tweets we pulled out from Twitter
API on 12th October 2019.
There were mostly positive reactions from people since it has highest sentiment scores
followed by Negative & Trust. This means there are large number of people who thinks that
this decision to revoke article 370 will bring positive changes. However, there are almost
comparable number of people who think that it will open pandora’s box unnecessary.
So, most of the Indians are in favour of abolishing or scrapping ‘The Article 370’.
LIMITATIONS
• Not effective in detecting sarcasm
• Cannot get 100% efficiency in analyzing sentiments
• Giving a hashtag under wrong category will still give results. No error messages.
9|Page