Reinterpreting Corona As Coded Sentiments

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 17

Reinterpreting Corona as Coded Sentiments

BITS ZG628T: Dissertation

by

DIVYA BHARTI

2018HT12498

Dissertation work carried out at

DXC Technology,Noida

BIRLA INSTITUTE OF TECHNOLOGY & SCIENCE


PILANI (RAJASTHAN)

December 2020
Reinterpreting Corona as Coded Sentiments

BITS ZG628T: Dissertation

By

DIVYA BHARTI

2018HT12498

Dissertation work carried out at

DXC Technology,Noida

Submitted in partial fulfillment of M.Tech. Software Systems degree


programme

Under the Supervision of


Anurag Malik

Associate Professor
MIT,Moradabad

BIRLA INSTITUTE OF TECHNOLOGY & SCIENCE


PILANI (RAJASTHAN)

December 2020
CERTIFICATE
This is to certify that the Dissertation entitled “Reinterpreting Corona as Coded
Sentiments” and submitted by DIVYA BHARTI having ID N0-2018HT12498 for the
partial fulfillment of the requirements of M.Tech.Software Systems, degree of BITS,
embodies the bonafide work done by him/her under my supervision.

--------------------
Signature of the Supervisor
Anurag Malik
Associate Professor
MIT,MORADABAD

Place:New Delhi
Date:28.09.20
Birla Institute of Technology & Science, Pilani
Work-Integrated Learning Programmes Division Second Semester
2019-2020
BITS ZG628T: Dissertation
ABSTRACT

ID No. : 2018HT12498

NAME OF THE STUDENT : DIVYA BHARTI

EMAIL ADDRESS : divyabharti171990@gmil.com

STUDENT’S EMPLOYING : DXC Technology,Noida


ORGANIZATION & LOCATION

SUPERVISOR’S NAME : Anurag Malik

SUPERVISOR’S EMPLOYING : MIT,Moradabad


ORGANIZATION & LOCATION

SUPERVISOR’S EMAIL ADDRESS : Anurag_malik@rediffmail.com

DISSERTATION TITLE : Reinterpreting Corona as Coded Sentiments

ABSTRACT:

Broad area of this project includes existing Sentimental Analysis techniques of Machine
Learning. The dissertation work aims in the following application areas.
i. Tweets Extraction:
Tweets possessing COVID as the keyword forms the input for our model.We are
going to use Twitter API for the same.
ii. Sentimental Analysis Model:

This model will be used to predict the sentimental analysis related to


COVID19 using twitter platform.
iii. Machine learning model:

For the sentimental Analysis of the Tweets we are going to use Polarity
concept. The words present in the tweets will be used and we are
going to calculate polarity of those which will be our basis for
sentimental analysis.
iv. Python:

Python for building the machine learning model.


v. React

React for building the User Interface (UI) for this project.
Keywords:-API,Corona,COVID-19,Machine Learning,Python,Sentimental Analysis,Twitter

Signature of the Student Signature of the Supervisor

Name: Divya Bharti Name: Anurag Malik

Date: 28.09.20 Date: 25.09.2020


ACKNOWLEDGEMENTS

The satisfaction and euphoria that accompanies the successful completion of any task
would be incomplete without mentioning the people who made it possible, because
success is the epitome of hard work, perseverance, undeterred missionary zeal,
steadfast determination and most of all “Encouraging Guidance”.

I express my gratitude to my supervisor Anurag Malik for providing me a means of


attaining my most cherished goals.

I record my heart full of thanks and gratitude to my additional Examiner Diksha Gupta
for providing me an opportunity to carry this project, along with proposed guidance and
moral support extended to me throughout the duration of the project work.
Table of Contents

Content Page No.

INTRODUCTION 1

OBJECTIVES 1

SCOPE OF WORK 2

DESIGN OF WORK 2

MID SEMESTER PROGRESS 3

FUTURE PLAN OF WORK 7

PLAN OF WORK 7

LITERATURE REFRENCES 9

0
INTRODUCTION:

In 2020,the world faced the pandemic CoronaVirus. People used different platforms
to express their views,one of them is Twitter. But during this time also “misinformation”
came into picture using social media platform. Reason being that this pandemic was
dealing with huge scattered data at the backend and the information coming in picture
was either incomplete or inaccurate.
Using Twitter for this project is considered ideal as in today’s world, it is the most
common platform where people express their sentiments.
In this work, I am trying to build a model which will do sentimental analysis used in
AI to visualize the complete picture of Covid-19.

OBJECTIVES:

i. To build a Machine Learning Model that is going to do sentimental analysis of


the twitter data.
ii. To build a User Interface (UI) for the same.

1
SCOPE OF WORK:

i. Studying the various models to do sentimental analysis.


ii. Building a User Interface (UI) for taking for reflecting this analysis.
iii. Building a backend API that’s been called by the UI.

DESIGN OF THE PROJECT:

Below diagram shows the design of the project.

DATA
Data Gathering (Twitter STORAGE
API)

Polarity Calculation & SentiMental Analysis


Preprocessing-Cleaning the
Data
Text SentiWordNet W-WSD
Blob

Visualization

2
MID SEMESTER PROGRESS:
The below work has been done as part of Mid semester progress
1. Data Collection
2. Cleaning of the Data
3. Sentimental Analysis of the Sample Data

Data collection:

Twitter Data can be extracted through Twitter API. But recently Twitter has changed
endpoints for it. So, I have done the below coding in python to extract the tweet having
“COVID” as one of the words.

import tweepy
from textblob import TextBlob
from wordcloud import WordCloud
import pandas as pd
import time
import numpy as np
import re
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')

consumer_key = "XXX"
consumer_secret = "XXX"
access_key = "XXX"
access_secret = "XXX"

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)


auth.set_access_token(access_key, access_secret)
api = tweepy.API(auth)
api = tweepy.API(auth, wait_on_rate_limit=True)

tweets = []

def query_to_csv(text_query, count):


try:
tweets = tweepy.Cursor(api.search, q=text_query).items(count)
tweets_list = [[tweet.id, tweet.created_at, tweet.text] for tweet in tweets]

# tweet information
tweets_df = pd.DataFrame(tweets_list, columns=['ID', 'Datetime', 'Text'])

tweets_df.to_csv(r'C:\Users\admin\Desktop\COVID.csv'.format(text_query), sep=',',
index=False)

except BaseException as e:
print('failed on_status,', str(e))
time.sleep(60*5)

# X is number of tweets to be retrieved


text_query = 'COVID'
count = 7000

query_to_csv(text_query, count)

3
Following Data is collected

COVID.csv

2.Cleaning Of Data & Sentimental Analysis

The extracted tweets contains unused links and signs(e.g., @). So first the Data is
cleaned and then using the Polarity concept Sentimental Analysis is done.Code for the
same is shown below:-
from typing import BinaryIO

import pandas as pd
import tweepy
from textblob import TextBlob
from wordcloud import WordCloud
import pandas as pd
import numpy as np
import re
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')

pth = pd.read_csv(r'C:\Users\admin\Desktop\COVID.csv')
# print(df)

#for col in df.columns.values:


# print(col)

#mc = len(pth.columns)
#print(mc)
#print(clm)
col1=pth.iloc[:,2]
#print(col1)
df=pd.DataFrame(col1)

def cleanTxt(text):
text = re.sub(r'@[A-Za-z0-9]+', '', text)
text = re.sub(r'#', '', text)
text = re.sub(r'RT[\s]+', '', text)
text = re.sub(r'https?:\/\/S+', '', text)
text = re.sub(r':', '', text)

return text
def getSubjectivity(text):
return TextBlob(text).sentiment.subjectivity

def getPolarity(text):
return TextBlob(text).sentiment.polarity

print("len",len(df))
#for i in range(0,len(df)):

#print (df['Text'].loc[i])
df['Tweets']=df['Text'].apply(cleanTxt)
df['Subjectivity'] = df['Tweets'].apply(getSubjectivity)
df['Polarity'] = df['Tweets'].apply(getPolarity)

allWords = ''.join([twts for twts in df['Tweets']])


4
wordCloud = WordCloud(width=500, height=300, random_state=21,
max_font_size=119).generate(allWords)
plt.imshow(wordCloud, interpolation="bilinear")
plt.axis('off')
plt.show()
def getAnalysis(score):
if score < 0:
return 'Negative'
elif score ==0:
return 'Neutral'
else:
return 'Positive'

df['Analysis'] = df['Polarity'].apply(getAnalysis)
print(df)

#print all the positive tweets

#j = 1
#sortedDF = df.sort_values(by=['Polarity'])
#for i in range(0, sortedDF.shape[0]):
# if(sortedDF['Analysis'][i] == 'Positive'):
# print(str(j) + ')' +sortedDF['Tweets'][i])
# print()
# j= j + 1

#print negative tweets


#j = 1
#sortedDF = df.sort_values(by=['Polarity'], ascending = 'False')
#for i in range(0, sortedDF.shape[0]):
# if(sortedDF['Analysis'][i] == 'Negative'):
# print(str(j) + ')' +sortedDF['Tweets'][i])
# print()
# j= j + 1

#Plot polarity & subjectivity


plt.figure(figsize=(8,6))
for i in range(0,df.shape[0]):
plt.scatter(df['Polarity'][i],df['Subjectivity'][i], color='Blue')

plt.title('Sentiment Analysis')
plt.xlabel('Polarity')
plt.ylabel('Subjectivity')
plt.show()

#Get the % of positive tweets

ptweets = df[df.Analysis == 'Positive']


ptweets = ptweets['Tweets']

f = round( (ptweets.shape[0] / df.shape[0]) *100 , 1)


print(f)

#Get the % of negative tweets

ntweets = df[df.Analysis == 'Negative']


ntweets = ntweets['Tweets']

f = round( (ntweets.shape[0] / df.shape[0]) *100 , 1)


print(f)

#show the value counts

df['Analysis'].value_counts()

plt.title('Sentiment Analysis')
plt.xlabel('Sentiment')
plt.ylabel('Counts')
df['Analysis'].value_counts().plot(kind='bar')
plt.show()
5
Output of the above code contains the subjectivity & Polarity of each tweet

6
FUTURE PLAN OF WORK:
1. Creating the Dashboard for the sentimental Analysis of Corona Tweets
2. Working on more Sample Data by creating a solution to extract more older
tweets.The earlier solution to extract old tweets is not working as Twitter has
changed the endpoints
3. Working on Framework that can be deployed on server.

PLAN OF WORK:

The tentative timelines for the above work are:

Phases Description of Work Start date End date Status


Dissertation Literature Review & 10th Aug 2020 31h Aug 2020 Completed
outline prepare
Dissertation outline
report

Data Extracting Data from 1th Sept 2020 15th Sept 2020 Completed
collection Twitter

Data Cleaning of the Data 16th Sept 2020 18st Sept2020 Completed
preprocessing Extracted

Sentimental Calculating the 19st Sep 2020 28th Sep 2020 Completed
Analysis sentimental Analysis
Testing Developing the UI and 29st sept 2020 30th Oct 2020 In Progress
through UI deploying it
interface
Dissertation Submit dissertation 1th Nov 2020 15st Nov 2020 To Be
7
Review to Supervisor & Completed
Additional
Examiner for
review &
feedback
Submission 16st Nov 2020 30th Nov 2020 To Be
Final review & submission Completed
of
dissertation

8
LITERATURE REFRENCES:
1. https://www.semanticscholar.org/paper/Word-frequency-and-sentiment-
analysis-of-twitter-Rajput-Grover/
ada0c18910fcb079778c996129bd671f25d27e9e

2. https://www.emeraldgrouppublishing.com/journal/idd/using-data-science-understand-
coronavirus-pandemic
3. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3574385
4. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7152888/
5. https://arxiv.org/abs/2004.03925l

You might also like