Reinterpreting Corona As Coded Sentiments

Reinterpreting Corona as Coded Sentiments
BITS ZG628T: Dissertation
by
DIVYA BHARTI
2018HT12498
Dissertation work carried out at
DXC Technology,Noida
BIRLA INSTITUTE OF TECHNOLOGY & SCIENCE

PILANI (RAJASTHAN)
December 2020
Reinterpreting Corona as Coded Sentiments
By
DIVYA BHARTI
2018HT12498
Dissertation work carried out at
DXC Technology,Noida
Submitted in partial fulfillment of M.Tech. Software Systems degree

programme
Under the Supervision of

Anurag Malik
Associate Professor
MIT,Moradabad
BIRLA INSTITUTE OF TECHNOLOGY & SCIENCE

PILANI (RAJASTHAN)
December 2020
CERTIFICATE
This is to certify that the Dissertation entitled “Reinterpreting Corona as Coded
Sentiments” and submitted by DIVYA BHARTI having ID N0-2018HT12498 for the
partial fulfillment of the requirements of M.Tech.Software Systems, degree of BITS,
embodies the bonafide work done by him/her under my supervision.
--------------------
Signature of the Supervisor
Anurag Malik
Associate Professor
MIT,MORADABAD
Place:New Delhi
Date:28.09.20
Birla Institute of Technology & Science, Pilani
Work-Integrated Learning Programmes Division Second Semester
2019-2020
ABSTRACT
ID No. : 2018HT12498
NAME OF THE STUDENT : DIVYA BHARTI
EMAIL ADDRESS : divyabharti171990@gmil.com
STUDENT’S EMPLOYING : DXC Technology,Noida

ORGANIZATION & LOCATION
SUPERVISOR’S NAME : Anurag Malik
SUPERVISOR’S EMPLOYING : MIT,Moradabad

ORGANIZATION & LOCATION
SUPERVISOR’S EMAIL ADDRESS : Anurag_malik@rediffmail.com
DISSERTATION TITLE : Reinterpreting Corona as Coded Sentiments
ABSTRACT:
Broad area of this project includes existing Sentimental Analysis techniques of Machine
Learning. The dissertation work aims in the following application areas.
i. Tweets Extraction:
Tweets possessing COVID as the keyword forms the input for our model.We are
going to use Twitter API for the same.
ii. Sentimental Analysis Model:
This model will be used to predict the sentimental analysis related to

COVID19 using twitter platform.
iii. Machine learning model:
For the sentimental Analysis of the Tweets we are going to use Polarity
concept. The words present in the tweets will be used and we are
going to calculate polarity of those which will be our basis for
sentimental analysis.
iv. Python:
Python for building the machine learning model.

v. React
React for building the User Interface (UI) for this project.
Keywords:-API,Corona,COVID-19,Machine Learning,Python,Sentimental Analysis,Twitter
Signature of the Student Signature of the Supervisor
Name: Divya Bharti Name: Anurag Malik
Date: 28.09.20 Date: 25.09.2020

ACKNOWLEDGEMENTS
The satisfaction and euphoria that accompanies the successful completion of any task
would be incomplete without mentioning the people who made it possible, because
success is the epitome of hard work, perseverance, undeterred missionary zeal,
steadfast determination and most of all “Encouraging Guidance”.
I express my gratitude to my supervisor Anurag Malik for providing me a means of

attaining my most cherished goals.
I record my heart full of thanks and gratitude to my additional Examiner Diksha Gupta
for providing me an opportunity to carry this project, along with proposed guidance and
moral support extended to me throughout the duration of the project work.
Table of Contents
Content Page No.
INTRODUCTION 1
OBJECTIVES 1
SCOPE OF WORK 2
DESIGN OF WORK 2
MID SEMESTER PROGRESS 3
FUTURE PLAN OF WORK 7
PLAN OF WORK 7
LITERATURE REFRENCES 9
0
INTRODUCTION:
In 2020,the world faced the pandemic CoronaVirus. People used different platforms
to express their views,one of them is Twitter. But during this time also “misinformation”
came into picture using social media platform. Reason being that this pandemic was
dealing with huge scattered data at the backend and the information coming in picture
was either incomplete or inaccurate.
Using Twitter for this project is considered ideal as in today’s world, it is the most
common platform where people express their sentiments.
In this work, I am trying to build a model which will do sentimental analysis used in
AI to visualize the complete picture of Covid-19.
OBJECTIVES:
i. To build a Machine Learning Model that is going to do sentimental analysis of

the twitter data.
ii. To build a User Interface (UI) for the same.
1
SCOPE OF WORK:
i. Studying the various models to do sentimental analysis.

ii. Building a User Interface (UI) for taking for reflecting this analysis.
iii. Building a backend API that’s been called by the UI.
DESIGN OF THE PROJECT:
Below diagram shows the design of the project.
DATA
Data Gathering (Twitter STORAGE
API)
Polarity Calculation & SentiMental Analysis

Preprocessing-Cleaning the
Data
Text SentiWordNet W-WSD
Blob
Visualization
2
MID SEMESTER PROGRESS:
The below work has been done as part of Mid semester progress
1. Data Collection
2. Cleaning of the Data
3. Sentimental Analysis of the Sample Data
Data collection:
Twitter Data can be extracted through Twitter API. But recently Twitter has changed
endpoints for it. So, I have done the below coding in python to extract the tweet having
“COVID” as one of the words.
import tweepy
from textblob import TextBlob
from wordcloud import WordCloud
import pandas as pd
import time
import numpy as np
import re
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
consumer_key = "XXX"
consumer_secret = "XXX"
access_key = "XXX"
access_secret = "XXX"
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)

auth.set_access_token(access_key, access_secret)
api = tweepy.API(auth)
api = tweepy.API(auth, wait_on_rate_limit=True)
tweets = []
def query_to_csv(text_query, count):

try:
tweets = tweepy.Cursor(api.search, q=text_query).items(count)
tweets_list = [[tweet.id, tweet.created_at, tweet.text] for tweet in tweets]
# tweet information
tweets_df = pd.DataFrame(tweets_list, columns=['ID', 'Datetime', 'Text'])
tweets_df.to_csv(r'C:\Users\admin\Desktop\COVID.csv'.format(text_query), sep=',',
index=False)
except BaseException as e:
print('failed on_status,', str(e))
time.sleep(60*5)
# X is number of tweets to be retrieved

text_query = 'COVID'
count = 7000
query_to_csv(text_query, count)
3
Following Data is collected
COVID.csv
2.Cleaning Of Data & Sentimental Analysis
The extracted tweets contains unused links and signs(e.g., @). So first the Data is
cleaned and then using the Polarity concept Sentimental Analysis is done.Code for the
same is shown below:-
from typing import BinaryIO
import pandas as pd
import tweepy
from textblob import TextBlob
from wordcloud import WordCloud
import pandas as pd
import numpy as np
import re
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
pth = pd.read_csv(r'C:\Users\admin\Desktop\COVID.csv')
# print(df)
#for col in df.columns.values:

# print(col)
#mc = len(pth.columns)
#print(mc)
#print(clm)
col1=pth.iloc[:,2]
#print(col1)
df=pd.DataFrame(col1)
def cleanTxt(text):
text = re.sub(r'@[A-Za-z0-9]+', '', text)
text = re.sub(r'#', '', text)
text = re.sub(r'RT[\s]+', '', text)
text = re.sub(r'https?:\/\/S+', '', text)
text = re.sub(r':', '', text)
return text
def getSubjectivity(text):
return TextBlob(text).sentiment.subjectivity
def getPolarity(text):
return TextBlob(text).sentiment.polarity
print("len",len(df))
#for i in range(0,len(df)):
#print (df['Text'].loc[i])
df['Tweets']=df['Text'].apply(cleanTxt)
df['Subjectivity'] = df['Tweets'].apply(getSubjectivity)
df['Polarity'] = df['Tweets'].apply(getPolarity)
allWords = ''.join([twts for twts in df['Tweets']])

4
wordCloud = WordCloud(width=500, height=300, random_state=21,
max_font_size=119).generate(allWords)
plt.imshow(wordCloud, interpolation="bilinear")
plt.axis('off')
plt.show()
def getAnalysis(score):
if score < 0:
return 'Negative'
elif score ==0:
return 'Neutral'
else:
return 'Positive'
df['Analysis'] = df['Polarity'].apply(getAnalysis)
print(df)
#print all the positive tweets
#j = 1
#sortedDF = df.sort_values(by=['Polarity'])
#for i in range(0, sortedDF.shape[0]):
# if(sortedDF['Analysis'][i] == 'Positive'):
# print(str(j) + ')' +sortedDF['Tweets'][i])
# print()
# j= j + 1
#print negative tweets

#j = 1
#sortedDF = df.sort_values(by=['Polarity'], ascending = 'False')
#for i in range(0, sortedDF.shape[0]):
# if(sortedDF['Analysis'][i] == 'Negative'):
# print(str(j) + ')' +sortedDF['Tweets'][i])
# print()
# j= j + 1
#Plot polarity & subjectivity

plt.figure(figsize=(8,6))
for i in range(0,df.shape[0]):
plt.scatter(df['Polarity'][i],df['Subjectivity'][i], color='Blue')
plt.title('Sentiment Analysis')
plt.xlabel('Polarity')
plt.ylabel('Subjectivity')
plt.show()
#Get the % of positive tweets
ptweets = df[df.Analysis == 'Positive']

ptweets = ptweets['Tweets']
f = round( (ptweets.shape[0] / df.shape[0]) *100 , 1)

print(f)
#Get the % of negative tweets
ntweets = df[df.Analysis == 'Negative']

ntweets = ntweets['Tweets']
f = round( (ntweets.shape[0] / df.shape[0]) *100 , 1)

print(f)
#show the value counts
df['Analysis'].value_counts()
plt.title('Sentiment Analysis')
plt.xlabel('Sentiment')
plt.ylabel('Counts')
df['Analysis'].value_counts().plot(kind='bar')
plt.show()
5
Output of the above code contains the subjectivity & Polarity of each tweet
6
FUTURE PLAN OF WORK:
1. Creating the Dashboard for the sentimental Analysis of Corona Tweets
2. Working on more Sample Data by creating a solution to extract more older
tweets.The earlier solution to extract old tweets is not working as Twitter has
changed the endpoints
3. Working on Framework that can be deployed on server.
PLAN OF WORK:
The tentative timelines for the above work are:
Phases Description of Work Start date End date Status

Dissertation Literature Review & 10th Aug 2020 31h Aug 2020 Completed
outline prepare
Dissertation outline
report
Data Extracting Data from 1th Sept 2020 15th Sept 2020 Completed
collection Twitter
Data Cleaning of the Data 16th Sept 2020 18st Sept2020 Completed
preprocessing Extracted
Sentimental Calculating the 19st Sep 2020 28th Sep 2020 Completed
Analysis sentimental Analysis
Testing Developing the UI and 29st sept 2020 30th Oct 2020 In Progress
through UI deploying it
interface
Dissertation Submit dissertation 1th Nov 2020 15st Nov 2020 To Be
7
Review to Supervisor & Completed
Additional
Examiner for
review &
feedback
Submission 16st Nov 2020 30th Nov 2020 To Be
Final review & submission Completed
of
dissertation
8
LITERATURE REFRENCES:
1. https://www.semanticscholar.org/paper/Word-frequency-and-sentiment-
analysis-of-twitter-Rajput-Grover/
ada0c18910fcb079778c996129bd671f25d27e9e
2. https://www.emeraldgrouppublishing.com/journal/idd/using-data-science-understand-
coronavirus-pandemic
3. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3574385
4. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7152888/
5. https://arxiv.org/abs/2004.03925l

Reinterpreting Corona As Coded Sentiments

Uploaded by

Copyright:

Available Formats

You might also like

Reinterpreting Corona As Coded Sentiments

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Reinterpreting Corona As Coded Sentiments

Uploaded by

Copyright:

Available Formats

Reinterpreting Corona as Coded Sentiments

BITS ZG628T: Dissertation

Dissertation work carried out at

BIRLA INSTITUTE OF TECHNOLOGY & SCIENCE

BITS ZG628T: Dissertation

Dissertation work carried out at

Submitted in partial fulfillment of M.Tech. Software Systems degree

Under the Supervision of

BIRLA INSTITUTE OF TECHNOLOGY & SCIENCE

NAME OF THE STUDENT : DIVYA BHARTI

EMAIL ADDRESS : divyabharti171990@gmil.com

STUDENT’S EMPLOYING : DXC Technology,Noida

SUPERVISOR’S NAME : Anurag Malik

SUPERVISOR’S EMPLOYING : MIT,Moradabad

SUPERVISOR’S EMAIL ADDRESS : Anurag_malik@rediffmail.com

DISSERTATION TITLE : Reinterpreting Corona as Coded Sentiments

This model will be used to predict the sentimental analysis related to

Python for building the machine learning model.

Signature of the Student Signature of the Supervisor

Name: Divya Bharti Name: Anurag Malik

Date: 28.09.20 Date: 25.09.2020

I express my gratitude to my supervisor Anurag Malik for providing me a means of

Content Page No.

MID SEMESTER PROGRESS 3

FUTURE PLAN OF WORK 7

i. To build a Machine Learning Model that is going to do sentimental analysis of

i. Studying the various models to do sentimental analysis.

DESIGN OF THE PROJECT:

Below diagram shows the design of the project.

Polarity Calculation & SentiMental Analysis

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)

def query_to_csv(text_query, count):

# X is number of tweets to be retrieved

2.Cleaning Of Data & Sentimental Analysis

#for col in df.columns.values:

allWords = ''.join([twts for twts in df['Tweets']])

#print all the positive tweets

#print negative tweets

#Plot polarity & subjectivity

#Get the % of positive tweets

ptweets = df[df.Analysis == 'Positive']

f = round( (ptweets.shape[0] / df.shape[0]) *100 , 1)

#Get the % of negative tweets

ntweets = df[df.Analysis == 'Negative']

f = round( (ntweets.shape[0] / df.shape[0]) *100 , 1)

#show the value counts

The tentative timelines for the above work are:

Phases Description of Work Start date End date Status

You might also like