Professional Documents
Culture Documents
Reinterpreting Corona As Coded Sentiments
Reinterpreting Corona As Coded Sentiments
Reinterpreting Corona As Coded Sentiments
by
DIVYA BHARTI
2018HT12498
DXC Technology,Noida
December 2020
Reinterpreting Corona as Coded Sentiments
By
DIVYA BHARTI
2018HT12498
DXC Technology,Noida
Associate Professor
MIT,Moradabad
December 2020
CERTIFICATE
This is to certify that the Dissertation entitled “Reinterpreting Corona as Coded
Sentiments” and submitted by DIVYA BHARTI having ID N0-2018HT12498 for the
partial fulfillment of the requirements of M.Tech.Software Systems, degree of BITS,
embodies the bonafide work done by him/her under my supervision.
--------------------
Signature of the Supervisor
Anurag Malik
Associate Professor
MIT,MORADABAD
Place:New Delhi
Date:28.09.20
Birla Institute of Technology & Science, Pilani
Work-Integrated Learning Programmes Division Second Semester
2019-2020
BITS ZG628T: Dissertation
ABSTRACT
ID No. : 2018HT12498
ABSTRACT:
Broad area of this project includes existing Sentimental Analysis techniques of Machine
Learning. The dissertation work aims in the following application areas.
i. Tweets Extraction:
Tweets possessing COVID as the keyword forms the input for our model.We are
going to use Twitter API for the same.
ii. Sentimental Analysis Model:
For the sentimental Analysis of the Tweets we are going to use Polarity
concept. The words present in the tweets will be used and we are
going to calculate polarity of those which will be our basis for
sentimental analysis.
iv. Python:
React for building the User Interface (UI) for this project.
Keywords:-API,Corona,COVID-19,Machine Learning,Python,Sentimental Analysis,Twitter
The satisfaction and euphoria that accompanies the successful completion of any task
would be incomplete without mentioning the people who made it possible, because
success is the epitome of hard work, perseverance, undeterred missionary zeal,
steadfast determination and most of all “Encouraging Guidance”.
I record my heart full of thanks and gratitude to my additional Examiner Diksha Gupta
for providing me an opportunity to carry this project, along with proposed guidance and
moral support extended to me throughout the duration of the project work.
Table of Contents
INTRODUCTION 1
OBJECTIVES 1
SCOPE OF WORK 2
DESIGN OF WORK 2
PLAN OF WORK 7
LITERATURE REFRENCES 9
0
INTRODUCTION:
In 2020,the world faced the pandemic CoronaVirus. People used different platforms
to express their views,one of them is Twitter. But during this time also “misinformation”
came into picture using social media platform. Reason being that this pandemic was
dealing with huge scattered data at the backend and the information coming in picture
was either incomplete or inaccurate.
Using Twitter for this project is considered ideal as in today’s world, it is the most
common platform where people express their sentiments.
In this work, I am trying to build a model which will do sentimental analysis used in
AI to visualize the complete picture of Covid-19.
OBJECTIVES:
1
SCOPE OF WORK:
DATA
Data Gathering (Twitter STORAGE
API)
Visualization
2
MID SEMESTER PROGRESS:
The below work has been done as part of Mid semester progress
1. Data Collection
2. Cleaning of the Data
3. Sentimental Analysis of the Sample Data
Data collection:
Twitter Data can be extracted through Twitter API. But recently Twitter has changed
endpoints for it. So, I have done the below coding in python to extract the tweet having
“COVID” as one of the words.
import tweepy
from textblob import TextBlob
from wordcloud import WordCloud
import pandas as pd
import time
import numpy as np
import re
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
consumer_key = "XXX"
consumer_secret = "XXX"
access_key = "XXX"
access_secret = "XXX"
tweets = []
# tweet information
tweets_df = pd.DataFrame(tweets_list, columns=['ID', 'Datetime', 'Text'])
tweets_df.to_csv(r'C:\Users\admin\Desktop\COVID.csv'.format(text_query), sep=',',
index=False)
except BaseException as e:
print('failed on_status,', str(e))
time.sleep(60*5)
query_to_csv(text_query, count)
3
Following Data is collected
COVID.csv
The extracted tweets contains unused links and signs(e.g., @). So first the Data is
cleaned and then using the Polarity concept Sentimental Analysis is done.Code for the
same is shown below:-
from typing import BinaryIO
import pandas as pd
import tweepy
from textblob import TextBlob
from wordcloud import WordCloud
import pandas as pd
import numpy as np
import re
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
pth = pd.read_csv(r'C:\Users\admin\Desktop\COVID.csv')
# print(df)
#mc = len(pth.columns)
#print(mc)
#print(clm)
col1=pth.iloc[:,2]
#print(col1)
df=pd.DataFrame(col1)
def cleanTxt(text):
text = re.sub(r'@[A-Za-z0-9]+', '', text)
text = re.sub(r'#', '', text)
text = re.sub(r'RT[\s]+', '', text)
text = re.sub(r'https?:\/\/S+', '', text)
text = re.sub(r':', '', text)
return text
def getSubjectivity(text):
return TextBlob(text).sentiment.subjectivity
def getPolarity(text):
return TextBlob(text).sentiment.polarity
print("len",len(df))
#for i in range(0,len(df)):
#print (df['Text'].loc[i])
df['Tweets']=df['Text'].apply(cleanTxt)
df['Subjectivity'] = df['Tweets'].apply(getSubjectivity)
df['Polarity'] = df['Tweets'].apply(getPolarity)
df['Analysis'] = df['Polarity'].apply(getAnalysis)
print(df)
#j = 1
#sortedDF = df.sort_values(by=['Polarity'])
#for i in range(0, sortedDF.shape[0]):
# if(sortedDF['Analysis'][i] == 'Positive'):
# print(str(j) + ')' +sortedDF['Tweets'][i])
# print()
# j= j + 1
plt.title('Sentiment Analysis')
plt.xlabel('Polarity')
plt.ylabel('Subjectivity')
plt.show()
df['Analysis'].value_counts()
plt.title('Sentiment Analysis')
plt.xlabel('Sentiment')
plt.ylabel('Counts')
df['Analysis'].value_counts().plot(kind='bar')
plt.show()
5
Output of the above code contains the subjectivity & Polarity of each tweet
6
FUTURE PLAN OF WORK:
1. Creating the Dashboard for the sentimental Analysis of Corona Tweets
2. Working on more Sample Data by creating a solution to extract more older
tweets.The earlier solution to extract old tweets is not working as Twitter has
changed the endpoints
3. Working on Framework that can be deployed on server.
PLAN OF WORK:
Data Extracting Data from 1th Sept 2020 15th Sept 2020 Completed
collection Twitter
Data Cleaning of the Data 16th Sept 2020 18st Sept2020 Completed
preprocessing Extracted
Sentimental Calculating the 19st Sep 2020 28th Sep 2020 Completed
Analysis sentimental Analysis
Testing Developing the UI and 29st sept 2020 30th Oct 2020 In Progress
through UI deploying it
interface
Dissertation Submit dissertation 1th Nov 2020 15st Nov 2020 To Be
7
Review to Supervisor & Completed
Additional
Examiner for
review &
feedback
Submission 16st Nov 2020 30th Nov 2020 To Be
Final review & submission Completed
of
dissertation
8
LITERATURE REFRENCES:
1. https://www.semanticscholar.org/paper/Word-frequency-and-sentiment-
analysis-of-twitter-Rajput-Grover/
ada0c18910fcb079778c996129bd671f25d27e9e
2. https://www.emeraldgrouppublishing.com/journal/idd/using-data-science-understand-
coronavirus-pandemic
3. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3574385
4. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7152888/
5. https://arxiv.org/abs/2004.03925l