Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

Business Data

Management
DISC 325
SUBMITTED TO:
DR. USSAMA YAQUB
SUBMITTED BY:
(GROUP 14)
HARIS SAEED 21110274
SYED QAIM ABBAS 21110264
TALHA REHAN 21110351
HANAN NAUSHAHI 22110296
Data Extraction

➢ The dataset consisted of multiple json objects that are written line
by line.
➢ Used a for loop to check if the line is blank and then it is checked
whether it is json object, and it is loaded by json.loads, and this
gives 1 tweet as a json object.
➢ The list of all tweets is appended through tweets.append similarly.
➢ All the tweets are now stored as json objects which are loaded
through pandas and json_normalize deals with the nested json
objects within the dataframe and the final dataframe is extracted
for further cleaning.
Data Extraction (contd.)

 Figure shows the head of initial data-frame (df) that was initially extracted consisting of 378 columns and 65795 rows.
Data Cleaning
A dataframe (ddf) was extracted ➢ User created at(date and time)
from df consisting of following ➢ Location of User
variables
➢ Language of the user’s tweet
➢ Tweet Created at (Date and time)
➢ User verification info (true or false)
➢ Tweet id
➢ User protection info (true or false)
➢ Text of tweet
➢ The number of retweets for the tweet made
➢ Source of tweet (Android phone, by user
IOS phone, Windows etc.)
➢ The username of author of tweet
➢ The user id of author of tweet
➢ The user description
➢ User’s URL
➢ No of followers of user
➢ Friends count for user (No of
users this user is following
➢ Favorites(Cumulative likes, the
tweets of user has received)
➢ Statuses count(No of tweets user
has made since creation)
Exploratory Data analysis

The dataframe (ddf) which is subset of original dataframe (df) (loaded as a pandas
dataframe) was subjected to various programs to give the following analysis:
➢No. of unique users are 38400
➢Out of these 38400 unique users 291 are verified users, 38109 are unverified.
➢Total number of tweets are 64511
➢Out of these tweets 49306 are retweets
➢18077 tweets have URL within them
➢There are total number of 56600 mentions (@) within text field
➢Out of all the tweets, 7911 tweets are replies
Tweets Count By Device

The field of "source" contained the URL's in which the devices used to make those tweets were stored. They were
extracted out through a code which ignored the URL and just stored the device name in "source field". This
visualization shows device wise tweets count in our dataframe.
Frequency of Tweets Over 22 Minutes

The dataframe contained a field of "created_at" which consisted of date and time for the tweets. The variable was
converted to "datetime" datatype and the frequency of tweets over the 22 minutes (19:01-19:23) was plotted, which
shows that the frequency revolves around 3000 tweets over the entire time period.
Tweets Count by Language

The following graph shows tweets count by language. As it is clear from the graph, almost all the tweets are in
English, followed by Undefined Language, followed by Spanish and French respectively.
Sentiment Analysis

➢ Sentiment analysis is used to check the sentiment of the text that is given to textblob library

➢ To check sentiment of people towards Donald Trump and Joe Biden, the whole dataframe(ddf) was divided
into two dataframes Trump_data and Biden_data, using the analogues of @realdonaldtrump and @joebiden
in “text” field of ddf.

➢ Using textblob library the polarity for each tweet for Trump_data and Biden Data was determined. The
polarity for sentiments can range from -1 to +1. The negative polarity indicates negative sentiment and
positive polarity indicates positive sentiments and 0 polarity is considered to be neutral.

➢ Considering the above facts, two new fields were added to Trump_data and Biden_data i.e. Sentiment
polarity (numeric variable) and Overall Sentiment either “negative”, “positive” or “neutral”. The rows for
neutral sentiments were excluded from dataframe as they can causing noise in the overall analysis of data.
Overall Sentiment Visualization

Donald Trump Joe Biden


Devices Used by Trump’s Supporters
Devices Used by Biden’s Supporters
Comparison of Trump’s and Biden’s Overall Sentiment

The given dataset indicated a lot of support and less negative sentiment for Trump as compared to Biden. In Trump's
dataframe 70.30% tweets are in his support and 29.69% have a negative sentiment. Similary, 63.38% tweets in
Biden dataframe are in his support and 36.62% have negative sentiment.
Most Positive Tweets in Trump’s Data
Most Negative Tweets in Trump’s Data
Most Positive Tweets in Biden’s Data
Most Negative Tweets in Biden’s Data
Influence of Trump’s and Biden’s Supporters on Twitter

➢ It was hypothesized through personal observation on social media that most of the publicly followed figures are
democrats. This was proved through an analysis of data for both Trump's and Biden's supporters which
suggested these the verified ratio for Biden supporters of 0.74 and verified ratio for Trump Supports of 0.11.
➢ The analysis for field of user favorites for supporters of Trump and Biden suggested that the all tweets ever of
Biden's supporters had fewer average likes "42266" as compared to average likes for all tweets ever for
Trump's supporters of around "50608". This suggests that Trump's supporters might have more influence on
twitter as compared to Biden.
➢ A similar analysis as that of the previous one was done for the average followers for both Trump and Biden
supporters and the suggestion provided in this case was similar too. Biden supporters had average followers of
2510 and Trump supporters have average followers of 3634. Thus, this analysis also suggests that Trump's
supporters might have more influence on Twitter over Biden's supporters.
Word Cloud of Trump Data
Word Cloud of Biden Data
THANK YOU!

You might also like