Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 25

SOCIAL MEDIA ANALYTICS

END TERM PROJECT

Group 2:
Akshat Sharma (PGP18007)
K Sunder (PGP18038)
Srijan Chauhan (PGP18151)
Toshar Chaudhary (PGP18248)
STRUCTURAL DATA ANALYSIS
Twitter Activity Data of Donald Trump’s
Inauguration
CRISP-DM Perspective
Business Understanding

Data Understanding

Data Preparation

Modeling

Evaluation

Deployment
STAGE ONE – DETERMINE BUSINESS
OBJECTIVES
• The first stage of the CRISP-DM process is to understand what you want to
accomplish from a business perspective. Your organization may have competing
objectives and constraints that must be properly balanced.
• What are the desired outputs of the project?
Objective- Analysis of the Tweets during US Presidential inauguration and its
capitalisation for subsequent inaugurations.
Produce project plan- Selection of the dataset  Selection of tool  Analysis of
based on different periods and types of tweet and prominent keywords
Business success criteria- A) Determination of day of week with highest
popularity. B) Determination of Hour of the day of Highest popularity. C)
Determination of types of tweet and prominent keyword.
STAGE TWO – DATA UNDERSTANDING
• The second stage of the CRISP-DM process requires you to acquire
the data listed in the project resources.
Initial data collection report - The data sources acquired together
with their locations, the methods used to acquire them and any
problems encountered.
 Data Description report – The data acquired is in .csv format.
Surface features – 15000 Records across 16 Fields.
Verify data quality – The data is complete, contains no missing
values and errors.
STAGE THREE – DATA PREPARATION AND
MODELLING
• The dataset taken:
• Around 15000 tweets spread across 1 week post Donald Trump’s inauguration.
Formatting it suiting our needs.

• Data Cleaning:
• Removal of following from the tweets:
• Hashtags
• URL’s
• Special Characters
• Convert the text t
• Punctuations

• Timestamp corrections.
Inauguration of Donald Trump – CRISP-DM
Business Scenario
• Analyze tweets of users during the inauguration of US President in
February 2017
• Finding out what is the most popular time when people give reactions
• Finding out what kind of tweets are users sending out – Retweets or
organic tweets
• Targeting specific user audience active during specific times of day or
days of week
• Positioning our own tweets in the particular time period when these
trends rise again
STAGE FIVE – EVALUATION AND REVIEW
• During this step, we assessed the degree to which the model meets our business
objectives and seek to determine if there is some business reason why this model
is deficient.

• Assessment of data mining results - Summarize assessment results in terms of


business success criteria, including a final statement regarding whether the
project already meets the initial business objectives.
• Approved models - After assessing models with respect to business success
criteria, the generated models that meet the selected criteria become the
approved models.
• List of possible actions - List the potential further actions, along with the reasons
for and against each option.
• Decision - Describe the decision as to how to proceed, along with the rationale.
Analysis-Day of
week

Sunday and Thursday


were most popular
days for users to
tweet about Trump’s
inauguration
Analysis-Hour of
Day

Users tweet more


during evening and
night hours. This
implies users don’t
prefer tweeting during
work hours
Analysis-Type of
tweets
The majority of the
tweets are Retweets
and the Organic
tweets are very less.
This shows that users
are putting in less
efforts to write their
own tweets.
Analysis-Prominent
Keywords
The most prominent
keywords identified
were Inauguration
and Trump. The data
shows that people are
talking more about
the process than the
person
UNSTRUCTURED DATA
ANALYSIS
Sentiment Analysis of Twitter
reactions for Avengers Endgame
CRISP-DM Perspective
Business Understanding
Decode the sentiment of Netizens towards ‘Avengers: Endgame’ film
using unstructured data from Twitter.

Data Understanding
The dataset contained the following information for the period under
analysis.
• Tweet text
• Whether the tweet was retweeted if so how many times
• Tweet creation date
• Tweet creator name
Data Preparation
• Removal of following from the tweets:
• Hashtags
• URL’s
• Special Characters
• Removal of punctuation
• Convert all characters into lowercase
• Removal of stopwords
Modeling
• Around 15000 tweets containing the hashtag #AvengersEndgame
• Sentiment analysis performed on the data to find out the user
reviews of the movie
• Data analyzed for user behavior –
• Positive or negative reaction
• Most popular words
• Generating word clouds for popular words
• Categorizing tweets into positive and negative
• Finding common words in positive and negative tweets
Data Evaluation
Most frequently
used words
It is found that for the period under
analysis, out of top 10 most frequently
used words, at least 5 words are
confirmed to be associated with the film
proving that the promotion of the film
was in the right path.
Most frequent words-Put into order
Bar chart of 20 most common words
Popular Words
Comparison of popular words and sentiments

The popular words have been mapped against


the sentiments it carries based on analysis. We
can find that the words related to ‘Avengers:
Endgame’ film are mostly banked towards ‘Joy’
and ‘Trust’.
Comparison of popular words against sentiments

You might also like