Professional Documents
Culture Documents
Social Media Analytics: End Term Project
Social Media Analytics: End Term Project
Group 2:
Akshat Sharma (PGP18007)
K Sunder (PGP18038)
Srijan Chauhan (PGP18151)
Toshar Chaudhary (PGP18248)
STRUCTURAL DATA ANALYSIS
Twitter Activity Data of Donald Trump’s
Inauguration
CRISP-DM Perspective
Business Understanding
Data Understanding
Data Preparation
Modeling
Evaluation
Deployment
STAGE ONE – DETERMINE BUSINESS
OBJECTIVES
• The first stage of the CRISP-DM process is to understand what you want to
accomplish from a business perspective. Your organization may have competing
objectives and constraints that must be properly balanced.
• What are the desired outputs of the project?
Objective- Analysis of the Tweets during US Presidential inauguration and its
capitalisation for subsequent inaugurations.
Produce project plan- Selection of the dataset Selection of tool Analysis of
based on different periods and types of tweet and prominent keywords
Business success criteria- A) Determination of day of week with highest
popularity. B) Determination of Hour of the day of Highest popularity. C)
Determination of types of tweet and prominent keyword.
STAGE TWO – DATA UNDERSTANDING
• The second stage of the CRISP-DM process requires you to acquire
the data listed in the project resources.
Initial data collection report - The data sources acquired together
with their locations, the methods used to acquire them and any
problems encountered.
Data Description report – The data acquired is in .csv format.
Surface features – 15000 Records across 16 Fields.
Verify data quality – The data is complete, contains no missing
values and errors.
STAGE THREE – DATA PREPARATION AND
MODELLING
• The dataset taken:
• Around 15000 tweets spread across 1 week post Donald Trump’s inauguration.
Formatting it suiting our needs.
• Data Cleaning:
• Removal of following from the tweets:
• Hashtags
• URL’s
• Special Characters
• Convert the text t
• Punctuations
• Timestamp corrections.
Inauguration of Donald Trump – CRISP-DM
Business Scenario
• Analyze tweets of users during the inauguration of US President in
February 2017
• Finding out what is the most popular time when people give reactions
• Finding out what kind of tweets are users sending out – Retweets or
organic tweets
• Targeting specific user audience active during specific times of day or
days of week
• Positioning our own tweets in the particular time period when these
trends rise again
STAGE FIVE – EVALUATION AND REVIEW
• During this step, we assessed the degree to which the model meets our business
objectives and seek to determine if there is some business reason why this model
is deficient.
Data Understanding
The dataset contained the following information for the period under
analysis.
• Tweet text
• Whether the tweet was retweeted if so how many times
• Tweet creation date
• Tweet creator name
Data Preparation
• Removal of following from the tweets:
• Hashtags
• URL’s
• Special Characters
• Removal of punctuation
• Convert all characters into lowercase
• Removal of stopwords
Modeling
• Around 15000 tweets containing the hashtag #AvengersEndgame
• Sentiment analysis performed on the data to find out the user
reviews of the movie
• Data analyzed for user behavior –
• Positive or negative reaction
• Most popular words
• Generating word clouds for popular words
• Categorizing tweets into positive and negative
• Finding common words in positive and negative tweets
Data Evaluation
Most frequently
used words
It is found that for the period under
analysis, out of top 10 most frequently
used words, at least 5 words are
confirmed to be associated with the film
proving that the promotion of the film
was in the right path.
Most frequent words-Put into order
Bar chart of 20 most common words
Popular Words
Comparison of popular words and sentiments