Defense Project Slide Final

A FINAL PROJECT DEFENSE
ON
Analysis of Twitter Data Mining on Genres of Netflix
Web Series
Presented By
Biplove Pokhrel
MSCmKE007
Communication and Knowledge Engineering Program

Pashchimanchal Campus
Pokhara, Nepal
OUTLINES
 INTRODUCTION
 PROBLEM STATEMENT
 OBJECTIVES
 LITERATURE REVIEW
 METHODOLOGY
 RESULTS
 ANALYSIS
 REFERENCES
2
INTRODUCTION
• There is a Freedom of Opinion.

• Different Opinion has Different Values.
• Social media has been one of the best place for the people to
express their feelings.
• The comments, status that we share in the Public guides our
perspective towards the Product, Mode of Entertainment and
Other common things.
3
INTRODUCTION..
• Sentiment Analysis is text classification tool that analyses a

message and tells whether the underlying sentiment is
positive, negative or not necessarily both of these.
• Netflix has been the key Social Video Watching Platform since
5 years.
• This paper has explored the analysis of different most watched
Genres and the interaction of celebrity and audience in the
Social media.
• This paper also visualize the different metrics of Twitter Data
such as Popularity, Follower Rank and General Activity of the
various celebrity in their Respective Genre and discuss how
actively they are participated in the Discussions through
Twitter. 4
INTRODUCTION
USER TWEET
Follows/is followed by Mention, Replies to,

USER Posts, tweets, likes
Retweets to
TWEET Posted by, retweeted by, liked by, replied by Replies/is replied from
5
PROBLEM STATEMENT
• Analysis of Genre as a whole is very less these

days.
• Genre Shift Paradigm which explains the
interest can be studied only analyzing different
sentiments and metrics in the Social Network
Media.
• This sort of Analysis can also define the
popularity of the Genre and Particular interest of
the audience or Follower in the genre of Netflix.
6
OBJECTIVES
• To perform Sentiment Analysis of Different tweets of the

audience or followers of the Web Series and Lead
Actor/Actress.
• To Analyse Different Social network Metrics based on Twitter
data obtained through Twitter API.
7
LITERATURE REVIEW
S.N TITLE PUBLISHED FEATURE AUTHORS
. YEAR
1 Measuring user 2016 AD Different Twitter metrics F. Riquelme and P.
influence on to discuss the Activity of González-Cantergiani
Twitter: A survey the Popular User in the
Twitter
3 Sentiment 2017 AD Mining Sentiments or Sunman Rani, Jaswinder

Analysis Of emotions of Tweets Singh
Tweets Using about politicians using
Support Vector Support Vector
Machine Machine .
3 Sentiment 2019 AD Sentimental Analysis for Department of Computer

Analysis for the prediction Science and Technology,
Predicting of the most popular web Manav Rachna
the Popularity of series among the Four University,
Web Series Cartoon series. Faridabad, India
4 Sentiment 2019 AD Sentiment Analysis on A. Rahman and M. S.

analysis on Different Movie review Hossen
with different Classifier
movie review
data using
machine
learning
approach,
8
Why Web Series not Movies not Comic Series ?
• Movies are generally of one time review.

• Actual Plot in Web series run more in Length than Movies.
• Comic series are preference to Limited Age Group.
• User wants to be engaged in Discussion with Popular Users in
Series of Change of Plot in the Web Series.
• Specific Plot Change in the Series can be huge matter of
Discussion.
9
METHODOLOGY
DATA PRE
FEATURE EXTRACTION
TWITTER PROCESSING
AND SELECTION
DATA
COLLECTION
FEATURES
ACTIVITY POPULARITY
METRICS METRICS SENTIMENT
POLARITY
CLASSIFIER
CALCULATION
METRICS FOR EACH GENRE
10
ANAYSIS OF GENRES
METHODOLOGY
 Data Acquisition
• Involves Collection of Twitter Data Sets using Tweepy
library.
• Tweets of the Twitter handle of Popular User of the Genre
along with the hashtags tweets of popular keywords.
11
METHODOLOGY
Text Preprocessing
• Converting all letters to Lower or Upper case
• Converting Numbers into Words or Removing Numbers
• Removing Punctuations, Accent marks and other diacritics
• Removing White Spaces
• Expanding Abbreviations
• Removing Stop Words, Sparse Terms, and Particular Words
• Removing URL, Unnecessary Emojis
12
METHODOLOGY
Fig :Extraction of tweets
13
METHODOLOGY
Fig :Visualization of tokens
14
METHODOLOGY
Feature Extraction using BOW

• A bag-of-words model, or BoW for short, is a way of
extracting features from text for use in modeling, such as with
machine learning algorithms.
• The objective is to turn each texts into a vector that we can use
as input or output for a machine learning model.
15
METHODOLOGY
Fig :Corpus formation 16

METHODOLOGY
Generate Sentiments from Dictionary of Words
AFINN Sentiment Lexicon

• The AFINN lexicon is a list of English terms manually rated for valence with an
integer between -5 (negative) and +5 (positive) by Finn Årup Nielsen between 2009
and 2011.
• The original lexicon contains some multi-word phrases, but they are excluded here.
Source:http://www2.imm.dtu.dk/pubdb/views/publication_details.php?id=6010
17
METHODOLOGY
Fig :Sample Of Words Rated In Sentiment Affinity

18
METHODOLOGY
• Every word from the AFFIN-111 Lexicon is used in categorizing the unigram into
four different categorized into Very Positive, Very Negative, Negative and Positive.
• Sentiment Class Division :
• If (Very_Positive_count + Positive_count) > (Very_Negative_count +

Negative_count) then classified as Positve.
• If (Very_Positive_count + Positive count) < (Very_Negative_count +
Negative_count) then classified as Negative.
• If Very_Positive_count == Very_Negative_count == Positive_count ==
Negative_count classified as Neutral.
19
METHODOLOGY
Machine Learning Model

• SVM or Support Vector Machine is a linear model for classification and regression
problems.
• It can solve linear and non-linear problems and work well for many practical problems.
• The Idea of SVM is Simple: The algorithm creates a line or a hyper plane which
separates the data into classes.
• Train Data set for Model : Rotten Tomatoes Movie Reviews

• Link : https://www.kaggle.com/stefanoleone992/rotten-tomatoes-movies-and-critic-
reviews-dataset
20
METHODOLOGY
USER METRICS CALCULATION
Metrics Definition
OT1 Original Tweets of User
RP1 Replies from the User
RT1 Retweets from the User
FT1 Favorite Tweet from the User
GA General Activity (OT1+RP1+RT1+FT1)
F1 Account that User Follows
F3 Follower of the User
Follower
Defined as F1/(F1+F3)
Rank

Popularity Defined as in links in network.
21
RESULTS
CONFUSION MATRIX FOR DIFFERENT GENRES
Predicted Predicted
Confusion Confusion
Positive Negative Neutral Positive Negative Neutral
Matrix Matrix
Actual Positive 1145 0 9 Actual Positive 1244 0 6
Negative 443 4556 58 Negative 443 11628 58
Neutral 31 0 3960 Neutral 10 0 4440
Genre 2
Genre 1
Predicted Predicted
Confusion Confusion
Positive Negative Neutral Positive Negative Neutral
Matrix
Matrix
Positive 1756 0 9 Actual Positive 1708 0 9

Actual
Negative 637 6671 57 Negative 879 9034 70

Neutral 39 0 4782 Neutral 44 0 7749

Genre 3 Genre 4 22
RESULTS
Predicted
Confusion
Positive Negative Neutral
Matrix
Actual Positive 2520 0 13

Negative 889 10473 87
Neutral 41 0 7379
Genre 5
Genres Accuracy Precision Recall F-Measure

Genre 1 0.946 0.9922 0.707 0.825
Genre 2 0.969 0. 9952 0.7203 0.8357
Genre 3 0.9469 0. 9942 0.7203 0.836
Genre 4 0.948 0. 994 0.649 0.785
Genre 5 0.953 0. 994 0.730 0.842
Fig : Evaluation metrics

23
RESULTS
Genre Classified Positive Classified Negative Classified Neutral
Genre 1 (Comedy) 1145 4556 3960
Genre 2(Drama) 1244 8905 4440
Genre 3(Sci-Fi) 1708 9034 4782
Genre 4(Romance) 1756 6671 7749
Genre 5(Action) 2520 10473 7379
Fig : Data of Tweets showing Different

Sentiment
24
RESULTS
Mean Mean
Genre General Activity
Popularity Follower
Genre 1 (Comedy) 44556 1 0.0003
Genre 2(Drama) 34304 1 0.0004
Genre 3(Sci-Fi) 65078 1 0.0004
Genre 4(Romance) 51855 1 0.0004
Genre 5(Action) 56524 1 0.0005
Table 19: Data of Tweets of Different Genre

and their metrics
25
RESULTS
12000
10000
8000
6000
4000
2000
0
Genre 1 (Comedy) Genre 2(Drama) Genre 3(Sci-Fi) Genre 4(Romance) Genre 5(Action)
Classified Positive Classified Negative Classified Neutral Tweets
Fig Tweets Classification in Different Genre

26
RESULTS
70000
60000
50000
40000
30000
20000
10000
0
General Activity
Genre 1 (Comedy) Genre 2(Drama) Genre 3(Sci-Fi)

Genre 4(Romance) Genre 5(Action)
27
ANALYSIS
Genre with most Positive Sentiment Action Genre

Tweet
Genre With most Negative Action Genre

Sentiment Tweets
Genre having Most General Activity Sci-Fi Genre
Genre having least General Activity Drama Genre
Genre that have celebrity which Comedy Genre

have more chance to interact with
People
28
REFERENCES
• A. Pak and P. Paroubek, “Twitter for Sentiment Analysis: When Language Resources are Not Available,”
2011 22nd International Workshop on Database and Expert Systems Applications, 2011
• A. U. Khan, M. Khan, and M. B. Khan, “Naïve Multi-label Classification of YouTube Comments Using
Comparative Opinion Mining,” Procedia Computer Science, vol. 82, pp. 57–64, 2016.
• A. Rahman and M. S. Hossen, “Sentiment analysis on movie review data using machine learning
approach,” in 2019 International Conference on Bangla Speech and Language Processing (ICBSLP),
2019, pp. 1–4.
• U. Kumari, A. K. Sharma and D. Soni, "Sentiment analysis of smart phone product review using SVM
classification technique," 2017 International Conference on Energy, Communication, Data Analytics and
Soft Computing (ICECDS), Chennai, India, 2017, pp. 1469-1474, doi: 10.1109/ICECDS.2017.8389689.
• F. Riquelme and P. González-Cantergiani, “Measuring user influence on Twitter: A survey,” Information
Processing & Management, vol. 52, no. 5, pp. 949–975, 2016.
• Cha, M., Haddadi, H., Benevenuto, F., & Gummadi, K, “Measuring User Influence in Twitter: The
Million Follower Fallacy”, 2010.
• J. Sun and J. Tang, “A Survey of Models and Algorithms for Social Influence Analysis,” Social Network
Data Analytics, pp. 177–214, 2011.
• S. Abe, Support Vector Machines for Pattern Classification, Springer-Verlag London Limited, 2008, 350
pp
• I. Steinwart and C. Scovel, "Fast rates for support vector machines using Gaussian kernels", The Annals
of Statistics, vol. 35, no. 2, pp. 575-607, 2007. Available: 10.1214/009053606000001226.
• A. Pal and S. Counts, “Identifying topical authorities in microblogs,” Proceedings of the fourth ACM
international conference on Web search and data mining - WSDM '11, 2011.
29
APPENDIX
30
SERIES AND TWITTER HANDLES
31
32
33
34

Defense Project Slide Final

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Defense Project Slide Final

Uploaded by

Copyright:

Available Formats

A FINAL PROJECT DEFENSE

Communication and Knowledge Engineering Program

• There is a Freedom of Opinion.

• Sentiment Analysis is text classification tool that analyses a

Follows/is followed by Mention, Replies to,

• Analysis of Genre as a whole is very less these

• To perform Sentiment Analysis of Different tweets of the

3 Sentiment 2017 AD Mining Sentiments or Sunman Rani, Jaswinder

3 Sentiment 2019 AD Sentimental Analysis for Department of Computer

4 Sentiment 2019 AD Sentiment Analysis on A. Rahman and M. S.

• Movies are generally of one time review.

METRICS FOR EACH GENRE

Fig :Extraction of tweets

Fig :Visualization of tokens

Feature Extraction using BOW

Fig :Corpus formation 16

Generate Sentiments from Dictionary of Words

AFINN Sentiment Lexicon

Fig :Sample Of Words Rated In Sentiment Affinity

• Sentiment Class Division :

• If (Very_Positive_count + Positive_count) > (Very_Negative_count +

Machine Learning Model

• Train Data set for Model : Rotten Tomatoes Movie Reviews

USER METRICS CALCULATION

CONFUSION MATRIX FOR DIFFERENT GENRES

Actual Positive 1145 0 9 Actual Positive 1244 0 6

Negative 443 4556 58 Negative 443 11628 58

Neutral 31 0 3960 Neutral 10 0 4440

Positive 1756 0 9 Actual Positive 1708 0 9

Negative 637 6671 57 Negative 879 9034 70

Neutral 39 0 4782 Neutral 44 0 7749

Actual Positive 2520 0 13

Genres Accuracy Precision Recall F-Measure

Fig : Evaluation metrics

Genre Classified Positive Classified Negative Classified Neutral

Genre 1 (Comedy) 1145 4556 3960

Genre 2(Drama) 1244 8905 4440

Genre 3(Sci-Fi) 1708 9034 4782

Genre 4(Romance) 1756 6671 7749

Genre 5(Action) 2520 10473 7379

Fig : Data of Tweets showing Different

Genre 1 (Comedy) 44556 1 0.0003

Genre 2(Drama) 34304 1 0.0004

Genre 3(Sci-Fi) 65078 1 0.0004

Genre 4(Romance) 51855 1 0.0004

Genre 5(Action) 56524 1 0.0005

Table 19: Data of Tweets of Different Genre

Classified Positive Classified Negative Classified Neutral Tweets

Fig Tweets Classification in Different Genre

Genre 1 (Comedy) Genre 2(Drama) Genre 3(Sci-Fi)

Genre with most Positive Sentiment Action Genre

Genre With most Negative Action Genre

Genre having Most General Activity Sci-Fi Genre

Genre having least General Activity Drama Genre

Genre that have celebrity which Comedy Genre

You might also like