Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 34

A FINAL PROJECT DEFENSE

ON
Analysis of Twitter Data Mining on Genres of Netflix
Web Series

Presented By
Biplove Pokhrel
MSCmKE007

Communication and Knowledge Engineering Program


Pashchimanchal Campus
Pokhara, Nepal
OUTLINES

 INTRODUCTION
 PROBLEM STATEMENT
 OBJECTIVES
 LITERATURE REVIEW
 METHODOLOGY
 RESULTS
 ANALYSIS
 REFERENCES

2
INTRODUCTION

• There is a Freedom of Opinion.


• Different Opinion has Different Values.
• Social media has been one of the best place for the people to
express their feelings.
• The comments, status that we share in the Public guides our
perspective towards the Product, Mode of Entertainment and
Other common things.

3
INTRODUCTION..

• Sentiment Analysis is text classification tool that analyses a


message and tells whether the underlying sentiment is
positive, negative or not necessarily both of these.
• Netflix has been the key Social Video Watching Platform since
5 years.
• This paper has explored the analysis of different most watched
Genres and the interaction of celebrity and audience in the
Social media.
• This paper also visualize the different metrics of Twitter Data
such as Popularity, Follower Rank and General Activity of the
various celebrity in their Respective Genre and discuss how
actively they are participated in the Discussions through
Twitter. 4
INTRODUCTION

  USER TWEET

Follows/is followed by Mention, Replies to,


USER Posts, tweets, likes
Retweets to

TWEET Posted by, retweeted by, liked by, replied by Replies/is replied from

5
PROBLEM STATEMENT

• Analysis of Genre as a whole is very less these


days.
• Genre Shift Paradigm which explains the
interest can be studied only analyzing different
sentiments and metrics in the Social Network
Media.
• This sort of Analysis can also define the
popularity of the Genre and Particular interest of
the audience or Follower in the genre of Netflix.

6
OBJECTIVES

• To perform Sentiment Analysis of Different tweets of the


audience or followers of the Web Series and Lead
Actor/Actress.
• To Analyse Different Social network Metrics based on Twitter
data obtained through Twitter API.

7
LITERATURE REVIEW
S.N TITLE PUBLISHED FEATURE AUTHORS
. YEAR
1 Measuring user 2016 AD Different Twitter metrics F. Riquelme and P.
influence on to discuss the Activity of González-Cantergiani
Twitter: A survey the Popular User in the
Twitter

3 Sentiment 2017 AD Mining Sentiments or Sunman Rani, Jaswinder


Analysis Of emotions of Tweets Singh
Tweets Using about politicians using
Support Vector Support Vector
Machine Machine .

3 Sentiment 2019 AD Sentimental Analysis for Department of Computer


Analysis for the prediction Science and Technology,
Predicting of the most popular web Manav Rachna
the Popularity of series among the Four University,
Web Series Cartoon series. Faridabad, India

4 Sentiment 2019 AD Sentiment Analysis on A. Rahman and M. S.


analysis on Different Movie review Hossen
with different Classifier
movie review
data using
machine
learning
approach,
8
Why Web Series not Movies not Comic Series ?

• Movies are generally of one time review.


• Actual Plot in Web series run more in Length than Movies.
• Comic series are preference to Limited Age Group.
• User wants to be engaged in Discussion with Popular Users in
Series of Change of Plot in the Web Series.
• Specific Plot Change in the Series can be huge matter of
Discussion.

9
METHODOLOGY

DATA PRE
FEATURE EXTRACTION
TWITTER PROCESSING
AND SELECTION
DATA

COLLECTION

FEATURES

ACTIVITY POPULARITY
METRICS METRICS SENTIMENT
POLARITY

CLASSIFIER

CALCULATION

METRICS FOR EACH GENRE

10
ANAYSIS OF GENRES
METHODOLOGY

 Data Acquisition
• Involves Collection of Twitter Data Sets using Tweepy
library.
• Tweets of the Twitter handle of Popular User of the Genre
along with the hashtags tweets of popular keywords.

11
METHODOLOGY

Text Preprocessing
• Converting all letters to Lower or Upper case
• Converting Numbers into Words or Removing Numbers
• Removing Punctuations, Accent marks and other diacritics
• Removing White Spaces
• Expanding Abbreviations
• Removing Stop Words, Sparse Terms, and Particular Words
• Removing URL, Unnecessary Emojis

12
METHODOLOGY

Fig :Extraction of tweets

13
METHODOLOGY

Fig :Visualization of tokens

14
METHODOLOGY

Feature Extraction using BOW


• A bag-of-words model, or BoW for short, is a way of
extracting features from text for use in modeling, such as with
machine learning algorithms.
• The objective is to turn each texts into a vector that we can use
as input or output for a machine learning model.

15
METHODOLOGY

Fig :Corpus formation 16


METHODOLOGY

Generate Sentiments from Dictionary of Words

AFINN Sentiment Lexicon


• The AFINN lexicon is a list of English terms manually rated for valence with an
integer between -5 (negative) and +5 (positive) by Finn Årup Nielsen between 2009
and 2011.
• The original lexicon contains some multi-word phrases, but they are excluded here.

Source:http://www2.imm.dtu.dk/pubdb/views/publication_details.php?id=6010

17
METHODOLOGY

Fig :Sample Of Words Rated In Sentiment Affinity


18
METHODOLOGY

• Every word from the AFFIN-111 Lexicon is used in categorizing the unigram into
four different categorized into Very Positive, Very Negative, Negative and Positive.

• Sentiment Class Division :

• If (Very_Positive_count + Positive_count) > (Very_Negative_count +


Negative_count) then classified as Positve.
• If (Very_Positive_count + Positive count) < (Very_Negative_count +
Negative_count) then classified as Negative.
• If Very_Positive_count == Very_Negative_count == Positive_count ==
Negative_count classified as Neutral.

19
METHODOLOGY

Machine Learning Model


• SVM or Support Vector Machine is a linear model for classification and regression
problems.
• It can solve linear and non-linear problems and work well for many practical problems.
• The Idea of SVM is Simple: The algorithm creates a line or a hyper plane which
separates the data into classes.

• Train Data set for Model : Rotten Tomatoes Movie Reviews


• Link : https://www.kaggle.com/stefanoleone992/rotten-tomatoes-movies-and-critic-
reviews-dataset

20
METHODOLOGY

USER METRICS CALCULATION

Metrics Definition  
OT1 Original Tweets of User  
RP1 Replies from the User  
RT1 Retweets from the User  
FT1 Favorite Tweet from the User  
GA General Activity (OT1+RP1+RT1+FT1)  
F1 Account that User Follows  
F3 Follower of the User  
Follower
Defined as F1/(F1+F3)  
Rank
 
Popularity Defined as in links in network.
21
RESULTS

CONFUSION MATRIX FOR DIFFERENT GENRES

Predicted   Predicted

Confusion Confusion
Positive Negative Neutral Positive Negative Neutral
Matrix Matrix

Actual Positive 1145 0 9 Actual Positive 1244 0 6

  Negative 443 4556 58   Negative 443 11628 58

  Neutral 31 0 3960   Neutral 10 0 4440

Genre 2
Genre 1
  Predicted   Predicted

Confusion Confusion
Positive Negative Neutral Positive Negative Neutral
Matrix
Matrix

Positive 1756 0 9 Actual Positive 1708 0 9


Actual

Negative 637 6671 57   Negative 879 9034 70


 

Neutral 39 0 4782   Neutral 44 0 7749


 

Genre 3 Genre 4 22
RESULTS

  Predicted

Confusion
Positive Negative Neutral
Matrix

Actual Positive 2520 0 13


  Negative 889 10473 87
  Neutral 41 0 7379

Genre 5

Genres Accuracy Precision Recall F-Measure


Genre 1 0.946 0.9922 0.707 0.825
Genre 2 0.969 0. 9952 0.7203 0.8357
Genre 3 0.9469 0. 9942 0.7203 0.836
Genre 4 0.948 0. 994 0.649 0.785
Genre 5 0.953 0. 994 0.730 0.842

Fig : Evaluation metrics


23
RESULTS

Genre Classified Positive Classified Negative Classified Neutral

Genre 1 (Comedy) 1145 4556 3960

Genre 2(Drama) 1244 8905 4440

Genre 3(Sci-Fi) 1708 9034 4782

Genre 4(Romance) 1756 6671 7749

Genre 5(Action) 2520 10473 7379

Fig : Data of Tweets showing Different


Sentiment
24
RESULTS

Mean Mean
Genre General Activity
Popularity Follower

Genre 1 (Comedy) 44556 1 0.0003

Genre 2(Drama) 34304 1 0.0004

Genre 3(Sci-Fi) 65078 1 0.0004

Genre 4(Romance) 51855 1 0.0004

Genre 5(Action) 56524 1 0.0005

Table 19: Data of Tweets of Different Genre


and their metrics
25
RESULTS

12000

10000

8000

6000

4000

2000

0
Genre 1 (Comedy) Genre 2(Drama) Genre 3(Sci-Fi) Genre 4(Romance) Genre 5(Action)

Classified Positive Classified Negative Classified Neutral Tweets

Fig Tweets Classification in Different Genre


26
RESULTS

70000

60000

50000

40000

30000

20000

10000

0
General Activity

Genre 1 (Comedy) Genre 2(Drama) Genre 3(Sci-Fi)


Genre 4(Romance) Genre 5(Action)
27
ANALYSIS

Genre with most Positive Sentiment Action Genre


Tweet

Genre With most Negative Action Genre


Sentiment Tweets

Genre having Most General Activity Sci-Fi Genre

Genre having least General Activity Drama Genre

Genre that have celebrity which Comedy Genre


have more chance to interact with
People

28
REFERENCES

• A. Pak and P. Paroubek, “Twitter for Sentiment Analysis: When Language Resources are Not Available,”
2011 22nd International Workshop on Database and Expert Systems Applications, 2011
• A. U. Khan, M. Khan, and M. B. Khan, “Naïve Multi-label Classification of YouTube Comments Using
Comparative Opinion Mining,” Procedia Computer Science, vol. 82, pp. 57–64, 2016.
• A. Rahman and M. S. Hossen, “Sentiment analysis on movie review data using machine learning
approach,” in 2019 International Conference on Bangla Speech and Language Processing (ICBSLP),
2019, pp. 1–4.
• U. Kumari, A. K. Sharma and D. Soni, "Sentiment analysis of smart phone product review using SVM
classification technique," 2017 International Conference on Energy, Communication, Data Analytics and
Soft Computing (ICECDS), Chennai, India, 2017, pp. 1469-1474, doi: 10.1109/ICECDS.2017.8389689.
• F. Riquelme and P. González-Cantergiani, “Measuring user influence on Twitter: A survey,” Information
Processing & Management, vol. 52, no. 5, pp. 949–975, 2016.
• Cha, M., Haddadi, H., Benevenuto, F., & Gummadi, K, “Measuring User Influence in Twitter: The
Million Follower Fallacy”, 2010.
• J. Sun and J. Tang, “A Survey of Models and Algorithms for Social Influence Analysis,” Social Network
Data Analytics, pp. 177–214, 2011.
• S. Abe, Support Vector Machines for Pattern Classification, Springer-Verlag London Limited, 2008, 350
pp
• I. Steinwart and C. Scovel, "Fast rates for support vector machines using Gaussian kernels", The Annals
of Statistics, vol. 35, no. 2, pp. 575-607, 2007. Available: 10.1214/009053606000001226.
• A. Pal and S. Counts, “Identifying topical authorities in microblogs,” Proceedings of the fourth ACM
international conference on Web search and data mining - WSDM '11, 2011.

29
APPENDIX

30
SERIES AND TWITTER HANDLES

31
32
33
34

You might also like