Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 67

DON HONORIO VENTURA STATE UNIVERSITY

1
Chapter I

The Problem and its Background

Introduction

Before the invention of the Internet, producing and disseminating content

was always a costly and challenging undertaking. The Internet makes things

significantly simpler, quicker, and less expensive (Blank, 2013). However, the

internet's network bandwidth increased dramatically in the early 2000s.

Improved media compression methods, along with increasingly potent personal

computer systems, had made media streaming feasible. When data is sent across

a computer network in a constant, continuous stream that permits playback to

continue while new data is being received, the technique is known as streaming

(Fecheyr-Lippens, 2010).

Platforms that allow individuals to stream video content over the internet

are often referred to as user-generated live streaming systems (Pires & Simon,

2015). The triumph of digital content platforms like YouTube is predicated on

the ingenuity of independent content creators and the adeptness of content

distribution mechanisms (Qian & Jain, 2024).

In recent years, live video streaming has become a global business and

social phenomenon. Numerous streaming services, like Twitch and YouTube

Live, have been established and have experienced remarkable global expansion.

However, not enough attention has been given by studies to comprehending the

large engagement behavior displayed by viewers of live video streaming (Hu et

al., 2017).

COLLEGE OF COMPUTING STUDIES


MAIN CAMPUS
DON HONORIO VENTURA STATE UNIVERSITY
2
In this study, researchers aim to develop a system that interprets a dataset

comprising comments from YouTube Video on Demand (VOD) and utilizes

sentiment analysis techniques to assess the value or sentiment conveyed within

these comments.

Sentiment analysis is a developing area at the interface of computer

science and linguistics that aims to automatically identify the sentiment

expressed in text. Positive or negative judgment conveyed through language is

known as sentiment. Sentiment analysis research gathers data from text's

linguistic structure, the context of words used in the text, and both positive and

negative phrases (Taboada, 2016).

Understanding the sentiments of viewers is paramount for streamers and

content creators. Viewer sentiment can profoundly impact the success of a live

stream, influencing factors such as viewer retention, engagement levels, and

overall audience satisfaction. Positive sentiments foster a supportive and

engaging atmosphere, encouraging continued participation and attracting new

viewers. Conversely, negative sentiments can lead to dissatisfaction,

disengagement, and potential backlash from the audience.

By conducting sentiment analysis on viewer comments, streamers can

gain valuable insights into audience preferences, reactions, and interests. This

information enables streamers to tailor their content to better align with viewer

expectations, ultimately leading to increased viewer satisfaction and loyalty.

Additionally, understanding viewer sentiments empowers streamers to

COLLEGE OF COMPUTING STUDIES


MAIN CAMPUS
DON HONORIO VENTURA STATE UNIVERSITY
3
effectively manage and respond to feedback, thereby fostering a positive and

interactive community around their content.

Statement of the Problem

The challenge faced by digital content creators, such as YouTube

streamers, revolves around handling the large volume of comments their videos

attract. It's impractical for them to manually sift through hundreds or thousands

of comments. While likes and dislikes offer some insight into viewer feedback,

comments provide a more detailed perspective (Shoufan, 2019). User comments

on YouTube are typically available in an unorganized manner, making analysis

challenging (Bhuiyan et al., 2017). Furthermore, because comments are growing

so quickly, it is difficult to manually assess them (Sainath Pichad et al., 2023).

The study specifically seeks to address the following questions:

1. What are the requirements to produce a web system that utilize sentiment

analysis?

2. How does a pre-trained model function within a web system?

3. How can sentiment analysis help digital content creators?

4. Is the use of sentiment analysis helpful in the area of digital contents?

COLLEGE OF COMPUTING STUDIES


MAIN CAMPUS
DON HONORIO VENTURA STATE UNIVERSITY
4
Objectives of the Study

The main objective of the study, titled "Viewer Sentiment Analysis in Live

Stream VODs: Insights from YouTube Comments," is to create a web browser

system, that utilizes sentiment analysis. Specifically, this study aims to:

1. Utilize preprocessing techniques and sentiment analysis using the best-

performing model among a comparison of three models.

2. Develop a module for interpreting YouTube VOD comments.

2.1 Positive comment analysis

2.2 Neutral comment analysis

2.3 Negative comment analysis

3. Develop a module for summarizing all labeled comments.

4. Generate insights for digital content creators to analyze their audience's

preferences.

Significance of the Study

Sentiment analysis emerges as a vital tool for analyzing audience

engagement and emotional responses in live streaming, particularly through the

interpretation of YouTube comments. This methodology offers streamers,

viewers, and researchers’ valuable insights into viewer preferences and

COLLEGE OF COMPUTING STUDIES


MAIN CAMPUS
DON HONORIO VENTURA STATE UNIVERSITY
5
emotional reactions. Such findings can be instrumental in tailoring content to

enhance viewer engagement, retention, and satisfaction, thereby strengthening

reputation management.

Streamers: Streamers would benefit greatly from this study by gaining

access to a powerful tool for analyzing audience engagement and emotional

responses. This insight allows them to tailor their content more effectively to

enhance viewer engagement, retention, and satisfaction. By understanding their

audience’s preferences and reactions, streamers can improve their content

strategies, thereby managing their reputation more efficiently.

Viewers: Viewers stand to gain from the enhanced content quality that

results from streamers using sentiment analysis. As streamers become more

attuned to their audience's preferences and emotional responses, they can create

content that better aligns with what viewers enjoy and find engaging.

Future Researchers: Researchers in the field of sentiment analysis and

social media can expand their understanding of the dynamics within online

communities. By analyzing YouTube comments from live streams, they can

develop more sophisticated models and tools for sentiment analysis, contributing

to the academic and practical advancements in this area of study.

COLLEGE OF COMPUTING STUDIES


MAIN CAMPUS
DON HONORIO VENTURA STATE UNIVERSITY
6
Definition of Terms

Digital Content. Content in digital format that can be accessed,

consumed, and distributed electronically. This includes text, images, videos,

audio, and multimedia content that is created and shared across digital platforms.

Digital Content Creators. Individuals or entities that produce original

digital content for online distribution and consumption. Digital content creators

may include writers, photographers, videographers, graphic designers, and other

creative professionals who produce content for websites, social media, streaming

platforms, and other digital channels.

Livestream Content. Video content that is broadcasted in real-time over

the internet, allowing viewers to watch events as they happen. Livestreams often

include live commentary, interaction with viewers, and real-time engagement

features.

Live Streaming. The process of broadcasting live video or audio content

over the internet in real-time. Live streaming allows content creators to engage

with their audience in real-time, enabling live interaction, commentary, and

feedback.

Reputation Management. The practice of monitoring and shaping how a

brand is perceived by its audience, with the goal of maintaining a positive

reputation. Reputation management involves proactive strategies to address and

mitigate negative perceptions while enhancing positive perceptions of the brand.

COLLEGE OF COMPUTING STUDIES


MAIN CAMPUS
DON HONORIO VENTURA STATE UNIVERSITY
7
Sentiment Analysis. The process of computationally identifying and

categorizing opinions expressed in a piece of text as positive, negative, or neutral.

Sentiment Trends: Patterns or changes observed in the emotional tone or

sentiment expressed by viewers over a period of time. Analyzing sentiment trends

can reveal shifts in audience attitudes, preferences, and reactions to content.

Streamer. Someone who broadcasts live video content over the internet,

engaging with their audience in real-time through platforms like Twitch,

YouTube, or Facebook Gaming. They cover diverse activities such as gaming, art

creation, cooking, or discussions, building communities of followers who interact

with their content and contribute to the streamer's online presence.

Video on Demand (VOD). Videos that viewers can access at any time,

rather than at a scheduled broadcast time. VOD allows users to watch videos

whenever they choose, providing flexibility in viewing content.

Viewer. A viewer is an individual who watches or observes something,

particularly a form of media such as television, videos, or live performances.

Viewer Engagement. The level of interaction and participation from

viewers, which can include actions such as liking, commenting, sharing, and

subscribing. Higher levels of engagement indicate a stronger connection between

the audience and the content.

COLLEGE OF COMPUTING STUDIES


MAIN CAMPUS
DON HONORIO VENTURA STATE UNIVERSITY
8
YouTube. An American video-sharing platform owned by Google,

YouTube is accessible globally. It was launched on February 14, 2005, by former

PayPal employees Steve Chen, Chad Hurley, and Jawed Karim.

Scope and Limitation

The study focuses on using a pre-trained model to analyze YouTube VOD

comments through lexicon-based sentiment analysis and data collection via the

YouTube API, aiming to develop a system for categorizing comments into

positive, neutral, and negative sentiments.

The system is designed to work only with YouTube videos, and the

findings may not be applicable to other platforms. Data is gathered only from

publicly available YouTube VOD comments; comments from private and

membership VODs are not included. Additionally, the sentiment analysis

algorithm is limited to text data and may struggle to interpret sarcasm, slang,

tone of voice, and non-verbal cues in YouTube comments.

COLLEGE OF COMPUTING STUDIES


MAIN CAMPUS
DON HONORIO VENTURA STATE UNIVERSITY
9
Conceptual Framework

Figure 1: Independent and Dependent Variables

COLLEGE OF COMPUTING STUDIES


MAIN CAMPUS
DON HONORIO VENTURA STATE UNIVERSITY
10

Figure 2: Input Process Output

COLLEGE OF COMPUTING STUDIES


MAIN CAMPUS
DON HONORIO VENTURA STATE UNIVERSITY
11
Chapter II

Review of Related Literature

Sentiment Analysis

A study conducted by Devika, Sunitha, & Ganesh (2016), titled

"Sentiment Analysis: A Comparative Study on Different Approaches," the

researchers investigated various methods of sentiment analysis. They compared

different approaches, including machine learning, rule-based, and lexicon-based

methods, to discern their effectiveness in analyzing sentiment from textual data.

The study aimed to provide insights into the strengths and limitations of each

approach, offering valuable guidance for future sentiment analysis research.

Conducted by Medhat, Hassan, & Korashy (2023), titled "Sentiment

analysis algorithms and applications: A survey," the researchers explored

different sentiment analysis methods. They discussed machine learning and

lexicon-based techniques, along with related fields like emotion detection. The

study emphasized the importance of these methods for understanding opinions

in text and highlighted the need for further research in sentiment analysis.

Al-Sabbagh (2020) found that emojis are important for understanding

people's sentiments in online comments. They suggest that considering both text

and emojis is crucial, especially for Arabic comments. The study also notes that

emojis vary between platforms, so it's essential to analyze them separately.

Overall, the study highlights the potential of emojis for sentiment analysis and

suggests further investigation into analyzing politeness levels.

COLLEGE OF COMPUTING STUDIES


MAIN CAMPUS
DON HONORIO VENTURA STATE UNIVERSITY
12
Akhtar (2019), found that the revolution of video sharing sites such as

YouTube; users often express their opinions towards the content that they

consume by leaving comments. The study used three class attributes which are:

positive, neutral, and negative; TextBlob library is used to process textual data

in python.

A study by Li, M., Chen, Zhao, and Li, Q. (2021) proposed a sentiment

analysis model based on BERT model for Chinese stock reviews. This avoids

building a new dictionary, and extract its features manually. The researchers

found out that BERT+FC model via fine-tuning performs the best among the

different variants they designed. The proposed model includes BERT model and

a classifier layer. This classifier layer is stacked on top of BERT and jointly

fine-tuned with the model. The dataset used to be trained on the Chinese

Wikipedia corpus is from Github website with a total of 9204 labeled reviews,

and 88.09% accuracy. To verify the efficacy of the proposed BERT model, it

was compared with TextCNN, TextRNN, Att-BLSTM, and TextCRNN. The

proposed model performed approximately 2.14% to 3% better than the rest of

the methods in terms of accuracy in sentiment analysis of Chinese stock review

text. The proposed model outperforms traditional methods because it relied on

BERT language model’s fine-tuning.

A study by Wankhade et al. (2022) conducted a thorough assessment of

sentiment analysis methods, applications, and problems. This study examined

feature selection and data gathering techniques while delving into several

degrees of sentiment analysis, such as the document, sentence, phrase, and

COLLEGE OF COMPUTING STUDIES


MAIN CAMPUS
DON HONORIO VENTURA STATE UNIVERSITY
13
aspect levels. They tackled problems like sarcasm, irony, and language-specific

difficulties while looking at applications in social media, business, and

healthcare. The study made recommendations for future research aimed at

improving sentiment analysis's efficacy and accuracy.

Tan and Chia (2022) stated that increased usage of social media has

resulted in massive amounts of textual data, which contains important

information that may assist businesses. Commercial applications have grown

swiftly, and some have incorporated mature academic findings, particularly in

the field of opinion and sentiment analysis. Prior to undertaking sentiment

analysis, the sentiment on the topic of interest must be identified, also known as

topic modeling. Topic modeling was not recognized to be reuse-ready for broad

use. In this study, we implemented the Latent Dirichlet Allocation algorithm

(LDA), with an established sentiment analysis tool, Valence Aware Dictionary

for Sentiment Reasoning (VADER). Using the Reuse Readiness Levels (RRL),

it was identified that LDA for Topic Modeling is at RRL 4, where it might be

reused by most users with some effort, additional cost, and calculated risk.

Sentiment Analysis on YouTube

Social media platforms play an important role in business, entertainment,

marketing, education, media, and communications. YouTube's distinctive

behavior has made it the most popular medium for sharing videos in society.

YouTube allows anyone to create an account in any category and upload videos

to be viewed by millions of other people. This has been a trend in the

entertainment sector, allowing video assets hosted online to reach a larger

COLLEGE OF COMPUTING STUDIES


MAIN CAMPUS
DON HONORIO VENTURA STATE UNIVERSITY
14
audience and acquire popularity. Many YouTube channel owners are taking

various tactics to make their videos popular. Evaluate user comments, determine

requirements, and provide recommendations for YouTubers to increase video

popularity (Gajanayake & Sandanayake, 2021).

YouTube comments have an impact on users' perceptions of video

content, which influences their decision to subscribe to such channels (Danda &

Talarczyk, 2021).

Singh and Tiwari (2021) have used sentiment analysis to explore the

growth of textual information, which has been facilitated by the increase in

textual information, creating new opportunities for research in machine learning

(ML) and natural language processing (NLP). In a similar case, they analyzed

YouTube comments, employing six ML algorithms to identify trends and

sentiments in user feedback, which can provide valuable insights into public

mood and real-world events.

Another study conducted by Mulholland et al. (2017) investigated the use

of technology to analyze emotions in YouTube comments. They developed a

system called 360-MAM-Select, which included tools for understanding

sentiments (360-MAM-Affect) and adding game-like features (360-Gamify).

Using this system, they examined comments on popular YouTube channels and

found that Sadness, Surprise, and Joy were the most common emotions

expressed.

COLLEGE OF COMPUTING STUDIES


MAIN CAMPUS
DON HONORIO VENTURA STATE UNIVERSITY
15
(Bhuiyan et al., 2017) introduced a Natural Language Processing (NLP)-

based method for retrieving relevant and popular YouTube videos by analyzing

sentiment in user comments. Their approach involves four key steps: collecting

and preprocessing comments, generating datasets, measuring sentiment using

SentiStrength, and rating videos based on standard deviation values. Their

experiments, conducted on 1000 YouTube videos spanning various categories,

yielded promising accuracy, particularly excelling in science and technology

videos with an accuracy of 75.435%. This approach aims to improve the

effectiveness of YouTube video retrieval by considering the sentiment expressed

in user comments.

Another study, conducted by Mukhopadhyay et al. (2022), highlights the

value of sentiment analysis in improving interaction between creators and their

audience on YouTube. Sentiment analysis serves as a tool to help YouTubers

understand viewer feedback from comments. By employing a lexicon-based

approach, it categorizes comments as positive, negative, or neutral, offering

valuable insights to creators regarding audience sentiment. Despite its promise,

challenges such as handling ambiguous words and improving accuracy persist.

Nonetheless, the research underscores the significance of sentiment analysis in

enhancing interaction between creators and their audience on YouTube.

In this study by Yang (2020), the analysis of health-related vlog comments

on YouTube revealed a prevalent positive, narrative, and externally focused

language style. Moreover, emotionally charged favorable comments were

strongly associated with increased purchase intent. Interestingly, consumers

COLLEGE OF COMPUTING STUDIES


MAIN CAMPUS
DON HONORIO VENTURA STATE UNIVERSITY
16
tended to prioritize the product over the endorsers when it came to health-related

items.

Sentiment Analysis of YouTube Video Comments Using Deep Neural

Networks

According to Cunha et al. (2019), YouTube has become a major source of

entertainment. The name "YouTuber" is now considered a career, and many

people make videos to attract audiences and earn money through views,

reputation, and subscriptions to their respective channels. There are few

quantifiable measures of reputation.

The number of likes and dislikes is an easy approach to analyze a video's

reputation. If the number of likes exceeds the number of dislikes, the material is

good; on the other hand, a high number of dislikes compared to likes usually

indicates poor content. Although the amount of likes a video receives provides a

summary of its success, it does not explain the fundamental causes of its success

or failure.

Another way to assess a video's reputation is to examine its comments to

see how people feel about the material. Prior to AI and machine learning,

manual analysis could only handle a limited number of comments per video.

Most popular YouTube channels, on the other hand, receive 1,000 or more

comments per video and post at least five videos every week. As a result, the

work of manually assessing video comments is quickly becoming

COLLEGE OF COMPUTING STUDIES


MAIN CAMPUS
DON HONORIO VENTURA STATE UNIVERSITY
17
insurmountable, and machine reviewing is rapidly becoming a must-have

commercial strategy advantage for major YouTubers.

Sentiment Analysis on Twitch.tv

In a study by Kobs et al. (2020), they found that sentiment analysis helps

streamers understand viewer feedback during live stream events. Comments

during a stream reflect the audience's emotions, allowing streamers to adjust

their behavior or presentation accordingly. The study introduced various

sentiment analysis methods to automatically assess comments in active streams,

helping streamers gauge whether an event is positively or negatively perceived.

Despite challenges with Twitch's unique language, the study showed that these

methods effectively capture viewer preferences.

In a different study conducted by (Yildiz, 2022), the researcher chose

Twitch.tv as the platform. In this study, researchers scraped livestream chats and

clips for data. Since the data was unlabeled, they had to find a solution. They

had two people label each message as Positive, Neutral, or Negative. They used

a method called semi-supervised learning, which is quicker and easier to

evaluate than other methods. They compared two models: convolutional neural

network (CNN) and lexicon-based models. Surprisingly, the lexicon-based

model performed better than the CNN model.

In a recent study on sentiment analysis of Twitch.tv livestream messages,

machine learning methods were employed to analyze viewer interactions in real-

time (Chouhan et al., 2021). The study utilized a range of models, including

COLLEGE OF COMPUTING STUDIES


MAIN CAMPUS
DON HONORIO VENTURA STATE UNIVERSITY
18
Support Vector Classifier, Logistic Regression, Decision Tree Classifier,

Random Forest Classifier, and Multinomial Naïve Bayes. The Support Vector

Classifier emerged as the top performer in discerning sentiment from livestream

chat messages. This research sheds light on the effectiveness of machine

learning in understanding viewer sentiment during livestreaming sessions on

Twitch.tv.

Sentiment Analysis on Other Platforms

Another study where researchers examined the marketing strategies of

PewDiePie, Markiplier, and Kwebbelkop on Facebook pages (Poecze, Ebster, &

Strauss, 2019). These YouTuber gamers were selected based on their common

traits and high popularity. The study employed judgment sampling followed by

criterion sampling to retrieve comments for sentiment analysis. Utilizing

Netvizz (version 1.45), researchers extracted posts from each YouTuber's page,

categorized them, and analyzed Facebook metrics and sentiment. Supervised

and unsupervised learning methods were used for sentiment classification,

considering the unique language used in the gaming community. The study's

findings were divided into Facebook metric analysis, sentiment analysis, and a

comparison of metrics and sentiment analyses.

In a study by (Melville, Gryc, & Lawrence, 2009), researchers looked at

sentiments expressed in blogs, which are important for understanding user

feedback and preferences. The study classified sentiments into positive and

negative categories, using lexical classification when labeled data was scarce.

They introduced the Pooling Multinomials classifier, blending background

COLLEGE OF COMPUTING STUDIES


MAIN CAMPUS
DON HONORIO VENTURA STATE UNIVERSITY
19
knowledge with training examples for better sentiment classification. The

researchers tested their approach on various types of blogs, finding Linear

Pooling to be the most accurate method. This simplified approach helps in

understanding and categorizing sentiments expressed in blogs for better

decision-making.

In this study titled 'Sentiment Analysis of Cyberbullying on Instagram

User Comments' by M. Z. Naf’an et al. (2019), sentiment analysis of

cyberbullying within Instagram user comments was explored. The study focuses

on discussions related to the 2017 Jakarta Governor Election and examines how

social media enables individuals to freely express opinions, both positive and

negative. It highlights the significant impacts of social media, including the

dissemination of information and changes in online transactions. Additionally,

the research involves sentiment analysis of public opinions on the 2014

Indonesian presidential and vice-presidential candidates through Twitter,

reflecting the internet's transformative influence on conventional transaction

processes.

In their study, Birjali, Kasri, and Beni-Hssane (2021) explored sentiment

analysis in healthcare, focusing on Twitter data. They discussed its application

in areas like rehabilitation, disease prevention, diagnosis and treatment, and

promoting good health. Using Twitter data logs and the MooM dataset, they

categorized tweets based on trending topics and described their alignment and

sorting process. Additionally, they discussed a recommendation system based on

user requirements and Twitter data analysis. Overall, their study provided

COLLEGE OF COMPUTING STUDIES


MAIN CAMPUS
DON HONORIO VENTURA STATE UNIVERSITY
20
insights into sentiment analysis in healthcare, highlighting its challenges and

trends.

Sentiment Analysis in Gaming

Thompson et al. (2017) explore sentiment analysis in gaming, focusing on

StarCraft 2 chat. They use SO-CAL for sentiment and toxicity detection, finding

it effective. Their study emphasizes the need for tailored dictionaries in gaming

sentiment analysis. They highlight sentiment analysis' growing importance in

gaming research, especially for understanding the social context of learning.

They also note the overlap between sentiment classification and toxicity

detection tasks.

Guzsvinecz and Szűcs (2023) studied player reviews on Steam to help

game developers understand emotional reactions to top game genres. Using

natural language processing, they analyzed nearly 36 million reviews. They

found negative reviews are longer and written sooner than positive ones. Each

genre shows different emotional patterns, with action and adventure games

receiving mixed reviews and role-playing and strategy games getting mostly

positive feedback.

The study highlights the importance of understanding player emotions,

which are influenced by game design and mechanics. By analyzing reviews,

developers can gain valuable insights into player experiences, helping to

improve games and boost player engagement.

COLLEGE OF COMPUTING STUDIES


MAIN CAMPUS
DON HONORIO VENTURA STATE UNIVERSITY
21
Strååt & Verhagen (2017) found that user review sentiments on Metacritic

align closely with their ratings. They analyzed reviews for Dragon Age and

Mass Effect games, focusing on aspects like combat, story, and character.

Negative reviews mentioned these aspects more, indicating user dissatisfaction.

The study highlights the value of user-generated content for understanding

player feedback and improving games.

Sentiment Analysis in Business

Taboada (2016) provides an overview of sentiment analysis from a

linguistic perspective, highlighting the state of the art in automatically extracting

sentiment and opinion. The Stanford Deep Learning for Sentiment Analysis

model is noted for its success. Challenges in sentiment analysis include the

unreliability of individual words and phrases, particularly due to phenomena like

downtoning and nonveridicality. Identifying negative sentiment is difficult

because negative evaluations are often expressed in positive terms. One

application of sentiment analysis is correlating market sentiments with stock

prices.

Sasikala and Sheela (2020) introduced new ways to understand customer

sentiments in online product reviews. They used a method called sentiment

analysis to figure out how people felt about products. Then, they used another

technique called IANFIS to predict if a product would be in high or low demand

in the future. Based on this prediction, they made sure that the best reviews of

products were shown first to customers. This helps customers find great

COLLEGE OF COMPUTING STUDIES


MAIN CAMPUS
DON HONORIO VENTURA STATE UNIVERSITY
22
products more easily. The researchers aimed to make online shopping simpler

and more enjoyable for everyone.

In the book "Advances in Business Information Systems and Analytics

Book Series" by Vyas and Uma (2019), Chapter 2 discusses how to understand

feelings in product reviews. It explains that sentiment analysis helps figure out if

a review is positive, negative, or neutral. This helps businesses see what they're

doing well and what needs improvement in their products. Sentiment analysis

also helps marketing teams target their ads better. The chapter explains how

sentiment analysis works and mentions that it usually uses supervised learning

techniques. Overall, the chapter aims to make sentiment analysis easier to

understand for everyone.

In the study of sentiment analysis, Mukherjee (2020) explores its critical

role in contemporary business landscapes, particularly in gauging customer

sentiments towards products or services offered by companies. The study

underscores the significance of discerning whether customers hold positive or

negative views, as it can profoundly influence strategic decisions. However,

Mukherjee highlights the challenges inherent in accurately extracting sentiment

from text, across different languages, due to ambiguities and nuances like

sarcasm, which pose considerable obstacles for computational analysis.

Hartmann et al. (2023) studied how accurately sentiment analysis methods

can gauge people's feelings. They found that using machine learning was better

than other methods. Looking at whole documents, not just sentences, improved

accuracy by 7%. Using more data helped traditional methods. They found that

COLLEGE OF COMPUTING STUDIES


MAIN CAMPUS
DON HONORIO VENTURA STATE UNIVERSITY
23
advanced methods, like transfer learning, worked best, especially for analyzing

product reviews.

Sentiment Analysis using VADER Model

Conducted by Anton Borg & Martin Boldt (2020), titled, “Using VADER

sentiment and SVM for predicting customer response sentiment.” The researchers

explored sentiment analysis regarding customer support for a Swedish Telecom

business. The researchers used VADER sentiment with a sentiment lexicon to

provide labeling of the emails. Two support vector machine models are being

trained by email content and sentiment labels. The outcomes has demonstrated that

a predictable pattern in emails can be used to prepare specific measures for

customers with negative responses as well as providing feedback on the possible

responses to customer emails.

According to Elbagir and Yang (2019), Social media technologies take many

forms, including blogs, business networks, photo sharing, forums, microblogs,

enterprise social networks, video sharing networks, and social networks. As the

number of social media technologies has grown, so has the popularity of online

social networking sites like Facebook, YouTube, and Twitter, which allow people

to express and share their ideas and opinions about life events. Currently, numerous

programs, such as Linguistic Inquiry and Word Count (LIWC), can extract

complex aspects from texts. However, many of these tools require some

programming experience. The Valence Aware Dictionary and Sentiment Reasoner

COLLEGE OF COMPUTING STUDIES


MAIN CAMPUS
DON HONORIO VENTURA STATE UNIVERSITY
24
(VADER) is used to assess tweet polarity and classify them based on multiclass

sentiment analysis.

Amin, Hossain, Akther, and Alam (2019) modified the VADER model to

support Bengali sentiment polarity identification. Compared to the sentiment

analysis of the English language, the Bengali language is far behind but the polarity

lexicon based works can learn from SentiWordNet, VADER, etc. The researchers

of the study developed a model that identifies Bengali text sentiments. They created

two dictionaries of negation and booster words. The negation list consists of

negative common words in Bengali that affects the polarity of the text to either

positive or negative. The booster dictionary includes Bengali words that boosts the

valence of the text. The Bengali lexicon was constructed by translating VADER

lexicon with bilingual dictionary and gave the corresponding polarity scores. The

range of the valences of the words is -4 to +4 where -4 represents the most

negative, +4 represents the most positive, and 0 represents neutrality. Preprocessing

techniques, for instance, punctuation removal, Bengali stop-words removal, and

stemming. Identification of the position of booster words was also done by using

three processes: Bigram, Trigram, and Negation. The final step in developing a

Bengali VADER is valence calculation. The researchers evaluated their model by

comparing unmodified VADER and Bengali VADER. First, the Bengali text was

translated to English with MyMemory Translator by VADER. They also translated

the text into English by Google Translate and Python Translator. Unmodified

VADER, using its three translators, labeled the text with a positive sentiment,

where in Bengali context, it should be negative. The Bengali VADER model that

COLLEGE OF COMPUTING STUDIES


MAIN CAMPUS
DON HONORIO VENTURA STATE UNIVERSITY
25
was developed correctly labeled the text with negative sentiment. The researchers

also evaluated the models with a positive sentiment Bengali text. As expected,

unmodified VADER labeled it with negative sentiment whereas the Bengali

VADER labeled the text correctly.

"A complete VADER-Based Sentiment Analysis of Bitcoin (BTC) Tweets

during the Era of COVID-19" was the title of a study done in 2020 by Pano and

Kashef. The researchers investigated with both full-length and truncated tweets to

find out how different text preparation methods affects VADER sentiment scores.

Texts were deemed positive if their VADER emotion score was above 0.05 and

negative if it was below -0.05. Additionally, the researchers found that phrases

work better with bigger datasets. Furthermore, cleaning the data before employing

a regex tool to separate sentences proved to be the most successful preprocessing

strategy for establishing a positive association between Bitcoin prices and tweet

volume. This study indicated that preprocessing techniques are significant in

sentiment analysis and how they might increase the association between sentiment

on social media and financial indicators like Bitcoin prices.

Garay et al. (2019) conducted a study examining the opinions and emotions

expressed within the anti-vaccine movements on social media. The data, consisting

of tweets and excerpts, has been processed to remove irrelevant information and

background noise. This information is then grouped using the k-means clustering

algorithm. Each word within a cluster is subjected to analysis by the VADER

sentiment tool. The prevalent sentiments within each cluster help to characterize the

COLLEGE OF COMPUTING STUDIES


MAIN CAMPUS
DON HONORIO VENTURA STATE UNIVERSITY
26
overall mood of the cluster. The findings shed light on various insights regarding

vaccines, including concerns about side effects, post-vaccination injuries, perceived

ineffectiveness, potential harm from ingredients, the beliefs of the unvaccinated

elite, reinforcement of the right to not vaccinate, the presence of toxic components,

the profit motives of big pharmaceutical companies, purported links to autism, and

reported health issues following vaccination. To assess the results of the k-means

clustering, the silhouette score is calculated to determine the proximity of data

points to other nearby clusters. The average silhouette score that is derived is

0.013540022, indicating that the data points are close to the decision boundaries.

Sentiment Analysis using RoBERTa Model

A study conducted by Semary et al. (2023) introduced a hybrid model

wherein it is based on the transformer model and deep learning models to improve

sentiment classification tasks. BERT (RoBERTA) was chosen for vectors of the

input sentences as well as the Long Short-Term Memory (LSTM) model along with

the Convolutional Neural Networks (CNN) model to understand the context and

sentiment between each sentence. The results show that the suggested hybrid model

is effective in classifying sentiment.

Lioa et al. (2020) said that, Aspect-category sentiment analysis can provide

richer and more detailed information than document-level sentiment analysis

because it seeks to anticipate the sentiment polarities of different aspect categories

within the same text. The primary issue of aspect-category sentiment analysis is

that multiple aspect categories can display distinct polarity in the same text.

COLLEGE OF COMPUTING STUDIES


MAIN CAMPUS
DON HONORIO VENTURA STATE UNIVERSITY
27
Previous research has combined Long Short-Term Memory (LSTM) and attention

mechanisms to predict the sentiment polarity of a given aspect category; however,

LSTM-based algorithms are not bidirectional text feature extraction methods.

RoBERTa (Robustly Optimized BERT Pre-training Approach) was proposed as a

multi-task sentiment analysis model based on aspect categories.

RoBERTa uses a deep bidirectional Transformer to extract features from both

text and aspect tokens, and the cross-attention mechanism directs the model to

focus on the most relevant characteristics for the aspect category. According to the

experimental data, the suggested model outperforms existing models used in

aspect-category sentiment analysis.

RoBERTa-GRU is a hybrid sentiment analysis model proposed by Tan,

Lee, and Lim (2023) to tackle challenges in this field like imbalanced datasets. The

recurrent neural network (RNN) is represented by Gated Recurrent Units (GRU).

Preprocessing techniques ensure the quality of the analysis which involved

removing of stop-words, punctuations, hashtags, and URLs in the raw text data.

Data augmentation technique was applied to give more attention to the minority

classes in the imbalanced dataset. The techniques considered were Thesaurus

Substitution, Text Generation, and Word Embedding. Global Vectors for Word

Representation (GloVe) was chosen as the word embedding technique for data

augmentation. The model was evaluated by using three datasets: the Internet Movie

Database (IMDb) dataset, Sentiment140 dataset, and the Twitter US Airline

dataset. The results of RoBERTa, BERT, BERT-GRU, and RoBERTa-GRU

COLLEGE OF COMPUTING STUDIES


MAIN CAMPUS
DON HONORIO VENTURA STATE UNIVERSITY
28
showed that RoBERTa-GRU model outshined the other methods on the three

datasets. RoBERTa-GRU model had accuracy scores of 94.63% for IMDb dataset,

89.59% for Sentiment140 dataset, and 91.52% for Twitter US Airlines dataset. This

study states that the combination of RoBERTa and GRU is a promising solution for

various NLP tasks and is an efficient model for sentiment analysis.

In order to improve sentiment and emotion analysis, Uddagiri Sirisha and

Bolem Sai Chandana's study, "Aspect-Based Sentiment & Emotion Analysis with

ROBERTa and LSTM" (2022), used LSTM (Long Short-Term Memory networks)

and the transformer-based model ROBERTa. By utilizing the advantages of both

models for aspect-based analysis, this work aimed to increase the precision of

sentiment and emotion detection in textual data. They discovered that the

combination strategy produced more accurate sentiment analysis results by

successfully capturing complex sentiments.

Warstadt et al. (2020) argued that pretraining models on self-supervised

linguistic tasks is effective because it helps them learn features that aid in

understanding language. However, it is also important for pre-trained models to not

only learn to represent linguistic features but also to prioritize the use of those

features during fine-tuning. With this objective in mind, a new English-language

diagnostic set called MSGS (the Mixed Signals Generalization Set) has been

introduced. It comprises 20 ambiguous binary classification tasks used to test

whether a pre-trained model favors linguistic or surface generalizations during fine-

tuning. RoBERTa models were pre-trained from scratch on datasets ranging from

COLLEGE OF COMPUTING STUDIES


MAIN CAMPUS
DON HONORIO VENTURA STATE UNIVERSITY
29
1M to 1B words, and their performance on MSGS was compared to the publicly

available RoBERTa-base. The study found that models can learn to represent

linguistic features with relatively small amounts of pretraining data, but they

require significantly more data to learn to prioritize linguistic generalizations over

surface ones. Eventually, with around 30B words of pretraining data, RoBERTa-

base did demonstrate a linguistic bias consistently. The conclusion drawn is that

while self-supervised pretraining is effective in learning useful inductive biases,

there is still potential to improve the speed at which models learn to recognize

which features are important.

Sentiment Analysis using BERT Model

Sayeed et al. (2023) evaluated the BERT model by utilizing it in certain

studies for multiple languages, restaurants, agriculture, Automated Essay Scoring

(AES), Twitter, and Google Play. It involves fine-tuning using pre-trained BERT in

order to comprehend and execute language understanding tasks. To clean up raw

data, text pre-processing is conducted to convert it to numerical values before

feeding it to the BERT model. The outcomes showed that BERT exceeded

standards on general language tasks which includes: sentiment analysis, paraphrase

recognition, and linguistic acceptability. However, detection of neutral reviews and

false reviews pose as problems on the BERT model's accuracy.

A study by Li, M., Chen, Zhao, and Li, Q. (2021) proposed a sentiment

analysis model based on BERT model for Chinese stock reviews. This avoids

building a new dictionary, and extract its features manually. The researchers found

COLLEGE OF COMPUTING STUDIES


MAIN CAMPUS
DON HONORIO VENTURA STATE UNIVERSITY
30
out that BERT+FC model via fine-tuning performs the best among the different

variants they designed. The proposed model includes BERT model and a classifier

layer. This classifier layer is stacked on top of BERT and jointly fine-tuned with

the model. The dataset used to be trained on the Chinese Wikipedia corpus is from

Github website with a total of 9204 labeled reviews, and 88.09% accuracy. To

verify the efficacy of the proposed BERT model, it was compared with TextCNN,

TextRNN, Att-BLSTM, and TextCRNN. The proposed model performed

approximately 2.14% to 3% better than the rest of the methods in terms of accuracy

in sentiment analysis of Chinese stock review text. The proposed model

outperforms traditional methods because it relied on BERT language model’s fine-

tuning.

Singh et al. (2019) claimed that sentiment analysis, also known as opinion

mining, uses natural language processing techniques to analyze a person's opinion

or emotion. Sentiment analysis is accomplished using several matrices such as

Average Likes and Re-tweets a period, Intensity Analysis, Polarity and

Subjectivity, as well as WordCloud. In addition, the Bidirectional Encoder

Representations from Transformers (BERT) model is employed to classify public

perceptions about the subject whether it is positive, negative or neutral.

In another study conducted by Sousa and Sakiyama (2019), titled "BERT for

Stock Market Sentiment Analysis", the researchers based the prediction for stock

prices from the moods in social media and financial news. After fine-tuning the

BERT model, it was able to accurately identify nuanced emotions from the text and

COLLEGE OF COMPUTING STUDIES


MAIN CAMPUS
DON HONORIO VENTURA STATE UNIVERSITY
31
match them with trends in the stock market. The study revealed that BERT

sentiment model might greatly increase the accuracy of stock market predictions by

combining both positive and negative sentiment indicators.

Ashir (2021) stated that E-commerce reviews are increasingly valued by both

customers and businesses. Businesses rely on sentiment research to enhance

product quality and make educated decisions in a fiercely competitive business

climate, which drives its high demand. The goal of this review article is to

investigate and assess the applicability of the BERT model, a Natural Language

Processing (NLP) approach, to sentiment analysis in a variety of domains. The

approach has been used in investigations including numerous languages, restaurant

enterprises, agriculture, Automated Essay Scoring (AES), Twitter, and Google

Play. The BERT model's fine-tuning procedures include employing pre-trained

BERT to accomplish a variety of language comprehension tasks. Text pre-

processing cleans up the data and converts it to numbers before it is sent into

BERT, which builds vectors for each input character. We discovered that BERT

beat the norm on a variety of general language comprehension tasks, including as

sentiment analysis, paraphrase detection, question-answering, and linguistic

acceptability. The identification of neutral reviews and the existence of fraudulent

reviews in the dataset are two issues that affect the model's accuracy. Training is

also sluggish due to its size and the large number of weights to update. Additional

research might be undertaken to increase the BERT model's accuracy by creating a

fake review classification model and providing more training to the model in

detecting neutral reviews.

COLLEGE OF COMPUTING STUDIES


MAIN CAMPUS
DON HONORIO VENTURA STATE UNIVERSITY
32
Synthesis

After reviewing the current literature, it has been found that employing

machine-learning algorithms or natural language processing to attain user feedback

can provide valuable insights to a streamer, allowing them to adjust based on the

viewer's reviews to improve on games and boost engagement. However, ambiguous

words and nuances such as sarcasm may pose as a barrier for computational

analysis. The study done by Kobs et al. (2020), established various methods in

order for streamers to check if a video is seen in a positive or negative light. In

comparison, the researchers utilize a web browser system for sentiment analysis by

comparing three models to determine which is more accurate in depicting users’

sentiment.

System Technical Background

The technical background of the "Behind the Screens" Sentiment Analysis

system includes an overview of its development plan and the requirements

necessary to prevent technical issues, ensuring optimal performance.

C.J. Hutto and Eric Gilbert launched Valence Aware Dictionary and

Sentiment Reasoner (VADER) in June 2014. VADER is a sentiment analysis

tool based on language and rules that is specifically built for social media

sentiment. VADER uses a combination of sentiment lexicons, which are

collections of lexical features (e.g., words) that are frequently classified as

positive or negative depending on their semantic orientation. VADER not only

COLLEGE OF COMPUTING STUDIES


MAIN CAMPUS
DON HONORIO VENTURA STATE UNIVERSITY
33
calculates the Positivity and Negativity scores, but it also determines whether a

sentiment is favorable or negative.

In the late 2010s, transformer-based models revolutionized the field of

Natural Language Processing (NLP). Among these, Bidirectional Encoder

Representations from Transformers (BERT), developed by Google researchers

in 2018, stood out as a significant advancement. BERT's impressive

performance quickly set a new standard for NLP tasks such as language

comprehension, question answering, and named entity recognition.

Following BERT's introduction, Facebook AI researchers developed

RoBERTa (Robustly Optimized BERT Approach), a variation that improved

upon BERT by modifying hyperparameters and the pretraining scheme.

RoBERTa's transformer-based architecture processes input sequences and

constructs contextualized representations using self-attention mechanisms,

making it highly effective for sentiment analysis tasks.

"Behind the Screens" utilizes Microsoft Visual Studio's integrated

development environment (IDE) for its development. Visual Studio supports

various open-source software development platforms, including Jupyter

Notebook and Python 3.11.7 (virtual environment), alongside RoBERTa. This

setup enables the generation of both native and managed code.

The sentiment analysis for "Behind the Screens" is built using these

technologies to process and analyze viewer comments. The system employs

COLLEGE OF COMPUTING STUDIES


MAIN CAMPUS
DON HONORIO VENTURA STATE UNIVERSITY
34
RoBERTa to understand and categorize sentiments accurately, facilitating

deeper insights into audience reactions.

COLLEGE OF COMPUTING STUDIES


MAIN CAMPUS
DON HONORIO VENTURA STATE UNIVERSITY
35

Chapter III

Methodology of the Study

Research Methodology

The researchers used sentiment analysis to analyze live stream video-on-

demand (VOD) comments on YouTube and interpret or summarize them based on

the overall sentiment. In this study, the researchers would like to emphasize that

live stream VOD is different from other YouTube videos as VODs are recorded

during a live stream and viewers can interact with the streamer real-time. Also,

VOD serves as the archived recording of the stream. While, YouTube videos are

recorded and edited beforehand. Comments under the VOD were extracted and

not the YouTube live chat during the stream.

On YouTube, viewers can leave a comment under a video unless the

comment section is disabled, and with comments, a viewer can see the opinions of

others. YouTubers, specifically in this study, the streamers also check the

comments under their videos to see the reactions, opinions, or insights of their

viewers. Sometimes, there are too many comments to read and it is a hassle to do

it one by one. The researchers will develop a website that will get the overall

sentiment of the comments under a YouTube video and give the summary of the

intent.

COLLEGE OF COMPUTING STUDIES


MAIN CAMPUS
DON HONORIO VENTURA STATE UNIVERSITY
36
Research Design

Quantitative and descriptive research designs were used by the researchers

to conduct the study. Quantitative research takes numerical data and measurement

into account and supposes that circumstances in the study can be measured

(Watson, 2015). Descriptive research generates both qualitative and quantitative

data that describe the current state of a phenomenon (Koh & Owen, 2000).

Quantitative research is applied for sentiment analysis as it involves

measurement scales or sentimental categories (positive, negative, and neutral)

defined into numerical scores or labels. To know the best performing Natural

Language Processing (NLP) pre-trained model for analyzing the sentiment of

each comment, the researchers compared three (3) language models for sentiment

analysis: Valence Aware Dictionary and sEntiment Reasoner (VADER),

Bidirectional Encoders Representations from Transformers (BERT), and Robustly

Optimized BERT Approach (RoBERTa). VADER model, developed by Gilbert &

Hutto (2014), is used for general sentiment analysis, particularly for social media

texts or comments found online (Wu et al., 2024). Researchers from Google

designed BERT model to improve the fine-tuning approaches and get the context

of unlabeled texts by pre-training deep bidirectional representations (Devlin et al.,

2019). RoBERTa is an extension of BERT model and proposed by Facebook AI

researchers which is fine-tuned for sentiment analysis with TweetEval benchmark

(Barbieri et al., 2020). Descriptive research comes into play when labeling or

summarizing the numerical scores from analyzing sentiments of the comments

into negative, positive, or neutral categories.

COLLEGE OF COMPUTING STUDIES


MAIN CAMPUS
DON HONORIO VENTURA STATE UNIVERSITY
37
Data Collection

In the study, purposive sampling was used to choose three (3) YouTube

streamers: Lilypichu, Valkyrae, and Sykkuno. These streamers share common

characteristics by being variety streamers, playing the same games, and mainly

streaming on YouTube under a contract. They are also part of the same circle

called OfflineTV and Friends, and were nominated for or won an award on The

Streamer Awards. Before YouTube, they used to stream on another streaming

platform, Twitch.

Criterion sampling, a purposive sampling method, was done to allow

comparisons between the sampled streamers. The same number of VODs (n=200)

with the same sampling time scale end point were chosen. Lilypichu’s latest VOD

in the playlist was streamed on April 28, 2024, while Valkyrae and Sykkuno’s

latest livestreams in the playlists were on April 29, 2024. All comments under the

sampled VODs were retrieved for the purpose of sentiment analysis.

The researchers created three (3) YouTube playlists and named them

according to the sampled streamers, which consist of 200 VODs from each of the

streamers. Python program (version 3.11.7) was used to retrieve all the comments

under each VOD in a playlist. In this study, the YouTube Data API was also used

for the developer or API key to get the comments. Then, the output was saved as a

CSV file. Comments, along with their timestamps, username, video ID, and date,

were retrieved. A total of 31,054 comments were extracted from 600 livestream

VODs as shown in Table 1.

COLLEGE OF COMPUTING STUDIES


MAIN CAMPUS
DON HONORIO VENTURA STATE UNIVERSITY
38
Table 1: YouTube streamer sorted based on the number of extracted comments in

ascending order.

No. YouTube Number of Number of

Streamer VODs Comments

1 Lilypichu 200 3,663

2 Sykkuno 200 10,983

3 Valkyrae 200 16,408

TOTAL 600 31,054

Data Preprocessing

Preprocessing techniques were performed for cleaning and organizing the

comments retrieved for classification. Texts from the internet usually contains

noise such as HTML tags, scripts, special characters, etc. Natural Language

Toolkit (NLTK) was used for language processing, especially the VADER model.

Removal of numbers, conversion of emojis or emoticons to text, correction of

spellings were done to get the sentiment of each comment. These preprocessing

techniques were done for classifying each comment sentiment. For the overall

sentiment of all the comments, English stop words, punctuations, and special

characters were removed as these do not contribute to the overall sentiment

relating to each streamer and content of the study. Finally, the data were

converted to lowercase, stemmed, and tokenized.

COLLEGE OF COMPUTING STUDIES


MAIN CAMPUS
DON HONORIO VENTURA STATE UNIVERSITY
39
Sentiment Analysis Methodology

As what was mentioned, the datasets were analyzed by using three (3)

models to compare which model would perform best for the study. Capitalization

and punctuations were kept as they give more subtle understanding of the

emotions behind the comments. First, VADER model was used to classify the

datasets. Each comment extracted from the VODs was analyzed for its sentiment.

Next, BERT model classified the datasets. Unlike the two (2) models, the polarity

scores of this model are zero (0) to five (5): 0 being the most negative, 3 being

neutral, and 5 being the most positive. Lastly, the datasets were classified by

using RoBERTa model. The polarity score of each comment was done in a for-

loop, as well as the classification of the comments.

The rest of the preprocessing techniques were applied to the dataset. The

sentiment of the comments were counted to determine the emotions of the viewers

towards the streamer. The first five (5) most frequent words that appeared in the

comments per streamer based on sentiment were extracted. These words from the

streamers were also compared from one another. Word clouds were used to

visualize the sentiment of the viewers per streamer. Lastly, all the comments per

streamer will be interpreted or summarized based on the underlying sentiment.

COLLEGE OF COMPUTING STUDIES


MAIN CAMPUS
DON HONORIO VENTURA STATE UNIVERSITY
40
Software Development Methodology

Figure 3: Iterative Software Development Life Cycle

Iterative model will be used for developing the system. This software

development life cycle has elements from waterfall model in iterative form.

Initially, iterative model implements parts of the total system and adds

functionality in the next iterations (Alshamrani & Bahattab, 2015).

Iteration 1. Initial requirements were gathered through sentiment analysis.

The main functions of the system were designed. The main function should be

accepting any YouTube video URL and analyze the comment section’s overall

sentiment. No backend programming was added yet. The developers of the study

designed the initial user interface and the basic requirements of the system. The

system was not ready to accept any user input in this stage. The developers

implemented the system and reviewed if there were functions to be removed,

COLLEGE OF COMPUTING STUDIES


MAIN CAMPUS
DON HONORIO VENTURA STATE UNIVERSITY
41
added, or improved.

Iteration 2. The developers will be putting all the reviews, feedbacks,

suggestions into account. The frontend will be improved and the basic functions

of the system will be programmed. The user should be able to input YouTube

video URL and analyze for the overall sentiment. The results of the analysis

should be returned. The system will be tested, implemented, and reviewed for

changes. This stage will continue until the application achieves user-satisfaction.

Iteration 3. This stage will still include all the changes, improvements,

and feedbacks from previous iterations. In this stage, the system will undergo user

testing and be implemented. The system will be reviewed to check for further

improvements or errors. Then, the system will be deployed and be available for

the users to utilize. The web application should be maintained by the developers

to ensure that the system is perfectly working.

COLLEGE OF COMPUTING STUDIES


MAIN CAMPUS
DON HONORIO VENTURA STATE UNIVERSITY
42
Requirements Analysis and Documentation

This section is comprised of System Work Plan, USE-CASE Diagram,

Data Flow Diagram, Entity Relationship Diagram, Visual Table of Contents

Diagram, and Deployment Diagram.

Work Plan

Figure 4: Behind the Screens System Work Plan

The researchers collected data by extracting VOD comments from three

(3) YouTube streamers: Lilypichu, Valkyrae, and Sykkuno. Preprocessing

techniques were applied to the data collected. Then, the data were analyzed and

classified. All comments (data) will be interpreted or summarized. This is the

main feature for the web application. To develop a web application, Python Flask

will be used as a framework for backend, Tailwind CSS for frontend, and MySQL

for database.

Frontend. Tailwind CSS is a utility-first framework for designing the user

COLLEGE OF COMPUTING STUDIES


MAIN CAMPUS
DON HONORIO VENTURA STATE UNIVERSITY
43
interface for websites. The researchers will use this framework for designing the

frontend. Initially, the UI/UX designers of the study used Figma to design the

frontend.

Backend. The researchers will use Flask framework for developing the

web application, Behind the Screens. Flask is a micro web framework

programmed in Python that helps developers to ease common tasks in web

development (e.g. authentication, routing, sessions, templating, caching, etc.).

Database. MySQL is an open-source relational database management

system that is used for managing structured data. The web application will be

using this database management system as the user should be able to access their

past requests of analysis of comments from YouTube videos.

COLLEGE OF COMPUTING STUDIES


MAIN CAMPUS
DON HONORIO VENTURA STATE UNIVERSITY
44
USE – CASE Diagram

Figure 5: Behind the Screens’ USE-CASE Diagram

Viewers and Streamers (New and Registered) and admin are the actors in

this diagram. YouTubers who either streams or uploads edited videos, or viewers

are the users of the website. New users can register an account to start using the

web application. Once a user has an existing account, they can log in and utilize

the YouTube comments analyzer. They can request interpretations by entering a

URL of the desired YouTube video, and view interpretations of past requests. The

interpretation includes the overall sentiment of the comment section, five (5) most

frequently used words, and a word cloud. The user can also view how many

comments are negative, positive, or neutral with the sentiment counter. The admin

COLLEGE OF COMPUTING STUDIES


MAIN CAMPUS
DON HONORIO VENTURA STATE UNIVERSITY
45
(developers) of the web application can manage and view all the user accounts,

and user requests (the YouTube comments interpretations).

Data Flow Diagram

Figure 6: Data Flow Diagram

COLLEGE OF COMPUTING STUDIES


MAIN CAMPUS
DON HONORIO VENTURA STATE UNIVERSITY
46
External Entities

1. Users: YouTube URL is provided by the user and results are received.

2. Admin: The system is managed by the admin and is also able to view the

interpretations.

Processes

1. YouTube Comments Extraction: Extracts all the comments from the

YouTube video’s comment section.

2. Comment Sentiment Labeling: Each extracted and cleaned comments are

categorized (positive, negative, neutral).

3. Sentiment Counter: Counts the comments based on their sentiment

category.

4. Most Frequently Used Words: Identifies the five (5) most frequently used

words, how many times the words were used, and provide their

sentiments.

5. Comment Summarization: Summarizes all the comments for the overall

interpretation.

Data Stores

1. Extracted Comments: Where the comments extracted from YouTube URL

are stored.

2. Labeled Comments: Where the comments that were labeled are stored.

3. Sentiment Counts: Where the counted comments based on sentiment are

stored.

COLLEGE OF COMPUTING STUDIES


MAIN CAMPUS
DON HONORIO VENTURA STATE UNIVERSITY
47
4. Frequent Words: Where the frequently used words are stored.

5. Summarized Comments: Where the overall interpretations are stored.

6. Interpretations: Where the results of the analysis are stored.

COLLEGE OF COMPUTING STUDIES


MAIN CAMPUS
DON HONORIO VENTURA STATE UNIVERSITY
48
Entity Relationship Diagram

Figure 7: Entity Relationship Diagram for Database

The database has tables called: loginReq for login requirements that also

serves as an audit trail; admin and user that consists of basic user and admin

COLLEGE OF COMPUTING STUDIES


MAIN CAMPUS
DON HONORIO VENTURA STATE UNIVERSITY
49
information; Youtube URL for storing the video URL; comments where

comments extracted from the YouTube video are stored; labeledComments where

comments that were labeled using sentiment analysis are stored;

sentimentCounter where counted positive, negative, and neutral comments are

stored; FrequentWords where words from the comments that frequently appeared

were counted and labeled are stored; Lastly, summarizedComments where the

summary of all the comments from the YouTube video are stored.

COLLEGE OF COMPUTING STUDIES


MAIN CAMPUS
DON HONORIO VENTURA STATE UNIVERSITY
50
System Flowchart

This section contains the system flowchart of the web system “Behind the

Screens." It is divided into 3 parts: login and registration, admin, and user.

Figure 8: System Flowchart for login and registration

COLLEGE OF COMPUTING STUDIES


MAIN CAMPUS
DON HONORIO VENTURA STATE UNIVERSITY
51

Figure 9: System Flowchart for User

COLLEGE OF COMPUTING STUDIES


MAIN CAMPUS
DON HONORIO VENTURA STATE UNIVERSITY
52

Figure 10: System Flowchart for Admin

COLLEGE OF COMPUTING STUDIES


MAIN CAMPUS
DON HONORIO VENTURA STATE UNIVERSITY
53
Visual Table of Contents Diagram

Figure 11: Visual Table of Contents Diagram

COLLEGE OF COMPUTING STUDIES


MAIN CAMPUS
DON HONORIO VENTURA STATE UNIVERSITY
54
Deployment Diagram

Figure 12: Deployment Diagram

COLLEGE OF COMPUTING STUDIES


MAIN CAMPUS
DON HONORIO VENTURA STATE UNIVERSITY
55
Research Tools and Instruments

The researchers used sentiment analysis to determine the emotions of the

viewers by classifying each comment of the VODs per streamer. Sentiment

analysis was used to assess the overall sentiment of the viewers towards the

streamer.

Observation was also conducted to determine the correlation between

viewer comments and the content. The researchers watched ongoing live streams,

and live stream VODs on YouTube. The researchers observed the chat during live

streams, and analyzed the recorded chat during VODs to identify the behavior of

the viewers.

1. How frequent do viewers send their message during different parts of

the stream?

2. How do the viewers respond to the streamer's reactions during specific

parts of the stream?

3. How do emotional responses vary depending on the content being

streamed?

4. What are the tasks of the moderators in a streamer's chat?

5. How does the streamer interact with the chat and how often do they do

it?

The researchers interviewed both viewers and streamers to gather more

information for the development of the web application. Questionnaires were

given to those who were not able to attend the interview due to conflicts of

COLLEGE OF COMPUTING STUDIES


MAIN CAMPUS
DON HONORIO VENTURA STATE UNIVERSITY
56
schedules, especially the streamers. The researchers used open-ended questions to

allow the streamers and viewers more freedom to express their thoughts and

insights. The researchers designed different set of guide questions to ask the

streamers and viewers: one set for the streamers, and another for the viewers.

Guide questions for interviewing viewers:

1. Do you watch YouTube videos? If yes, how often?

2. What type of content do you usually watch? Can you give an example of a

creator you watch?

3. Do you read the comment section?

4. What do you think about the comments? Do they influence your opinion

on the video?

5. Do you sometimes leave a comment or have you ever thought of leaving a

comment?

6. If you do, what motivates you to leave a comment on a YouTube video?

7. What kind of comments do you usually leave on YouTube videos? Why?

8. What types of comments do you find most helpful when deciding whether

to watch a video?

9. Do you think it is important for a creator to read (and respond) to

comments? If yes, how so?

10. If you do leave a comment, what type/s of video content motivate you to

leave detailed comments rather than short ones?

11. If the content creator saw a comment of yours, what would you feel if they

liked and/or replied to you?

COLLEGE OF COMPUTING STUDIES


MAIN CAMPUS
DON HONORIO VENTURA STATE UNIVERSITY
57
12. Have you ever felt discouraged from commenting due to the nature of

other comments on a video? Why?

13. In what ways do you think viewer comments contribute to the success or

failure of a YouTube video?

Guide questions for interviewing streamers:

1. How long have you been streaming?

2. What made you or inspired you to start streaming?

3. What type of content do you usually stream?

4. What do you usually feel when you read the chat? How does it affect you?

5. Do you make playful banters with your chat?

6. What do you feel when you receive positive messages?

7. Have you ever encountered a negative message or comment? If so, how do

you handle it?

8. Have you ever felt discouraged to continue what you do because of a

comment?

9. Have you ever timed out or banned a viewer based on their chat? If so, can

you give an example of what they said?

10. Do you think leaving a chat (on a livestream) or a comment (if it's a pre-

recorded video uploaded, let's say on YouTube) is important or makes a

difference in your content creation? Why or why not?

COLLEGE OF COMPUTING STUDIES


MAIN CAMPUS
DON HONORIO VENTURA STATE UNIVERSITY
58
Implementation Plan

Project Manager

The project manager is responsible for managing the overall project. He or

she manages, organizes, and plans for the researchers. He or she is also

responsible for distributing tasks, conducting meetings, and communicating with

the thesis adviser and other personnel involved with the study.

Frontend Developer

They are responsible for designing and developing the user interface of the

website.

Backend Developer

Backend developers are assigned to develop the processes and logic

behind the website, ensuring the website is working as intended.

Manuscript Writer

Writers are assigned to write and document the processes that took place

while conducting the study and developing the system.

Data Collector

They are responsible for collecting the data necessary for the study. In this

study, sentiment analysis usually involves text analysis. The data collector

collects about 30,000 comments for the study.

COLLEGE OF COMPUTING STUDIES


MAIN CAMPUS
DON HONORIO VENTURA STATE UNIVERSITY
59
Gantt Chart

Figure 13: Gantt Chart

COLLEGE OF COMPUTING STUDIES


MAIN CAMPUS
DON HONORIO VENTURA STATE UNIVERSITY
60
References

Alrumaih, A., Al-Sabbagh, A., Alsabah, R., Kharrufa, H., & Baldwin, J. (2020).

Sentiment analysis of comments in social media. International Journal of

Power Electronics and Drive Systems/International Journal of Electrical

and Computer Engineering, 10(6), 5917. Retrieved May 18, 2024, from

https://doi.org/10.11591/ijece.v10i6.pp5917-5922

Barbieri, F., Camacho-Collados, J., Neves, L., & Espinosa-Anke, L. (2020,

October 26). TweetEval: Unified Benchmark and Comparative Evaluation

for Tweet Classification. DOI. https://doi.org/10.48550/arXiv.2010.12421

Bird, Steven, Edward Loper and Ewan Klein (2009), Natural Language

Processing with Python. O’Reilly Media Inc.

Birjali, M., Kasri, M., & Beni-Hssane, A. (2021). A comprehensive survey on

sentiment analysis: Approaches, challenges and trends. Knowledge-based

Systems, 226, 107134. Retrieved May 19, 2024, from

https://doi.org/10.1016/j.knosys.2021.107134

Chouhan, A., Halgekar, A., Rao, A., Khankhoje, D., & Narvekar, M. (2021).

Sentiment Analysis of Twitch.tv Livestream Messages using Machine

Learning Methods. 2021 Fourth International Conference on Electrical,

Computer and Communication Technologies (ICECCT). Retrieved May

18, 2024, from https://doi.org/10.1109/icecct52121.2021.9616932

Danda, A. (2021). Gaming sentiment: The relationship of comment sentiment and

subscriber growth rate. Journal of Student Research, 10(2). Retrieved May

15, 2024, from https://doi.org/10.47611/jsrhs.v10i2.1722

COLLEGE OF COMPUTING STUDIES


MAIN CAMPUS
DON HONORIO VENTURA STATE UNIVERSITY
61
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019, May 24). BERT: Pre-

training of Deep Bidirectional Transformers for Language Understanding.

DOI. https://doi.org/10.48550/arXiv.1810.04805

Gajanayake, G., & Sandanayake, T. (2020). Trending Pattern identification of

YouTube gaming channels using sentiment analysis. Retrieved May 15,

2024, from https://www.semanticscholar.org/paper/Trending-Pattern-

Identification-of-YouTube-Gaming-Gajanayake-Sandanayake/

7c90f945e738487209c1194f53e9260c60ec3e52

Hartmann, J., Heitmann, M., Siebert, C., & Schamp, C. (2023). More than a

Feeling: Accuracy and Application of Sentiment Analysis. International

Journal of Research in Marketing, 40(1), 75–87. Retrieved May 19, 2024,

from https://doi.org/10.1016/j.ijresmar.2022.05.005

Kobs, K., Zehe, A., Bernstetter, A., Chibane, J., Pfister, J., Tritscher, J., & Hotho,

A. (2020b). Emote-Controlled. ACM Transactions on Social Computing,

3(2), 1–34. May 15, 2024, from https://doi.org/10.1145/3365523

Mukherjee, S. (2020). Sentiment analysis. In Apress eBooks (pp. 113–127).

Retrieved May 18, 2024, from https://doi.org/10.1007/978-1-4842-6543-

7_7

Mukhopadhyay, A., Patel, S., & Parmar, V. (2022). Sentiment Analysis on

YouTube using Lexicon Based Approach - Peer-reviewed Journal. Peer-

reviewed Journal. Retrieved May 15, 2024, from

https://ijarcce.com/papers/sentiment-analysis-on-youtube-using-lexicon-

based-approach/

COLLEGE OF COMPUTING STUDIES


MAIN CAMPUS
DON HONORIO VENTURA STATE UNIVERSITY
62

Naf’an, M. Z., Bimantara, A. A., Larasati, A., Risondang, E. M., & Setya

Nugraha, N. A. (2019, April). Human verification. Retrieved May 19,

2024, from https://www.semanticscholar.org/paper/Sentiment-Analysis-

of-Cyberbullying-on-Instagram-Naf%E2%80%99an-Bimantara/

5ed9294f98c53e8c8b4d5b06ff5c54091f0e1054?p2df

Poecze, F., Ebster, C., & Strauss, C. (2019). Let’s play on Facebook: using

sentiment analysis and social media metrics to measure the success of

YouTube gamers’ post types. Personal and Ubiquitous Computing, 26(3),

901–910. Retrieved May 18, 2024, from https://doi.org/10.1007/s00779-

019-01361-7

Qian, K., & Jain, S. (2024). Digital Content Creation: An analysis of the impact of

recommendation systems. Management Science. Retrieved April 28, 2024,

from https://doi.org/10.1287/mnsc.2022.03655

Sainath Pichad, Sunit Kamble, Rohan Kalamb, & Chavan, S. (2023, May 10).

Analysing Sentiments for YouTube Comments using Machine Learning.

IJRASET. Retrieved May 31, 2024, from

https://www.ijraset.com/research-paper/analysing-sentiments-for-youtube-

comments

Sasikala, P., & Sheela, L. M. I. (2020). Sentiment analysis of online product

reviews using DLMNN and future prediction of online product using

IANFIS. Journal of Big Data, 7(1). Retrieved May 19, 2024, from

https://doi.org/10.1186/s40537-020-00308-7

COLLEGE OF COMPUTING STUDIES


MAIN CAMPUS
DON HONORIO VENTURA STATE UNIVERSITY
63
Singh, R. (2021). YOUTUBE COMMENTS SENTIMENT ANALYSIS.

ResearchGate. Retrieved May 15, 2024, from

https://www.researchgate.net/publication/351351202_YOUTUBE_COM

MENTS_SENTIMENT_ANALYSIS

Shoufan, A. (2019). What motivates university students to like or dislike an

educational online video? A sentimental framework. Computers and

Education/Computers & Education, 134, 132–144. Retrieved May 22,

2024, from https://doi.org/10.1016/j.compedu.2019.02.008

Tibor, G., & Szűcs, J. (2023). Length and sentiment analysis of reviews about

top-level video game genres on the steam platform. Retrieved May 19,

2024, from https://ouci.dntb.gov.ua/en/works/7BKXyaD4/

Vyas, V., & Uma, V. (2019). Approaches to sentiment analysis on product

reviews. In Advances in business information systems and analytics book

series (pp. 15–30). Retrieved May 18, 2024, from

https://doi.org/10.4018/978-1-5225-4999-4.ch002

Wu, Y., Lin, M., & Yao, W. (2024, April 19). The Influence of Titles on YouTube

Trending Videos. DOI. http://doi.org/10.54254/2753-7064/29/20230835

Yang, Z., MA. (2020, December 11). Text and sentiment analysis of YouTube

health-related vlog comments and brand endorsement effectiveness.

Retrieved May 18, 2024, from

https://repositories.lib.utexas.edu/items/ebee793a-80ef-420d-bc19-

4e9a9841faae

COLLEGE OF COMPUTING STUDIES


MAIN CAMPUS
DON HONORIO VENTURA STATE UNIVERSITY
64
Yildiz, S. N. (2022). Comparison of Various Methods of Sentiment Analysis: For

the Case of Twitch. Retrieved May 18, 2024, from

https://arno.uvt.nl/show.cgi?fid=161872.

Borg, Anton & Boldt, Martin. (2020). Using VADER sentiment and SVM for

predicting customer response sentiment. Expert Systems with

Applications. 162. 113746. 10.1016/j.eswa.2020.113746.

Semary, Noura & Ahmed, Wesam & Amin, Khalid & Pławiak, Paweł &

Hammad, Mohamed. (2023). Improving sentiment classification using a

RoBERTa-based hybrid model. Frontiers in Human Neuroscience. 17.

10.3389/fnhum.2023.1292010.

Sayeed, Md Shohel & Roji, Varsha & Anbananthen, Kalaiarasi. (2023). BERT: A

Review of Applications in Sentiment Analysis. HighTech and Innovation

Journal. 4. 453-462. 10.28991/HIJ-2023-04-02-015.

Akhtar, Mohd. (2019). Sentiment Analysis on YouTube Comments: A brief

study.

Elbagir, S., & Yang, J. (2019). Twitter Sentiment Analysis using Natural

Language Toolkit and VADER Sentiment. International MultiConference

of Engineers and Computer Scientists.

https://www.iaeng.org/publication/IMECS2019/IMECS2019_pp12-16.pdf

Singh, M., Jakhar, A. K., & Pandey, S. (2021). Sentiment analysis on the impact

of coronavirus in social life using the BERT model. Social Network

Analysis and Mining, 11(1). https://doi.org/10.1007/s13278-021-00737-z

COLLEGE OF COMPUTING STUDIES


MAIN CAMPUS
DON HONORIO VENTURA STATE UNIVERSITY
65
Liao, W., Zeng, B., Yin, X., & Wei, P. (2020). An improved aspect-category

sentiment analysis model for text sentiment analysis based on RoBERTa.

Applied Intelligence, 51(6), 3522–3533. https://doi.org/10.1007/s10489-

020-01964-1

Cunha, Alexandre & Costa, Melissa & Pacheco, Marco. (2019). Sentiment

Analysis of YouTube Video Comments Using Deep Neural Networks.

10.1007/978-3-030-20912-4_51.

Amin, A., Hossain, I., Akther, A., & Alam, K. M. (2019). Bengali VADER: A

Sentiment Analysis Approach Using Modified VADER.

https://doi.org/10.1109/ecace.2019.8679144

Li, M., Chen, L., Zhao, J., & Li, Q. (2021). Sentiment analysis of Chinese stock

reviews based on BERT model. Applied Intelligence, 51(7), 5016–5024.

https://doi.org/10.1007/s10489-020-02101-8

Tan, K. L., Lee, C. P., & Lim, K. M. (2023). RoBERTa-GRU: A Hybrid Deep

Learning Model for Enhanced Sentiment Analysis. Applied Sciences,

13(6), 3915. https://doi.org/10.3390/app13063915

Alzahrani, Mohammad & Aldhyani, Theyazn & Alsubari, Saleh & Althobaiti,

Maha & Fahad, Adil. (2022). Developing an Intelligent System with Deep

Learning Algorithms for Sentiment Analysis of E-Commerce Product

Reviews. Computational Intelligence and Neuroscience. 2022. 1-10.

10.1155/2022/3840071.

COLLEGE OF COMPUTING STUDIES


MAIN CAMPUS
DON HONORIO VENTURA STATE UNIVERSITY
66
Pano, T., & Kashef, R. (2020). A Complete VADER-Based Sentiment Analysis

of Bitcoin (BTC) Tweets during the Era of COVID-19. Big Data and

Cognitive Computing, 4(4), 33. https://doi.org/10.3390/bdcc4040033

Sousa, M. G., Sakiyama, K., De Souza Rodrigues, L., De Moraes, P. H.,

Fernandes, E. R., & Matsubara, E. (2019). BERT for Stock Market

Sentiment Analysis. https://www.semanticscholar.org/paper/BERT-for-

Stock-Market-Sentiment-Analysis-Sousa-Sakiyama/

e03d32c04c6bb4d2383ac4df25f954dd941152c3

Sirisha, U., & Chandana, B. S. (2022). Aspect based Sentiment & Emotion

Analysis with ROBERTa, LSTM. International Journal of Advanced

Computer Science and Applications/International Journal of Advanced

Computer Science & Applications, 13(11).

https://doi.org/10.14569/ijacsa.2022.0131189

Wankhade, M., Rao, A. C. S., & Kulkarni, C. (2022). A survey on sentiment

analysis methods, applications, and challenges. Artificial Intelligence

Review, 55(7), 5731–5780. https://doi.org/10.1007/s10462-022-10144-1

Sayeed, Md Shohel & Roji, Varsha & Anbananthen, Kalaiarasi. (2023). BERT: A

Review of Applications in Sentiment Analysis. HighTech and Innovation

Journal. 4. 453-462. 10.28991/HIJ-2023-04-02-015.

Tan, Jih & Chia, Wai. (2022). Research Output to Industry Use: A Readiness

Study for Topic Modelling with Sentiment Analysis. 10.1007/978-981-16-

8515-6_2.

COLLEGE OF COMPUTING STUDIES


MAIN CAMPUS
DON HONORIO VENTURA STATE UNIVERSITY
67
Garay, Jack Laurence & Yap, R & Sabellano, Mary Jane. (2019). An analysis on

the insights of the anti-vaccine movement from social media posts using

k-means clustering algorithm and VADER sentiment analyzer. IOP

Conference Series: Materials Science and Engineering. 482. 012043.

10.1088/1757-899X/482/1/012043.

Warstadt, Alex & Zhang, Yian & Li, Haau-Sing & Liu, Haokun & Bowman,

Samuel. (2020). Learning Which Features Matter: RoBERTa Acquires a

Preference for Linguistic Generalizations (Eventually).

COLLEGE OF COMPUTING STUDIES


MAIN CAMPUS

You might also like