Professional Documents
Culture Documents
02. Manuscript (2)
02. Manuscript (2)
02. Manuscript (2)
1
Chapter I
Introduction
was always a costly and challenging undertaking. The Internet makes things
significantly simpler, quicker, and less expensive (Blank, 2013). However, the
computer systems, had made media streaming feasible. When data is sent across
continue while new data is being received, the technique is known as streaming
(Fecheyr-Lippens, 2010).
Platforms that allow individuals to stream video content over the internet
are often referred to as user-generated live streaming systems (Pires & Simon,
In recent years, live video streaming has become a global business and
Live, have been established and have experienced remarkable global expansion.
However, not enough attention has been given by studies to comprehending the
al., 2017).
these comments.
linguistic structure, the context of words used in the text, and both positive and
content creators. Viewer sentiment can profoundly impact the success of a live
gain valuable insights into audience preferences, reactions, and interests. This
information enables streamers to tailor their content to better align with viewer
streamers, revolves around handling the large volume of comments their videos
attract. It's impractical for them to manually sift through hundreds or thousands
of comments. While likes and dislikes offer some insight into viewer feedback,
1. What are the requirements to produce a web system that utilize sentiment
analysis?
The main objective of the study, titled "Viewer Sentiment Analysis in Live
system, that utilizes sentiment analysis. Specifically, this study aims to:
preferences.
reputation management.
responses. This insight allows them to tailor their content more effectively to
Viewers: Viewers stand to gain from the enhanced content quality that
attuned to their audience's preferences and emotional responses, they can create
content that better aligns with what viewers enjoy and find engaging.
social media can expand their understanding of the dynamics within online
develop more sophisticated models and tools for sentiment analysis, contributing
audio, and multimedia content that is created and shared across digital platforms.
digital content for online distribution and consumption. Digital content creators
creative professionals who produce content for websites, social media, streaming
the internet, allowing viewers to watch events as they happen. Livestreams often
features.
over the internet in real-time. Live streaming allows content creators to engage
feedback.
Streamer. Someone who broadcasts live video content over the internet,
YouTube, or Facebook Gaming. They cover diverse activities such as gaming, art
Video on Demand (VOD). Videos that viewers can access at any time,
rather than at a scheduled broadcast time. VOD allows users to watch videos
viewers, which can include actions such as liking, commenting, sharing, and
comments through lexicon-based sentiment analysis and data collection via the
The system is designed to work only with YouTube videos, and the
findings may not be applicable to other platforms. Data is gathered only from
algorithm is limited to text data and may struggle to interpret sarcasm, slang,
Sentiment Analysis
The study aimed to provide insights into the strengths and limitations of each
lexicon-based techniques, along with related fields like emotion detection. The
in text and highlighted the need for further research in sentiment analysis.
people's sentiments in online comments. They suggest that considering both text
and emojis is crucial, especially for Arabic comments. The study also notes that
Overall, the study highlights the potential of emojis for sentiment analysis and
behavior has made it the most popular medium for sharing videos in society.
YouTube allows anyone to create an account in any category and upload videos
audience and acquire popularity. Many YouTube channel owners are taking
various tactics to make their videos popular. Evaluate user comments, determine
content, which influences their decision to subscribe to such channels (Danda &
Talarczyk, 2021).
Singh and Tiwari (2021) have used sentiment analysis to explore the
(ML) and natural language processing (NLP). In a similar case, they analyzed
sentiments in user feedback, which can provide valuable insights into public
Using this system, they examined comments on popular YouTube channels and
found that Sadness, Surprise, and Joy were the most common emotions
expressed.
based method for retrieving relevant and popular YouTube videos by analyzing
sentiment in user comments. Their approach involves four key steps: collecting
in user comments.
tended to prioritize the product over the endorsers when it came to health-related
items.
In a study by Kobs et al. (2020), they found that sentiment analysis helps
Despite challenges with Twitch's unique language, the study showed that these
Twitch.tv as the platform. In this study, researchers scraped livestream chats and
clips for data. Since the data was unlabeled, they had to find a solution. They
had two people label each message as Positive, Neutral, or Negative. They used
evaluate than other methods. They compared two models: convolutional neural
time (Chouhan et al., 2021). The study utilized a range of models, including
Random Forest Classifier, and Multinomial Naïve Bayes. The Support Vector
Twitch.tv.
Strauss, 2019). These YouTuber gamers were selected based on their common
traits and high popularity. The study employed judgment sampling followed by
Netvizz (version 1.45), researchers extracted posts from each YouTuber's page,
considering the unique language used in the gaming community. The study's
feedback and preferences. The study classified sentiments into positive and
negative categories, using lexical classification when labeled data was scarce.
decision-making.
cyberbullying within Instagram user comments was explored. The study focuses
on discussions related to the 2017 Jakarta Governor Election and examines how
social media enables individuals to freely express opinions, both positive and
processes.
promoting good health. Using Twitter data logs and the MooM dataset, they
categorized tweets based on trending topics and described their alignment and
user requirements and Twitter data analysis. Overall, their study provided
trends.
StarCraft 2 chat. They use SO-CAL for sentiment and toxicity detection, finding
it effective. Their study emphasizes the need for tailored dictionaries in gaming
They also note the overlap between sentiment classification and toxicity
detection tasks.
Guzsvinecz & Szűcs (2023) studied player reviews on Steam to help game
negative reviews are longer and written sooner than positive ones. Each genre
shows different emotional patterns, with action and adventure games receiving
feedback.
Strååt & Verhagen (2017) found that user review sentiments on Metacritic
align closely with their ratings. They analyzed reviews for Dragon Age and
Mass Effect games, focusing on aspects like combat, story, and character.
sentiment and opinion. The Stanford Deep Learning for Sentiment Analysis
model is noted for its success. Challenges in sentiment analysis include the
prices.
analysis to figure out how people felt about products. Then, they used another
in the future. Based on this prediction, they made sure that the best reviews of
products were shown first to customers. This helps customers find great
products more easily. The researchers aimed to make online shopping simpler
Book Series" by Vyas and Uma (2019), Chapter 2 discusses how to understand
feelings in product reviews. It explains that sentiment analysis helps figure out if
a review is positive, negative, or neutral. This helps businesses see what they're
doing well and what needs improvement in their products. Sentiment analysis
also helps marketing teams target their ads better. The chapter explains how
sentiment analysis works and mentions that it usually uses supervised learning
can gauge people's feelings. They found that using machine learning was better
than other methods. Looking at whole documents, not just sentences, improved
accuracy by 7%. Using more data helped traditional methods. They found that
advanced methods, like transfer learning, worked best, especially for analyzing
product reviews.
Synthesis
After reviewing the current literature, it has been found that employing
can provide valuable insights to a streamer, allowing them to adjust based on the
words and nuances such as sarcasm may pose as a barrier for computational
analysis. The study done by Kobs et al. (2020), established various methods in
comparison, the researchers utilize a web browser system for sentiment analysis by
sentiment.
C.J. Hutto and Eric Gilbert launched Valence Aware Dictionary and
tool based on language and rules that is specifically built for social media
calculates the Positivity and Negativity scores, but it also determines whether a
performance quickly set a new standard for NLP tasks such as language
The sentiment analysis for "Behind the Screens" is built using these
Chapter III
Research Methodology
the overall sentiment. In this study, the researchers would like to emphasize that
live stream VOD is different from other YouTube videos as VODs are recorded
during a live stream and viewers can interact with the streamer real-time. Also,
VOD serves as the archived recording of the stream. While, YouTube videos are
recorded and edited beforehand. Comments under the VOD were extracted and
comment section is disabled, and with comments, a viewer can see the opinions of
others. YouTubers, specifically in this study, the streamers also check the
comments under their videos to see the reactions, opinions, or insights of their
viewers. Sometimes, there are too many comments to read and it is a hassle to do
it one by one. The researchers will develop a website that will get the overall
sentiment of the comments under a YouTube video and give the summary of the
intent.
to conduct the study. Quantitative research takes numerical data and measurement
into account and supposes that circumstances in the study can be measured
data that describe the current state of a phenomenon (Koh & Owen, 2000).
defined into numerical scores or labels. To know the best performing Natural
each comment, the researchers compared three (3) language models for sentiment
Hutto (2014), is used for general sentiment analysis, particularly for social media
texts or comments found online (Wu et al., 2024). Researchers from Google
designed BERT model to improve the fine-tuning approaches and get the context
(Barbieri et al., 2020). Descriptive research comes into play when labeling or
In the study, purposive sampling was used to choose three (3) YouTube
characteristics by being variety streamers, playing the same games, and mainly
streaming on YouTube under a contract. They are also part of the same circle
called OfflineTV and Friends, and were nominated for or won an award on The
platform, Twitch.
comparisons between the sampled streamers. The same number of VODs (n=200)
with the same sampling time scale end point were chosen. Lilypichu’s latest VOD
in the playlist was streamed on April 28, 2024, while Valkyrae and Sykkuno’s
latest livestreams in the playlists were on April 29, 2024. All comments under the
The researchers created three (3) YouTube playlists and named them
according to the sampled streamers, which consist of 200 VODs from each of the
streamers. Python program (version 3.11.7) was used to retrieve all the comments
under each VOD in a playlist. In this study, the YouTube Data API was also used
for the developer or API key to get the comments. Then, the output was saved as a
CSV file. Comments, along with their timestamps, username, video ID, and date,
were retrieved. A total of 31,054 comments were extracted from 600 livestream
ascending order.
Data Preprocessing
comments retrieved for classification. Texts from the internet usually contains
noise such as HTML tags, scripts, special characters, etc. Natural Language
Toolkit (NLTK) was used for language processing, especially the VADER model.
spellings were done to get the sentiment of each comment. These preprocessing
techniques were done for classifying each comment sentiment. For the overall
sentiment of all the comments, English stop words, punctuations, and special
relating to each streamer and content of the study. Finally, the data were
As what was mentioned, the datasets were analyzed by using three (3)
models to compare which model would perform best for the study. Capitalization
and punctuations were kept as they give more subtle understanding of the
emotions behind the comments. First, VADER model was used to classify the
datasets. Each comment extracted from the VODs was analyzed for its sentiment.
Next, BERT model classified the datasets. Unlike the two (2) models, the polarity
scores of this model are zero (0) to five (5): 0 being the most negative, 3 being
neutral, and 5 being the most positive. Lastly, the datasets were classified by
using RoBERTa model. The polarity score of each comment was done in a for-
The rest of the preprocessing techniques were applied to the dataset. The
sentiment of the comments were counted to determine the emotions of the viewers
towards the streamer. The first five (5) most frequent words that appeared in the
comments per streamer based on sentiment were extracted. These words from the
streamers were also compared from one another. Word clouds were used to
visualize the sentiment of the viewers per streamer. Lastly, all the comments per
Iterative model will be used for developing the system. This software
development life cycle has elements from waterfall model in iterative form.
Initially, iterative model implements parts of the total system and adds
The main functions of the system were designed. The main function should be
accepting any YouTube video URL and analyze the comment section’s overall
sentiment. No backend programming was added yet. The developers of the study
designed the initial user interface and the basic requirements of the system. The
system was not ready to accept any user input in this stage. The developers
suggestions into account. The frontend will be improved and the basic functions
of the system will be programmed. The user should be able to input YouTube
video URL and analyze for the overall sentiment. The results of the analysis
should be returned. The system will be tested, implemented, and reviewed for
changes. This stage will continue until the application achieves user-satisfaction.
Iteration 3. This stage will still include all the changes, improvements,
and feedbacks from previous iterations. In this stage, the system will undergo user
testing and be implemented. The system will be reviewed to check for further
improvements or errors. Then, the system will be deployed and be available for
the users to utilize. The web application should be maintained by the developers
Work Plan
techniques were applied to the data collected. Then, the data were analyzed and
main feature for the web application. To develop a web application, Python Flask
will be used as a framework for backend, Tailwind CSS for frontend, and MySQL
for database.
frontend. Initially, the UI/UX designers of the study used Figma to design the
frontend.
Backend. The researchers will use Flask framework for developing the
system that is used for managing structured data. The web application will be
using this database management system as the user should be able to access their
Viewers and Streamers (New and Registered) and admin are the actors in
this diagram. YouTubers who either streams or uploads edited videos, or viewers
are the users of the website. New users can register an account to start using the
web application. Once a user has an existing account, they can log in and utilize
URL of the desired YouTube video, and view interpretations of past requests. The
interpretation includes the overall sentiment of the comment section, five (5) most
frequently used words, and a word cloud. The user can also view how many
comments are negative, positive, or neutral with the sentiment counter. The admin
1. Users: YouTube URL is provided by the user and results are received.
2. Admin: The system is managed by the admin and is also able to view the
interpretations.
Processes
category.
4. Most Frequently Used Words: Identifies the five (5) most frequently used
words, how many times the words were used, and provide their
sentiments.
interpretation.
Data Stores
are stored.
2. Labeled Comments: Where the comments that were labeled are stored.
stored.
The database has tables called: loginReq for login requirements that also
serves as an audit trail; admin and user that consists of basic user and admin
comments extracted from the YouTube video are stored; labeledComments where
stored; FrequentWords where words from the comments that frequently appeared
were counted and labeled are stored; Lastly, summarizedComments where the
summary of all the comments from the YouTube video are stored.
This section contains the system flowchart of the web system “Behind the
Screens." It is divided into 3 parts: login and registration, admin, and user.
analysis was used to assess the overall sentiment of the viewers towards the
streamer.
viewer comments and the content. The researchers watched ongoing live streams,
and live stream VODs on YouTube. The researchers observed the chat during live
streams, and analyzed the recorded chat during VODs to identify the behavior of
the viewers.
the stream?
streamed?
5. How does the streamer interact with the chat and how often do they do
it?
given to those who were not able to attend the interview due to conflicts of
allow the streamers and viewers more freedom to express their thoughts and
insights. The researchers designed different set of guide questions to ask the
streamers and viewers: one set for the streamers, and another for the viewers.
2. What type of content do you usually watch? Can you give an example of a
4. What do you think about the comments? Do they influence your opinion
on the video?
comment?
8. What types of comments do you find most helpful when deciding whether
to watch a video?
10. If you do leave a comment, what type/s of video content motivate you to
11. If the content creator saw a comment of yours, what would you feel if they
13. In what ways do you think viewer comments contribute to the success or
4. What do you usually feel when you read the chat? How does it affect you?
comment?
9. Have you ever timed out or banned a viewer based on their chat? If so, can
10. Do you think leaving a chat (on a livestream) or a comment (if it's a pre-
Project Manager
she manages, organizes, and plans for the researchers. He or she is also
the thesis adviser and other personnel involved with the study.
Frontend Developer
They are responsible for designing and developing the user interface of the
website.
Backend Developer
Manuscript Writer
Writers are assigned to write and document the processes that took place
Data Collector
They are responsible for collecting the data necessary for the study. In this
study, sentiment analysis usually involves text analysis. The data collector
Alrumaih, A., Al-Sabbagh, A., Alsabah, R., Kharrufa, H., & Baldwin, J. (2020).
and Computer Engineering, 10(6), 5917. Retrieved May 18, 2024, from
https://doi.org/10.11591/ijece.v10i6.pp5917-5922
Bird, Steven, Edward Loper and Ewan Klein (2009), Natural Language
https://doi.org/10.1016/j.knosys.2021.107134
https://doi.org/10.1080/1369118x.2013.777758
https://doi.org/10.1016/j.procs.2016.05.124
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019, May 24). BERT: Pre-
DOI. https://doi.org/10.48550/arXiv.1810.04805
HTTP-Live-Streaming-Fecheyr
Lippens/57d33cda30c2d497b694470aaa8b502613851fa5
7c90f945e738487209c1194f53e9260c60ec3e52
Hartmann, J., Heitmann, M., Siebert, C., & Schamp, C. (2023). More than a
from https://doi.org/10.1016/j.ijresmar.2022.05.005
Hu, M., Zhang, M., & Wang, Y. (2017). Why do audiences choose to keep
Hutto, C. J., & Gilbert, E. (2014, May 16). VADER: A Parsimonious Rule-Based
http://doi.org/10.1609/icwsm.v8i1.14550
Kobs, K., Zehe, A., Bernstetter, A., Chibane, J., Pfister, J., Tritscher, J., & Hotho,
Koh, E.T., & Owen, W.L. (2000, October 31). Descriptive Research and
5_12
Medhat, W., Hassan, A., & Korashy, H. (2014b). Sentiment analysis algorithms
https://doi.org/10.1016/j.asej.2014.04.011
Melville, P., Gryc, W., & Lawrence, R. D. (2009). Sentiment analysis of blogs by
7_7
https://ijarcce.com/papers/sentiment-analysis-on-youtube-using-lexicon-
based-approach/
Mulholland, E., Kevitt, P., Lunney, T., Farren, J., & Wilson, J. (2015). 360-
https://www.semanticscholar.org/paper/360-MAM-Affect%3A-Sentiment-
analysis-with-the-Google-Mulholland-Kevitt/
83680ab962dce27b0ba9e0be563f6834c29b1162
https://www.semanticscholar.org/paper/Retrieving-YouTube-video-by-
135966ba8a70573ce4d00f20286cda04c848d934
Naf’an, M. Z., Bimantara, A. A., Larasati, A., Risondang, E. M., & Setya
of-Cyberbullying-on-Instagram-Naf%E2%80%99an-Bimantara/
5ed9294f98c53e8c8b4d5b06ff5c54091f0e1054?p2df
Pires, K., & Simon, G. (2015). YouTube Live and Twitch: a tour of user-
https://www.semanticscholar.org/paper/YouTube-live-and-Twitch%3A-a-
tour-of-user-generated-Pires-Simon/
e4e3ebd24a35195d845d373ef41fceb49ded2da9
Poecze, F., Ebster, C., & Strauss, C. (2019). Let’s play on Facebook: using
019-01361-7
Qian, K., & Jain, S. (2024). Digital Content Creation: An analysis of the impact of
from https://doi.org/10.1287/mnsc.2022.03655
Sainath Pichad, Sunit Kamble, Rohan Kalamb, & Chavan, S. (2023, May 10).
comments
IANFIS. Journal of Big Data, 7(1). Retrieved May 19, 2024, from
https://doi.org/10.1186/s40537-020-00308-7
https://www.researchgate.net/publication/351351202_YOUTUBE_COM
MENTS_SENTIMENT_ANALYSIS
Strååt, B., & Verhagen, H. (2017). Using User Created Game Reviews for
ws.org/Vol-1956/GHItaly17_paper_01.pdf
040518
https://doi.org/10.1146/annurev-linguistics-011415-040518
Thompson, J. J., Leung, B. H. M., Blair, M., & Taboada, M. (2017). Sentiment
https://www.semanticscholar.org/paper/Sentiment-analysis-of-player-chat-
messaging-in-the-Thompson-Leung/
27fcbf779e8a6d54a6c8d33c63d679017059ffaf
Tibor, G., & Szűcs, J. (2023). Length and sentiment analysis of reviews about
top-level video game genres on the steam platform. Retrieved May 19,
https://doi.org/10.4018/978-1-5225-4999-4.ch002
https://doi.org/10.7748/ns.29.31.44.e8681
Wu, Y., Lin, M., & Yao, W. (2024, April 19). The Influence of Titles on YouTube
Yang, Z., MA. (2020, December 11). Text and sentiment analysis of YouTube
https://repositories.lib.utexas.edu/items/ebee793a-80ef-420d-bc19-
4e9a9841faae
https://arno.uvt.nl/show.cgi?fid=161872.