Professional Documents
Culture Documents
Group 5 Emcrypt Thesis Manuscript
Group 5 Emcrypt Thesis Manuscript
Group 5 Emcrypt Thesis Manuscript
A Thesis
Presented to the Faculty of the College of Computer and Information Sciences
Polytechnic University of the Philippines
Sta. Mesa, Manila
Casinsinan, Cj C.
January 2024
ABSTRACT
Page
Title Page i
Abstract ii
Table of Contents iii
List of Tables vi
List of Figures viii
1. The Problem and Its Setting
Introduction 1
Theoretical Framework 4
Conceptual Framework 7
Statement of the Problem 8
Hypothesis 8
Scope and Limitations of the Study 9
Significance of the Study 12
Definition of Terms 13
Related Literature 14
Cryptocurrency 14
Sentiment Analysis on Cryptocurrency 15
Related Studies 22
Cryptocurrency 22
X (formerly known as Twitter) 23
Emoji (Emoticons) 24
Emotag1200 25
Emotion Recognition 26
Intensity Level Recognition 29
Keyword Spotting Method 29
LSTM-SVM Approach for Sentiment Analysis 30
Neutrality in Sentiment Analysis 31
Recognition of Emotion from Microblog (REM) 32
Deep Learning 33
Sentiment Analysis 34
Sentiment Analysis using X (formerly known as Twitter) Data 34
Sentiment Analysis in Cryptocurrency 35
Sentiment Analysis with Emoticons 37
Sentiment Analysis with Emotion and Intensity level recognition 38
Sentiment Analysis using LSTM-GRU Ensembled 39
Sentiment analysis using Recognition of emotion from microblogs (REM) 39
Synthesis of the Study 41
3. Methodology
Research Design 42
Source of Data 43
Instruments 45
System Architecture 45
Development Details 54
Research Instrument 54
Data Generation/ Data Gathering Procedure 55
Ethical Considerations 56
Statistical Data Analysis 56
Confusion Matrix 56
Evaluation Metrics 57
Precision 57
Recall 58
F-measure 58
Hypothesis Testing 58
Paired T-test 58
Rating System 59
Summary of Findings 71
Conclusions 72
Recommendations 73
References 74
Appendices
3 LSTM-SVM architecture 5
13 Pre-processing Diagram 53
Introduction
In today's digital age, Cryptocurrencies have become extremely popular all around
the world. They have brought about big changes in the financial world and have
transformed the way people interact with digital assets. There are about 420 million global
crypto users as of 2023, with over 20,000 cryptocurrencies in circulation worldwide
(Ariella, 2023). Cryptocurrency has gained significant traction in the Philippines, becoming
the second most popular in the world (Navalan, 2022). This growth aligns with the
country's plan to modernize its financial industry, with the government planning to launch
its digital currency project later this year. The new President of the Philippines, Ferdinand
Marcos Jr., advocates for digital and technological advancements, emphasizing their
importance in the widespread adoption of Cryptocurrency in the future (Ghosh, 2022).
Additionally, the government and central bank are collaborating with experts to ensure a
secure environment for investors and stakeholders utilizing blockchain or crypto
technology. (Mason, 2022)
Sentiment analysis for cryptocurrency is an excellent way and become more highly
significant to understand how to make smart investment decisions. It provides broad
market insights that can be useful for forming trading strategies. (Dwivedi, D., &
Vemareddy, A. 2023). In recent years, users of various social media platforms have been
1
used to using a set of graphic symbols to describe their feelings in online interactions.
Emojis, which are these emotive icons that can be found on all platforms, have become a
universal language. In addition to text, emojis, and punctuation marks are increasingly
being used by people to express their feelings or sentiments that otherwise cannot be
adequately communicated in words. Most of the recent cryptocurrency sentiment analysis
systems do not consider keywords, emojis, and ending punctuation marks as a part of the
analysis. However, A combination of keywords, ending punctuation marks, and emojis
could lead to better performance in Sentiment Analysis. (Sagum, R., Navarro, M., &
Jasper, A., 2019).
This study aims to analyze sentiments, emotions, and intensity levels by utilizing
tweets associated with cryptocurrency. These tweets are commonly utilized for predicting
cryptocurrency market prices. It aims to provide insight into the sentiments surrounding
cryptocurrencies on social media platforms, particularly X (formerly known as Twitter), and
to comprehend how the combination of keywords, emojis, and punctuation marks has a
significant effect on the performance of the tool in analyzing sentiments. This study has a
2
specific focus on utilizing supervised machine learning models to predict people's
sentiments, emotions, and intensity levels regarding the cryptocurrency market. X
(formerly known as Twitter) is widely used as a platform for expressing opinions and
thoughts on specific topics, making it a valuable source of data for this analysis.
The goal of the proposed study is to create a tool that has improved performance
for sentiment analysis with an emotion and intensity level recognition for tweets about
cryptocurrencies by combining advanced machine learning techniques, particularly Long
Short-Term Memory (LSTM) with Support Vector Machine (SVM) as Classifier. To
accomplish this, the proponents must first compile a complete dataset of tweets containing
textual content, punctuation marks, and emojis. The dataset is then analyzed using
machine learning and natural language processing (NLP) techniques, with an emphasis
on sentiment and emotion extraction and classification. The proponents intend to identify
patterns and trends in the emotional expressions of X (formerly known as Twitter) users
who are interested in cryptocurrencies, thereby providing valuable insight into the potential
effects of these emotions and sentiments on the dynamics of the cryptocurrency market.
3
Theoretical Framework
This paper's theoretical framework begins with sentiment analysis considering the
combination of keywords, ending punctuation marks, and emojis, which is the focus of this
research study. It will all take place in sentiment analysis, followed by processes
conducted in algorithms: Long Short-Term Memory (LSTM) Algorithm with Support Vector
Machine (SVM) Algorithms.
4
Figure 2. Recognition of Emotion from Microblogs (REM)
The study by Islam et al. (2021) states that Recognition of Emotion from
Microblogs (REM) is an algorithm for a sample microblog containing an emoticon in the
text. REM is utilized as a method to identify and understand emotions conveyed through
emoticons in microblog posts. The researchers employ LSTM (Long Short-Term Memory)
as a deep learning model to capture the sequential nature of the emoticon-text
combination. REM aims to recognize emotions associated with emoticons, as emoticons
often serve to express emotions in text-based communication. By training the LSTM model
on a dataset of emoticons and their corresponding emotions, the researchers leverage the
power of recurrent neural networks to learn the contextual and emotional information
conveyed by emoticons in the given microblog texts. This approach allows for the
automatic Recognition of emotions in microblog posts, enhancing the understanding of
sentiment and emotional content in online communication.
5
Cimino & Dell'Orletta (2023) states that combining LSTM and SVM in sentiment
analysis offers the benefits of LSTM's ability to capture sequential information and
semantic meaning in text, along with SVM's robust classification framework. This
combination improves the model's accuracy in classifying sentiments, even for complex
expressions. During training, the model learns representations from labeled data, with the
LSTM layer capturing features and the SVM component classifying them into sentiment
classes. Through iterative parameter optimization, the model's performance is enhanced.
Once trained, the LSTM-SVM model can be used to analyze sentiment in new text inputs
by extracting features through the LSTM layer and assigning sentiment labels using the
SVM classifier.
According to Padme & Kulkarni (2018), as Sentiment Analysis develops in the field
of Natural Language Processing, advanced studies beyond polarity occur. Recently, the
aim of SA has developed from determining the polarity to knowing the attitude of the
speaker. The attitude can be the speaker's evaluation, affective state, or emotional
communication. One example is emotional state detection, such as "sad", "happy", and
"angry".
6
Conceptual Framework
The researchers utilize a conceptual model to depict the study's variables. The
feature selection phase includes keywords, ending punctuation marks, and emoji as
independent variables. Outputs were affected by statistical treatment relying on these
variables. Intervening variables located in the middle section, including a combination of
emoji and punctuation marks, repetitive emojis, and punctuation marks, also affect the
outputs as they undergo the process. The dependent variable, the system's output, which
is the polarity (positive, negative, or neutral), emotion (happy, sad, anger, surprise,
anticipation, fear), and intensity level (low, medium, high), is derived from the input section.
Modifying the independent variables will result in changes in the performance of the tool.
7
Statement of the Problem
The researchers aim to develop a tool for sentiment analysis with emotion and
intensity level recognition in cryptocurrency-related tweets considering the combination of
keywords, ending punctuation marks, and emoticons.
a. Precision
b. Recall
c. F-Measure
a. Precision
b. Recall
c. F-Measure
Hypothesis
8
Scope and Limitations of the Study
Table 1
9
🎵 musical note 💰 money bag
🎶 musical notes 📷 camera
👀 eyes 🔞 no one under eighteen
👅 tongue 🔥 fire
👇 backhand index pointing down 🔫 pistol
👈 backhand index pointing left 🔴 red circle
👉 backhand index pointing right 😀 grinning face
👊 oncoming fist 😁 beaming face with smiling eyes
👋 waving hand 😂 face with tears of joy
👌 OK hand 😃 grinning face with big eyes
👍 thumbs up 😄 grinning face with smiling eyes
👎 thumbs down 😅 grinning face with sweat
👏 clapping hands 😆 grinning squinting face
👑 crown 😇 smiling face with halo
👻 ghost 😈 smiling face with horns
💀 skull 😉 winking face
💁 person tipping hand 😊 smiling face with smiling eyes
💃 woman dancing 😋 face savoring food
💋 kiss mark 😌 relieved face
💎 gem stone 😍 smiling face with heart-eyes
💐 40bouquet 😎 smiling face with sunglasses
😐 neutral face 😏 smirking face
😑 expressionless face 🙅 person gesturing NO
😒 unamused face 🙆 person gesturing OK
😓 downcast face with sweat 🙈 see-no-evil monkey
😔 pensive face 🙊 speak-no-evil monkey
😕 confused face 🙋 person raising hand
😖 confounded face 🙌 raising hands
😘 face blowing a kiss 🙏 folded hands
😙 kissing face with smiling eyes ‼ double exclamation mark
😚 kissing face with closed eyes ↩ right arrow curving left
😛 face with tongue ↪ left arrow curving right
😜 winking face with tongue ▶ play button
😝 squinting face with tongue ◀ reverse button
😞 disappointed face ☀ sun
😟 worried face ☑ check box with check
😠 angry face ☝ index pointing up
😡 pouting face ☺ smiling face
😢 crying face ♥ heart suit
😣 persevering face ♻ recycling symbol
😤 face with steam from nose ⚡ high voltage
😥 sad but relieved face ⚽ soccer ball
😨 fearful face ✅ check mark button
10
😩 weary face ✈ airplane
😪 sleepy face ✊ raised fist
😫 tired face ✋ raised hand
😬 grimacing face ✌ victory hand
😭 loudly crying face ✔ check mark
😰 anxious face with sweat ✨ sparkles
😱 face screaming in fear ❄ snowflake
😳 flushed face ❌ cross mark
😴 sleeping face ❗ exclamation mark
😶 face without mouth ❤ red heart
😷 face with medical mask ➡ right arrow
😹 cat with tears of joy ⬅ left arrow
😻 smiling cat with heart-eyes ⭐ star
😲 Astonished Face 😮 Face with Open Mouth
😵 Dizzy Face 💭 Thought Balloon
❗ Exclamation Mark ⚡ High Voltage
🎊 Confetti Ball 🙁 Slightly Frowning Face
🔪 Hocho 🌕 Full Moon
🚀 Rocket 📉 Down Trend
🤣 Rolling on the Floor Laughing 💸 Money with Wings
This study has several limitations. Firstly, the sentiment analysis will only analyze
the main post with a total of 1500 tweets and will also analyze the replies in the comments
as a secondary source of data. The tool can also upload Excel and CSV files only.
Secondly, the study only considers posts using Cryptocurrency hashtags. Lastly, sarcasm
is not always assumed when reading writings that conclude with two or more punctuation
marks or are very emotional. Therefore, before the assessment procedure for emotion
recognition, this study will not address the detection of figurative languages, such as
sarcasm.
11
Significance of the Study
RESEARCHERS. Improved research will help with specific issues like emotion
recognition in the field of natural language processing. Recommendations and further
research could be monitored by the said beneficiaries.
SOCIAL MEDIA. The beneficiary mentioned above might also evaluate the study
based on their observations and evaluations. The outcome will be quite instructive.
People online are expected to look at the study's findings because the researchers
utilized social media as the study's domain.
12
Definition of Terms
The following is a list of the terms that are used in this research study:
Emotion Recognition - the emotion of the tweets that identified whether happy,
sad, fearful, surprised, angry, or anticipation.
Remove Stopwords - a process that collects emotional words only and avoids
unnecessary words.
Social Media - refers to online platforms and websites that allow users to create
or share information.
X (formerly known as Twitter) - is a social media platform for people who like to
share or post anything in their minds.
13
CHAPTER 2
REVIEW OF LITERATURE AND STUDIES
Related Literature
Cryptocurrency is a type of digital money that is created through
cryptographic tactics using binary data. It lets people buy, sell, or trade it securely
without needing a government or bank. While you can use cryptocurrencies to buy
things, many people also use them as a way to invest money for a short or long
time. There are lots of different cryptocurrencies available, but the most well-known
and expensive one is Bitcoin, which currently costs more than $19,000.
Cryptocurrency is a growing industry worldwide that started about 13 years ago. It
has become popular and important, with more than 20,000 digital currencies being
used today. The most well-known cryptocurrencies are Bitcoin, Ethereum, and
Tether. Around 200,000 Bitcoin transactions happen every day as of November
2022. In 2023, there are about 45 million people in the United States and 420
million people globally who use cryptocurrency. About 16% of Americans have
used, invested in, or traded cryptocurrencies. The total value of blockchain
technology worldwide is currently $10.02 billion as of 2022 and is expected to
reach $67.4 billion by 2026, with an annual growth rate of 68.4%. (Ariella, 2023)
The Philippines has put a lot of focus on blockchain technology and its
potential uses. The country's central bank has noticed a significant increase in the
adoption of cryptocurrencies, especially during the Covid-19 pandemic. The long
period of isolation introduced the concept of digital tokens to the country's growing
middle class and tech-savvy millennials through popular blockchain games like
Axie Infinity. At one point, 40% of the game's players were from the Philippines.
Bitcoin trading volumes also reached new highs on certain crypto exchanges in
July this year. The number of cryptocurrency transactions grew by 362% compared
to the previous year, with a total value of around $1.82 billion. As a result, the
Philippines now ranks second in the Global Crypto Adoption Index, indicating high
individual interest in digital assets. In response, the country's central bank plans to
launch a digital currency project in late 2022 to improve payments and aid in
economic recovery. The central bank is also collaborating with solution providers
to monitor financial institutions using blockchain and cryptocurrency, showing
14
increasing acceptance of the technology in finance. However, there is still
uncertainty and differing opinions among regulators, legislators, and market
participants regarding the future of blockchain and crypto in the Philippines. Clear
consensus and understanding of the technology's impact are still needed.
(Navalan, 2022)
15
According to Yilmaz here are the steps on how to conduct cryptocurrency
sentiment analysis:
In today's digital age, social media plays a significant role in people's lives,
and the content they share online holds valuable insights. Natural language
processing (NLP) techniques have been widely used to understand public
sentiment expressed in social media posts. Sentiment Analysis, a crucial aspect
16
of NLP, focuses on computationally analyzing opinions, emotions, attitudes, or
sentiments in written texts. Within this field, social media sentiment analysis
(SMSA) specifically aims to understand and represent sentiments expressed in
short social media posts. (Chen, 2023)
Emojis, those cute little in-text graphics, have become increasingly popular in
social media communication. They serve as graphical symbols that allow users to
express emotions and convey meanings in a concise and convenient way.
Statistics from Emojipedia, a well-known emoji reference site in 2021, show that
over one-fifth of tweets (21.54%) and more than half of Instagram comments
contain emojis. Despite their widespread use in online communication, emojis are
not widely embraced in the field of NLP and SMSA. During the data preprocessing
stage, emojis are often removed along with other unstructured elements like URLs,
stop words, unique characters, and images. While some researchers have recently
started exploring the potential of including emojis in SMSA, it is still a niche
approach that requires further investigation. This project aims to evaluate the
compatibility of popular BERT encoders with emojis and explore different methods
of incorporating emojis in SMSA to enhance accuracy. (Chen, 2023)
Table 2
17
🆖🆙 Bitcoin NGU, "Number Go Up technology"
18
⚛ Cosmos Cosmos ATOM Supporter
💩 Misc Shitcoin
🐂 Misc Permabull
🐻 Misc Permabear
19
Table 3
These are the emojis depicting common day-to-day expressions (Singh, 2022)
20
At the end of a sentence, there are three punctuation marks: the period (.), the
question mark (?), and the exclamation point (!). After a sentence, there is always a single
space, regardless of the punctuation used. Periods are used to show a neutral statement
and are the most common punctuation mark. They are used at the end of statements.
Question marks are used at the end of questions, whether they expect an answer or are
rhetorical. Some questions are polite requests. Exclamation points indicate strong
21
emotions or high volume and often mark the end of a sentence. They are sometimes
overused on the internet. (Quillbot, English Composition)
Out of the three punctuation marks commonly used to end sentences, only the
exclamation point is used to show strong emotions. When we write, we don't have the
advantage of using tone and other verbal cues to convey our emotions. Instead, we rely
on grammar, including punctuation, to establish the tone of our writing. Using an
exclamation point is a quick way to indicate that we are expressing strong emotion.
Exclamatory sentences always end with an exclamation point and are used to express
intense emotions, regardless of the specific emotion being conveyed. It's important to
consider the intensity of the emotion you want to express when deciding whether to use
an exclamation point. In formal writing, it's appropriate to use only one exclamation point,
while in informal writing, you can use a few more, but it's best not to overdo it. (Craiker,
2022).
Related Studies
Cryptocurrency
The evolution of the cryptocurrency market in the past decade has been
nothing short of meteoric, with its user base exploding from a modest 5 million in
2016 to a staggering 300+ million by the close of 2021. This Trend of rampant
growth has not been exclusive to the cryptocurrency market, with the NFT
marketplace experiencing a similar expansion, from 670,000 users in 2020 to over
44 million in 2022. However, with such growth also comes an increase in market
volatility and risk. This volatility stems in part from the inherent nature of
cryptocurrency markets, which lack a central governing authority. Instead, prices
are highly susceptible to various external factors, including public sentiment,
natural disasters, global news, and international crises. (Begüm Yılmaz, 2023)
In the study of Naila Aslam, Furqan Rustam, Ernesto Lee, Patrick Bernard,
and Washington Imran Ashraf (2022) states that Cryptocurrency is an alternative
medium of exchange consisting of numerous decentralized crypto coin types. The
essence of each crypto coin is in its cryptographic foundation. Secure peer-to-peer
22
transactions are enabled through cryptography in this secure and decentralized
exchange network. Since its inception in 2009, Bitcoin has become a digital
commodity of interest as some believe the crypto coins' worth is comparable to
that of traditional fiat currency.
23
Furthermore, the sentiment reflected in the X (formerly known as Twitter)
community about altcoins and their prices has been established to have a certain
correlation. To measure this daily X (formerly known as Twitter) sentiment, an
aggregation of sentiment across numerous tweets for a specific altcoin is
necessary. Sentiment Analysis, a sub-field of computational Natural Language
Processing, aims to discern positive, negative, and neutral opinions in a text.
However, analyzing tweets presents unique challenges due to their irregular
grammar, high emoticon usage, and frequent sarcasm (Emre Sasmaz and F.
Boray Tek, 2021). The rapid advancement in the field of sentiment analysis on
cryptocurrency-related tweets, with a particular focus on the role of emoticons,
underlines the need for more nuanced and advanced models. This literature review
aims to provide a foundation for such advancements.
Emoji (Emoticons)
In the study of P. S. Dandannavar, S. R. Mangalwede, and S. B.
Deshpande (2019). The rapid expansion of the World Wide Web over the past
years has led to a corresponding surge in user-generated content across various
social media platforms, web forums, and blogs. Sites like X (formerly known as
Twitter) and Facebook have emerged as significant hubs of online communication,
with millions of users sharing their sentiments, opinions, and experiences daily.
This treasure trove of data offers rich insights into public sentiment on a myriad of
topics, from products and services to socio-political issues.
24
aiming to glean valuable sentiment-based insights from these unstructured data
sources. A key feature of this user-generated content is the pervasive use of
emoticons, particularly among younger users, to express sentiments that might be
challenging to convey through text alone (P. S. Dandannavar, S. R. Mangalwede
& S. B. Deshpande, 2019).
EmoTag1200
EmoTag 1200 is a natural language processing (NLP) tool specifically
designed for emotion analysis. It is built upon a comprehensive dataset of 1,200
emotion tags, which cover a wide range of emotional states and expressions. This
powerful tool employs advanced machine learning algorithms and deep neural
networks to accurately identify and classify emotions in text-based content.
25
gain a deeper understanding of how people feel and react to specific topics,
products, or experiences.
Emotion Recognition
The study by Nourah & Mohamed (2020) gives an in-depth review of the
most recent state-of-the-art approaches and strategies for emotion recognition in
textual data. Emotion recognition is important in many applications, including
sentiment analysis, customer feedback analysis, mental health monitoring, and
human-computer interaction. The survey further delves into the emergence of
deep learning techniques, such as Convolutional Neural Networks (CNNs),
Recurrent Neural Networks (RNNs), and Transformers, for emotion recognition in
text. These approaches leverage the ability of deep neural networks to
automatically learn and extract relevant features from raw text data. The authors
highlight the advantages and limitations of each deep learning method and present
various architectures proposed in the literature. It shows an emotion-detecting
system by keyword-spotting technique. The challenge of locating occurrences of
keywords from a given set as substrings in each string are known as the keyword
pattern matching problem. This topic has already been investigated, and strategies
for tackling it have been proposed. This approach is based on specified keywords
in the context of emotion detection. These words are classed as disgusted, sad,
glad, furious, afraid, startled, and so on.
26
Figure 7 Main steps of a keyword-spotting technique
27
model to identify emotions in text. Convolutional neural networks (CNN) and Bi-
GRU were exploited as deep learning techniques.
In this study, all three data sets, text sentences, dialogs, and tweets, are
integrated. Over 14500 text sentences are included in the merged dataset. Every
text phrase is identified with six types of emotions (according to its syntactic and
semantic polarities): pleasure, disgust, fear, surprise, rage, and sorrow (Chen et
al., 2019). The content is in English and includes some punctuation and emojis.
The collection solely comprises text phrases and their associated emotions. Each
dataset is split into two categories of data: training and testing, with an 80:20 split.
The researchers performed many experiments using various methods to
get the best accuracy for their proposed model. Emotion classification with a
machine learning approach, a deep learning approach, and our hybrid model
approach on the multi text dataset consisting of sentences, tweets, and dialogs.
Three datasets are used for performing these experiments. According to the ML
classifier, SVM gives the highest accuracy of 78.97%. In the DL method, the Bi-
GRU model achieves the highest accuracy of 79.46%, and the CNN model
achieves the highest F1-score of 80.76. The hybrid model has achieved a precision
of 82.39, a recall of 80.40, an F1 score of 81.27, and an accuracy of 80.11%.
28
Intensity Level Recognition
Das, & Bandyopadhyay. (2010) In this study, we explore three levels of
intensity: low, medium, and high, in the context of emotional expression in
sentences. We focus on two categories of intensifiers - positive and negative - to
assign these intensity levels. These intensifiers can be part of the emotional
expression itself. We analyze the parts of speech (POS) surrounding the emotion-
laden word in a sentence, specifically looking at adjectives (JJ) and adverbs (RB),
as they are likely candidates for intensifiers. To determine an intensifier's polarity,
we consult the SentiWordNet database. Here, each potential intensifier is checked
for its presence in SentiWordNet, and if found, its positive and negative scores are
retrieved. The intensifier is then categorized as positive or negative based on
whichever score is higher on average. Additionally, we have compiled a list of
commonly used negative words and consider words involved in negative [negation
modifier] dependency relations as negative words too. The methodology involves
applying specific rules (outlined in Table 4) to understand how various intensifiers
and negations contribute to the assignment of post-emotion tags and intensity
levels in sentences. These rules help in systematically determining the role and
impact of these linguistic elements in conveying emotional intensity.
Table 4
Rules for tagging an emotional anchoring vector with intensity
29
about the emotions associated with different aspects of the reviewed products or
services.
First, the researchers compiled a list of relevant keywords related to the
specific domain or industry they were studying. These keywords represented
different aspects or features of the products or services being reviewed. For
example, in a study analyzing restaurant reviews, keywords might include "food
quality," "service," "ambience," and "price." Once the keyword list was established,
the researchers applied the keyword spotting technique to the online user-
generated reviews. They scanned each review and looked for the presence of
these predefined keywords. Whenever a keyword was identified within a review, it
signaled the presence of a particular aspect being discussed.
After identifying the aspects, the researchers then focused on the
surrounding context of the keywords to extract the emotional sentiments
associated with each aspect. This involved analyzing the text surrounding the
keywords, including adjectives, adverbs, and other sentiment-bearing words, to
determine the emotional tone expressed by the reviewer. Positive sentiments
might indicate satisfaction or enjoyment, while negative sentiments could imply
dissatisfaction or disappointment.
By employing the keyword spotting method, the study aimed to provide a
fine-grained analysis of emotions expressed towards different aspects of the
reviewed products or services. This approach allowed the researchers to gain
insights into which aspects were most positively or negatively perceived by users,
helping businesses and decision-makers understand the strengths and
weaknesses of their offerings from a customer's emotional perspective.
30
text and generates a fixed-length representation, often referred to as an
embedding.
The generated embeddings from the LSTM model are then used as input
to an SVM classifier. The SVM classifier is trained on these embeddings to predict
the sentiment of new, unseen text instances. The SVM takes advantage of its
ability to handle high-dimensional feature spaces and find an optimal hyperplane
that separates different sentiment classes.
By combining the strengths of LSTM in capturing contextual information
and SVM in classification, the Tandem LSTM-SVM approach aims to improve the
accuracy of sentiment analysis. This approach takes advantage of LSTM's ability
to model complex dependencies in the text data while leveraging SVM's robust
classification capabilities. The study likely evaluates the performance of the
Tandem LSTM-SVM approach on benchmark sentiment analysis datasets,
comparing it against other existing approaches. It would measure accuracy,
precision, recall, and F1-score to assess the effectiveness of the proposed
approach.
Overall, the Tandem LSTM-SVM approach for sentiment analysis attempts
to enhance the accuracy of sentiment classification tasks by combining the
strengths of LSTM and SVM algorithms, resulting in a more robust and accurate
sentiment analysis model.
Neutrality in Sentiment Analysis
Valdivia, A., Luzón, M. V., Wang, Z., & Herrera, F. (2018, November 1). In
recent times, there has been a surge in interest in sentiment analysis, leading to
the development of numerous algorithms designed to categorize text based on the
expressed sentiment, typically categorized as positive, neutral, or negative. Often,
neutral sentiments are overlooked in many sentiments analysis approaches due
to their vague nature and minimal informational content. This paper introduces a
strategy to enhance the significance of neutral sentiments by defining the
distinction between positive and negative opinions, aiming to boost the efficiency
of the model. We implement various sentiment analysis techniques on diverse
datasets to extract sentiment values and identify neutral sentiments through a
consensus approach, essentially filtering them out using a weighted aggregation
of different models. We then assess the efficacy of both individual and combined
models in classification tasks. The findings clearly indicate that combined methods
31
generally surpass individual models in effectiveness, leading to the conclusion that
recognizing neutrality is crucial in differentiating between positive and negative
sentiments and consequently in enhancing the accuracy of sentiment
classification.
32
probabilities or scores to each emotion category (e.g., happy, sad, angry,
etc.).
Deep Learning
Boquiren, Garcia, Hungria, and De Goma (2022) applied Deep Learning to
classify the effects of backward slang on two deep learning models, namely Long-
Short Term Memory (LSTM ) and Bidirectional LSTM (Bi-LSTM), in the context of
Tagalog Sentiment Analysis. The study was motivated by the increasing popularity
of backward slang among Tagalog tweets and the need for effective methods to
determine general sentiments correlated to a topic in the context of the growing
number of internet users in the Philippines. The study emphasizes the importance
33
of deep learning techniques in sentiment analysis, especially when dealing with
complex languages such as Tagalog.
Deep learning is a type of machine learning approach that uses a multilayer
neural network to automatically learn and extract features from data rather than
relying on manual feature extraction. Deep learning models also measure
hyperparameters automatically, which can result in better accuracy and
performance. Deep learning techniques are currently the best solutions for
problems in image and speech recognition, as well as natural language
processing. (Dang, Garcia, De la Prieta, 2020)
Sentiment Analysis
Sentiment analysis appears to be a promising tool for predicting market
behaviors and guiding investment decisions. Specifically, analysis of tweets from
customers or thought leaders can illuminate the relationship between public
sentiment and cryptocurrency prices. This sentiment analysis, particularly when
coupled with Recognition of emotion intensity levels and consideration of
emoticons, can provide a deeper understanding, and potentially offer a predictive
edge in this highly volatile market. However, as this field is still in its infancy, it also
faces numerous challenges that warrant further investigation. (Begüm Yılmaz,
2023) In the study by Abdullah A. et al. (2019), Sentiment analysis is a method for
tool to evacuate people's opinions or group assessments, such as those
expressed by customers in communication with customer support or followers of a
brand. A lot of existing sentiment analysis methods in the market can completely
handle large numbers of data with greater accuracy. The goal of sentiment
analysis is to categorize whether the expressed sentiment is positive, negative, or
neutral. Their sentiment analysis, also known as opinion mining, involves using
natural language processing, text mining, computational linguistics, and biometrics
to identify, extract, evaluate, and analyze emotional states and subjective
information. The sentiment analysis aims to detect the polarity of text documents
or short sentences and classify them as positive, negative, or neutral.
34
media analysis gains more attention, there is a growing interest in Natural
Language Processing (NLP) and Artificial Intelligence (AI) technologies related to
text analysis. The study used X (formerly known as Twitter) data to perform
sentiment analysis on the topic of Covid-19 in England. According to Qi & Shabrina
(2022), X (formerly known as Twitter) is a social media platform where users share
their thoughts and opinions using short posts called tweets, which can include text,
pictures, and videos. Users can interact with tweets using likes, comments, and
reposts buttons. X (formerly known as Twitter) has more than 206 million daily
active users, and analyzing information available on the platform can provide
insights into changes in people's perceptions, actions, and behavior. The
researchers collected tweets from three major cities in England and divided them
into three stages: the early stage, the middle stage, and the late stage. They used
two different approaches to analyze the sentiment of the tweets: lexicon-based
approaches and supervised machine-learning approaches. The results showed
that the public sentiment towards COVID-19 changed over time. In the early stage,
the public sentiment was mostly positive. In the middle stage, the public sentiment
became more negative. In the late stage, the public sentiment became more
positive again. According to Qi & Shabrina (2022), the increase in confirmed cases
and the decrease in vaccination volume might be the reason for the increase in
negative sentiments. The supervised machine learning approaches performed
better than the lexicon-based approaches.
35
keywords within sentiment data that capture the recurring themes or topics in the
text. Such approaches are often utilized to quickly and efficiently understand broad
themes within public sentiment.
A recent study applied the principle of Latent Semantic Analysis and
Singular Value Decomposition to X (formerly known as Twitter) data pre and post-
COVID, grouping key themes related to Bitcoin and Cryptocurrency sentiment
Dwivedi, D. and Vemareddy, A. (2023). This analysis yielded valuable insights into
how public sentiment towards Cryptocurrency evolved in response to the
pandemic, highlighting key themes in negative sentiments related to crypto trading.
This study contributes to the literature on text mining by providing a contextual
framework for analyzing the public's sentiment toward Bitcoin and other
cryptocurrencies before and after COVID-19. Such understanding can illuminate
key public concerns, which can then be shared with a broader community for
further exploration and action. Through the lens of sentiment analysis, we can gain
a deeper understanding of the complex dynamics that drive the cryptocurrency
market, informing smarter investment decisions and fostering a more
comprehensive understanding of this burgeoning financial landscape.
The study of Yilmaz, B. (2023) focuses on sentiment analysis of
Cryptocurrency in 2023, its status, and challenges. The cryptocurrency market has
grown exponentially recently, from 5 million owners in 2016 to 300+ million in 2021.
The researcher sees a similar trend in the NFT marketplace as the number of users
was 670,000 in 2020 and increased to 44+ million in 2022. However, investing in
Cryptocurrency can be risky as there can be extreme fluctuations in the market.
For instance, in 2022, while Bitcoin lost more than 60% of its value, Dogecoin lost
55% of it, putting investors in a difficult position.
The challenges of cryptocurrency sentiment analysis are that there are
times when models are not trained in the terminology of the crypto market, may
yield misleading results, identifying bot accounts can be challenging, especially if
the dataset is not labeled manually, and the number of tweets sent regarding
Cryptocurrency by bot accounts is estimated to be almost 15%, distorting the
sentiment analysis results. But to overcome these challenges, you must generate
a holistic approach by combining polarity, emotion, and aspect-based analysis,
train a domain-specific model that includes the terminology of the crypto market,
36
and detect bot accounts using neural networks and contextualized representations
of each text.
A recent study shows that neural networks achieve 82% accuracy in
identifying bots. In the study of Tudor-Mirce and DULĂU Mircea (2019), the
frequency of cryptocurrency-related news and social media posts is increasing
rapidly, and there is a link between media attention and cryptocurrency prices.
Sentiment analysis of publicly accessible web media may help forecast
cryptocurrency prices. Bitcoin is a virtual currency created for payments where the
sender and recipient cannot be identified and has a high volatility rate. The study
used tweets' sentiment and crypto's daily price data to predict the movement of
Bitcoin's price. FinBERT was used for sentiment analysis, leading to higher
accuracy. The mean absolute percentage error (MAPE) was 9.45% for sentiment
prediction using FinBERT and 3.6% for price prediction using GRU. The future
work will involve using sentiments from multiple media to predict Bitcoin's price.
37
Sentiment Analysis with Emotion and Intensity level recognition
The study of Navarro & Victore (2019) focuses on Sentiment Analysis with
emotion and intensity level recognition that considered ending punctuation marks.
Although this topic is not new, most computer scientists have solved this issue by
using natural language processing techniques. According to Burton (2016), there
are several ways to sentiment analysis that employ various characteristics. The
usage of sentiment analysis is used to identify the polarity of text, evaluate
sarcasm, irony, and figurative language, and detect emotions. The most frequent
data sets for sentiment analysis include reviews, comments, ratings, and feedback.
It determines whether a phrase or language is good, negative, or neutral. Contests
on the topic focus on judging complicated statements with uncertain contexts, such
as sarcasm, irony, and figurative language.
The researchers were able to show the difference between the approach
of considering and disregarding punctuation marks in Sentiment Analysis. Based
on the results of the evaluation, their system was consistent when the classification
considered the ending punctuation marks. The F-Measures of 80.27% and 69.82%
for considering and disregarding the symbols, respectively, show that there's a
difference between the two classifications. Also, after obtaining a p-value that is
less than the alpha level used, it is confirmed that EMOSIS has a significant
difference in performance compared with existing systems.
38
Sentiment Analysis using LSTM-GRU Ensembled
In the recent study of Naila Aslam et al. (2022), they state that
understanding public sentiment towards cryptocurrencies is crucial, given the
impact of public opinion on the market dynamics of these digital assets. This
perspective is supported by a study that conducted sentiment analysis and
emotion detection on tweets related to Cryptocurrency, a common method used to
predict cryptocurrency market prices. A key development in this field is the use of
advanced machine learning and deep learning approaches, including the LSTM-
GRU ensemble model. This model integrates the capabilities of two recurrent
neural networks, Long Short-Term Memory (LSTM) and Gated Recurrent Unit
(GRU). The GRU is trained on features extracted by the LSTM, thereby enhancing
the accuracy of the analysis. Various feature extraction methods, such as term
frequency-inverse document frequency, word2vec, and Bag of Words (BoW), have
been explored to improve the performance of these models. The study found that
machine learning models performed better when using BoW features.
39
translated into relevant emotional words, and a Long-Short Term Memory (LSTM)
model is employed for emotion classification. LSTM is a type of recurrent neural
network that can effectively capture sequential or time-series information.
The study verifies the proposed REM method using X (formerly known as
Twitter) data and compares its recognition performances with existing methods
that only consider text expressions without emoticons. The results show that the
emoticon-based REM method achieves higher recognition accuracy, highlighting
its potential for applications in microblogs. By incorporating emoticons into the
analysis, the proposed REM method improves the understanding of emotions
expressed in microblog posts and enhances the accuracy of emotion classification.
40
Synthesis of the Study
Social media has a big impact on different parts of everyone lives. It provides a lot
of information but understanding it can be difficult. One way to understand it better is by
using special tools. One tool is called sentiment analysis, which helps figure out the
emotions expressed in social media posts. The study "Sentiment Analysis on
Cryptocurrency-Related Tweets with Emotion and Intensity Level Recognition" looks at
something that previous sentiment analyses have often ignored. It focuses on the
importance of keywords, emoticons, and punctuation marks. Emoticons are like facial
expressions in text form, and they are important for showing emotions on social media.
On the other hand, the Punctuation marks help to analyze the intensity level of the
emotion.
41
Chapter 3
METHODOLOGY
The objective of the study is to create a tool for sentiment analysis with emotion
and intensity level recognition in Cryptocurrency-related tweets considering the
combination of keywords, ending punctuation marks, and emojis. This Chapter presents
the Research Design, Sources of Data, Development Process, Research Instrument, Data
Gathering Procedure, and Statistical Data Analysis.
Research Design
The researchers utilized two designs, which involved comparing the outcomes of
the pretest and posttest designs. The pretest design acts as an initial evaluation to
establish a common level of proficiency in identifying polarity, emotion, and intensity. Its
purpose is to ensure that all data sets begin with a similar baseline. Following the pretest,
the data undergo training sessions where they are exposed to text-based stimuli that
include emojis, keywords, and ending punctuation marks. Following the training, a posttest
assessment is conducted to measure the extent to which the data's ability to recognize
polarity, emotion, and intensity has changed. By comparing the results of the pretest and
posttest, researchers can determine whether the inclusion of emojis and ending
punctuation marks has had a significant impact on the data's recognition capabilities. This
process allows for the evaluation of the effectiveness of incorporating emojis, keywords,
42
and ending punctuation marks in enhancing the data's Recognition of polarity, emotion,
and intensity.
The researchers intend to merge a deep learning method, particularly the Long
Short-Term Memory (LSTM), with the Support Vector Machine (SVM) Classifier. The goal
is to develop a tool that could potentially provide higher performance in sentiment analysis
with emotion and intensity level recognition for cryptocurrency-related tweets considering
the combination of keywords, emojis, and punctuation marks. The performance of the
proposed tool in terms of Precision, Recall, and F measure was tested and compared
without considering the combination of keywords, ending punctuation marks, and emojis.
Source of Data
The data that is used in this study is exclusively sourced from the English tweet
stream on X (formerly known as Twitter). The primary source of data is X (formerly known
as Twitter) posts containing information related to cryptocurrencies and featuring the
hashtags "#crypto" and "#cryptocurrency". These tweets formed the target population for
the research, and they were selected as they met the requirements for the sample needed
for recognizing emotion and intensity levels. The proposed tool for emotion and intensity
level Recognition focused on analyzing texts, phrases, and sentences, with particular
attention given to emojis that express emotional content.
In addition to the primary data from X (formerly known as Twitter) posts, the
secondary source of data involved analyzing the replies to these main posts, which
provided further insights into the sentiment and opinions expressed by users within the
cryptocurrency community. There were also 3 respondents in this experiment. The 1st
respondent is Mr. Soren Louis Anore, expert in cryptocurrency trading, with knowledge of
digital currencies, market trends, and investment strategies. The 2nd respondent is Mr.
43
Kirck Michael Britos De Leon, a language practitioner, skilled in areas like translation,
interpretation, and linguistics. The 3rd respondent is Dr. Rodrigo V. Lopiga, a faculty
member at the Polytechnic University of the Philippines, Department of Psychology,
College of Social Sciences and Development, specializing in understanding human
behavior and emotions. The data annotated by the three experts undergoes a majority
voting process. If there is a discrepancy among the three experts, the Language
Practitioner Expert's judgment is selected for determining Polarity, and the Psychologist's
decision is used for assessing Emotion and Intensity Level. This approach is founded on
the research conducted by Nandwani, P., and Verma, R. in 2021, which focuses on the
analysis of sentiment and detection of emotions from text. This data is intended for
training, testing, and evaluation.
44
Instruments
System Architecture
45
Figure 9 shows two parts. The initial component involves feeding the cryptocurrency-
related tweets into the pre-processing stage. The researchers utilized a Python library
called Natural Language Toolkit (NLTK) to assist the proponents in constructing various
aspects of the pre-processing modules. These aspects include importing essential
libraries such as ‘re’ for regular expressions, ‘pandas’ for data manipulation, and ‘numpy’
for numerical operations. These libraries collectively form the foundation of the pre-
processing phase, enabling efficient and effective text analysis and manipulation.
Tweets
During the initial phase, proponents gather X (formerly known as Twitter)
cryptocurrency-related tweets as part of the data collection process.
Tweets Filtering
Tweets are being sorted in tweet filtering, particularly those tweets associated with
Cryptocurrency. Hashtag matching is performed in a case-insensitive manner. This
means that capitalization variations in hashtags, such as #Cryptocurrency, are not
considered during the filtering process.
Pre-processing Phase
• Remove numbers.
The first process involves removing any numerical digits from the text data.
Numerical digits don't contribute to the sentiment of the text.
• Remove links, usernames, mention, and hashtags.
After removing the numbers, remove the links, username, mention, and
hashtags as they do not contribute much to the sentiment expressed in the
text. This may include additional information, but it does not reflect the
sentiment of the text.
• Spell corrections
After removing the links and hashtag, correction of spelling is performed to
fix any misspelled words in the text data, and incorrectly spelled words can
affect the accuracy of the sentiment analysis.
• Emoticon converter
If the user chooses the “Combination of Keywords, Ending Punctuation
Marks, and Emoticon Features” it will convert the emojis/emoticons to their
46
textual representation allowing the model used in sentiment analysis to
understand the sentiment expressed more accurately.
• Remove Emoticons and Punctuation Marks
If the user chooses the “Plain Text Only Features” it will clean the text by
taking out emoticons and punctuation marks that’s not needed for analysis.
• Removing Stop Words
After converting the emojis to their textual representation, the tool will
remove the stop words to give more focus to the important information.
These are the words that are not significant in sentiment analysis in a
specific context, such as "in, at, on, a, an, the, etc.".
• Convert to lowercase.
Converting the text to lowercase ensures that the algorithm used in
sentiment analysis treats words with different cases. This helps to avoid
the duplication of words and capture the exact sentiment used.
• Tokenization
After this process splits a text document into tokens, this involves dividing
the text component into units such as words, phrases, or characters to a
given text. This process is important because there are specific implications
in Cryptocurrency, such as "pump”, "dump", "bullish", "moon" etc.
• Lemmatization
After splitting every phrase, paragraph, and sentence into smaller units,
reducing the words to their base form considers the dictionary meaning and
grammatical context of words. For example, "buying", "bought", and "buys"
would be reduced to "buy".
47
the text, such as happy, sad, anticipation, fear, angry, and surprise. The
proponents will use Keras and Scikit-Learn Libraries for the Classifier.
• Polarity recognition
Polarity recognition determines the polarity of the text, such as positive,
negative, or neutral. This helps to understand the overall sentiment
conveyed by a piece of text. The Classifier that we will be using is the
LSTM-SVM Classifier.
• Emotion recognition
After the polarity recognition process, emotion recognition aims to identify
the specific emotion expressed in the processed text, such as happiness,
sadness, anger, surprise, etc. Just like in polarity recognition, we will be
using the LSTM-SVM Classifier in emotion recognition.
• Intensity Recognition
The intensity recognition process determines the intensity of the sentiment
expressed in the text. This can be performed using techniques such as
sentiment score aggregation, where the sentiment scores are assigned
differently to individual words or phrases in a text.
Sentiment Analysis
Lastly, the whole process includes the filtering of tweets, pre-
processing stage, and the Sentiment analysis stage results in the final sentiment
analysis of the text. The result of these processes can provide insights that allow
individuals to make decisions based on public opinion. The result can also see
using chart for Polarity, Emotion, and Intensity level. The Intensity Level
Recognition used the rules presented in Table 4 for considering words based on
the study of Das, & Bandyopadhyay. (2010), and Table 5 for considering ending
punctuation marks, and emoticons. The output was the emotion and intensity level
detected based on the process performed.
48
Table 5
High
Exclamation Mark (>= 1) AND Question Mark (> 1)
AND Emotion Weight (> 1.0)
Medium
Period (== 1) AND Question Mark (== 0) AND
Emoticon Weight (> 0.5)
High
Anticipation Low
Question Mark (== 0) AND Period (==0 ) AND
Exclamation Mark (= 1) AND Emotion Weight (<0.5)
Medium
Period (== 1) AND Question Mark (== 1) AND
Emotion Weight (> 0.5)
49
High
Exclamation Mark (>= 1) AND Question Mark (> 1)
AND Emotion Weight (> 1.0)
Fear Low
Question Mark (== 0) AND Period (==0 ) AND
Exclamation Mark (= 1) AND Emotion Weight (<0.5)
Medium
Period (== 1) AND Question Mark (== 1) AND
Emotion Weight (> 0.5)
High
Exclamation Mark (>= 1) AND Question Mark (> 1)
AND Emotion Weight (> 1.0)
In sentiment analysis, particularly with the use of Emotag120 annotated data for
emotion recognition, the Intensity level involves a combination of emotion weights and the
presence of ending punctuation marks. The researchers and language practitioner experts
have developed specific rules based on the study of Sagum, R., Navarro, M., & Victore,
A. (2019). They consider the emotional weights assigned to different emoticons in the
Emotag120 dataset. The intensity level of a sentence is determined not only by these
emotional weights but also by the type and number of ending punctuation marks. For
instance, a sentence might be classified as having medium or high emotional intensity
based on a combination of high emotional weight words or emoticons and the use of single
or multiple punctuation marks. This approach allows for a nuanced analysis of emotional
intensity, providing a more accurate reflection of the sentiment expressed in the tweets.
50
Figure 10. EmCrypt Training (Combination of Keywords, Ending Punctuation Marks, and
Emoticons)
The Classifier for Polarity and Emotion Recognition utilized by the researchers
combines LSTM with an SVM Classifier, necessitating training data. The training involves
two distinct processes: pre-processing and the training phase itself. Additionally, tweets
51
with the hashtags "#crypto" and "#cryptocurrency" were specifically selected as training
data to align the Classifier's focus with the study's subject matter. During training, data
underwent pre-processing and tweet-filtering phases. The tokens, once simplified, were
entered into the database, which then categorized them as positive or negative based on
expert assessments. The training method for the emotion classifier is similar, but it targets
emotions such as happiness, sadness, surprise, anger, anticipation, and fear.
Two approaches were used for training: plain text analysis and a combination of
keywords, punctuation marks, and emojis. This dual-method training is essential to
prepare the Classifier's knowledge base before classification begins. The purpose of using
two approaches for training - plain text analysis and a combination of keywords,
punctuation marks, and emojis - is to enhance the Classifier's ability to accurately
recognize and categorize emotions and polarity (positive or negative sentiments) in text
data, particularly tweets in this context.
52
Figure 13. Pre-processing Diagram
The figure presents the development of the system of EmCrypt. A user searched
for tweets related to cryptocurrency on X (formerly called Twitter) The user used advanced
search feature in X using hashtags, #crypto and #cryptocurrency. The collected tweets
then undergo preprocessing, a series of text preparation steps. This includes cleaning the
tweets by removing unwanted characters, converting emoticons to text, eliminating
emoticons and associated punctuation, getting rid of common stopwords, and breaking
the text into individual tokens. Furthermore, a lemmatization process is applied to
standardize words.
The preprocessed tweet data is stored in a database for future reference.
Subsequently, a sentiment analysis stage is executed, involving feature extraction and
classification using a combination of LSTM and SVM algorithms. The tool then saves the
results of polarity recognition, identifying whether tweets are positive, negative, or neutral,
as well as emotion recognition and intensity level recognition. Finally, these processed
and analyzed results are presented to the user through a user interface for their review
and interaction. This entire process aims to provide insights into the sentiment and
emotions expressed in cryptocurrency-related tweets.
In the preprocessing stage, the gathered tweets undergo data cleaning, which
involves eliminating numbers, hyperlinks, usernames, mentions, and hashtags, as well as
53
correcting spelling errors. Following this, the procedure moves to feature selection, which
encompasses converting emoticons and removing both emoticons and punctuation
marks. The next step is tokenization, during which the process entails removing redundant
characters, eliminating stopwords, converting text to lowercase, and employing a
tokenizer. Subsequently, lemmatization is applied. This series of steps completes the
preprocessing of cryptocurrency-related tweets, preparing them for storage in the
database.
Development Details
Python was used as the tool's programming language, MySQL as the database,
and Visual Studio Code as the programming system. The researchers accessed the
tweets using the X (formerly known as Twitter) user account and manually gathered the
data using X (formerly known as Twitter) advanced search options. The tool's design
serves as an example of development phase planning. The testing and observation of the
tool's behavior during the development period were crucial for understanding its potential.
The researcher conducted a test and debug each tool component after it has been
completed to determine the prerequisites for the subsequent component. The
development process was completed by repeating these phases. The tool was approved
for implementation to address the issue in the study's domain because it met the
requirements and achieved the study's objective.
Research Instrument
The study utilized the experiment paper to determine if the tool output, which is
sentiment analysis of cryptocurrency-related tweets with polarity, emotion and intensity
level recognition, will match the expected output of the tool.
Experiment paper
The first three table shows the performance of the system in polarity, emotion, and
intensity level recognition results considering the combination of keywords, emoji, and
ending punctuation marks in terms of Precision, Recall, and F-Measure. The next three
table shows the performance of the system in polarity, emotion, and intensity level
recognition using Plain text only.
54
Data Generation/ Gathering Procedure
The researchers collected the data in X (formerly known as Twitter) platform since
it is renowned for highlighting trends in cryptocurrency-related tweets, various news, and
current events. The study was prepared according to the following steps:
55
Ethical Considerations
The sentiment analysis study is conducted with full adherence to research ethics,
including obtaining informed consent from the university administration, course
instructors, and affected experts specifically the cryptocurrency trader, language
practitioner, and psychologist. Ethical considerations will prioritize privacy, confidentiality,
fairness, and equity. Measures will be taken to protect sensitive information, anonymize
data, and address biases. Transparent communication will be maintained regarding
sentiment analysis methods, limitations, and uncertainties. Approval from an ethics
committee will be sought, and data protection protocols is followed. Monitored, evaluated,
and ensures the ethical compliance of the study that aims to contribute a valuable insight
in sentiment analysis while maintaining the utmost ethical standards.
56
column, excluding the diagonal, represents the FPs for that class. TN for each class is
calculated as the sum of all values in the matrix excluding the row and column of that
class. With these values, the proponents can accurately calculate recall and precision for
each class, providing a comprehensive understanding of the classifier's performance in a
multi-class setting.
Evaluation Metrics
Evaluation Parameters (Agarwal & Mittal, 2019)
• True Positive (TP): This is when the tool accurately predicts the positive class,
and the actual class is indeed positive. For instance, if the tool predicts 'happy' and
the actual emotion is 'happy', it's a TP.
• True Negative (TN): TN occurs when the tool correctly identifies the negative
class. In a multi-class setting, this means for a specific class (say, 'happy'), all other
classes ('sad', 'surprise', etc.) are correctly identified as not being 'happy'.
• False Positive (FP): This happens when the tool incorrectly predicts the positive
class. For example, if the tool predicts 'happy’ when the actual emotion is 'angry',
it's a FP for 'happy'.
• False Negative (FN): FN takes place when the tool incorrectly predicts the
negative class. Using the same example, if the tool predicts 'angry' (negative for
'happy') when the actual emotion is 'happy', it's an FN for 'happy'.
Precision
Given all the predicted labels (for a given class X), how many of the
instances were correctly predicted. Measured by the number of True
Positive divided by the total number of True Positives and False Positive.
Where:
True Positive- the tool correctly predicts the positive sentiment for
cryptocurrency-related tweets.
False positive- the tool incorrectly predicts positive sentiment for tweets
that should have been classified as negative.
57
Recall
For all instances that should have a label X, how many of these
were correctly captured, Measured by the number of True Positive
divided by the total number of True Positive and False Negative.
Where:
True Positive- the tool correctly predicts the positive sentiment for
cryptocurrency-related tweets.
False positive- the tool incorrectly predicts positive sentiment for
cryptocurrency-related tweets that should have been classified as
negative.
F-measure
F-measure is the value of the weighted average of Precision and
Recall. Multiply the values by two and divide them by the sum of
Precision and Recall.
Hypothesis Testing
To measure the significant difference in sentiment analysis performance
between using a combination of keywords, ending punctuation marks, and emojis
compared to using plain text only for analyzing sentiments in cryptocurrency-
related tweets the researchers utilized Paired T-test.
Paired T-test
The t-test assesses whether the means of the tool using plain-text only
cryptocurrency-related tweets and the proposed tool are statistically different from
each other.
58
Where:
d: difference per paired value
n: number of samples
Table 7
Tweets F-measure
Polarity Recognition
Emotion Recognition
Intensity Level Recognition
Overall Performance
Rating System
To assess the tool's overall recognition performance, it is necessary to
interpret its performance. Table 3.4 Rating System for the parameters: Precision,
Recall, and F-measure.
Table 8
Rating System for the Parameters: Precision, Recall, and F-Measure (Eboña 2013)
59
CHAPTER 4
RESULTS AND DISCUSSIONS
This chapter presents and interprets the findings of the data collected during the
implementation of the developed tool to address the problem in the study.
Table 9
Division of Cryptocurrency-related Tweets per phase
The data that is used by the researchers was limited due to the limitation imposed
by Elon Musk on X (formerly known as Twitter) on data scraping and the number of tweets
users can read per day on X (Quintet, 2023). The researchers conducted experiments to
explore and analyze the data and generate the following results. The researcher
presented tables that outline the cumulative details of the experiment to provide a concise
explanation of the findings. The researchers used a combination of keywords, ending
punctuation marks, and emoticons in analyzing sentiments and recognized its emotion
and intensity level to address the gaps in sentiment analysis in cryptocurrency-related
tweets. The researcher conducted tests to determine the assessment of the tool’s
performance in analyzing sentiment in crypto-currency related tweets in terms of
Precision, Recall, and F-measurement based on TP (True Positive - This is when the tool
60
accurately predicts the positive class, and the actual class is indeed positive), FP (False
Positive - This happens when the tool incorrectly predicts the positive class) , TN (True
Negative - occurs when the tool correctly identifies the negative class), and FN (False
Negative - takes place when the tool incorrectly predicts the negative class). The
researchers conducted several assessments of the developed tool in classifying Polarity,
Emotion, and Intensity level to get its overall performance of the tool. The researchers
also answered if there is a significant difference in sentiment analysis performance
between using a combination of keywords, ending punctuation marks, and emojis
compared to using plain text only for analyzing sentiments in cryptocurrency-related
tweets.
Based on the problem presented in Chapter 1, below are the gathered data that
respond to the Statement of the Problem regarding the overall performance of the
developed tool that considered the combination of keywords, ending punctuation marks,
and emoticons compared to using plain text only in analyzing sentiment.
a. Precision
b. Recall
c. F-Measure
The researchers tested the tool to calculate the performance of the tool in
cryptocurrency-related tweets considering the combination keywords, ending punctuation
marks, and emoticons.
Table 10
Polarity results considering the combination of keywords, ending punctuation
marks, and emoticons.
Polarity Precision Recall F-Measure
Positive 93.55% 91.58% 92.55%
Negative 85.96% 89.09% 87.50%
Overall Polarity 89.76% 90.34% 90.03%
Verbal Interpretation Very Good Very Good Very Good
61
The table above shows the results for the Polarity Recognition considering the
combination of Keywords, Ending Punctuation Marks, and Emoticons. The tool approach,
which prioritizes detecting the polarity of a cryptocurrency-related tweet before identifying
its emotion and intensity level, reflects a layered understanding of sentiment analysis. By
first establishing whether a tweet is positive or negative, the tool lays a foundational
context for further emotional and intensity analysis. This sentiment analysis method based
on the study of Kumar & Bhaskari (2018) emphasize the complexity of interpreting
sentiments.
The verbal interpretation of this parameter is determined by Eboña (2013) Rating
system, and it presents the Precision, Recall, and F-Measure for evaluating the sentiment
polarity of tweets related to cryptocurrencies. A total of 150 cryptocurrency-related tweets
were tested for evaluations, with each tweet classified as positive or negative. Based on
the first statement of the problem stated in Chapter 1, The precision results for polarity
were 'very good,' with a score of 89.76%, while recall also achieved a 'very good' rating at
90.34%. Additionally, the F-measure, scored 'very good' at 90.03%. These high-
performance metrics not only illustrate the tool’s effectiveness in accurately categorizing
cryptocurrency-related tweet sentiments but also reflect its reliability in the nuanced field
of sentiment analysis.
Table 11
Emotion results considering the combination of keywords, ending punctuation
marks, and emoticons.
Emotion Precision Recall F-Measure
Happy 97.44% 90.48% 93.83%
Sad 81.48% 78.57% 80.00%
Surprise 71.43% 83.33% 76.92%
Anger 66.67% 85.71% 75.00%
Anticipation 80.00% 91.43% 94.12%
Fear 76.19% 80.00% 78.05%
Overall Emotion 78.87% 84.92% 82.99%
Verbal Interpretation Fair Satisfactory Satisfactory
The table above shows the results for the Emotion Recognition considering the
combination of Keywords, Ending Punctuation Marks, and Emoticons. This approach by
62
Nourah & Mohamed (2020) highlights the complexity of emotional analysis in digital
communication, where simple textual elements like keywords and emoticons can reveal
deeper emotional undertones. After obtaining the sentiment value of each cryptocurrency-
related tweet, the tool progresses to the next step, which involves identifying the specific
emotion from these tweets. The limitation to six basic emotions (happy, surprise,
anticipation, angry, fear, and sad) as stated by the Emotag120, reflects a focused yet
comprehensive range of human emotional responses in the context of cryptocurrency
discussions. The findings among the six basic emotions, the 'Surprise' emotion category
have contributed to a lower performance with a F-Measure of 73.17%. Additionally,
Emotag120 has a limited emoticon that the system can detect. This leads to
misclassification of the system. The verbal interpretation of this parameter is determined
by Eboña (2013) Rating system, and it presents the Precision, Recall, and F-Measure for
evaluating the emotion of tweets related to cryptocurrencies. The results of the Emotion
in terms of precision were ‘fair’, with a score of 78.87%, recall achieved a ‘satisfactory’
rating at 84.92%, and F-measure scored ‘satisfactory’ at 82.99% accuracy, as based on
the first statement of the problem stated in Chapter 1. These metrics not only demonstrate
the tool’s effectiveness in emotion recognition but also underscore the nuanced challenge
of deciphering emotions in text-based communication.
Table 12
Intensity Level results considering the combination of keywords, ending
punctuation marks, and emoticons.
Intensity Level Precision Recall F-Measure
Low 87.50% 84.85% 86.15%
Medium 94.74% 87.10% 90.76%
High 85.25% 94.55% 89.66%
Overall Intensity Level 89.16% 88.83% 88.86%
Verbal Interpretation Very Good Good Good
The table above shows the results for the Intensity Level Recognition with a
combination of Keywords, Ending Punctuation Marks, and Emoticons. After obtaining the
emotion value of each cryptocurrency-related tweet, the tool advances to the critical step
of identifying the Intensity level of these emotions. The tool can detect 3 levels of emotion
intensity which are low, medium, and high (Das, & Bandyopadhyay 2010). The verbal
63
interpretation of this parameter is determined by Eboña (2013) Rating system, and it
presents the Precision, Recall, and F-Measure for evaluating the intensity level of tweets
related to cryptocurrencies. The results for the Intensity level, with precision were ’very
good’, with a score of 89.16%, recall achieved a ‘good’ rating at 88.83%, and F-Measure
scored ‘good’ at 88.86% accuracy, as per the first problem statement in Chapter 1,
highlight the tool’s high performance and reliability in the intricate task of intensity level
detection in textual data.
Table 13
Summary Result of the performance of the tool considering the combination of
Keywords, Ending Punctuation Marks, and Emoticons
Precision Recall F-Measure
Polarity 89.76% 90.34% 90.03%
Emotion 78.87% 84.92% 82.99%
Intensity Level 89.16% 88.83% 88.86%
Overall Performance 85.93% 88.03% 87.29%
Verbal Interpretation Good Good Good
The table above presents the summary of the performance results of the tool,
considering the combination of Keywords, ending punctuation marks, and emoticons. This
combination of elements suggests a nuanced approach to analyzing textual data, where
various types of inputs are considered to enhance the performance of the tool. The verbal
interpretation of the summary result of the performance of the tool is determined by Eboña
(2013) Rating system, and it presents the Precision, Recall, and F-Measure for evaluating
the polarity, emotion, and intensity level of tweets related to cryptocurrencies. Based on
the first statement of the problem stated in Chapter 1, the overall performance of the tool
in terms of precision were ‘good’ rating at 85.93%, recall achieved a ‘good’ rating at
88.03%, and F-measure obtained ‘good’ rating at 87.29%. These results indicate that the
tool is effective, with a high level of performance in identifying and interpreting the
cryptocurrency-related tweets, as reflected in the good precision and recall rates. The
balance between precision and recall, as demonstrated by the F-measure, highlights the
tool’s ability to process the cryptocurrency-related tweets accurately and consistently.
64
2. What is the performance of the tool in cryptocurrency-related tweets using
Plain-text only in terms of:
a. Precision
b. Recall
c. F-Measure
The researchers tested the tool to calculate the performance of the tool in
cryptocurrency-related tweets using Plain-Text only.
Table 14
Polarity results using Plain-Text only
Polarity Precision Recall F-Measure
Positive 88.42% 87.50% 87.96%
Negative 78.18% 79.63% 78.90%
Overall Polarity 83.30% 83.57% 83.43%
Verbal Interpretation Satisfactory Satisfactory Satisfactory
The table above shows the results for the Polarity Recognition using plain text only.
The tool approach, which prioritizes detecting the polarity of a cryptocurrency-related
tweet before identifying its emotion and intensity level, reflects a layered understanding of
sentiment analysis. By first establishing whether a tweet is positive or negative, the tool
lays a foundational context for further emotional and intensity analysis. This sentiment
analysis method based on the study of Kumar & Bhaskari (2018) emphasize the
complexity of interpreting sentiments. The verbal interpretation of this parameter is
determined by Eboña (2013) Rating system, and it presents the Precision, Recall, and F-
Measure for evaluating the sentiment polarity of tweets related to cryptocurrencies. A total
of 150 cryptocurrency-related tweets were tested for evaluations, with each tweet
classified as positive or negative. Based on the second statement of the problem stated
in Chapter 1, the results of the Polarity in terms of precision were ‘satisfactory’ with a score
of 83.30%, recall achieved ‘satisfactory’ rating at 83.57%, and F-measure also obtained
‘satisfactory’ rating at 83.43% accuracy. These performance metrics indicate that the tool,
65
which exclusively processes plain-text tweets, demonstrated a satisfactory rating on the
performance in evaluating the polarity.
Table 15
Emotion results using Plain-Text only
Emotion Precision Recall F-Measure
Happy 88.37% 86.36% 87.36%
Sad 75.00% 88.89% 81.36%
Surprise 78.57% 64.71% 70.97%
Anger 83.33% 71.43% 76.92%
Anticipation 81.08% 85.71% 83.33%
Fear 77.78% 70.00% 73.68%
Overall Emotion 80.69% 77.85% 78.94%
Verbal Interpretation Satisfactory Fair Fair
The table above shows the results for the Emotion Recognition using Plain text
only. This approach by Nourah & Mohamed (2020) highlights the complexity of emotional
analysis in digital communication, where simple textual elements like keywords and
emoticons can reveal deeper emotional undertones. After obtaining the sentiment value
of each cryptocurrency-related tweet, the tool progresses to the next step, which involves
identifying the specific emotion from these tweets. The limitation to six basic emotions
(happy, surprise, anticipation, angry, fear, and sad) as stated by the Emotag120, reflects
a focused yet comprehensive range of human emotional responses in the context of
cryptocurrency discussions. The findings among the six basic emotions, the 'Surprise'
emotion category have contributed to a lower performance with a F-Measure of 66.67%.
This leads to misclassification of the system. The verbal interpretation of this parameter is
determined by Eboña (2013) Rating system, and it presents the Precision, Recall, and F-
Measure for evaluating the emotion of tweets related to cryptocurrencies. Based on the
second statement of the problem stated in Chapter 1, the results of the Emotion in terms
of precision were ‘satisfactory’ with a score of 80.69%, recall achieved ‘fair’ ratings at
77.85%, and F-measure obtained ‘fair’ at 78.94% accuracy. These metrics suggest that
the tools performed moderately fair in recognizing and analyzing emotions, with a slightly
better performance in precision compared to recall and F-measure.
66
Table 16
Intensity Level results using Plain-Text only
Intensity Level Precision Recall F-Measure
Low 88.46% 71.88% 79.31%
Medium 69.49% 83.67% 75.93%
High 76.92% 81.08% 78.95%
Overall Intensity 78.29% 78.88% 78.06%
Verbal Interpretation Fair Fair Fair
The table above shows the results for the Intensity Level Recognition using Plain
text only. After obtaining the emotion value of each cryptocurrency-related tweet, the tool
advances to the critical step of identifying the Intensity level of these emotions. The tool
can detect 3 levels of emotion intensity which are low, medium, and high (Das, &
Bandyopadhyay, 2010). The verbal interpretation of this parameter is determined by
Eboña (2013) Rating system, and it presents the Precision, Recall, and F-Measure for
evaluating the intensity level of tweets related to cryptocurrencies. Based on the second
statement of the problem stated in Chapter 1, the results of the Intensity level in terms of
precision were ‘fair’ with a score of 78.29%, recall achieved ‘fair’ rating at 78.88%, and
obtained ‘fair’ rating at 78.06% accuracy for F-Measure. This shows the tool’s fair
performance in intensity level detection in textual data when using plain text only.
Table 17
Summary Result of the performance of the tool using Plain-Text only.
Summary Precision Recall F-Measure
Polarity 83.30% 83.57% 83.43%
Emotion 80.69% 77.85% 78.94%
Intensity Level 78.29% 78.88% 78.06%
Overall Performance 80.76% 80.10% 80.14%
Verbal Interpretation Satisfactory Satisfactory Satisfactory
The table above presents the summary of the performance results of the tool, using
plain text only. The verbal interpretation of the summary result of the performance of the
tool is determined by Eboña (2013) Rating system, and it presents the Precision, Recall,
and F-Measure for evaluating the polarity, emotion, and intensity level of tweets related to
67
cryptocurrencies. Based on the second statement of the problem stated in Chapter 1, the
overall performance of the tool in terms of precision were ‘satisfactory’ with a score of
80.60%, recall achieved ‘satisfactory’ rating at 80.10%, and F-measure obtained
‘satisfactory’ rating at 80.09% accuracy. These findings suggest that there are needs for
improvement in the tool's performance when it comes to identifying and interpreting
cryptocurrency-related tweets, as evidenced by the precision and recall rates. The F-
measure, which assesses the balance between precision and recall, underscores the
tool's capacity to process cryptocurrency-related tweets with precision and consistency,
However, there is a need for enhancement when using plain text only.
Table 18
Tweets F-Measure
Polarity 90.03%
Emotion 82.99%
Intensity Level 88.86%
Overall Performance 87.29%
Verbal Interpretation Good
68
Table 19
Tweets F-Measure
Polarity 83.43%
Emotion 78.94%
Intensity Level 78.06%
Overall Performance 80.14%
Verbal Interpretation Satisfactory
The tables 18 and 19 show the comparative efficiency of the EmCrypt Sentiment
Analyzer in processing cryptocurrency-related tweets. The analyzer's performance, when
utilizing a combination of keywords, ending punctuation marks, and emoticons, is
contrasted against its performance using only plain text. The difference in the Overall F-
Measure, ‘Good’ rating with the score of 87.68% for considering the combination of the
three features compared to ‘Satisfactory’ rating of 80.09% for using plain text only
underlines the enhanced effectiveness of the combined method. This suggests that the
inclusion of combination of keywords, punctuation marks, and emoticons offers a more
nuanced and accurate analysis of sentiment in cryptocurrency-related tweets. The
combination of keywords, ending punctuation marks, and emoticons shows a better result
in analyzing cryptocurrency-related tweets than just using plain text. The results highlight
the importance of considering the combination of the keywords, ending punctuation marks,
and emoticons in sentiment analysis, especially in the context of the often ambiguous and
emotionally charged field of cryptocurrency.
69
Figure 14 shows the overall results and comparison between the EmCrypt
Analyzer that considers the combination of keywords, ending punctuation marks, and
emoticons and using Plain-text only in analyzing sentiments in cryptocurrency-related
tweets. The tool that incorporates a combination of three features outperformed the tool
that solely relies on plain text in terms of overall performance. There is a noticeable
disparity in the overall results, with the combined feature tool achieving a ‘Good’ rating,
scoring 87.68%, as opposed to the ‘Satisfactory’ rating of 80.09% for the plain text-only
approach. This underscores the improved effectiveness of considering the combination of
keywords, ending punctuation marks, and emoticons in analyzing sentiments on
cryptocurrency-related tweets.
Table 20
Table 20 presents the Paired T-Test results, a statistical analysis, for the
performance of the tool, including a T-Test Value of -3.63 and a P Value of 0.03. The
decision to reject the null hypothesis is based on these results, particularly the P-Value
being lower than the predetermined significance level of 0.05. The default use of an alpha
level of .05 is suboptimal for two reasons. First, decisions based on data can be made
more efficiently by choosing an alpha level that minimizes the combined Type 1 and Type
2 error rate. Second, it is possible that in studies with very high statistical power, p values
lower than the alpha level can be more likely when the null hypothesis is true than when
the alternative hypothesis is true (Maier and Lakens, 2022). The P-value result are less
than the threshold 0.05, hence words are significant for sentiment classification (Mondal,
2016) This outcome implies that the differences observed in the tool’s performance are
statistically significant. It suggests that the variables or conditions being tested have a real,
measurable impact on the tool’s performance. The rejection of the null hypothesis here is
a critical finding, indicating that the factors under study do indeed have a significant effect.
This provides strong evidence that the tool’s performance is influenced by the specific
methods or conditions being tested, reinforcing the importance of these factors in the
overall effectiveness of the sentiment analysis tool.
70
CHAPTER 5
SUMMARY OF FINDINGS, CONCLUSION AND RECOMMENDATION
This chapter presents the summary of findings and results of the assessment on
defining the performance of EmCrypt in Sentiment Analysis in Cryptocurrency-related
tweets with Emotion and Intensity Level Recognition considering the combination of
keywords, ending punctuation marks, and emoticons. Conclusions and
Recommendations were also included in this chapter.
Summary of Findings
From the evaluation and implementation of the study, EmCrypt: Sentiment
Analysis on Cryptocurrency-related tweets with Emotion and Intensity Level Recognition
Considering the combination of keywords, ending punctuation marks, and emoticons, the
researcher have come up with the following conclusions:
71
recall was ‘Fair’ with a score of 78.88%, and F-measure obtained ‘Fair’ of 78.06%. Overall,
the performance of the tool using Plain-text only scores 80.76% precision obtained
‘Satisfactory’ rating, 80.10% recall achieved ‘Satisfactory’ rating, and 80.14% F-measure
obtained ‘Satisfactory’ rating. For the Paired T-test, with a T-value of -3.63, a p-value of
0.03 that is less than the alpha level used and the decision resulting in the rejection of the
null hypothesis.
Conclusions
Based on the findings of the study, the researchers have arrived on the following
conclusions:
1. The tool was consistent when the classification considered the keywords,
ending punctuation marks, and emojis compared to using plain text only.
2. The F-measures for considering keywords, ending punctuation marks, and
emojis and plain-text only shows that there’s a difference between the two
classifications.
3. After obtaining a p-value that is less than alpha level used, it is confirmed
that EMCRYPT that considers the combination of keywords, ending
punctuation marks, and emoticons has a significant difference on
performance compared with using a plain-text only. The study has resulted
in rejection of the said hypothesis.
4. Following the Rating System of the performance of the tool by Eboña
(2013), the interpretation for the performance tool is “Good” considering the
combination of keywords, ending punctuation marks, and emoticons.
5. The Sentiment Analyzer successfully achieved its objective of addressing
key challenges in Sentiment Analysis in cryptocurrency-related tweets. It
provided more precise interpretations of the emotions conveyed in tweets.
However, certain emotions were inaccurately labeled by the tool due to
limited training data and the tool’s limited ability to detect different kind of
emoticons, which led to misclassifications.
72
Recommendations
To improve the performance of the tool, the researchers suggest the following for
the future works and developments:
1. For improved surprise recognition, it is advised to include both positive and
negative surprises. Adding these types of surprises enhances the tool's
ability to better differentiate between various degrees or types of surprise,
ultimately leading to improved overall precision in identifying surprise
emotions.
2. Considering the limitations imposed on data scraping by X (formerly known
as Twitter), it is advisable to explore alternative platforms for gathering
data, which can provide more accessible and reliable sources of data.
3. It is recommended to increase the training data for a specific corpus. This
will increase the knowledge base of the classifier and could lead into better
classification of Polarity and Emotion.
4. The researchers recommend increasing the number of emoticons/emojis
that the system can detect in order to increase the performance of the
system.
73
References
2987. https://doi.org/10.1007/s10115-020-01449-0
statistics/
Aspect Based Emotion Analysis on Online User-Generated Reviews. (2018, July 1).
https://ieeexplore.ieee.org/document/8494183
Analysis. https://doi.org/10.5121/csit.2023.130302
Bharti, S. K., Varadhaganapathy, S., Gupta, R., Shukla, P., Bouye, M., Hinga, S. K., &
https://doi.org/10.1155/2022/2645381
Burton, N. B., [Neel Burton, M.D.]. (n.d.). What Are Basic Emotions? Neel Burton M.D.
https://www.psychologytoday.com/intl/blog/hide-and-seek/201601/what-are-
basic-emotions
Chen, B. (2023a, June 22). Emojis Aid Social Media Sentiment Analysis: Stop Cleaning
sentiment-analysis-stop-cleaning-them-out-
bb32a1e5fc8e#:~:text=Leverage%20emojis%20in%20social%20media%20senti
ment%20analysis%20to%20improve%20accuracy.&text=TL%3BDR%3A,incorpo
rate%20emojis%20in%20the%20loop
74
Chen, B. (2023b, June 22). Emojis Aid Social Media Sentiment Analysis: Stop Cleaning
sentiment-analysis-stop-cleaning-them-out-bb32a1e5fc8e
Cimino, A. N., & Dell’Orletta, F. (2016a). Tandem LSTM-SVM Approach for Sentiment
https://doi.org/10.4000/books.aaccademia.2003
Cimino, A. N., & Dell’Orletta, F. (2016b). Tandem LSTM-SVM Approach for Sentiment
https://doi.org/10.4000/books.aaccademia.2003
https://prowritingaid.com/punctuation-mark-express-strong-emotions
Dandannavar, P., Mangalwede, S. R., & Deshpande, S. M. (2019). Emoticons and Their
https://doi.org/10.1007/978-3-030-19562-5_19
Dang, N. C., Moreno, M. N., & De La Prieta, F. (2020). Sentiment Analysis Based on
https://doi.org/10.3390/electronics9030483
Dwivedi, D. N., & Vemareddy, A. (2023a). Sentiment Analytics for Crypto Pre and Post
Ferreira Araújo, R., Roschildt Pinto, A., & Ferrandin, M. (n.d.). Sentiment Identification
75
https://thescipub.com/pdf/jcssp.2023.619.628.pdf.
https://thescipub.com/pdf/jcssp.2023.619.628.pdf
Ghosh, M. (2022a, April 28). Philippines central bank to trial wholesale CBDC. Forkast.
https://forkast.news/headlines/philippines-central-bank-wholesale-cbdc/
Maier, M., & Lakens, D. (2022, April 1). Justify Your Alpha: A Primer on Two Practical
https://doi.org/10.1177/25152459221080396
Mason, R. (2022a, July 14). Philippines’ digital transformation could make it a new
transformation-could-make-it-a-new-crypto-hub
Montag, A. (2018, August 28). “HODL,” “whale” and 5 other cryptocurrency slang terms
cryptocurrency-slang-terms-mean.html
Navalan, E. (2022, September 26). Is the Philippines on track to becoming a crypto hub?
Forkast. https://forkast.news/is-philippines-becoming-crypto-hub/
Nandwani, P., & Verma, R. (2021, August 28). A review on sentiment analysis and
https://doi.org/10.1007/s13278-021-00776-6
Qi, Y., & Shabrina, Z. (2023). Sentiment analysis using Twitter data: a comparative
Quintet, J. (2023). Twitter Imposes Temporary Limits to Curb Data Scraping: Musk.
scraping/
Sagum, R., Navarro, M., & Victore, A. (2019). EMOSIS Sentiment Analysis on Tweets
76
Marks. International Journal of Recent Technology and Engineering, 8(4),
10289–10293. https://doi.org/10.35940/ijrte.d4518.118419
Sasmaz, E., & Tek, F. B. (2021a). Tweet Sentiment Analysis for Cryptocurrencies. In
(UBMK). https://doi.org/10.1109/ubmk52708.2021.9558914
Ensemble LSTM-GRU Model. (2022). IEEE Journals & Magazine | IEEE Xplore.
https://ieeexplore.ieee.org/abstract/document/9751065
https://www.semanticscholar.org/paper/EmoTag-%E2%80%93-Towards-an-
Emotion-Based-Analysis-of-Shoeb-
Raji/024efbeff09fdb26bb5da22310208f94aea05e0b
Ullah, M. S., Marium, S. M., Begum, S. M., & Dipa, N. S. (2020). An algorithm and
method for sentiment analysis using the text and emoticon. ICT Express, 6(4),
357–360. https://doi.org/10.1016/j.icte.2020.07.003
Valdivia, A., Luzón, M. V., Wang, Z., & Herrera, F. (2018, November 1). Consensus vote
Fusion. https://doi.org/10.1016/j.inffus.2018.03.007
View of Sentiment Analysis Based Direction Prediction in Bitcoin using Deep Learning
https://ijisae.org/index.php/IJISAE/article/view/1062/616
Wegrzyn-Wolska, K., Bougueroua, L., Yu, H., & Zhong, J. (2016). Explore the Effects of
77
Appendices
Appendix 1: Instrument
Experiment Paper
Materials:
a. Laptop/Computer
b. Microsoft Excel
c. Experiment Paper
D. Cryptocurrency-related tweets
POLARITY
(With combination of keyword, ending punctuation marks, and emoji)
Positive
Negative
EMOTION RECOGNITION
(With combination of keyword, ending punctuation marks, and emoji)
Actual
Emotion Happy Sad Surprise Anger Anticipation Fear
Category
Happy
Sad
Surprise
Anger
78
Anticipation
Fear
Total Expert
Label
Total
Predicted Emotion Category Predicted
Actual Emotion
Category Low Medium High
Low
Medium
High
Total Expert
Label
POLARITY
(Plan-text only)
Positive
Negative
79
EMOTION RECOGNITION
(Plan-text only)
Actual
Emotion Happy Sad Surprise Anger Anticipation Fear
Category
Happy
Sad
Surprise
Anger
Anticipation
Fear
Total Expert
Label
Total Predicted
Predicted Emotion Category
Actual Emotion
Category Low Medium High
Low
Medium
High
80
Appendix 2: Correspondence
81
82
83
Appendix 3: Ethical Clearance
84
Appendix 4: Screen Layout of the Tool
Screen capture of the proposed tool.
85
EmCrypt Analyzer Window
86
Sample Input and Output (Plain-text only)
Sentiment Chart
87
Appendix 5: Thesis Implementation Report
Introduction
Problem Statement
The researchers developed a tool for sentiment analysis with emotion and intensity
level recognition in cryptocurrency-related tweets considering the combination of
keywords, ending punctuation marks, and emoticons.
a. Precision
b. Recall
c. F-Measure
a. Precision
b. Recall
c. F-Measure
88
3. Is there a significant difference in sentiment analysis performance between using
a combination of keywords, ending punctuation marks, and emojis compared to
using plain text only for analyzing sentiments in cryptocurrency-related tweets?
Respondents:
The 1st respondent is Mr. Soren Louis Anore, an expert in cryptocurrency
trading, with knowledge of digital currencies, market trends, and investment strategies. He
is currently a Media Analyst in Accenture.
The 2nd respondent is Mr. Kirck Michael Britos De Leon, a language practitioner
currently working on his Doctorate Degree of Philosophy in English Studies: Language at
the University of the Philippines – Diliman. He is skilled in areas such as translation,
interpretation, and linguistics.
The 3rd respondent is Dr. Rodrigo V. Lopiga, a faculty member at the Polytechnic
University of the Philippines, Department of Psychology, College of Social Sciences and
Development. Experts in specializing in understanding human behavior and emotions.
Time Frame:
Activity Status Date
Chapter 1-3 Documentation Done Month of April – Month of May, 2023
Development of system Done Month of October – Month of December, 2023
Data Gathering Done Month of November – Month of December, 2023
Testing of system Done Month of December 2023
Chapter 4-5 Documentation Done Month of December 2023 – Month of January 2024
89
Mr. Kirck Michael Britos De Leon
• Day 1 (November 26, 2023) 7:30 PM-8:00 PM: Via Zoom Meeting
• Day 2 (November 27, 2023) 6:30 PM-7:30 PM: Via Zoom Meeting
• Day 3 (December 13, 2023) 7:30 PM-8:30 PM: Via Zoom Meeting
Implementation Procedure:
1. Manually gathered cryptocurrency-related tweets from X, formerly known as
Twitter.
2. Three experts manually annotated the Polarity, Emotion, and Intensity Level of the
cryptocurrency-related tweets.
3. Data collection and evaluation were conducted using a Majority Voting approach
by experts. In cases of disagreement among the three experts, the Language
Practitioner Expert decides the Polarity, and the Psychologist is responsible for
determining the Emotion and Intensity Level.
4. Data Acquisition: 900 Training, 450 Testing and 150 Evaluation
5. Input the annotated data from the experts into the system.
6. Filled out the experimental paper and compared the data annotated by the experts
with the outcomes generated by the tool.
90
Experiment Results
POLARITY
(With combination of keyword, ending punctuation marks, and emoji)
Positive 87 8 95 87 49 6 8
Negative 6 49 55 49 87 8 6
EMOTION RECOGNITION
(With combination of keyword, ending punctuation marks, and emoji)
Actual TP TN FP FN
Emotion Happy Sad Surprise Anger Anticipation Fear
Category
Happy 38 3 1 0 0 0 42 38 91 1 4
Sad 0 22 1 1 0 4 28 22 107 3 6
Surprise 1 0 15 1 1 0 18 15 114 6 3
Anger 0 0 0 6 0 1 7 6 123 3 1
Anticipation 0 0 2 1 32 0 35 32 97 1 3
Fear 0 2 2 0 0 16 20 16 113 5 4
Total Expert
Label 39 27 21 9 33 21 150
91
INTENSITY LEVEL RECOGNITION
(With combination of keyword, ending punctuation marks, and emoji)
Total Predicted TP TN FP FN
Predicted Emotion Category
Actual Emotion
Category Low Medium High
Low 28 1 4 33 28 106 4 5
Medium 3 54 5 62 54 80 3 8
High 1 2 52 55 52 82 9 3
Total Expert
Label 32 57 61 150
POLARITY
(Plan-text only)
Positive 84 12 96 84 43 11 12
Negative 11 43 54 43 84 12 11
92
EMOTION RECOGNITION
(Plan-text only)
Actual TP TN FP TN
Emotion Happy Sad Surprise Anger Anticipation Fear
Category
Happy 38 1 2 0 3 0 44 38 84 5 6
Sad 0 24 1 0 0 2 27 24 98 8 3
Surprise 2 0 11 0 3 1 17 11 111 3 6
Anger 0 1 0 5 0 1 7 5 117 1 2
Anticipation 3 1 0 1 30 0 35 30 92 7 5
Fear 0 5 0 0 1 14 20 14 108 4 6
Total Expert
Label 43 32 14 6 37 18 150
Total TP TN FP FN
Predicted Emotion Predicted
Category
Actual Emotion
Category Low Medium High
Low 46 14 4 64 46 71 6 18
Medium 3 41 5 49 41 76 18 8
High 3 4 30 37 30 87 9 7
93
Proof of Implementation
Cryptocurrency Trader Expert: Mr. Soren Louis Anore
94
Language Practitioner Expert: Mr. Kirck Michael Britos De Leon
95
Psychology Expert: Dr. Rodrigo V. Lopiga
96
Appendix 6:
Biographical Statement
Dayag, Jahren Hans P. He was born in Antipolo, City on December 29, 2001. He
attended Miljohn Christian Academy for his basic education, continued through Tomas
Claudio Colleges for his junior high school, and STI College for his senior high school
years. Currently a 4th year student at Polytechnic University of the Philippines. He is a
proficient Computer Science student with a keen interest and expertise in software
engineering and web development. He shows a solid grasp of technological principles and
applies them effectively to craft visually appealing and user-friendly designs. His Computer
Science field of Software Engineering, Artificial Intelligence, and Web Development.
97
Ebue, Lyndon Jeff E. He was born in Masinloc, Zambales on December 23, 2001.
He attended Taltal Elementary School for his basic education, continued through Northern
Zambales College, Inc. for his junior high school, and for his senior high school years.
Currently a 4th year student at Polytechnic University of the Philippines. He is a
knowledgeable and detail-oriented Computer Science student with a strong aptitude for
UI design, graphic design, and photography. Possesses a deep understanding of
technological principles and is able to apply them to create visually appealing and user-
friendly designs. His Computer Science field of Interest are UX/UX and Software
Engineering.
Tumbaga, John Jeffrei O. He was born on October 29, 2001, on Valenzuela City.
He attended Canumay West Elementary School for his basic education, continued through
Canumay West National High School for his junior high school, and progressed to
Pamantasan ng Lungsod ng Valenzuela for his senior high school years. Currently a 4th
year student at Polytechnic University of the Philippines. Taking the course of Bachelor of
Science in Computer Science. He is a diverse tech enthusiast and is eager to blend
creativity with analytical skills to craft innovative digital solutions. His Computer Science
field of Interest are Data Analyst, Website Development, Software Engineer, and UI/UX
Designer.
98