Group 5 Emcrypt Thesis Manuscript

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 106

EMCRYPT: SENTIMENT ANALYSIS ON CRYPTOCURRENCY-RELATED TWEETS

WITH EMOTION AND INTENSITY LEVEL RECOGNITION

A Thesis
Presented to the Faculty of the College of Computer and Information Sciences
Polytechnic University of the Philippines
Sta. Mesa, Manila

In Partial Fulfillment of the Requirements for the Degree


Bachelor of Science in Computer Science

Casinsinan, Cj C.

Dayag, Jahren Hans P.

Ebue, Lyndon Jeff E.

Tumbaga, John Jeffrei O.

January 2024
ABSTRACT

Sentiment analysis is a field of natural language processing employed to ascertain


the polarity, emotion, and intensity level of a sentence, particularly focusing on the study
of opinions. In this study, the researchers solved key challenges in sentiment analysis on
cryptocurrency-related tweets, which are to consider the combination of keywords, ending
punctuation marks, and emojis present in a tweet. Keywords, ending punctuation marks,
and emojis play a significant role in emotion recognition and intensity level recognition.
This was applied to tweets expressing opinions about cryptocurrency. This addressed the
objective of the research, which is to compare the significant difference between
considering the combination of keywords, ending punctuation marks, and emojis with plain
text only.
The cryptocurrency-related tweets, which served as inputs, were initially subjected
to pre-processing. The combination of LSTM and SVM served as the classifier for
determining the polarity and emotion of the cryptocurrency-related tweets. For the output,
the tweets have undergone intensity level recognition, which is determined by the
intensifiers within the tweets based on the Rules for Tagging an Emotional Sentence with
Intensity, considering emotion weights, and ending punctuation marks. The study used
precision, recall, and F-measures to evaluate the overall performance of the tool, which
considered the combined method and plain text only. According to the Rating System of
Eboña (2013), the developed tool has resulted in 87.29% achieving a “good” performance
in terms of considering the combination of keywords, ending punctuation marks, and
emojis, while 80.14% obtained a 'satisfactory' performance in terms of plain text only.
Based on the result of the T-test, the tool’s P-value is 0.03, which is less than the alpha
level used. The study has resulted in the rejection of the null hypothesis, confirming the
statistical significance of considering combinations of keywords, ending punctuation
marks, and emoticons in performance compared with using plain text only.
TABLE OF CONTENTS

Page
Title Page i
Abstract ii
Table of Contents iii
List of Tables vi
List of Figures viii
1. The Problem and Its Setting

Introduction 1
Theoretical Framework 4
Conceptual Framework 7
Statement of the Problem 8
Hypothesis 8
Scope and Limitations of the Study 9
Significance of the Study 12
Definition of Terms 13

2. Review of Literatures and Studies

Related Literature 14
Cryptocurrency 14
Sentiment Analysis on Cryptocurrency 15
Related Studies 22
Cryptocurrency 22
X (formerly known as Twitter) 23
Emoji (Emoticons) 24
Emotag1200 25
Emotion Recognition 26
Intensity Level Recognition 29
Keyword Spotting Method 29
LSTM-SVM Approach for Sentiment Analysis 30
Neutrality in Sentiment Analysis 31
Recognition of Emotion from Microblog (REM) 32
Deep Learning 33
Sentiment Analysis 34
Sentiment Analysis using X (formerly known as Twitter) Data 34
Sentiment Analysis in Cryptocurrency 35
Sentiment Analysis with Emoticons 37
Sentiment Analysis with Emotion and Intensity level recognition 38
Sentiment Analysis using LSTM-GRU Ensembled 39
Sentiment analysis using Recognition of emotion from microblogs (REM) 39
Synthesis of the Study 41

3. Methodology

Research Design 42
Source of Data 43
Instruments 45
System Architecture 45
Development Details 54
Research Instrument 54
Data Generation/ Data Gathering Procedure 55
Ethical Considerations 56
Statistical Data Analysis 56
Confusion Matrix 56
Evaluation Metrics 57
Precision 57
Recall 58
F-measure 58
Hypothesis Testing 58
Paired T-test 58
Rating System 59

4. Results and Discussion 60


5. Summary of Findings, Conclusions, and Recommendations

Summary of Findings 71
Conclusions 72
Recommendations 73

References 74
Appendices

Appendix 1: Research Instrument 78


Experiment Paper 78
Appendix 2: Correspondence 81
Appendix 3: Ethical Clearance 84
Appendix 4: Screen Layout of the Tool 85
Appendix 5: Thesis Implementation Result 88
Experiment Paper Result 91
Proof of Implementation 94
Appendix 6: Biological Statement 97
LIST OF TABLES

Number Title Page

1 Emojis/Emoticons included in the tool 9

2 List of Emoji/Emoticons that is used in 17


Cryptocurrency World

3 These are the emojis depicting common day-to-day 20


expressions

4 Rules for tagging an emotional anchoring vector with 29


intensity

5 Rules for Tagging an Emotional Sentence with 49


Intensity considering Emotion Weights and Ending
Punctuation Marks

6 Sample Confusion Matrix 56

7 Sample Summary of Overall Performance 59

8 Rating System for the Parameters: Precision, Recall, 59


and F-Measure (Eboña 2013)

9 Division of Cryptocurrency-related Tweets per phase 60

10 Polarity results considering the combination of 61


keywords, ending punctuation marks, and emoticons.

11 Emotion results considering the combination of 62


keywords, ending punctuation marks, and emoticons.

12 Intensity Level results considering the combination of 63


keywords, ending punctuation marks, and emoticons.
13 Summary Result of the performance of the tool 64
considering the combination of Keywords, Ending
Punctuation Marks, and Emoticons

14 Polarity results using Plain-Text only 65

15 Emotion results using Plain-Text only 66

16 Intensity Level results using Plain-Text only 67

17 Summary Result of the performance of the tool using 67


Plain-Text only.

18 Overall performance of the tool using combination of 68


Keywords, Ending Punctuation Marks, and Emoticons

19 Overall performance of the tool using Plain-Text only. 69

20 Paired T-Test Result of the tool 70


LIST OF FIGURES

Number Title Page

1 Procedure for Sentiment Analysis 4

2 Recognition of Emotion from Microblogs (REM) 5

3 LSTM-SVM architecture 5

4 Keywords Based (Keyword Spotting) Method 6

5 Conceptual Framework of the System 7

6 Elon Musk’s tweets on cryptocurrency and how 15


Bitcoin prices change accordingly.

7 Main steps of a keyword-spotting technique 27

8 Text-based Emotion Recognition Using a Deep 28


Learning Approach

9 System Architecture of the Tool 45

10 EmCrypt Training (Combination of Keywords, Ending 51


Punctuation Marks, and Emoticons)

11 EmCrypt Training (Plain Text Only) 51

12 Sequence Diagram of the Tool 52

13 Pre-processing Diagram 53

14 Overall results and comparison of the tool 69


performance
Chapter 1

THE PROBLEM AND ITS SETTING

Introduction

In today's digital age, Cryptocurrencies have become extremely popular all around
the world. They have brought about big changes in the financial world and have
transformed the way people interact with digital assets. There are about 420 million global
crypto users as of 2023, with over 20,000 cryptocurrencies in circulation worldwide
(Ariella, 2023). Cryptocurrency has gained significant traction in the Philippines, becoming
the second most popular in the world (Navalan, 2022). This growth aligns with the
country's plan to modernize its financial industry, with the government planning to launch
its digital currency project later this year. The new President of the Philippines, Ferdinand
Marcos Jr., advocates for digital and technological advancements, emphasizing their
importance in the widespread adoption of Cryptocurrency in the future (Ghosh, 2022).
Additionally, the government and central bank are collaborating with experts to ensure a
secure environment for investors and stakeholders utilizing blockchain or crypto
technology. (Mason, 2022)

Social media platform, specifically X (formerly known as Twitter), has emerged as


an important platform for cryptocurrency traders, enthusiasts, and investors to express
and share their opinions, exchange market information, and analyze the latest trends in
this dynamic industry. Therefore, analyzing people’s sentiment and emotion can help in
determining markets value. Cryptocurrency investors anticipate both profit and losses due
to fluctuations and to help them in their decision making, investors rely on the various tools
to know the sentiment of the people regarding cryptocurrency since its demand is
influenced by public opinions and government policies. (S. Colianni, et al. 2015) Studies
reveal that tweets with positive sentiments significantly affect cryptocurrency demand, and
vice versa. (Wołk, K. 2019).

Sentiment analysis for cryptocurrency is an excellent way and become more highly
significant to understand how to make smart investment decisions. It provides broad
market insights that can be useful for forming trading strategies. (Dwivedi, D., &
Vemareddy, A. 2023). In recent years, users of various social media platforms have been

1
used to using a set of graphic symbols to describe their feelings in online interactions.
Emojis, which are these emotive icons that can be found on all platforms, have become a
universal language. In addition to text, emojis, and punctuation marks are increasingly
being used by people to express their feelings or sentiments that otherwise cannot be
adequately communicated in words. Most of the recent cryptocurrency sentiment analysis
systems do not consider keywords, emojis, and ending punctuation marks as a part of the
analysis. However, A combination of keywords, ending punctuation marks, and emojis
could lead to better performance in Sentiment Analysis. (Sagum, R., Navarro, M., &
Jasper, A., 2019).

The existing research in cryptocurrency sentiment analysis has demonstrated the


effectiveness of traditional methods, such as lexicon-based approaches and supervised
machine-learning models using Plain-text tweets only (Naila Aslam et al., 2022). Given
the potential impact of these online matters on cryptocurrency markets, it is essential to
gain a thorough understanding of the underlying ideas and sentiments expressed in tweets
about cryptocurrencies. This study focuses on sentiment analysis with emotion and
intensity level recognition in cryptocurrency-related tweets, with a particular emphasis on
the role of a combination of keywords, ending punctuation marks, and emojis in shaping
and conveying these sentiments and emotions.

Analyzing the combination of keywords, emojis, and ending punctuation marks on


X (formerly known as Twitter) using sentiment analysis can be incredibly significant to
improve the performance in analyzing the text using Sentimental Analysis. The phrase "I
am so freaking happy rn, Bitcoin prices just shot up" is a lot different from "😂😂😂 I am so
freaking happy rn!?! Bitcoin prices just shot up 🚀🚀🚀!!!" and the emotion in the text can
be "happy" or "anticipation” adding the combination of keywords, emojis, and punctuation
marks in a sentiment analysis could improve the performance and understanding of the
sentiment of a message.

This study aims to analyze sentiments, emotions, and intensity levels by utilizing
tweets associated with cryptocurrency. These tweets are commonly utilized for predicting
cryptocurrency market prices. It aims to provide insight into the sentiments surrounding
cryptocurrencies on social media platforms, particularly X (formerly known as Twitter), and
to comprehend how the combination of keywords, emojis, and punctuation marks has a
significant effect on the performance of the tool in analyzing sentiments. This study has a

2
specific focus on utilizing supervised machine learning models to predict people's
sentiments, emotions, and intensity levels regarding the cryptocurrency market. X
(formerly known as Twitter) is widely used as a platform for expressing opinions and
thoughts on specific topics, making it a valuable source of data for this analysis.

The goal of the proposed study is to create a tool that has improved performance
for sentiment analysis with an emotion and intensity level recognition for tweets about
cryptocurrencies by combining advanced machine learning techniques, particularly Long
Short-Term Memory (LSTM) with Support Vector Machine (SVM) as Classifier. To
accomplish this, the proponents must first compile a complete dataset of tweets containing
textual content, punctuation marks, and emojis. The dataset is then analyzed using
machine learning and natural language processing (NLP) techniques, with an emphasis
on sentiment and emotion extraction and classification. The proponents intend to identify
patterns and trends in the emotional expressions of X (formerly known as Twitter) users
who are interested in cryptocurrencies, thereby providing valuable insight into the potential
effects of these emotions and sentiments on the dynamics of the cryptocurrency market.

3
Theoretical Framework

This paper's theoretical framework begins with sentiment analysis considering the
combination of keywords, ending punctuation marks, and emojis, which is the focus of this
research study. It will all take place in sentiment analysis, followed by processes
conducted in algorithms: Long Short-Term Memory (LSTM) Algorithm with Support Vector
Machine (SVM) Algorithms.

Figure 1. Procedure for Sentiment Analysis

According to Kumar & Bhaskari (2018), sentiment analysis is a popular technique


used to categorize documents or text into various polarities, such as positive, negative, or
neutral. It finds extensive application in assessing reviews, particularly on social media
platforms. Reviews can be in the form of textual feedback or ratings, which undergo
various evaluations to determine their quality. Social networking sites have also
incorporated sentiment analysis to gauge people's sentiments. X (formerly known as
Twitter), for instance, is a prime example where a multitude of opinionated texts surfaces.
Numerous studies have been conducted to evaluate tweets based on their respective
domains. Figure 1 illustrates the process of sentiment analysis.

4
Figure 2. Recognition of Emotion from Microblogs (REM)

The study by Islam et al. (2021) states that Recognition of Emotion from
Microblogs (REM) is an algorithm for a sample microblog containing an emoticon in the
text. REM is utilized as a method to identify and understand emotions conveyed through
emoticons in microblog posts. The researchers employ LSTM (Long Short-Term Memory)
as a deep learning model to capture the sequential nature of the emoticon-text
combination. REM aims to recognize emotions associated with emoticons, as emoticons
often serve to express emotions in text-based communication. By training the LSTM model
on a dataset of emoticons and their corresponding emotions, the researchers leverage the
power of recurrent neural networks to learn the contextual and emotional information
conveyed by emoticons in the given microblog texts. This approach allows for the
automatic Recognition of emotions in microblog posts, enhancing the understanding of
sentiment and emotional content in online communication.

Figure 3. LSTM-SVM architecture

5
Cimino & Dell'Orletta (2023) states that combining LSTM and SVM in sentiment
analysis offers the benefits of LSTM's ability to capture sequential information and
semantic meaning in text, along with SVM's robust classification framework. This
combination improves the model's accuracy in classifying sentiments, even for complex
expressions. During training, the model learns representations from labeled data, with the
LSTM layer capturing features and the SVM component classifying them into sentiment
classes. Through iterative parameter optimization, the model's performance is enhanced.
Once trained, the LSTM-SVM model can be used to analyze sentiment in new text inputs
by extracting features through the LSTM layer and assigning sentiment labels using the
SVM classifier.

Figure 4. Keywords Based (Keyword Spotting) Method

According to Padme & Kulkarni (2018), as Sentiment Analysis develops in the field
of Natural Language Processing, advanced studies beyond polarity occur. Recently, the
aim of SA has developed from determining the polarity to knowing the attitude of the
speaker. The attitude can be the speaker's evaluation, affective state, or emotional
communication. One example is emotional state detection, such as "sad", "happy", and
"angry".

6
Conceptual Framework

Figure 5. Conceptual Framework of the System

The researchers utilize a conceptual model to depict the study's variables. The
feature selection phase includes keywords, ending punctuation marks, and emoji as
independent variables. Outputs were affected by statistical treatment relying on these
variables. Intervening variables located in the middle section, including a combination of
emoji and punctuation marks, repetitive emojis, and punctuation marks, also affect the
outputs as they undergo the process. The dependent variable, the system's output, which
is the polarity (positive, negative, or neutral), emotion (happy, sad, anger, surprise,
anticipation, fear), and intensity level (low, medium, high), is derived from the input section.
Modifying the independent variables will result in changes in the performance of the tool.

7
Statement of the Problem

The researchers aim to develop a tool for sentiment analysis with emotion and
intensity level recognition in cryptocurrency-related tweets considering the combination of
keywords, ending punctuation marks, and emoticons.

Specifically, this study is intended to answer the following sub-problems:

1. What is the performance of the tool in cryptocurrency-related tweets considering


the combination keywords, ending punctuation marks, and emoticons in terms of:

a. Precision

b. Recall

c. F-Measure

2. What is the performance of the tool in cryptocurrency-related tweets using Plain-


text only in terms of:

a. Precision

b. Recall

c. F-Measure

3. Is there a significant difference in sentiment analysis performance between using


a combination of keywords, ending punctuation marks, and emojis compared to
using plain text only for analyzing sentiments in cryptocurrency-related tweets?

Hypothesis

Ho: There is no significant difference in sentiment analysis performance between


using a combination of keywords, ending punctuation marks, and emojis compared to
using plain text only for analyzing sentiments in cryptocurrency-related tweets.

8
Scope and Limitations of the Study

This study specifically focuses on conducting sentiment analysis on X (formerly


known as Twitter) posts related to Cryptocurrency, and the sample set was confined to
tweets provided in English-language tweets only. The researchers utilized a sample of
tweets using the hashtag #crypto and #cryptocurrency. For this study, tweets were filtered
via a given hashtag (#) to avoid the incorrect collection of data sets. The measure
assessed a tweet's polarity (positive or negative), The neutral label is also excluded
because it is considered to be semantically less informative than the positive and negative
labels, six fundamental emotions (happy, sad, angry, surprise, anticipation, fear), three
ending punctuation marks which were period (.), question mark (?), and exclamation point
(!), and intensity level (low, medium, high). The classification of emotion will fall under
Happiness, Sadness, Surprise, Anger, Anticipation, and Fear. The study considered
including emojis in the analysis of the tweets. The input can only allow 1 sentence up to
280 characters. Here is the list of the emojis/emoticons included:

Table 1

Emojis/Emoticons included in the tool

Emoji Name Emoji Name


🌈 rainbow 💓 beating heart
🌙 crescent moon 💔 broken heart
🌚 new moon face 💕 two hearts
🌞 sun with face 💖 sparkling heart
🌟 glowing star 💗 growing heart
🌷 tulip 💘 heart with arrow
🌸 cherry blossom 💙 blue heart
🌹 rose 💚 green heart
🌺 hibiscus 💛 yellow heart
🍀 four leaf clover 💜 purple heart
🍃 leaf fluttering in wind 💞 revolving hearts
🍕 pizza 💤 zzz
🍻 clinking beer mugs 💥 collision
🎀 ribbon 💦 sweat droplets
🎈 balloon 💩 pile of poo
🎉 party popper 💪 flexed biceps
🎤 microphone 💫 dizzy
🎥 movie camera 💭 thought balloon
🎧 headphone 💯 hundred points

9
🎵 musical note 💰 money bag
🎶 musical notes 📷 camera
👀 eyes 🔞 no one under eighteen
👅 tongue 🔥 fire
👇 backhand index pointing down 🔫 pistol
👈 backhand index pointing left 🔴 red circle
👉 backhand index pointing right 😀 grinning face
👊 oncoming fist 😁 beaming face with smiling eyes
👋 waving hand 😂 face with tears of joy
👌 OK hand 😃 grinning face with big eyes
👍 thumbs up 😄 grinning face with smiling eyes
👎 thumbs down 😅 grinning face with sweat
👏 clapping hands 😆 grinning squinting face
👑 crown 😇 smiling face with halo
👻 ghost 😈 smiling face with horns
💀 skull 😉 winking face
💁 person tipping hand 😊 smiling face with smiling eyes
💃 woman dancing 😋 face savoring food
💋 kiss mark 😌 relieved face
💎 gem stone 😍 smiling face with heart-eyes
💐 40bouquet 😎 smiling face with sunglasses
😐 neutral face 😏 smirking face
😑 expressionless face 🙅 person gesturing NO
😒 unamused face 🙆 person gesturing OK
😓 downcast face with sweat 🙈 see-no-evil monkey
😔 pensive face 🙊 speak-no-evil monkey
😕 confused face 🙋 person raising hand
😖 confounded face 🙌 raising hands
😘 face blowing a kiss 🙏 folded hands
😙 kissing face with smiling eyes ‼ double exclamation mark
😚 kissing face with closed eyes ↩ right arrow curving left
😛 face with tongue ↪ left arrow curving right
😜 winking face with tongue ▶ play button
😝 squinting face with tongue ◀ reverse button
😞 disappointed face ☀ sun
😟 worried face ☑ check box with check
😠 angry face ☝ index pointing up
😡 pouting face ☺ smiling face
😢 crying face ♥ heart suit
😣 persevering face ♻ recycling symbol
😤 face with steam from nose ⚡ high voltage
😥 sad but relieved face ⚽ soccer ball
😨 fearful face ✅ check mark button

10
😩 weary face ✈ airplane
😪 sleepy face ✊ raised fist
😫 tired face ✋ raised hand
😬 grimacing face ✌ victory hand
😭 loudly crying face ✔ check mark
😰 anxious face with sweat ✨ sparkles
😱 face screaming in fear ❄ snowflake
😳 flushed face ❌ cross mark
😴 sleeping face ❗ exclamation mark
😶 face without mouth ❤ red heart
😷 face with medical mask ➡ right arrow
😹 cat with tears of joy ⬅ left arrow
😻 smiling cat with heart-eyes ⭐ star
😲 Astonished Face 😮 Face with Open Mouth
😵 Dizzy Face 💭 Thought Balloon
❗ Exclamation Mark ⚡ High Voltage
🎊 Confetti Ball 🙁 Slightly Frowning Face
🔪 Hocho 🌕 Full Moon
🚀 Rocket 📉 Down Trend
🤣 Rolling on the Floor Laughing 💸 Money with Wings

This study has several limitations. Firstly, the sentiment analysis will only analyze
the main post with a total of 1500 tweets and will also analyze the replies in the comments
as a secondary source of data. The tool can also upload Excel and CSV files only.
Secondly, the study only considers posts using Cryptocurrency hashtags. Lastly, sarcasm
is not always assumed when reading writings that conclude with two or more punctuation
marks or are very emotional. Therefore, before the assessment procedure for emotion
recognition, this study will not address the detection of figurative languages, such as
sarcasm.

11
Significance of the Study

Sentiment analysis with emotion and intensity level recognition continues to


present difficulties. The study's key findings may be essential and advantageous in the
following ways, in particular:

STUDENTS. Examining the polarity and emotion of cryptocurrency-related tweets,


students can gain insights into the overall market sentiment. This information can help
understand the current trends, market behavior, and potential investment
opportunities.

RESEARCHERS. Improved research will help with specific issues like emotion
recognition in the field of natural language processing. Recommendations and further
research could be monitored by the said beneficiaries.

CRYPTOCURRENCY TRADERS. Cryptocurrency-related tweets offer a platform for


community discussions, debates, and information sharing. By analyzing the sentiment
and emotion of these tweets, traders can gain insights into the sentiments and opinions
of the crypto community, potentially uncovering valuable insights or contrarian
viewpoints.

COMPANIES/MARKET. Analyzing the sentiment and emotions expressed in


cryptocurrency tweets related to a company or its products/services, businesses can
gain an understanding of how their brand is perceived by the crypto community. This
feedback can help identify areas of strength, weakness, and potential areas for
improvement in their brand image.

SOCIAL MEDIA. The beneficiary mentioned above might also evaluate the study
based on their observations and evaluations. The outcome will be quite instructive.
People online are expected to look at the study's findings because the researchers
utilized social media as the study's domain.

12
Definition of Terms

The following is a list of the terms that are used in this research study:

Cryptocurrency - a digital or virtual form of currency used in trading, which is a


topic commonly seen on X (formerly known as Twitter).

Emoticon Converter - converting emoticons into text.

Emotion Recognition - the emotion of the tweets that identified whether happy,
sad, fearful, surprised, angry, or anticipation.

Ending Punctuation Marks - Period (.), Question mark(?), and Exclamation


point(!).

Hashtag – a symbol (#) used to indicate the subject of the tweet.

Intensity Level – a degree of sentiment telling if it is low, normal, or high.

Lemmatization - a process of reducing words to their root word.

Polarity – refers to understanding the subjective nature of text data by assigning


a sentiment label to each piece of content.

Remove Stopwords - a process that collects emotional words only and avoids
unnecessary words.

Sentiment Analysis - for classifying the polarity, emotion recognition, and


intensity level of the tweets.

Social Media - refers to online platforms and websites that allow users to create
or share information.

Tokenization - a process of breaking a stream of text up into words, phrases, and


symbols.

Tweet - a short user-generated message that contains reactions, thoughts, or


opinions in the form of text, image, or video.

X (formerly known as Twitter) - is a social media platform for people who like to
share or post anything in their minds.

13
CHAPTER 2
REVIEW OF LITERATURE AND STUDIES

Related Literature
Cryptocurrency is a type of digital money that is created through
cryptographic tactics using binary data. It lets people buy, sell, or trade it securely
without needing a government or bank. While you can use cryptocurrencies to buy
things, many people also use them as a way to invest money for a short or long
time. There are lots of different cryptocurrencies available, but the most well-known
and expensive one is Bitcoin, which currently costs more than $19,000.
Cryptocurrency is a growing industry worldwide that started about 13 years ago. It
has become popular and important, with more than 20,000 digital currencies being
used today. The most well-known cryptocurrencies are Bitcoin, Ethereum, and
Tether. Around 200,000 Bitcoin transactions happen every day as of November
2022. In 2023, there are about 45 million people in the United States and 420
million people globally who use cryptocurrency. About 16% of Americans have
used, invested in, or traded cryptocurrencies. The total value of blockchain
technology worldwide is currently $10.02 billion as of 2022 and is expected to
reach $67.4 billion by 2026, with an annual growth rate of 68.4%. (Ariella, 2023)

The Philippines has put a lot of focus on blockchain technology and its
potential uses. The country's central bank has noticed a significant increase in the
adoption of cryptocurrencies, especially during the Covid-19 pandemic. The long
period of isolation introduced the concept of digital tokens to the country's growing
middle class and tech-savvy millennials through popular blockchain games like
Axie Infinity. At one point, 40% of the game's players were from the Philippines.
Bitcoin trading volumes also reached new highs on certain crypto exchanges in
July this year. The number of cryptocurrency transactions grew by 362% compared
to the previous year, with a total value of around $1.82 billion. As a result, the
Philippines now ranks second in the Global Crypto Adoption Index, indicating high
individual interest in digital assets. In response, the country's central bank plans to
launch a digital currency project in late 2022 to improve payments and aid in
economic recovery. The central bank is also collaborating with solution providers
to monitor financial institutions using blockchain and cryptocurrency, showing

14
increasing acceptance of the technology in finance. However, there is still
uncertainty and differing opinions among regulators, legislators, and market
participants regarding the future of blockchain and crypto in the Philippines. Clear
consensus and understanding of the technology's impact are still needed.
(Navalan, 2022)

Recent research indicates that people generate a large amount of data,


more than 100 MB per minute, which includes their thoughts and feelings on
different topics. Some of this data consists of reviews, feedback, social media
posts, and blog posts where people express their opinions about cryptocurrency
markets. Interestingly, the sentiment expressed in these posts and comments can
have a connection with the movements in market prices. For instance, when Elon
Musk added the hashtag for Bitcoin to his X (formerly known as Twitter) bio, the
price of Bitcoin increased from $32,000 to $38,000 within a short span of time. This
suggests that analyzing the sentiment expressed in customers' or influential
individuals' tweets can provide valuable insights into the relationship between
cryptocurrency prices and public sentiment. Therefore, sentiment analysis can
serve as a useful tool for identifying opportune moments and locations for investing
in cryptocurrency. In the realm of cryptocurrency trading, the skill of interpreting
charts plays a pivotal role for traders in pinpointing lucrative market prospects.
Through the application of technical analysis, investors gain the ability to discern
prevailing market trends and forecast prospective price trajectories of various
assets. (Yilmaz, 2023)
Figure 6. Elon Musk’s tweets on cryptocurrency and how Bitcoin prices
change accordingly.

15
According to Yilmaz here are the steps on how to conduct cryptocurrency
sentiment analysis:

1. Collect data related to cryptocurrencies, such as investors’ reviews, texts


including public sentiment, tweets mentioning crypto, etc.
2. Gather historical price changes of various cryptocurrencies.
3. Clean the dataset to get rid of the unrelated items.
4. Label the content in the dataset based on emotional tone as either
negative, positive, or neutral, either manually or using automated tools.
5. Train your model with a labeled dataset
6. Evaluate the performance of your model.

Challenges of cryptocurrency sentiment analysis (Yilmaz, 2023)


● Models that are not trained in the terminology of the crypto market may
yield misleading results.
● Identifying bot accounts can be challenging, especially if the dataset is not
labeled manually.
● The number of tweets sent regarding cryptocurrency by bot accounts is
estimated to be almost 15%, distorting the sentiment analysis results.

How to overcome these challenges? (Yilmaz, 2023)


● Generate a holistic approach by combining polarity, emotion, and aspect-
based analysis.
● Train a domain-specific model that includes the terminology of the crypto
market.
● Detects bot accounts using neural networks and contextualized
representations of each text. A recent study shows that neural networks
achieve 82% accuracy in identifying bots.

In today's digital age, social media plays a significant role in people's lives,
and the content they share online holds valuable insights. Natural language
processing (NLP) techniques have been widely used to understand public
sentiment expressed in social media posts. Sentiment Analysis, a crucial aspect

16
of NLP, focuses on computationally analyzing opinions, emotions, attitudes, or
sentiments in written texts. Within this field, social media sentiment analysis
(SMSA) specifically aims to understand and represent sentiments expressed in
short social media posts. (Chen, 2023)

Emojis, those cute little in-text graphics, have become increasingly popular in
social media communication. They serve as graphical symbols that allow users to
express emotions and convey meanings in a concise and convenient way.
Statistics from Emojipedia, a well-known emoji reference site in 2021, show that
over one-fifth of tweets (21.54%) and more than half of Instagram comments
contain emojis. Despite their widespread use in online communication, emojis are
not widely embraced in the field of NLP and SMSA. During the data preprocessing
stage, emojis are often removed along with other unstructured elements like URLs,
stop words, unique characters, and images. While some researchers have recently
started exploring the potential of including emojis in SMSA, it is still a niche
approach that requires further investigation. This project aims to evaluate the
compatibility of popular BERT encoders with emojis and explore different methods
of incorporating emojis in SMSA to enhance accuracy. (Chen, 2023)

Table 2

List of Emoji/Emoticons that is used in Cryptocurrency World

Emoticon Coin Meaning

⚡ Bitcoin Lightning Network

🔑 Bitcoin Proof of Keys Movement (withdraw


from exchanges)

🦡 Bitcoin Honeybadger (doesn't care), Bitcoin


mascot

🌋⚒ Bitcoin Volcano Mining (in El Salvador) is


badass

17
🆖🆙 Bitcoin NGU, "Number Go Up technology"

☣ Bitcoin Toxic Bitcoin Maximalist

🍕 Bitcoin Bitcoin Pizza Day

🌈 Bitcoin Rainbow Chart

🐇🕳 Bitcoin Bitcoin Rabbit Hole

💊 Bitcoin Red-pilled ("The Matrix") for the


Bitcoin future

🦇🔊 Ethereum "Ultra Sound Money"

🔥🔥🔥 Ethereum Burning of transaction fees

🐬 Ethereum The "Flippening"

☀ Ethereum SOL Supporter

🔺 Avalanche AXAX Supporter

🦄 Uniswap UNI Supporter

🍣 SushiSwa SUSHI Supporter


p

⏳ IOTA IOTA Supporter, "Coordicide" is


coming (or not)

🌍 Terra Terra LUNA Supporter

18
⚛ Cosmos Cosmos ATOM Supporter

👻 Aave Aave Supporter, Aavegotchi, Aave


Ghost

🐶🚀🌕 Dogecoin DOGE to the moon

💩 Misc Shitcoin

💎🙌 Misc "Diamond Hands" = not selling

🚀 Misc To The Moon

🐋 Misc Whale, Rich Crypto Holder/Trader

👨🌾 Misc DeFi Yield Farming

👀 Misc Look, open your eyes! Probably


nothing

🐂 Misc Permabull

🐻 Misc Permabear

🥒 Misc Green Dildo (Chart going up)

19
Table 3

These are the emojis depicting common day-to-day expressions (Singh, 2022)

20
At the end of a sentence, there are three punctuation marks: the period (.), the
question mark (?), and the exclamation point (!). After a sentence, there is always a single
space, regardless of the punctuation used. Periods are used to show a neutral statement
and are the most common punctuation mark. They are used at the end of statements.
Question marks are used at the end of questions, whether they expect an answer or are
rhetorical. Some questions are polite requests. Exclamation points indicate strong

21
emotions or high volume and often mark the end of a sentence. They are sometimes
overused on the internet. (Quillbot, English Composition)

Out of the three punctuation marks commonly used to end sentences, only the
exclamation point is used to show strong emotions. When we write, we don't have the
advantage of using tone and other verbal cues to convey our emotions. Instead, we rely
on grammar, including punctuation, to establish the tone of our writing. Using an
exclamation point is a quick way to indicate that we are expressing strong emotion.
Exclamatory sentences always end with an exclamation point and are used to express
intense emotions, regardless of the specific emotion being conveyed. It's important to
consider the intensity of the emotion you want to express when deciding whether to use
an exclamation point. In formal writing, it's appropriate to use only one exclamation point,
while in informal writing, you can use a few more, but it's best not to overdo it. (Craiker,
2022).

Related Studies

Cryptocurrency
The evolution of the cryptocurrency market in the past decade has been
nothing short of meteoric, with its user base exploding from a modest 5 million in
2016 to a staggering 300+ million by the close of 2021. This Trend of rampant
growth has not been exclusive to the cryptocurrency market, with the NFT
marketplace experiencing a similar expansion, from 670,000 users in 2020 to over
44 million in 2022. However, with such growth also comes an increase in market
volatility and risk. This volatility stems in part from the inherent nature of
cryptocurrency markets, which lack a central governing authority. Instead, prices
are highly susceptible to various external factors, including public sentiment,
natural disasters, global news, and international crises. (Begüm Yılmaz, 2023)

In the study of Naila Aslam, Furqan Rustam, Ernesto Lee, Patrick Bernard,
and Washington Imran Ashraf (2022) states that Cryptocurrency is an alternative
medium of exchange consisting of numerous decentralized crypto coin types. The
essence of each crypto coin is in its cryptographic foundation. Secure peer-to-peer

22
transactions are enabled through cryptography in this secure and decentralized
exchange network. Since its inception in 2009, Bitcoin has become a digital
commodity of interest as some believe the crypto coins' worth is comparable to
that of traditional fiat currency.

In light of these observations, a burgeoning field of research has emerged,


focusing on the correlation between public sentiment and cryptocurrency market
movements. Current studies suggest that data from consumers' reviews,
feedback, social media posts, or blog posts can provide valuable insights into
market trends. A notable example of this correlation was witnessed when Elon
Musk's X (formerly known as Twitter) biography update featuring a Bitcoin hashtag
led to a swift price surge from 32,000 to 38,000 within hours (Begüm Yılmaz, 2023)

Considering the exchange rates of cryptocurrencies are notorious for being


volatile, our team strives to develop an effective trading strategy that can be
applied to a variety of cryptocurrencies. This method for determining the optimal
time to trade involves correlating prices with one of today's most popular social
media sources, X (formerly known as Twitter). The advantages of using X (formerly
known as Twitter) include having access to some of the earliest and fastest news
updates in a concise format as well as being able to extract data from this social
media platform with relative ease.

X (formerly known as Twitter)


The proliferation of social media platforms, particularly X (formerly known
as Twitter), has had a significant impact on cryptocurrency trading. Many traders
leverage tweets to inform their daily trading strategies, emphasizing the growing
importance of sentiment analysis in the realm of cryptocurrencies. The
digitalization of banking and the rise of cryptocurrencies have been remarkable.
With the advent of blockchain technology, the number of cryptocurrencies,
including Bitcoin (BTC) and Ethereum (ETH), has proliferated (Emre Sasmaz and
F. Boray Tek, 2021). As such, the valuation of these digital currencies has become
an area of significant interest and study.

23
Furthermore, the sentiment reflected in the X (formerly known as Twitter)
community about altcoins and their prices has been established to have a certain
correlation. To measure this daily X (formerly known as Twitter) sentiment, an
aggregation of sentiment across numerous tweets for a specific altcoin is
necessary. Sentiment Analysis, a sub-field of computational Natural Language
Processing, aims to discern positive, negative, and neutral opinions in a text.
However, analyzing tweets presents unique challenges due to their irregular
grammar, high emoticon usage, and frequent sarcasm (Emre Sasmaz and F.
Boray Tek, 2021). The rapid advancement in the field of sentiment analysis on
cryptocurrency-related tweets, with a particular focus on the role of emoticons,
underlines the need for more nuanced and advanced models. This literature review
aims to provide a foundation for such advancements.

According to Quintet (2023), Elon Musk introduced temporary restrictions


on the number of tweets users can view per day on Twitter in response to high
levels of data scraping and manipulation of the platform. Initially, these limits were
set at 6,000 posts for verified accounts and 600 posts for unverified accounts per
day, with stricter limits for new unverified accounts. Musk later increased these
limits to 10,000, 1,000, and 500 posts per day, respectively, for verified, unverified,
and new unverified accounts. The decision was made to combat organizations that
aggressively scrape data, affecting the user experience on Twitter.

Emoji (Emoticons)
In the study of P. S. Dandannavar, S. R. Mangalwede, and S. B.
Deshpande (2019). The rapid expansion of the World Wide Web over the past
years has led to a corresponding surge in user-generated content across various
social media platforms, web forums, and blogs. Sites like X (formerly known as
Twitter) and Facebook have emerged as significant hubs of online communication,
with millions of users sharing their sentiments, opinions, and experiences daily.
This treasure trove of data offers rich insights into public sentiment on a myriad of
topics, from products and services to socio-political issues.

Recognizing this, researchers have increasingly turned to social media


posts, SMS messages, and other informational texts for Sentiment Analysis,

24
aiming to glean valuable sentiment-based insights from these unstructured data
sources. A key feature of this user-generated content is the pervasive use of
emoticons, particularly among younger users, to express sentiments that might be
challenging to convey through text alone (P. S. Dandannavar, S. R. Mangalwede
& S. B. Deshpande, 2019).

Notwithstanding their widespread use, many current SA systems have


largely overlooked the potential significance of emoticons, often leaving them out
of their analytical models. This oversight fails to acknowledge that emoticons can
serve as powerful indicators of sentiment and can offer valuable cues for sentiment
analysis (P. S. Dandannavar, S. R. Mangalwede & S. B. Deshpande, 2019). Given
the growing usage of emoticons and their potential value for sentiment analysis,
the current study seeks to investigate the reliability of emoticons as cues in
Sentiment Analysis. Specifically, the study will compare Sentiment Analysis
conducted on tweets with emoticons against those without emoticons to discern
their influence. In the context of cryptocurrency-related tweets, considering
emoticons in Sentiment Analysis could provide more nuanced and accurate
sentiment analyses. This could further enhance the predictive power of Sentiment
Analysis for cryptocurrency market trends, contributing to the emerging field of
sentiment analysis on cryptocurrency-related tweets.

EmoTag1200
EmoTag 1200 is a natural language processing (NLP) tool specifically
designed for emotion analysis. It is built upon a comprehensive dataset of 1,200
emotion tags, which cover a wide range of emotional states and expressions. This
powerful tool employs advanced machine learning algorithms and deep neural
networks to accurately identify and classify emotions in text-based content.

The primary purpose of EmoTag 1200 is to assist in understanding and


interpreting human emotions conveyed through written language. By analyzing
text inputs, such as social media posts, customer reviews, or any textual data,
EmoTag 1200 provides valuable insights into the emotional tone and sentiment
embedded within the text. This allows businesses, researchers, and individuals to

25
gain a deeper understanding of how people feel and react to specific topics,
products, or experiences.

Overall, EmoTag 1200 is an NLP tool that accurately analyzes and


classifies emotions in text-based content. Its applications span across industries,
from business and marketing to research and mental health. By unlocking the
emotional insights contained within textual data, EmoTag 1200 empowers
organizations and individuals to make data-driven decisions and gain a deeper
understanding of human emotions in the digital age.

Emotion Recognition
The study by Nourah & Mohamed (2020) gives an in-depth review of the
most recent state-of-the-art approaches and strategies for emotion recognition in
textual data. Emotion recognition is important in many applications, including
sentiment analysis, customer feedback analysis, mental health monitoring, and
human-computer interaction. The survey further delves into the emergence of
deep learning techniques, such as Convolutional Neural Networks (CNNs),
Recurrent Neural Networks (RNNs), and Transformers, for emotion recognition in
text. These approaches leverage the ability of deep neural networks to
automatically learn and extract relevant features from raw text data. The authors
highlight the advantages and limitations of each deep learning method and present
various architectures proposed in the literature. It shows an emotion-detecting
system by keyword-spotting technique. The challenge of locating occurrences of
keywords from a given set as substrings in each string are known as the keyword
pattern matching problem. This topic has already been investigated, and strategies
for tackling it have been proposed. This approach is based on specified keywords
in the context of emotion detection. These words are classed as disgusted, sad,
glad, furious, afraid, startled, and so on.

26
Figure 7 Main steps of a keyword-spotting technique

Furthermore, the survey includes standard machine learning algorithms as


well as more modern deep learning methods for emotion perception in text.
Feature engineering, in which handmade characteristics like lexical, syntactic, and
semantic aspects are employed to express emotions in text, is one of the traditional
approaches addressed. Furthermore, the authors investigate machine learning
methods such as Support Vector Machines (SVM), Naive Bayes, and Random
Forests, which are extensively employed for emotion categorization.
In addition to discussing individual techniques, the researchers examine
publicly available emotion datasets that have been widely used for training and
assessing emotion detection algorithms. They examine the features and limits of
various datasets, highlighting the importance of varied and balanced data to
provide strong and impartial emotion identification algorithms. The work solves
various issues and offers up new research avenues in the field of emotion
identification in text, such as cross-lingual emotion recognition. Emotion analysis,
multimodal emotion recognition, and addressing the subjectivity and contextuality
of emotions.
Bharti et al. (2022) conducted a study titled "Text-Based Emotion
Recognition Using Deep Learning Approach," which proposed a system that uses
Deep Learning-GRU to evaluate models into emotion recognition. The
researchers' data is taken from three different datasets: ISEAR, WASSA, and
Emotion-stimulus, which have text and emotions as the attributes. These datasets
consist of three different types of text: normal sentences, tweets, and dialogs. It
shows the proposed scheme that used a hybrid (machine learning + deep learning)

27
model to identify emotions in text. Convolutional neural networks (CNN) and Bi-
GRU were exploited as deep learning techniques.

Figure 8. Text-based Emotion Recognition Using a Deep Learning Approach

In this study, all three data sets, text sentences, dialogs, and tweets, are
integrated. Over 14500 text sentences are included in the merged dataset. Every
text phrase is identified with six types of emotions (according to its syntactic and
semantic polarities): pleasure, disgust, fear, surprise, rage, and sorrow (Chen et
al., 2019). The content is in English and includes some punctuation and emojis.
The collection solely comprises text phrases and their associated emotions. Each
dataset is split into two categories of data: training and testing, with an 80:20 split.
The researchers performed many experiments using various methods to
get the best accuracy for their proposed model. Emotion classification with a
machine learning approach, a deep learning approach, and our hybrid model
approach on the multi text dataset consisting of sentences, tweets, and dialogs.
Three datasets are used for performing these experiments. According to the ML
classifier, SVM gives the highest accuracy of 78.97%. In the DL method, the Bi-
GRU model achieves the highest accuracy of 79.46%, and the CNN model
achieves the highest F1-score of 80.76. The hybrid model has achieved a precision
of 82.39, a recall of 80.40, an F1 score of 81.27, and an accuracy of 80.11%.

28
Intensity Level Recognition
Das, & Bandyopadhyay. (2010) In this study, we explore three levels of
intensity: low, medium, and high, in the context of emotional expression in
sentences. We focus on two categories of intensifiers - positive and negative - to
assign these intensity levels. These intensifiers can be part of the emotional
expression itself. We analyze the parts of speech (POS) surrounding the emotion-
laden word in a sentence, specifically looking at adjectives (JJ) and adverbs (RB),
as they are likely candidates for intensifiers. To determine an intensifier's polarity,
we consult the SentiWordNet database. Here, each potential intensifier is checked
for its presence in SentiWordNet, and if found, its positive and negative scores are
retrieved. The intensifier is then categorized as positive or negative based on
whichever score is higher on average. Additionally, we have compiled a list of
commonly used negative words and consider words involved in negative [negation
modifier] dependency relations as negative words too. The methodology involves
applying specific rules (outlined in Table 4) to understand how various intensifiers
and negations contribute to the assignment of post-emotion tags and intensity
levels in sentences. These rules help in systematically determining the role and
impact of these linguistic elements in conveying emotional intensity.
Table 4
Rules for tagging an emotional anchoring vector with intensity

Keyword Spotting Method


The study on Aspect Based Emotion Analysis of Online User-Generated
Reviews of Padme & Kulkarni (2018), utilized the keyword spotting method as part
of its analysis. This method involved identifying specific keywords or phrases within
the reviews that were indicative of aspects or topics being discussed. By focusing
on these keywords, the researchers were able to extract valuable information

29
about the emotions associated with different aspects of the reviewed products or
services.
First, the researchers compiled a list of relevant keywords related to the
specific domain or industry they were studying. These keywords represented
different aspects or features of the products or services being reviewed. For
example, in a study analyzing restaurant reviews, keywords might include "food
quality," "service," "ambience," and "price." Once the keyword list was established,
the researchers applied the keyword spotting technique to the online user-
generated reviews. They scanned each review and looked for the presence of
these predefined keywords. Whenever a keyword was identified within a review, it
signaled the presence of a particular aspect being discussed.
After identifying the aspects, the researchers then focused on the
surrounding context of the keywords to extract the emotional sentiments
associated with each aspect. This involved analyzing the text surrounding the
keywords, including adjectives, adverbs, and other sentiment-bearing words, to
determine the emotional tone expressed by the reviewer. Positive sentiments
might indicate satisfaction or enjoyment, while negative sentiments could imply
dissatisfaction or disappointment.
By employing the keyword spotting method, the study aimed to provide a
fine-grained analysis of emotions expressed towards different aspects of the
reviewed products or services. This approach allowed the researchers to gain
insights into which aspects were most positively or negatively perceived by users,
helping businesses and decision-makers understand the strengths and
weaknesses of their offerings from a customer's emotional perspective.

LSTM-SVM Approach for Sentiment Analysis


Cimino and Dell’Orletta (2023) proposed a study about the Tandem LSTM-
SVM approach for sentiment analysis that combines two popular machine learning
algorithms, namely Long Short-Term Memory (LSTM) and Support Vector
Machines (SVM), to improve the accuracy of sentiment analysis task. The study
proposes a novel approach where LSTM and SVM are used in a tandem fashion
to leverage their complementary strengths. The process begins with training an
LSTM model on a large corpus of labeled text data to learn the underlying patterns
and sentiment information. The LSTM model extracts meaningful features from the

30
text and generates a fixed-length representation, often referred to as an
embedding.
The generated embeddings from the LSTM model are then used as input
to an SVM classifier. The SVM classifier is trained on these embeddings to predict
the sentiment of new, unseen text instances. The SVM takes advantage of its
ability to handle high-dimensional feature spaces and find an optimal hyperplane
that separates different sentiment classes.
By combining the strengths of LSTM in capturing contextual information
and SVM in classification, the Tandem LSTM-SVM approach aims to improve the
accuracy of sentiment analysis. This approach takes advantage of LSTM's ability
to model complex dependencies in the text data while leveraging SVM's robust
classification capabilities. The study likely evaluates the performance of the
Tandem LSTM-SVM approach on benchmark sentiment analysis datasets,
comparing it against other existing approaches. It would measure accuracy,
precision, recall, and F1-score to assess the effectiveness of the proposed
approach.
Overall, the Tandem LSTM-SVM approach for sentiment analysis attempts
to enhance the accuracy of sentiment classification tasks by combining the
strengths of LSTM and SVM algorithms, resulting in a more robust and accurate
sentiment analysis model.
Neutrality in Sentiment Analysis
Valdivia, A., Luzón, M. V., Wang, Z., & Herrera, F. (2018, November 1). In
recent times, there has been a surge in interest in sentiment analysis, leading to
the development of numerous algorithms designed to categorize text based on the
expressed sentiment, typically categorized as positive, neutral, or negative. Often,
neutral sentiments are overlooked in many sentiments analysis approaches due
to their vague nature and minimal informational content. This paper introduces a
strategy to enhance the significance of neutral sentiments by defining the
distinction between positive and negative opinions, aiming to boost the efficiency
of the model. We implement various sentiment analysis techniques on diverse
datasets to extract sentiment values and identify neutral sentiments through a
consensus approach, essentially filtering them out using a weighted aggregation
of different models. We then assess the efficacy of both individual and combined
models in classification tasks. The findings clearly indicate that combined methods

31
generally surpass individual models in effectiveness, leading to the conclusion that
recognizing neutrality is crucial in differentiating between positive and negative
sentiments and consequently in enhancing the accuracy of sentiment
classification.

Recognition of Emotion from Microblogs (REM)


According to Islam et al. (2021), the overview of the process involved in the
REM algorithm are:
Data Collection: The algorithm collects microblog posts or tweets
from various sources that contain both emoticons and text. These posts
are used as the training data for the algorithm.

Pre-processing: The text data is pre-processed to remove any


irrelevant information, such as URLs, hashtags, or user mentions. It may
also involve normalizing the text by converting it to lowercase, removing
punctuation, and applying other techniques to standardize the input.

Feature Extraction: The algorithm extracts relevant features from


both the text and the emoticons. Textual features may include word
frequency, sentiment scores, or linguistic patterns. Emoticon features
involve analyzing the emoticons' visual characteristics, such as the shape,
orientation, or combination of symbols.

Training: The REM algorithm uses a machine learning model to


train on the pre-processed data. The model learns to recognize patterns
and relationships between the extracted features and the corresponding
emotions. Various machine learning techniques can be applied, such as
support vector machines (SVM), neural networks, or random forests.

Emotion Classification: Once the model is trained, it can classify


new microblog posts by predicting the emotions associated with the text
and emoticons. The algorithm takes the extracted features from the input
and passes them through the trained model. The model then assigns

32
probabilities or scores to each emotion category (e.g., happy, sad, angry,
etc.).

Post-processing: The algorithm may apply post-processing


techniques to refine the emotion classification results. For example, it can
incorporate contextual information by considering adjacent posts or the
overall sentiment of the author. This step aims to improve the accuracy and
coherence of emotion recognition.

Evaluation and Fine-tuning: The algorithm's performance is


evaluated by comparing its predictions to manually labeled data or by using
other evaluation metrics. If necessary, the algorithm can be fine-tuned by
adjusting the model's parameters or incorporating additional training data
to improve its accuracy and generalization capabilities.

The Recognition of Emotion from Emoticon with Text in Microblog (REM)


algorithm analyzes microblog posts containing emoticons and text to automatically
recognize the expressed emotions. It collects data from microblogging platforms,
pre-processes the text, extracts feature from both the text and emoticons, and
trains a machine-learning model. The trained model is then used to classify
emotions in new posts by predicting the emotion associated with the text and
emoticons. Post-processing techniques may be applied to refine the results, and
the algorithm's performance is evaluated and fine-tuned as needed. Overall, REM
combines textual and visual cues to provide efficient and automated emotion
recognition in microblogs.

Deep Learning
Boquiren, Garcia, Hungria, and De Goma (2022) applied Deep Learning to
classify the effects of backward slang on two deep learning models, namely Long-
Short Term Memory (LSTM ) and Bidirectional LSTM (Bi-LSTM), in the context of
Tagalog Sentiment Analysis. The study was motivated by the increasing popularity
of backward slang among Tagalog tweets and the need for effective methods to
determine general sentiments correlated to a topic in the context of the growing
number of internet users in the Philippines. The study emphasizes the importance

33
of deep learning techniques in sentiment analysis, especially when dealing with
complex languages such as Tagalog.
Deep learning is a type of machine learning approach that uses a multilayer
neural network to automatically learn and extract features from data rather than
relying on manual feature extraction. Deep learning models also measure
hyperparameters automatically, which can result in better accuracy and
performance. Deep learning techniques are currently the best solutions for
problems in image and speech recognition, as well as natural language
processing. (Dang, Garcia, De la Prieta, 2020)

Sentiment Analysis
Sentiment analysis appears to be a promising tool for predicting market
behaviors and guiding investment decisions. Specifically, analysis of tweets from
customers or thought leaders can illuminate the relationship between public
sentiment and cryptocurrency prices. This sentiment analysis, particularly when
coupled with Recognition of emotion intensity levels and consideration of
emoticons, can provide a deeper understanding, and potentially offer a predictive
edge in this highly volatile market. However, as this field is still in its infancy, it also
faces numerous challenges that warrant further investigation. (Begüm Yılmaz,
2023) In the study by Abdullah A. et al. (2019), Sentiment analysis is a method for
tool to evacuate people's opinions or group assessments, such as those
expressed by customers in communication with customer support or followers of a
brand. A lot of existing sentiment analysis methods in the market can completely
handle large numbers of data with greater accuracy. The goal of sentiment
analysis is to categorize whether the expressed sentiment is positive, negative, or
neutral. Their sentiment analysis, also known as opinion mining, involves using
natural language processing, text mining, computational linguistics, and biometrics
to identify, extract, evaluate, and analyze emotional states and subjective
information. The sentiment analysis aims to detect the polarity of text documents
or short sentences and classify them as positive, negative, or neutral.

Sentiment Analysis using X (formerly known as Twitter) Data


According to Qi & Shabrina(2022), the Trend of using X (formerly known
as Twitter) data for sentiment analysis has become increasingly popular. As social

34
media analysis gains more attention, there is a growing interest in Natural
Language Processing (NLP) and Artificial Intelligence (AI) technologies related to
text analysis. The study used X (formerly known as Twitter) data to perform
sentiment analysis on the topic of Covid-19 in England. According to Qi & Shabrina
(2022), X (formerly known as Twitter) is a social media platform where users share
their thoughts and opinions using short posts called tweets, which can include text,
pictures, and videos. Users can interact with tweets using likes, comments, and
reposts buttons. X (formerly known as Twitter) has more than 206 million daily
active users, and analyzing information available on the platform can provide
insights into changes in people's perceptions, actions, and behavior. The
researchers collected tweets from three major cities in England and divided them
into three stages: the early stage, the middle stage, and the late stage. They used
two different approaches to analyze the sentiment of the tweets: lexicon-based
approaches and supervised machine-learning approaches. The results showed
that the public sentiment towards COVID-19 changed over time. In the early stage,
the public sentiment was mostly positive. In the middle stage, the public sentiment
became more negative. In the late stage, the public sentiment became more
positive again. According to Qi & Shabrina (2022), the increase in confirmed cases
and the decrease in vaccination volume might be the reason for the increase in
negative sentiments. The supervised machine learning approaches performed
better than the lexicon-based approaches.

Sentiment Analysis in Cryptocurrency


The study of Dwivedi D. and Vemareddy A. (2023) states the impact of
market sentiment on investment decisions, particularly in the volatile
cryptocurrency market, has been a significant focus in recent financial literature.
As markets are heavily influenced by psychology, sentiment analysis presents a
promising avenue for forming investment strategies and identifying potential
opportunities. Sentiment analysis of cryptocurrencies such as Bitcoin offers broad
market insights that can be leveraged to inform trading strategies.
In the cryptocurrency market, sentiment acts as a critical tool for traders,
encapsulating the public's opinions, attitudes, moods, and perspectives. A
recurring theme in the literature is the use of topic extraction to identify the most
common subjects within a wide range of sentiments. This process uncovers

35
keywords within sentiment data that capture the recurring themes or topics in the
text. Such approaches are often utilized to quickly and efficiently understand broad
themes within public sentiment.
A recent study applied the principle of Latent Semantic Analysis and
Singular Value Decomposition to X (formerly known as Twitter) data pre and post-
COVID, grouping key themes related to Bitcoin and Cryptocurrency sentiment
Dwivedi, D. and Vemareddy, A. (2023). This analysis yielded valuable insights into
how public sentiment towards Cryptocurrency evolved in response to the
pandemic, highlighting key themes in negative sentiments related to crypto trading.
This study contributes to the literature on text mining by providing a contextual
framework for analyzing the public's sentiment toward Bitcoin and other
cryptocurrencies before and after COVID-19. Such understanding can illuminate
key public concerns, which can then be shared with a broader community for
further exploration and action. Through the lens of sentiment analysis, we can gain
a deeper understanding of the complex dynamics that drive the cryptocurrency
market, informing smarter investment decisions and fostering a more
comprehensive understanding of this burgeoning financial landscape.
The study of Yilmaz, B. (2023) focuses on sentiment analysis of
Cryptocurrency in 2023, its status, and challenges. The cryptocurrency market has
grown exponentially recently, from 5 million owners in 2016 to 300+ million in 2021.
The researcher sees a similar trend in the NFT marketplace as the number of users
was 670,000 in 2020 and increased to 44+ million in 2022. However, investing in
Cryptocurrency can be risky as there can be extreme fluctuations in the market.
For instance, in 2022, while Bitcoin lost more than 60% of its value, Dogecoin lost
55% of it, putting investors in a difficult position.
The challenges of cryptocurrency sentiment analysis are that there are
times when models are not trained in the terminology of the crypto market, may
yield misleading results, identifying bot accounts can be challenging, especially if
the dataset is not labeled manually, and the number of tweets sent regarding
Cryptocurrency by bot accounts is estimated to be almost 15%, distorting the
sentiment analysis results. But to overcome these challenges, you must generate
a holistic approach by combining polarity, emotion, and aspect-based analysis,
train a domain-specific model that includes the terminology of the crypto market,

36
and detect bot accounts using neural networks and contextualized representations
of each text.
A recent study shows that neural networks achieve 82% accuracy in
identifying bots. In the study of Tudor-Mirce and DULĂU Mircea (2019), the
frequency of cryptocurrency-related news and social media posts is increasing
rapidly, and there is a link between media attention and cryptocurrency prices.
Sentiment analysis of publicly accessible web media may help forecast
cryptocurrency prices. Bitcoin is a virtual currency created for payments where the
sender and recipient cannot be identified and has a high volatility rate. The study
used tweets' sentiment and crypto's daily price data to predict the movement of
Bitcoin's price. FinBERT was used for sentiment analysis, leading to higher
accuracy. The mean absolute percentage error (MAPE) was 9.45% for sentiment
prediction using FinBERT and 3.6% for price prediction using GRU. The future
work will involve using sentiments from multiple media to predict Bitcoin's price.

Sentiment Analysis with Emoticons


The study by Ullah et al. (2020) discusses sentiment analysis on social
media that focuses on emoticons and text that analyzes the public sentiment of the
users towards a particular topic. Unlike most sentiment analyses, the researchers
created an emoticon language and examined sentiment using both text and
emoticons. They analyzed the data using machine learning and deep learning
techniques such as TF-IDF, bag-of-words, and n-gram approaches. The
publication discusses their technique and outcomes in-depth, as well as a
comparison to existing systems. The main contribution of this study is their
algorithm that can analyze sentiments of social media data that can both include
emoticons and text, such as airline data that is collected from X (formerly known
as Twitter). The study shows the importance of considering the emoticon in
sentiment analysis that uses different methods such as Machine Learning and
Deep Learning algorithms. The proposed system applied several models to
analyze the collected data (emoticon and text) to analyze the sentiments. The
research concluded that their algorithm outperformed existing research and
algorithms such as Deep Learning and Machine Learning Algorithms. In the future,
the researchers state that the study could be extended to the field of multilingual
data.

37
Sentiment Analysis with Emotion and Intensity level recognition
The study of Navarro & Victore (2019) focuses on Sentiment Analysis with
emotion and intensity level recognition that considered ending punctuation marks.
Although this topic is not new, most computer scientists have solved this issue by
using natural language processing techniques. According to Burton (2016), there
are several ways to sentiment analysis that employ various characteristics. The
usage of sentiment analysis is used to identify the polarity of text, evaluate
sarcasm, irony, and figurative language, and detect emotions. The most frequent
data sets for sentiment analysis include reviews, comments, ratings, and feedback.
It determines whether a phrase or language is good, negative, or neutral. Contests
on the topic focus on judging complicated statements with uncertain contexts, such
as sarcasm, irony, and figurative language.

While others focused on polarity, progress was made on the subject,


delving into deeper themes such as emotion recognition, in which the emotion of
the text is recognized, whether it be anger, disgust, fear, happiness, sadness,
positive surprise or negative surprise. The study has a total of 600 tweets gathered
and tested. The computed Precision, Recall, and F-Measure were compared with
each of the evaluation measures calculated based on the evaluation of the three
experts. Also, the researchers showed the difference in results between
considering and disregarding punctuation marks.

The researchers were able to show the difference between the approach
of considering and disregarding punctuation marks in Sentiment Analysis. Based
on the results of the evaluation, their system was consistent when the classification
considered the ending punctuation marks. The F-Measures of 80.27% and 69.82%
for considering and disregarding the symbols, respectively, show that there's a
difference between the two classifications. Also, after obtaining a p-value that is
less than the alpha level used, it is confirmed that EMOSIS has a significant
difference in performance compared with existing systems.

38
Sentiment Analysis using LSTM-GRU Ensembled
In the recent study of Naila Aslam et al. (2022), they state that
understanding public sentiment towards cryptocurrencies is crucial, given the
impact of public opinion on the market dynamics of these digital assets. This
perspective is supported by a study that conducted sentiment analysis and
emotion detection on tweets related to Cryptocurrency, a common method used to
predict cryptocurrency market prices. A key development in this field is the use of
advanced machine learning and deep learning approaches, including the LSTM-
GRU ensemble model. This model integrates the capabilities of two recurrent
neural networks, Long Short-Term Memory (LSTM) and Gated Recurrent Unit
(GRU). The GRU is trained on features extracted by the LSTM, thereby enhancing
the accuracy of the analysis. Various feature extraction methods, such as term
frequency-inverse document frequency, word2vec, and Bag of Words (BoW), have
been explored to improve the performance of these models. The study found that
machine learning models performed better when using BoW features.

In terms of emotion analysis, tools such as TextBlob and Text2Emotion


were employed. These tools revealed that happiness was the most expressed
emotion towards cryptocurrency use, followed by fear and surprise. The LSTM-
GRU ensemble model demonstrated superior performance compared to other
machine learning and deep learning models, achieving an accuracy of 0.99 for
sentiment analysis and 0.92 for emotion prediction. These findings indicate a
promising future for sentiment analysis in the cryptocurrency industry, especially
with the use of advanced deep-learning techniques. However, further research is
required to enhance the understanding of public sentiment and its impact on
cryptocurrency markets. (Naila Aslam et al. 2022)

Sentiment analysis using Recognition of emotion from microblogs (REM)


The study of J. Islam et. Al. (2020) discusses the Recognition of Emotion
from Microblogs (REM) by utilizing sentiment analysis. Microblogs, such as X
(formerly known as Twitter), are popular platforms for sharing opinions and
expressing emotions. Emoticons, which are graphical emotional icons, are widely
used in microblogs alongside texts. The proposed REM method in this study aims
to preserve the semantic relationship between texts and emoticons. Emoticons are

39
translated into relevant emotional words, and a Long-Short Term Memory (LSTM)
model is employed for emotion classification. LSTM is a type of recurrent neural
network that can effectively capture sequential or time-series information.

The study verifies the proposed REM method using X (formerly known as
Twitter) data and compares its recognition performances with existing methods
that only consider text expressions without emoticons. The results show that the
emoticon-based REM method achieves higher recognition accuracy, highlighting
its potential for applications in microblogs. By incorporating emoticons into the
analysis, the proposed REM method improves the understanding of emotions
expressed in microblog posts and enhances the accuracy of emotion classification.

40
Synthesis of the Study

Social media has a big impact on different parts of everyone lives. It provides a lot
of information but understanding it can be difficult. One way to understand it better is by
using special tools. One tool is called sentiment analysis, which helps figure out the
emotions expressed in social media posts. The study "Sentiment Analysis on
Cryptocurrency-Related Tweets with Emotion and Intensity Level Recognition" looks at
something that previous sentiment analyses have often ignored. It focuses on the
importance of keywords, emoticons, and punctuation marks. Emoticons are like facial
expressions in text form, and they are important for showing emotions on social media.
On the other hand, the Punctuation marks help to analyze the intensity level of the
emotion.

The study specifically focuses on cryptocurrency-related tweets, a domain where


the combination of keywords, emoticons, and ending punctuation marks often intensify the
conveyed sentiment. In this context, emoticons and punctuation marks are not merely
decorative elements but instrumental in expressing the intensity of emotion. The research
suggests a new method of sentiment analysis that takes the combination of keywords,
emoticons, and ending punctuation marks into account in Recognition of the inherent
sentiment they convey. This approach not only identifies the sentiment of the tweet but
also measures the degree of emotion by considering the keywords, emoticons, and
punctuation marks used. The study concludes that including the combination of keywords,
ending punctuation marks, and emoticons in sentiment analysis makes it easier to
understand emotions and their intensity. This new approach is important for improving
sentiment analysis tools when dealing with the unique language of social media.

41
Chapter 3
METHODOLOGY

The objective of the study is to create a tool for sentiment analysis with emotion
and intensity level recognition in Cryptocurrency-related tweets considering the
combination of keywords, ending punctuation marks, and emojis. This Chapter presents
the Research Design, Sources of Data, Development Process, Research Instrument, Data
Gathering Procedure, and Statistical Data Analysis.

Research Design

The researchers used experimental research as the study’s research design,


wherein the researchers investigated whether the combination of emojis, keywords, and
ending punctuation marks will have a significant effect in recognizing polarity, emotion,
and intensity level. This means the procedure provides a systematic and controlled way
to investigate the effects of the combination of emojis, keywords, and ending punctuation
marks on the Recognition of polarity, emotion, and intensity level. Where it also
manipulates and controls every factor while others are being kept constant to know if there
is a significant difference in the result. It is conducted with a scientific approach where a
set of variables is kept constant while the other set of variables is measured as the subject
of an experiment.

The researchers utilized two designs, which involved comparing the outcomes of
the pretest and posttest designs. The pretest design acts as an initial evaluation to
establish a common level of proficiency in identifying polarity, emotion, and intensity. Its
purpose is to ensure that all data sets begin with a similar baseline. Following the pretest,
the data undergo training sessions where they are exposed to text-based stimuli that
include emojis, keywords, and ending punctuation marks. Following the training, a posttest
assessment is conducted to measure the extent to which the data's ability to recognize
polarity, emotion, and intensity has changed. By comparing the results of the pretest and
posttest, researchers can determine whether the inclusion of emojis and ending
punctuation marks has had a significant impact on the data's recognition capabilities. This
process allows for the evaluation of the effectiveness of incorporating emojis, keywords,

42
and ending punctuation marks in enhancing the data's Recognition of polarity, emotion,
and intensity.

The researchers intend to merge a deep learning method, particularly the Long
Short-Term Memory (LSTM), with the Support Vector Machine (SVM) Classifier. The goal
is to develop a tool that could potentially provide higher performance in sentiment analysis
with emotion and intensity level recognition for cryptocurrency-related tweets considering
the combination of keywords, emojis, and punctuation marks. The performance of the
proposed tool in terms of Precision, Recall, and F measure was tested and compared
without considering the combination of keywords, ending punctuation marks, and emojis.

Source of Data

X (formerly known as Twitter), a widely popular social networking platform


nowadays, often serves as a medium for expressing views on various matters. The rapid
expansion of social media platforms, particularly X (formerly known as Twitter), has
significantly impacted cryptocurrency transactions. Many traders utilize tweets as a basis
for their daily trading strategies, underlining the growing importance of sentiment analysis
in the cryptocurrency marketplace.

The data that is used in this study is exclusively sourced from the English tweet
stream on X (formerly known as Twitter). The primary source of data is X (formerly known
as Twitter) posts containing information related to cryptocurrencies and featuring the
hashtags "#crypto" and "#cryptocurrency". These tweets formed the target population for
the research, and they were selected as they met the requirements for the sample needed
for recognizing emotion and intensity levels. The proposed tool for emotion and intensity
level Recognition focused on analyzing texts, phrases, and sentences, with particular
attention given to emojis that express emotional content.

In addition to the primary data from X (formerly known as Twitter) posts, the
secondary source of data involved analyzing the replies to these main posts, which
provided further insights into the sentiment and opinions expressed by users within the
cryptocurrency community. There were also 3 respondents in this experiment. The 1st
respondent is Mr. Soren Louis Anore, expert in cryptocurrency trading, with knowledge of
digital currencies, market trends, and investment strategies. The 2nd respondent is Mr.

43
Kirck Michael Britos De Leon, a language practitioner, skilled in areas like translation,
interpretation, and linguistics. The 3rd respondent is Dr. Rodrigo V. Lopiga, a faculty
member at the Polytechnic University of the Philippines, Department of Psychology,
College of Social Sciences and Development, specializing in understanding human
behavior and emotions. The data annotated by the three experts undergoes a majority
voting process. If there is a discrepancy among the three experts, the Language
Practitioner Expert's judgment is selected for determining Polarity, and the Psychologist's
decision is used for assessing Emotion and Intensity Level. This approach is founded on
the research conducted by Nandwani, P., and Verma, R. in 2021, which focuses on the
analysis of sentiment and detection of emotions from text. This data is intended for
training, testing, and evaluation.

In this study, the researchers applied a combination of quota and purposive


sampling strategies to gather the necessary sample. Quota sampling confines the sample
size to 1500 tweets, whereas purposive sampling selectively chooses samples that meet
set requirements. Given the study's focus on a specific quantity of tweets from a
designated area of interest, the sampling methods can be classified as both quota and
purposive sampling. The chosen tweets for sampling must be related to cryptocurrencies
and should include a combination of keywords, emojis, and punctuation marks.

44
Instruments
System Architecture

Figure 9. System Architecture of the Tool

45
Figure 9 shows two parts. The initial component involves feeding the cryptocurrency-
related tweets into the pre-processing stage. The researchers utilized a Python library
called Natural Language Toolkit (NLTK) to assist the proponents in constructing various
aspects of the pre-processing modules. These aspects include importing essential
libraries such as ‘re’ for regular expressions, ‘pandas’ for data manipulation, and ‘numpy’
for numerical operations. These libraries collectively form the foundation of the pre-
processing phase, enabling efficient and effective text analysis and manipulation.

Tweets
During the initial phase, proponents gather X (formerly known as Twitter)
cryptocurrency-related tweets as part of the data collection process.

Tweets Filtering
Tweets are being sorted in tweet filtering, particularly those tweets associated with
Cryptocurrency. Hashtag matching is performed in a case-insensitive manner. This
means that capitalization variations in hashtags, such as #Cryptocurrency, are not
considered during the filtering process.

Pre-processing Phase
• Remove numbers.
The first process involves removing any numerical digits from the text data.
Numerical digits don't contribute to the sentiment of the text.
• Remove links, usernames, mention, and hashtags.
After removing the numbers, remove the links, username, mention, and
hashtags as they do not contribute much to the sentiment expressed in the
text. This may include additional information, but it does not reflect the
sentiment of the text.
• Spell corrections
After removing the links and hashtag, correction of spelling is performed to
fix any misspelled words in the text data, and incorrectly spelled words can
affect the accuracy of the sentiment analysis.
• Emoticon converter
If the user chooses the “Combination of Keywords, Ending Punctuation
Marks, and Emoticon Features” it will convert the emojis/emoticons to their

46
textual representation allowing the model used in sentiment analysis to
understand the sentiment expressed more accurately.
• Remove Emoticons and Punctuation Marks
If the user chooses the “Plain Text Only Features” it will clean the text by
taking out emoticons and punctuation marks that’s not needed for analysis.
• Removing Stop Words
After converting the emojis to their textual representation, the tool will
remove the stop words to give more focus to the important information.
These are the words that are not significant in sentiment analysis in a
specific context, such as "in, at, on, a, an, the, etc.".
• Convert to lowercase.
Converting the text to lowercase ensures that the algorithm used in
sentiment analysis treats words with different cases. This helps to avoid
the duplication of words and capture the exact sentiment used.
• Tokenization
After this process splits a text document into tokens, this involves dividing
the text component into units such as words, phrases, or characters to a
given text. This process is important because there are specific implications
in Cryptocurrency, such as "pump”, "dump", "bullish", "moon" etc.
• Lemmatization
After splitting every phrase, paragraph, and sentence into smaller units,
reducing the words to their base form considers the dictionary meaning and
grammatical context of words. For example, "buying", "bought", and "buys"
would be reduced to "buy".

Sentiment Analysis Phase


• Database
EmCrypt Database is the storage system where the pre-processed data is
stored. The EmCrypt database allows us to organize and access the data
processed text data efficiently.
• Classifier model
A classifier model is trained to classify text into different categories of
sentiments, such as positive and negative. It also classifies the emotion of

47
the text, such as happy, sad, anticipation, fear, angry, and surprise. The
proponents will use Keras and Scikit-Learn Libraries for the Classifier.
• Polarity recognition
Polarity recognition determines the polarity of the text, such as positive,
negative, or neutral. This helps to understand the overall sentiment
conveyed by a piece of text. The Classifier that we will be using is the
LSTM-SVM Classifier.
• Emotion recognition
After the polarity recognition process, emotion recognition aims to identify
the specific emotion expressed in the processed text, such as happiness,
sadness, anger, surprise, etc. Just like in polarity recognition, we will be
using the LSTM-SVM Classifier in emotion recognition.
• Intensity Recognition
The intensity recognition process determines the intensity of the sentiment
expressed in the text. This can be performed using techniques such as
sentiment score aggregation, where the sentiment scores are assigned
differently to individual words or phrases in a text.

Sentiment Analysis
Lastly, the whole process includes the filtering of tweets, pre-
processing stage, and the Sentiment analysis stage results in the final sentiment
analysis of the text. The result of these processes can provide insights that allow
individuals to make decisions based on public opinion. The result can also see
using chart for Polarity, Emotion, and Intensity level. The Intensity Level
Recognition used the rules presented in Table 4 for considering words based on
the study of Das, & Bandyopadhyay. (2010), and Table 5 for considering ending
punctuation marks, and emoticons. The output was the emotion and intensity level
detected based on the process performed.

48
Table 5

Rules for Tagging an Emotional Sentence with Intensity considering Emotion

Weights and Ending Punctuation Marks

Happiness Question Mark (== 0) AND Period (==0 ) AND Low


Exclamation Mark (= 1) AND Emotion Weight (<0.5)

Period (== 1) AND Question Mark (== 0) AND Medium


Emotion Weight (> 0.5)

Exclamation Mark (>= 1) AND Emoticon Weight (> High


1.0)

Sadness Question Mark (== 0) AND Period (==0 ) AND Low


Exclamation Mark (= 1) AND Emotion Weight (<0.5)

Period (== 1) AND Question Mark (== 1) AND Medium


Emotion Weight (> 0.5)

Exclamation Mark (>= 1) AND Question Mark (> 1) High


AND Emotion Weight (> 1.0)

Surprise Question Mark (== 0) AND Period (==0 ) AND Low


Exclamation Mark (= 1) AND Emotion Weight (<0.5)

Period (== 1) AND Question Mark (== 1) AND Medium


Exclamation Mark (== 1) AND Emotion Weight (> 0.5)

High
Exclamation Mark (>= 1) AND Question Mark (> 1)
AND Emotion Weight (> 1.0)

Anger Question Mark (== 0) AND Period (==0 ) AND Low


Exclamation Mark (= 1) AND Emotion Weight (<0.5)

Medium
Period (== 1) AND Question Mark (== 0) AND
Emoticon Weight (> 0.5)

High

Exclamation Mark (>= 1) AND Emotion Weight (> 1.0)

Anticipation Low
Question Mark (== 0) AND Period (==0 ) AND
Exclamation Mark (= 1) AND Emotion Weight (<0.5)

Medium
Period (== 1) AND Question Mark (== 1) AND
Emotion Weight (> 0.5)

49
High
Exclamation Mark (>= 1) AND Question Mark (> 1)
AND Emotion Weight (> 1.0)

Fear Low
Question Mark (== 0) AND Period (==0 ) AND
Exclamation Mark (= 1) AND Emotion Weight (<0.5)

Medium
Period (== 1) AND Question Mark (== 1) AND
Emotion Weight (> 0.5)

High
Exclamation Mark (>= 1) AND Question Mark (> 1)
AND Emotion Weight (> 1.0)

In sentiment analysis, particularly with the use of Emotag120 annotated data for
emotion recognition, the Intensity level involves a combination of emotion weights and the
presence of ending punctuation marks. The researchers and language practitioner experts
have developed specific rules based on the study of Sagum, R., Navarro, M., & Victore,
A. (2019). They consider the emotional weights assigned to different emoticons in the
Emotag120 dataset. The intensity level of a sentence is determined not only by these
emotional weights but also by the type and number of ending punctuation marks. For
instance, a sentence might be classified as having medium or high emotional intensity
based on a combination of high emotional weight words or emoticons and the use of single
or multiple punctuation marks. This approach allows for a nuanced analysis of emotional
intensity, providing a more accurate reflection of the sentiment expressed in the tweets.

50
Figure 10. EmCrypt Training (Combination of Keywords, Ending Punctuation Marks, and
Emoticons)

Figure 11. EmCrypt Training (Plain Text Only)

The Classifier for Polarity and Emotion Recognition utilized by the researchers
combines LSTM with an SVM Classifier, necessitating training data. The training involves
two distinct processes: pre-processing and the training phase itself. Additionally, tweets

51
with the hashtags "#crypto" and "#cryptocurrency" were specifically selected as training
data to align the Classifier's focus with the study's subject matter. During training, data
underwent pre-processing and tweet-filtering phases. The tokens, once simplified, were
entered into the database, which then categorized them as positive or negative based on
expert assessments. The training method for the emotion classifier is similar, but it targets
emotions such as happiness, sadness, surprise, anger, anticipation, and fear.

Two approaches were used for training: plain text analysis and a combination of
keywords, punctuation marks, and emojis. This dual-method training is essential to
prepare the Classifier's knowledge base before classification begins. The purpose of using
two approaches for training - plain text analysis and a combination of keywords,
punctuation marks, and emojis - is to enhance the Classifier's ability to accurately
recognize and categorize emotions and polarity (positive or negative sentiments) in text
data, particularly tweets in this context.

Figure 12. Sequence Diagram of the Tool

52
Figure 13. Pre-processing Diagram

The figure presents the development of the system of EmCrypt. A user searched
for tweets related to cryptocurrency on X (formerly called Twitter) The user used advanced
search feature in X using hashtags, #crypto and #cryptocurrency. The collected tweets
then undergo preprocessing, a series of text preparation steps. This includes cleaning the
tweets by removing unwanted characters, converting emoticons to text, eliminating
emoticons and associated punctuation, getting rid of common stopwords, and breaking
the text into individual tokens. Furthermore, a lemmatization process is applied to
standardize words.
The preprocessed tweet data is stored in a database for future reference.
Subsequently, a sentiment analysis stage is executed, involving feature extraction and
classification using a combination of LSTM and SVM algorithms. The tool then saves the
results of polarity recognition, identifying whether tweets are positive, negative, or neutral,
as well as emotion recognition and intensity level recognition. Finally, these processed
and analyzed results are presented to the user through a user interface for their review
and interaction. This entire process aims to provide insights into the sentiment and
emotions expressed in cryptocurrency-related tweets.

In the preprocessing stage, the gathered tweets undergo data cleaning, which
involves eliminating numbers, hyperlinks, usernames, mentions, and hashtags, as well as

53
correcting spelling errors. Following this, the procedure moves to feature selection, which
encompasses converting emoticons and removing both emoticons and punctuation
marks. The next step is tokenization, during which the process entails removing redundant
characters, eliminating stopwords, converting text to lowercase, and employing a
tokenizer. Subsequently, lemmatization is applied. This series of steps completes the
preprocessing of cryptocurrency-related tweets, preparing them for storage in the
database.

Development Details

Python was used as the tool's programming language, MySQL as the database,
and Visual Studio Code as the programming system. The researchers accessed the
tweets using the X (formerly known as Twitter) user account and manually gathered the
data using X (formerly known as Twitter) advanced search options. The tool's design
serves as an example of development phase planning. The testing and observation of the
tool's behavior during the development period were crucial for understanding its potential.
The researcher conducted a test and debug each tool component after it has been
completed to determine the prerequisites for the subsequent component. The
development process was completed by repeating these phases. The tool was approved
for implementation to address the issue in the study's domain because it met the
requirements and achieved the study's objective.

Research Instrument

The study utilized the experiment paper to determine if the tool output, which is
sentiment analysis of cryptocurrency-related tweets with polarity, emotion and intensity
level recognition, will match the expected output of the tool.

Experiment paper

The first three table shows the performance of the system in polarity, emotion, and
intensity level recognition results considering the combination of keywords, emoji, and
ending punctuation marks in terms of Precision, Recall, and F-Measure. The next three
table shows the performance of the system in polarity, emotion, and intensity level
recognition using Plain text only.

54
Data Generation/ Gathering Procedure

The researchers collected the data in X (formerly known as Twitter) platform since
it is renowned for highlighting trends in cryptocurrency-related tweets, various news, and
current events. The study was prepared according to the following steps:

● Pre-Experimentation- The proponents gathered tweets conveying opinions about


Cryptocurrency. The proponents used a user account on the X (formerly known as
Twitter) Platform involved accessing X's advanced search functionality and
defining search parameters with hashtags related to cryptocurrencies.
1. The total amount of data that the researchers gathered is 1500. Out of
those 1500 datasets, 900 were for training, 450 for testing, and 150 for
evaluation. The programming language that the researcher used is Python.
Search parameters should be defined by specifying relevant hashtags,
three-week timeframe and search terms associated with cryptocurrencies,
enabling the filtering of tweets. The tweets were filtered using X (formerly
known as Twitter)'s advanced search options, and the data was collected
and recorded in a Microsoft spreadsheet. A portion of the data was
manually labeled by the three experts using majority voting for training,
testing, and evaluation purposes.
● Experimentation- The data sets will be 30% testing, 60% training, and 10%
evaluation.
1. The researcher developed the EmCrypt Analyzer, a sentiment analysis tool
that take in a cryptocurrency-related tweet as input and predict its
sentiment label based on the learned patterns from the training data.
● Post Experimentation- the problems in the Statement of the Problem answered
using the results of the experiment and also the formulas such as Precision, Recall,
F-Measure.
1. The proponents used the manually gathered data in X (formerly known as
Twitter) and annotated by the respondent expert. After developing the
sentiment analysis tool, the researchers proceeded to evaluate its
performance. The evaluation process involves assessing how well the tool
correctly classifies the sentiment expressed in cryptocurrency-related
tweets with combination of keywords, emoji, and ending punctuation marks
compared to using plain-text only.

55
Ethical Considerations

The sentiment analysis study is conducted with full adherence to research ethics,
including obtaining informed consent from the university administration, course
instructors, and affected experts specifically the cryptocurrency trader, language
practitioner, and psychologist. Ethical considerations will prioritize privacy, confidentiality,
fairness, and equity. Measures will be taken to protect sensitive information, anonymize
data, and address biases. Transparent communication will be maintained regarding
sentiment analysis methods, limitations, and uncertainties. Approval from an ethics
committee will be sought, and data protection protocols is followed. Monitored, evaluated,
and ensures the ethical compliance of the study that aims to contribute a valuable insight
in sentiment analysis while maintaining the utmost ethical standards.

Statistical Data Analysis


Table 6
Sample Confusion Matrix

Predicted Emotion Category

Actual Happy Sad Surprise Anger Anticipation Fear


Emotion
Category
Happy TP
Sad TP
Surprise TP
Anger TP
Anticipation TP
Fear TP

In a multi-classification problem with labels such as happy, sad, surprise, anger,


anticipation, and fear, the Confusion Matrix is an essential tool. Calculating recall and
precision for binary classification is straightforward, but it can be more complex for multi-
class problems. This complexity can be managed by maintaining accurate counts of true
positives (TP), false positives (FP), false negatives (FN), and true negatives (TN) for each
label. In the matrix, the diagonal values represent TPs for each class. The sum of values
in a row, excluding the diagonal, gives the FNs for that class, while the sum of values in a

56
column, excluding the diagonal, represents the FPs for that class. TN for each class is
calculated as the sum of all values in the matrix excluding the row and column of that
class. With these values, the proponents can accurately calculate recall and precision for
each class, providing a comprehensive understanding of the classifier's performance in a
multi-class setting.

Evaluation Metrics
Evaluation Parameters (Agarwal & Mittal, 2019)
• True Positive (TP): This is when the tool accurately predicts the positive class,
and the actual class is indeed positive. For instance, if the tool predicts 'happy' and
the actual emotion is 'happy', it's a TP.
• True Negative (TN): TN occurs when the tool correctly identifies the negative
class. In a multi-class setting, this means for a specific class (say, 'happy'), all other
classes ('sad', 'surprise', etc.) are correctly identified as not being 'happy'.
• False Positive (FP): This happens when the tool incorrectly predicts the positive
class. For example, if the tool predicts 'happy’ when the actual emotion is 'angry',
it's a FP for 'happy'.
• False Negative (FN): FN takes place when the tool incorrectly predicts the
negative class. Using the same example, if the tool predicts 'angry' (negative for
'happy') when the actual emotion is 'happy', it's an FN for 'happy'.

Precision
Given all the predicted labels (for a given class X), how many of the
instances were correctly predicted. Measured by the number of True
Positive divided by the total number of True Positives and False Positive.

Where:
True Positive- the tool correctly predicts the positive sentiment for
cryptocurrency-related tweets.
False positive- the tool incorrectly predicts positive sentiment for tweets
that should have been classified as negative.

57
Recall
For all instances that should have a label X, how many of these
were correctly captured, Measured by the number of True Positive
divided by the total number of True Positive and False Negative.

Where:
True Positive- the tool correctly predicts the positive sentiment for
cryptocurrency-related tweets.
False positive- the tool incorrectly predicts positive sentiment for
cryptocurrency-related tweets that should have been classified as
negative.

F-measure
F-measure is the value of the weighted average of Precision and
Recall. Multiply the values by two and divide them by the sum of
Precision and Recall.

Hypothesis Testing
To measure the significant difference in sentiment analysis performance
between using a combination of keywords, ending punctuation marks, and emojis
compared to using plain text only for analyzing sentiments in cryptocurrency-
related tweets the researchers utilized Paired T-test.

Paired T-test
The t-test assesses whether the means of the tool using plain-text only
cryptocurrency-related tweets and the proposed tool are statistically different from
each other.

58
Where:
d: difference per paired value
n: number of samples

Table 7

Sample Summary of Overall Performance

Tweets F-measure
Polarity Recognition
Emotion Recognition
Intensity Level Recognition
Overall Performance

Rating System
To assess the tool's overall recognition performance, it is necessary to
interpret its performance. Table 3.4 Rating System for the parameters: Precision,
Recall, and F-measure.
Table 8

Rating System for the Parameters: Precision, Recall, and F-Measure (Eboña 2013)

Computed Value Verbal Interpretation


97%-100% Excellent
93%-96.99% Superior
89%-92.99% Very Good
85%-88.99% Good
80%-84.99% Satisfactory
75%-79.99% Fair
70%-74.99% Pass
Below 70% Fail

59
CHAPTER 4
RESULTS AND DISCUSSIONS

This chapter presents and interprets the findings of the data collected during the
implementation of the developed tool to address the problem in the study.

The purpose of the study, Sentiment Analysis on cryptocurrency-related tweets


with Emotion and Intensity Level Recognition is to provide insights into the sentiments
surrounding cryptocurrencies on social media platforms, particularly X (formerly known as
Twitter), and to comprehend how the combination of keywords, emojis, and punctuation
marks has a significant effect on the performance of the tool in analyzing sentiments. The
data used in the study has been divided into three phases of software development. The
division of the data per phase is shown in below:

Table 9
Division of Cryptocurrency-related Tweets per phase

Phase Percentage No. of Cryptocurrency-related Tweets

Training Set 60% 900

Testing Set 30% 450

Evaluation Set 10% 150

Total 100% 1500

The data that is used by the researchers was limited due to the limitation imposed
by Elon Musk on X (formerly known as Twitter) on data scraping and the number of tweets
users can read per day on X (Quintet, 2023). The researchers conducted experiments to
explore and analyze the data and generate the following results. The researcher
presented tables that outline the cumulative details of the experiment to provide a concise
explanation of the findings. The researchers used a combination of keywords, ending
punctuation marks, and emoticons in analyzing sentiments and recognized its emotion
and intensity level to address the gaps in sentiment analysis in cryptocurrency-related
tweets. The researcher conducted tests to determine the assessment of the tool’s
performance in analyzing sentiment in crypto-currency related tweets in terms of
Precision, Recall, and F-measurement based on TP (True Positive - This is when the tool

60
accurately predicts the positive class, and the actual class is indeed positive), FP (False
Positive - This happens when the tool incorrectly predicts the positive class) , TN (True
Negative - occurs when the tool correctly identifies the negative class), and FN (False
Negative - takes place when the tool incorrectly predicts the negative class). The
researchers conducted several assessments of the developed tool in classifying Polarity,
Emotion, and Intensity level to get its overall performance of the tool. The researchers
also answered if there is a significant difference in sentiment analysis performance
between using a combination of keywords, ending punctuation marks, and emojis
compared to using plain text only for analyzing sentiments in cryptocurrency-related
tweets.
Based on the problem presented in Chapter 1, below are the gathered data that
respond to the Statement of the Problem regarding the overall performance of the
developed tool that considered the combination of keywords, ending punctuation marks,
and emoticons compared to using plain text only in analyzing sentiment.

1. What is the performance of the tool in cryptocurrency-related tweets considering


the combination keywords, ending punctuation marks, and emoticons in terms of:

a. Precision

b. Recall

c. F-Measure

The researchers tested the tool to calculate the performance of the tool in
cryptocurrency-related tweets considering the combination keywords, ending punctuation
marks, and emoticons.
Table 10
Polarity results considering the combination of keywords, ending punctuation
marks, and emoticons.
Polarity Precision Recall F-Measure
Positive 93.55% 91.58% 92.55%
Negative 85.96% 89.09% 87.50%
Overall Polarity 89.76% 90.34% 90.03%
Verbal Interpretation Very Good Very Good Very Good

61
The table above shows the results for the Polarity Recognition considering the
combination of Keywords, Ending Punctuation Marks, and Emoticons. The tool approach,
which prioritizes detecting the polarity of a cryptocurrency-related tweet before identifying
its emotion and intensity level, reflects a layered understanding of sentiment analysis. By
first establishing whether a tweet is positive or negative, the tool lays a foundational
context for further emotional and intensity analysis. This sentiment analysis method based
on the study of Kumar & Bhaskari (2018) emphasize the complexity of interpreting
sentiments.
The verbal interpretation of this parameter is determined by Eboña (2013) Rating
system, and it presents the Precision, Recall, and F-Measure for evaluating the sentiment
polarity of tweets related to cryptocurrencies. A total of 150 cryptocurrency-related tweets
were tested for evaluations, with each tweet classified as positive or negative. Based on
the first statement of the problem stated in Chapter 1, The precision results for polarity
were 'very good,' with a score of 89.76%, while recall also achieved a 'very good' rating at
90.34%. Additionally, the F-measure, scored 'very good' at 90.03%. These high-
performance metrics not only illustrate the tool’s effectiveness in accurately categorizing
cryptocurrency-related tweet sentiments but also reflect its reliability in the nuanced field
of sentiment analysis.

Table 11
Emotion results considering the combination of keywords, ending punctuation
marks, and emoticons.
Emotion Precision Recall F-Measure
Happy 97.44% 90.48% 93.83%
Sad 81.48% 78.57% 80.00%
Surprise 71.43% 83.33% 76.92%
Anger 66.67% 85.71% 75.00%
Anticipation 80.00% 91.43% 94.12%
Fear 76.19% 80.00% 78.05%
Overall Emotion 78.87% 84.92% 82.99%
Verbal Interpretation Fair Satisfactory Satisfactory

The table above shows the results for the Emotion Recognition considering the
combination of Keywords, Ending Punctuation Marks, and Emoticons. This approach by

62
Nourah & Mohamed (2020) highlights the complexity of emotional analysis in digital
communication, where simple textual elements like keywords and emoticons can reveal
deeper emotional undertones. After obtaining the sentiment value of each cryptocurrency-
related tweet, the tool progresses to the next step, which involves identifying the specific
emotion from these tweets. The limitation to six basic emotions (happy, surprise,
anticipation, angry, fear, and sad) as stated by the Emotag120, reflects a focused yet
comprehensive range of human emotional responses in the context of cryptocurrency
discussions. The findings among the six basic emotions, the 'Surprise' emotion category
have contributed to a lower performance with a F-Measure of 73.17%. Additionally,
Emotag120 has a limited emoticon that the system can detect. This leads to
misclassification of the system. The verbal interpretation of this parameter is determined
by Eboña (2013) Rating system, and it presents the Precision, Recall, and F-Measure for
evaluating the emotion of tweets related to cryptocurrencies. The results of the Emotion
in terms of precision were ‘fair’, with a score of 78.87%, recall achieved a ‘satisfactory’
rating at 84.92%, and F-measure scored ‘satisfactory’ at 82.99% accuracy, as based on
the first statement of the problem stated in Chapter 1. These metrics not only demonstrate
the tool’s effectiveness in emotion recognition but also underscore the nuanced challenge
of deciphering emotions in text-based communication.

Table 12
Intensity Level results considering the combination of keywords, ending
punctuation marks, and emoticons.
Intensity Level Precision Recall F-Measure
Low 87.50% 84.85% 86.15%
Medium 94.74% 87.10% 90.76%
High 85.25% 94.55% 89.66%
Overall Intensity Level 89.16% 88.83% 88.86%
Verbal Interpretation Very Good Good Good

The table above shows the results for the Intensity Level Recognition with a
combination of Keywords, Ending Punctuation Marks, and Emoticons. After obtaining the
emotion value of each cryptocurrency-related tweet, the tool advances to the critical step
of identifying the Intensity level of these emotions. The tool can detect 3 levels of emotion
intensity which are low, medium, and high (Das, & Bandyopadhyay 2010). The verbal

63
interpretation of this parameter is determined by Eboña (2013) Rating system, and it
presents the Precision, Recall, and F-Measure for evaluating the intensity level of tweets
related to cryptocurrencies. The results for the Intensity level, with precision were ’very
good’, with a score of 89.16%, recall achieved a ‘good’ rating at 88.83%, and F-Measure
scored ‘good’ at 88.86% accuracy, as per the first problem statement in Chapter 1,
highlight the tool’s high performance and reliability in the intricate task of intensity level
detection in textual data.

Table 13
Summary Result of the performance of the tool considering the combination of
Keywords, Ending Punctuation Marks, and Emoticons
Precision Recall F-Measure
Polarity 89.76% 90.34% 90.03%
Emotion 78.87% 84.92% 82.99%
Intensity Level 89.16% 88.83% 88.86%
Overall Performance 85.93% 88.03% 87.29%
Verbal Interpretation Good Good Good

The table above presents the summary of the performance results of the tool,
considering the combination of Keywords, ending punctuation marks, and emoticons. This
combination of elements suggests a nuanced approach to analyzing textual data, where
various types of inputs are considered to enhance the performance of the tool. The verbal
interpretation of the summary result of the performance of the tool is determined by Eboña
(2013) Rating system, and it presents the Precision, Recall, and F-Measure for evaluating
the polarity, emotion, and intensity level of tweets related to cryptocurrencies. Based on
the first statement of the problem stated in Chapter 1, the overall performance of the tool
in terms of precision were ‘good’ rating at 85.93%, recall achieved a ‘good’ rating at
88.03%, and F-measure obtained ‘good’ rating at 87.29%. These results indicate that the
tool is effective, with a high level of performance in identifying and interpreting the
cryptocurrency-related tweets, as reflected in the good precision and recall rates. The
balance between precision and recall, as demonstrated by the F-measure, highlights the
tool’s ability to process the cryptocurrency-related tweets accurately and consistently.

64
2. What is the performance of the tool in cryptocurrency-related tweets using
Plain-text only in terms of:

a. Precision

b. Recall

c. F-Measure

The researchers tested the tool to calculate the performance of the tool in
cryptocurrency-related tweets using Plain-Text only.

Table 14
Polarity results using Plain-Text only
Polarity Precision Recall F-Measure
Positive 88.42% 87.50% 87.96%
Negative 78.18% 79.63% 78.90%
Overall Polarity 83.30% 83.57% 83.43%
Verbal Interpretation Satisfactory Satisfactory Satisfactory

The table above shows the results for the Polarity Recognition using plain text only.
The tool approach, which prioritizes detecting the polarity of a cryptocurrency-related
tweet before identifying its emotion and intensity level, reflects a layered understanding of
sentiment analysis. By first establishing whether a tweet is positive or negative, the tool
lays a foundational context for further emotional and intensity analysis. This sentiment
analysis method based on the study of Kumar & Bhaskari (2018) emphasize the
complexity of interpreting sentiments. The verbal interpretation of this parameter is
determined by Eboña (2013) Rating system, and it presents the Precision, Recall, and F-
Measure for evaluating the sentiment polarity of tweets related to cryptocurrencies. A total
of 150 cryptocurrency-related tweets were tested for evaluations, with each tweet
classified as positive or negative. Based on the second statement of the problem stated
in Chapter 1, the results of the Polarity in terms of precision were ‘satisfactory’ with a score
of 83.30%, recall achieved ‘satisfactory’ rating at 83.57%, and F-measure also obtained
‘satisfactory’ rating at 83.43% accuracy. These performance metrics indicate that the tool,

65
which exclusively processes plain-text tweets, demonstrated a satisfactory rating on the
performance in evaluating the polarity.

Table 15
Emotion results using Plain-Text only
Emotion Precision Recall F-Measure
Happy 88.37% 86.36% 87.36%
Sad 75.00% 88.89% 81.36%
Surprise 78.57% 64.71% 70.97%
Anger 83.33% 71.43% 76.92%
Anticipation 81.08% 85.71% 83.33%
Fear 77.78% 70.00% 73.68%
Overall Emotion 80.69% 77.85% 78.94%
Verbal Interpretation Satisfactory Fair Fair

The table above shows the results for the Emotion Recognition using Plain text
only. This approach by Nourah & Mohamed (2020) highlights the complexity of emotional
analysis in digital communication, where simple textual elements like keywords and
emoticons can reveal deeper emotional undertones. After obtaining the sentiment value
of each cryptocurrency-related tweet, the tool progresses to the next step, which involves
identifying the specific emotion from these tweets. The limitation to six basic emotions
(happy, surprise, anticipation, angry, fear, and sad) as stated by the Emotag120, reflects
a focused yet comprehensive range of human emotional responses in the context of
cryptocurrency discussions. The findings among the six basic emotions, the 'Surprise'
emotion category have contributed to a lower performance with a F-Measure of 66.67%.
This leads to misclassification of the system. The verbal interpretation of this parameter is
determined by Eboña (2013) Rating system, and it presents the Precision, Recall, and F-
Measure for evaluating the emotion of tweets related to cryptocurrencies. Based on the
second statement of the problem stated in Chapter 1, the results of the Emotion in terms
of precision were ‘satisfactory’ with a score of 80.69%, recall achieved ‘fair’ ratings at
77.85%, and F-measure obtained ‘fair’ at 78.94% accuracy. These metrics suggest that
the tools performed moderately fair in recognizing and analyzing emotions, with a slightly
better performance in precision compared to recall and F-measure.

66
Table 16
Intensity Level results using Plain-Text only
Intensity Level Precision Recall F-Measure
Low 88.46% 71.88% 79.31%
Medium 69.49% 83.67% 75.93%
High 76.92% 81.08% 78.95%
Overall Intensity 78.29% 78.88% 78.06%
Verbal Interpretation Fair Fair Fair

The table above shows the results for the Intensity Level Recognition using Plain
text only. After obtaining the emotion value of each cryptocurrency-related tweet, the tool
advances to the critical step of identifying the Intensity level of these emotions. The tool
can detect 3 levels of emotion intensity which are low, medium, and high (Das, &
Bandyopadhyay, 2010). The verbal interpretation of this parameter is determined by
Eboña (2013) Rating system, and it presents the Precision, Recall, and F-Measure for
evaluating the intensity level of tweets related to cryptocurrencies. Based on the second
statement of the problem stated in Chapter 1, the results of the Intensity level in terms of
precision were ‘fair’ with a score of 78.29%, recall achieved ‘fair’ rating at 78.88%, and
obtained ‘fair’ rating at 78.06% accuracy for F-Measure. This shows the tool’s fair
performance in intensity level detection in textual data when using plain text only.

Table 17
Summary Result of the performance of the tool using Plain-Text only.
Summary Precision Recall F-Measure
Polarity 83.30% 83.57% 83.43%
Emotion 80.69% 77.85% 78.94%
Intensity Level 78.29% 78.88% 78.06%
Overall Performance 80.76% 80.10% 80.14%
Verbal Interpretation Satisfactory Satisfactory Satisfactory

The table above presents the summary of the performance results of the tool, using
plain text only. The verbal interpretation of the summary result of the performance of the
tool is determined by Eboña (2013) Rating system, and it presents the Precision, Recall,
and F-Measure for evaluating the polarity, emotion, and intensity level of tweets related to

67
cryptocurrencies. Based on the second statement of the problem stated in Chapter 1, the
overall performance of the tool in terms of precision were ‘satisfactory’ with a score of
80.60%, recall achieved ‘satisfactory’ rating at 80.10%, and F-measure obtained
‘satisfactory’ rating at 80.09% accuracy. These findings suggest that there are needs for
improvement in the tool's performance when it comes to identifying and interpreting
cryptocurrency-related tweets, as evidenced by the precision and recall rates. The F-
measure, which assesses the balance between precision and recall, underscores the
tool's capacity to process cryptocurrency-related tweets with precision and consistency,
However, there is a need for enhancement when using plain text only.

3. Is there a significant difference in sentiment analysis performance between using


a combination of keywords, ending punctuation marks, and emojis compared to
using plain text only for analyzing sentiments in cryptocurrency-related tweets?

The researchers calculate the overall performance of the tool in cryptocurrency-


related tweets considering the combination keywords, ending punctuation marks, and
emoticons compared to using plain text only for analyzing sentiments.

Table 18

Overall performance of the tool using combination of Keywords, Ending

Punctuation Marks, and Emoticons

Tweets F-Measure
Polarity 90.03%
Emotion 82.99%
Intensity Level 88.86%
Overall Performance 87.29%
Verbal Interpretation Good

68
Table 19

Overall performance of the tool using Plain-Text only.

Tweets F-Measure
Polarity 83.43%
Emotion 78.94%
Intensity Level 78.06%
Overall Performance 80.14%
Verbal Interpretation Satisfactory

The tables 18 and 19 show the comparative efficiency of the EmCrypt Sentiment
Analyzer in processing cryptocurrency-related tweets. The analyzer's performance, when
utilizing a combination of keywords, ending punctuation marks, and emoticons, is
contrasted against its performance using only plain text. The difference in the Overall F-
Measure, ‘Good’ rating with the score of 87.68% for considering the combination of the
three features compared to ‘Satisfactory’ rating of 80.09% for using plain text only
underlines the enhanced effectiveness of the combined method. This suggests that the
inclusion of combination of keywords, punctuation marks, and emoticons offers a more
nuanced and accurate analysis of sentiment in cryptocurrency-related tweets. The
combination of keywords, ending punctuation marks, and emoticons shows a better result
in analyzing cryptocurrency-related tweets than just using plain text. The results highlight
the importance of considering the combination of the keywords, ending punctuation marks,
and emoticons in sentiment analysis, especially in the context of the often ambiguous and
emotionally charged field of cryptocurrency.

Figure 14. Overall results and comparison of the tool performance

69
Figure 14 shows the overall results and comparison between the EmCrypt
Analyzer that considers the combination of keywords, ending punctuation marks, and
emoticons and using Plain-text only in analyzing sentiments in cryptocurrency-related
tweets. The tool that incorporates a combination of three features outperformed the tool
that solely relies on plain text in terms of overall performance. There is a noticeable
disparity in the overall results, with the combined feature tool achieving a ‘Good’ rating,
scoring 87.68%, as opposed to the ‘Satisfactory’ rating of 80.09% for the plain text-only
approach. This underscores the improved effectiveness of considering the combination of
keywords, ending punctuation marks, and emoticons in analyzing sentiments on
cryptocurrency-related tweets.

Table 20

Paired T-Test Result of the tool

T-Test Value P Value Decision

Overall Performance -3.63 0.03 Reject Null Hypothesis

Table 20 presents the Paired T-Test results, a statistical analysis, for the
performance of the tool, including a T-Test Value of -3.63 and a P Value of 0.03. The
decision to reject the null hypothesis is based on these results, particularly the P-Value
being lower than the predetermined significance level of 0.05. The default use of an alpha
level of .05 is suboptimal for two reasons. First, decisions based on data can be made
more efficiently by choosing an alpha level that minimizes the combined Type 1 and Type
2 error rate. Second, it is possible that in studies with very high statistical power, p values
lower than the alpha level can be more likely when the null hypothesis is true than when
the alternative hypothesis is true (Maier and Lakens, 2022). The P-value result are less
than the threshold 0.05, hence words are significant for sentiment classification (Mondal,
2016) This outcome implies that the differences observed in the tool’s performance are
statistically significant. It suggests that the variables or conditions being tested have a real,
measurable impact on the tool’s performance. The rejection of the null hypothesis here is
a critical finding, indicating that the factors under study do indeed have a significant effect.
This provides strong evidence that the tool’s performance is influenced by the specific
methods or conditions being tested, reinforcing the importance of these factors in the
overall effectiveness of the sentiment analysis tool.

70
CHAPTER 5
SUMMARY OF FINDINGS, CONCLUSION AND RECOMMENDATION

This chapter presents the summary of findings and results of the assessment on
defining the performance of EmCrypt in Sentiment Analysis in Cryptocurrency-related
tweets with Emotion and Intensity Level Recognition considering the combination of
keywords, ending punctuation marks, and emoticons. Conclusions and
Recommendations were also included in this chapter.

Summary of Findings
From the evaluation and implementation of the study, EmCrypt: Sentiment
Analysis on Cryptocurrency-related tweets with Emotion and Intensity Level Recognition
Considering the combination of keywords, ending punctuation marks, and emoticons, the
researcher have come up with the following conclusions:

Considering the combination keywords, ending punctuation marks, and emoticons.


Polarity evaluation scored 89.76% precision achieved ‘Very good’ rating, recall obtained
‘Very good’ rating of 90.34, and F-Measure were ‘Very good’ with a score of 90.03%.
Emotion recognition precisions achieved ‘Fair’ rating with a score of 78.87%, recall was
‘Satisfactory’ with a score of 84.92%, and F-measure obtained ‘Satisfactory’ of 82.99%.
Additionally, the findings among the six basic emotions, the 'Surprise' emotion category
have contributed to a lower performance with a F-Measure of 73.17%. In addition,
Emotag120 has a limited emoticon that the system can detect. This leads to
misclassification of the system. Intensity level assessment, precision reached ‘Very Good’
rating with a score of 89.16%, recall obtained ‘Good’ rating of 88.83%, and F-measure
achieved ‘Good’ rating with a score of 88.86%. Overall, the performance of the tool that
consider the combination of keywords, ending punctuation, marks, and emoticons scores
85.93% precision obtained ‘Good’ rating, 88.03% recall achieved ‘Good’ rating, and
87.29% F-measure obtained ‘Good’ rating.

In using Plain-text only, polarity evaluation scored 83.30% precision achieved


‘Satisfactory’ rating, recall was ‘Satisfactory’ with a score of 83.57%, and F-measure
obtained ‘Satisfactory’ of 83.43%. Emotion recognition attained 80.69% precision obtained
‘Satisfactory’ rate, recall achieved ‘Fair’ of 77.85%, and F-measure was ‘Fair’ with a score
of 78.94%. Intensity level assessment reached 78.29% precision achieved ‘Fair’ rate,

71
recall was ‘Fair’ with a score of 78.88%, and F-measure obtained ‘Fair’ of 78.06%. Overall,
the performance of the tool using Plain-text only scores 80.76% precision obtained
‘Satisfactory’ rating, 80.10% recall achieved ‘Satisfactory’ rating, and 80.14% F-measure
obtained ‘Satisfactory’ rating. For the Paired T-test, with a T-value of -3.63, a p-value of
0.03 that is less than the alpha level used and the decision resulting in the rejection of the
null hypothesis.

Conclusions
Based on the findings of the study, the researchers have arrived on the following
conclusions:
1. The tool was consistent when the classification considered the keywords,
ending punctuation marks, and emojis compared to using plain text only.
2. The F-measures for considering keywords, ending punctuation marks, and
emojis and plain-text only shows that there’s a difference between the two
classifications.
3. After obtaining a p-value that is less than alpha level used, it is confirmed
that EMCRYPT that considers the combination of keywords, ending
punctuation marks, and emoticons has a significant difference on
performance compared with using a plain-text only. The study has resulted
in rejection of the said hypothesis.
4. Following the Rating System of the performance of the tool by Eboña
(2013), the interpretation for the performance tool is “Good” considering the
combination of keywords, ending punctuation marks, and emoticons.
5. The Sentiment Analyzer successfully achieved its objective of addressing
key challenges in Sentiment Analysis in cryptocurrency-related tweets. It
provided more precise interpretations of the emotions conveyed in tweets.
However, certain emotions were inaccurately labeled by the tool due to
limited training data and the tool’s limited ability to detect different kind of
emoticons, which led to misclassifications.

72
Recommendations
To improve the performance of the tool, the researchers suggest the following for
the future works and developments:
1. For improved surprise recognition, it is advised to include both positive and
negative surprises. Adding these types of surprises enhances the tool's
ability to better differentiate between various degrees or types of surprise,
ultimately leading to improved overall precision in identifying surprise
emotions.
2. Considering the limitations imposed on data scraping by X (formerly known
as Twitter), it is advisable to explore alternative platforms for gathering
data, which can provide more accessible and reliable sources of data.
3. It is recommended to increase the training data for a specific corpus. This
will increase the knowledge base of the classifier and could lead into better
classification of Polarity and Emotion.
4. The researchers recommend increasing the number of emoticons/emojis
that the system can detect in order to increase the performance of the
system.

73
References

Alswaidan, N., & Menai, M. E. B. (2020). A survey of state-of-the-art approaches for

emotion recognition in text. Knowledge and Information Systems, 62(8), 2937–

2987. https://doi.org/10.1007/s10115-020-01449-0

Ariella, S. (2023). 30 Striking Cryptocurrency Statistics [2023]: Market Value, Bitcoin

Usage, and Trends. Zippia. https://www.zippia.com/advice/cryptocurrency-

statistics/

Aspect Based Emotion Analysis on Online User-Generated Reviews. (2018, July 1).

IEEE Conference Publication | IEEE Xplore.

https://ieeexplore.ieee.org/document/8494183

B, H. G., & B, S. N. (2023). Cryptocurrency Price Prediction using Twitter Sentiment

Analysis. https://doi.org/10.5121/csit.2023.130302

Bharti, S. K., Varadhaganapathy, S., Gupta, R., Shukla, P., Bouye, M., Hinga, S. K., &

Mahmoud, A. (2022). Text-Based Emotion Recognition Using Deep Learning

Approach. Computational Intelligence and Neuroscience, 2022, 1–8.

https://doi.org/10.1155/2022/2645381

Burton, N. B., [Neel Burton, M.D.]. (n.d.). What Are Basic Emotions? Neel Burton M.D.

https://www.psychologytoday.com/intl/blog/hide-and-seek/201601/what-are-

basic-emotions

Chen, B. (2023a, June 22). Emojis Aid Social Media Sentiment Analysis: Stop Cleaning

Them Out! Medium. https://towardsdatascience.com/emojis-aid-social-media-

sentiment-analysis-stop-cleaning-them-out-

bb32a1e5fc8e#:~:text=Leverage%20emojis%20in%20social%20media%20senti

ment%20analysis%20to%20improve%20accuracy.&text=TL%3BDR%3A,incorpo

rate%20emojis%20in%20the%20loop

74
Chen, B. (2023b, June 22). Emojis Aid Social Media Sentiment Analysis: Stop Cleaning

Them Out! Medium. https://towardsdatascience.com/emojis-aid-social-media-

sentiment-analysis-stop-cleaning-them-out-bb32a1e5fc8e

Cimino, A. N., & Dell’Orletta, F. (2016a). Tandem LSTM-SVM Approach for Sentiment

Analysis. In Accademia University Press eBooks (pp. 172–177).

https://doi.org/10.4000/books.aaccademia.2003

Cimino, A. N., & Dell’Orletta, F. (2016b). Tandem LSTM-SVM Approach for Sentiment

Analysis. In Accademia University Press eBooks (pp. 172–177).

https://doi.org/10.4000/books.aaccademia.2003

Craiker, K. N. (n.d.). What Punctuation Mark Is Used to Express Strong Emotions?

https://prowritingaid.com/punctuation-mark-express-strong-emotions

Dandannavar, P., Mangalwede, S. R., & Deshpande, S. M. (2019). Emoticons and Their

Effects on Sentiment Analysis of Twitter Data. In EAI/Springer Innovations in

Communication and Computing (pp. 191–201). Springer International Publishing.

https://doi.org/10.1007/978-3-030-19562-5_19

Dang, N. C., Moreno, M. N., & De La Prieta, F. (2020). Sentiment Analysis Based on

Deep Learning: A Comparative Study. Electronics, 9(3), 483.

https://doi.org/10.3390/electronics9030483

Das, & Bandyopadhyay. (2010). Identifying Emotional Expressions, Intensities and

Sentence level Emotion Tags using a Supervised Framework* . Retrieved

November 27, 2023, from https://aclanthology.org/Y10-1013.pdf

Dwivedi, D. N., & Vemareddy, A. (2023a). Sentiment Analytics for Crypto Pre and Post

Covid: Topic Modeling. In Lecture Notes in Computer Science (pp. 303–315).

Springer Science+Business Media. https://doi.org/10.1007/978-3-031-24848-1_2

Ferreira Araújo, R., Roschildt Pinto, A., & Ferrandin, M. (n.d.). Sentiment Identification

on Tweets to Forecast Cryptocurrency’s Volatility.

75
https://thescipub.com/pdf/jcssp.2023.619.628.pdf.

https://thescipub.com/pdf/jcssp.2023.619.628.pdf

Ghosh, M. (2022a, April 28). Philippines central bank to trial wholesale CBDC. Forkast.

https://forkast.news/headlines/philippines-central-bank-wholesale-cbdc/

Maier, M., & Lakens, D. (2022, April 1). Justify Your Alpha: A Primer on Two Practical

Approaches. Advances in Methods and Practices in Psychological Science.

https://doi.org/10.1177/25152459221080396

Mason, R. (2022a, July 14). Philippines’ digital transformation could make it a new

crypto hub. Cointelegraph. https://cointelegraph.com/news/philippines-digital-

transformation-could-make-it-a-new-crypto-hub

Montag, A. (2018, August 28). “HODL,” “whale” and 5 other cryptocurrency slang terms

explained. CNBC. https://www.cnbc.com/2018/01/23/what-hodl-whale-and-other-

cryptocurrency-slang-terms-mean.html

Navalan, E. (2022, September 26). Is the Philippines on track to becoming a crypto hub?

Forkast. https://forkast.news/is-philippines-becoming-crypto-hub/

Nandwani, P., & Verma, R. (2021, August 28). A review on sentiment analysis and

emotion detection from text. Social Network Analysis and Mining.

https://doi.org/10.1007/s13278-021-00776-6

Qi, Y., & Shabrina, Z. (2023). Sentiment analysis using Twitter data: a comparative

application of lexicon- and machine-learning-based approach. Social Network

Analysis and Mining, 13(1). https://doi.org/10.1007/s13278-023-01030-x

Quintet, J. (2023). Twitter Imposes Temporary Limits to Curb Data Scraping: Musk.

Retrieved from https://www.iphoneincanada.ca/2023/07/01/twitter-limits-data-

scraping/

Sagum, R., Navarro, M., & Victore, A. (2019). EMOSIS Sentiment Analysis on Tweets

with Emotion and Intensity Level Recognition Considering Ending Punctuation

76
Marks. International Journal of Recent Technology and Engineering, 8(4),

10289–10293. https://doi.org/10.35940/ijrte.d4518.118419

Sasmaz, E., & Tek, F. B. (2021a). Tweet Sentiment Analysis for Cryptocurrencies. In

2021 6th International Conference on Computer Science and Engineering

(UBMK). https://doi.org/10.1109/ubmk52708.2021.9558914

Sentiment Analysis and Emotion Detection on Cryptocurrency Related Tweets Using

Ensemble LSTM-GRU Model. (2022). IEEE Journals & Magazine | IEEE Xplore.

https://ieeexplore.ieee.org/abstract/document/9751065

Shoeb, A. a. M. (2019). EMOTAG – towards an emotion-based analysis of emojis.

https://www.semanticscholar.org/paper/EmoTag-%E2%80%93-Towards-an-

Emotion-Based-Analysis-of-Shoeb-

Raji/024efbeff09fdb26bb5da22310208f94aea05e0b

Ullah, M. S., Marium, S. M., Begum, S. M., & Dipa, N. S. (2020). An algorithm and

method for sentiment analysis using the text and emoticon. ICT Express, 6(4),

357–360. https://doi.org/10.1016/j.icte.2020.07.003

Valdivia, A., Luzón, M. V., Wang, Z., & Herrera, F. (2018, November 1). Consensus vote

models for detecting and filtering neutrality in sentiment analysis. Information

Fusion. https://doi.org/10.1016/j.inffus.2018.03.007

View of Sentiment Analysis Based Direction Prediction in Bitcoin using Deep Learning

Algorithms and Word Embedding Models. (n.d.).

https://ijisae.org/index.php/IJISAE/article/view/1062/616

Wegrzyn-Wolska, K., Bougueroua, L., Yu, H., & Zhong, J. (2016). Explore the Effects of

Emoticons on Twitter Sentiment Analysis. https://doi.org/10.5121/csit.2016.61006

Yılmaz, B. (2023a). Cryptocurrency Sentiment Analysis: Statistics & How It Works.


AIMultiple. https://research.aimultiple.com/cryptocurrency-sentiment-analysis/

77
Appendices

Appendix 1: Instrument

Experiment Paper

Experiment Paper of EMCRYPT: Sentiment Analysis on Cryptocurrency-Related


Tweets with Emotion and Intensity Level Recognition

Materials:
a. Laptop/Computer
b. Microsoft Excel
c. Experiment Paper
D. Cryptocurrency-related tweets

POLARITY
(With combination of keyword, ending punctuation marks, and emoji)

Predicted Polarity Category Total Predicted

Actual Polarity Category Positive Negative

Positive

Negative

Total Expert Label

EMOTION RECOGNITION
(With combination of keyword, ending punctuation marks, and emoji)

Predicted Emotion Category Total


Predicted

Actual
Emotion Happy Sad Surprise Anger Anticipation Fear
Category

Happy

Sad

Surprise

Anger

78
Anticipation

Fear

Total Expert
Label

INTENSITY LEVEL RECOGNITION


(With combination of keyword, ending punctuation marks, and emoji)

Total
Predicted Emotion Category Predicted

Actual Emotion
Category Low Medium High

Low

Medium

High

Total Expert
Label

POLARITY
(Plan-text only)

Predicted Polarity Category Total Predicted

Actual Polarity Category Positive Negative

Positive

Negative

Total Expert Label

79
EMOTION RECOGNITION
(Plan-text only)

Predicted Emotion Category Total


Predicted

Actual
Emotion Happy Sad Surprise Anger Anticipation Fear
Category

Happy

Sad

Surprise

Anger

Anticipation

Fear

Total Expert
Label

INTENSITY LEVEL RECOGNITION


(Plan-text only)

Total Predicted
Predicted Emotion Category

Actual Emotion
Category Low Medium High

Low

Medium

High

Total Expert Label

80
Appendix 2: Correspondence

81
82
83
Appendix 3: Ethical Clearance

84
Appendix 4: Screen Layout of the Tool
Screen capture of the proposed tool.

EmCrypt Onboarding window

EmCrypt User Manual Window

85
EmCrypt Analyzer Window

Sample Input and Output (EmCrypt)

86
Sample Input and Output (Plain-text only)

Sentiment Chart

87
Appendix 5: Thesis Implementation Report

Introduction

The EmCrypt Analyzer is a sentiment analyzer for cryptocurrency-related tweets.


It analyzes the sentiments surrounding cryptocurrency and recognizes the emotion and
intensity level. Users have two options for inputting cryptocurrency-related tweets into the
tool: they can either upload a text file using the "upload file" button or directly type the
cryptocurrency-related tweets in the provided text field. Once the cryptocurrency-related
tweets are evaluated, the proposed tool will generate output that includes the polarity,
emotion, and intensity level identified in the cryptocurrency-related tweets.

Problem Statement

The researchers developed a tool for sentiment analysis with emotion and intensity
level recognition in cryptocurrency-related tweets considering the combination of
keywords, ending punctuation marks, and emoticons.

Specifically, this study is intended to answer the following sub-problems:

1. What is the performance of the tool in cryptocurrency-related tweets considering


the combination keywords, ending punctuation marks, and emoticons in terms of:

a. Precision

b. Recall

c. F-Measure

2. What is the performance of the tool in cryptocurrency-related tweets using Plain-


text only in terms of:

a. Precision

b. Recall

c. F-Measure

88
3. Is there a significant difference in sentiment analysis performance between using
a combination of keywords, ending punctuation marks, and emojis compared to
using plain text only for analyzing sentiments in cryptocurrency-related tweets?

Respondents:
The 1st respondent is Mr. Soren Louis Anore, an expert in cryptocurrency
trading, with knowledge of digital currencies, market trends, and investment strategies. He
is currently a Media Analyst in Accenture.

The 2nd respondent is Mr. Kirck Michael Britos De Leon, a language practitioner
currently working on his Doctorate Degree of Philosophy in English Studies: Language at
the University of the Philippines – Diliman. He is skilled in areas such as translation,
interpretation, and linguistics.

The 3rd respondent is Dr. Rodrigo V. Lopiga, a faculty member at the Polytechnic
University of the Philippines, Department of Psychology, College of Social Sciences and
Development. Experts in specializing in understanding human behavior and emotions.

Time Frame:
Activity Status Date
Chapter 1-3 Documentation Done Month of April – Month of May, 2023
Development of system Done Month of October – Month of December, 2023
Data Gathering Done Month of November – Month of December, 2023
Testing of system Done Month of December 2023
Chapter 4-5 Documentation Done Month of December 2023 – Month of January 2024

Experts Data Annotation Time Frame


Mr. Soren Louis Anore
• Day 1 (November 30, 2023) 9:30 AM-10:30 AM: Home Office
• Day 2 (December 4, 2023) 1:30 PM-2:30 PM: Home Office
• Day 3 (December 13, 2023) 3:00 PM-5:00 PM: Home Office

89
Mr. Kirck Michael Britos De Leon
• Day 1 (November 26, 2023) 7:30 PM-8:00 PM: Via Zoom Meeting
• Day 2 (November 27, 2023) 6:30 PM-7:30 PM: Via Zoom Meeting
• Day 3 (December 13, 2023) 7:30 PM-8:30 PM: Via Zoom Meeting

Dr. Rodrigo V. Lopiga


• Day 1 (November 30, 2023) 1:30 PM-2:30 PM: Faculty of Department of
Psychology
• Day 2 (December 13, 2023) 1:30 PM-3:00 PM: Faculty of Department of
Psychology

Implementation Procedure:
1. Manually gathered cryptocurrency-related tweets from X, formerly known as
Twitter.
2. Three experts manually annotated the Polarity, Emotion, and Intensity Level of the
cryptocurrency-related tweets.
3. Data collection and evaluation were conducted using a Majority Voting approach
by experts. In cases of disagreement among the three experts, the Language
Practitioner Expert decides the Polarity, and the Psychologist is responsible for
determining the Emotion and Intensity Level.
4. Data Acquisition: 900 Training, 450 Testing and 150 Evaluation
5. Input the annotated data from the experts into the system.
6. Filled out the experimental paper and compared the data annotated by the experts
with the outcomes generated by the tool.

Issues and Concerns


The following are the issues and concerns that were encountered during implementation:
1. Include diverse surprises for better recognition.
2. Explore alternative platforms for reliable data.
3. Increase training data for improved classification.
4. Expand detected emoticons for system enhancement.

90
Experiment Results

Experiment Results of EMCRYPT: Sentiment Analysis on Cryptocurrency-Related


Tweets with Emotion and Intensity Level Recognition

POLARITY
(With combination of keyword, ending punctuation marks, and emoji)

Predicted Polarity Category Total TP TN FP FN


Predicted
Actual Polarity Category Positive Negative

Positive 87 8 95 87 49 6 8

Negative 6 49 55 49 87 8 6

Total Expert Label 93 57 150

EMOTION RECOGNITION
(With combination of keyword, ending punctuation marks, and emoji)

Predicted Emotion Category Total


Predicted

Actual TP TN FP FN
Emotion Happy Sad Surprise Anger Anticipation Fear
Category

Happy 38 3 1 0 0 0 42 38 91 1 4

Sad 0 22 1 1 0 4 28 22 107 3 6

Surprise 1 0 15 1 1 0 18 15 114 6 3

Anger 0 0 0 6 0 1 7 6 123 3 1

Anticipation 0 0 2 1 32 0 35 32 97 1 3

Fear 0 2 2 0 0 16 20 16 113 5 4

Total Expert
Label 39 27 21 9 33 21 150

91
INTENSITY LEVEL RECOGNITION
(With combination of keyword, ending punctuation marks, and emoji)

Total Predicted TP TN FP FN
Predicted Emotion Category

Actual Emotion
Category Low Medium High

Low 28 1 4 33 28 106 4 5

Medium 3 54 5 62 54 80 3 8

High 1 2 52 55 52 82 9 3

Total Expert
Label 32 57 61 150

POLARITY
(Plan-text only)

Predicted Polarity Total TP TN FP FN


Category Predicted

Actual Polarity Category Positive Negative

Positive 84 12 96 84 43 11 12

Negative 11 43 54 43 84 12 11

Total Expert Label 95 55 150

92
EMOTION RECOGNITION
(Plan-text only)

Predicted Emotion Category Total


Predicted

Actual TP TN FP TN
Emotion Happy Sad Surprise Anger Anticipation Fear
Category

Happy 38 1 2 0 3 0 44 38 84 5 6

Sad 0 24 1 0 0 2 27 24 98 8 3

Surprise 2 0 11 0 3 1 17 11 111 3 6

Anger 0 1 0 5 0 1 7 5 117 1 2

Anticipation 3 1 0 1 30 0 35 30 92 7 5

Fear 0 5 0 0 1 14 20 14 108 4 6

Total Expert
Label 43 32 14 6 37 18 150

INTENSITY LEVEL RECOGNITION


(Plan-text only)

Total TP TN FP FN
Predicted Emotion Predicted
Category

Actual Emotion
Category Low Medium High

Low 46 14 4 64 46 71 6 18

Medium 3 41 5 49 41 76 18 8

High 3 4 30 37 30 87 9 7

Total Expert Label 52 59 39 150

93
Proof of Implementation
Cryptocurrency Trader Expert: Mr. Soren Louis Anore

94
Language Practitioner Expert: Mr. Kirck Michael Britos De Leon

95
Psychology Expert: Dr. Rodrigo V. Lopiga

96
Appendix 6:
Biographical Statement

Casinsinan, Cj C. He was born in Pililla, Rizal on October 8, 2002. He attended


Halayhayin Elementary School for his basic education, continued through Pililla National
High School for his junior high school, and progressed to Our Lady of Fatima University
for his senior high school years. Currently a 4th year student at Polytechnic University of
the Philippines. Taking the course of Bachelor of Science in Computer Science. Not only
does he have an enthusiasm for design, but he is also a highly creative individual with
extensive knowledge in various programming languages. This unique combination of
creativity and technical skill distinguishes him in his field. His Computer Science field of
Interest are Software Engineering, UI/UX Design, and Artificial Intelligence.

Dayag, Jahren Hans P. He was born in Antipolo, City on December 29, 2001. He
attended Miljohn Christian Academy for his basic education, continued through Tomas
Claudio Colleges for his junior high school, and STI College for his senior high school
years. Currently a 4th year student at Polytechnic University of the Philippines. He is a
proficient Computer Science student with a keen interest and expertise in software
engineering and web development. He shows a solid grasp of technological principles and
applies them effectively to craft visually appealing and user-friendly designs. His Computer
Science field of Software Engineering, Artificial Intelligence, and Web Development.

97
Ebue, Lyndon Jeff E. He was born in Masinloc, Zambales on December 23, 2001.
He attended Taltal Elementary School for his basic education, continued through Northern
Zambales College, Inc. for his junior high school, and for his senior high school years.
Currently a 4th year student at Polytechnic University of the Philippines. He is a
knowledgeable and detail-oriented Computer Science student with a strong aptitude for
UI design, graphic design, and photography. Possesses a deep understanding of
technological principles and is able to apply them to create visually appealing and user-
friendly designs. His Computer Science field of Interest are UX/UX and Software
Engineering.

Tumbaga, John Jeffrei O. He was born on October 29, 2001, on Valenzuela City.
He attended Canumay West Elementary School for his basic education, continued through
Canumay West National High School for his junior high school, and progressed to
Pamantasan ng Lungsod ng Valenzuela for his senior high school years. Currently a 4th
year student at Polytechnic University of the Philippines. Taking the course of Bachelor of
Science in Computer Science. He is a diverse tech enthusiast and is eager to blend
creativity with analytical skills to craft innovative digital solutions. His Computer Science
field of Interest are Data Analyst, Website Development, Software Engineer, and UI/UX
Designer.

98

You might also like