Cyberbullying Detection Based On Emotion

Received 15 March 2023, accepted 14 May 2023, date of publication 29 May 2023, date of current version 7 June 2023.
Digital Object Identifier 10.1109/ACCESS.2023.3280556
Cyberbullying Detection Based on Emotion

MOHAMMED AL-HASHEDI1 , LAY-KI SOON2 , HUI-NGO GOH 1,
AMY HUI LAN LIM1 , AND EU-GENE SIEW 3

1 Facultyof Computing and Informatics, Multimedia University, Cyberjaya, Selangor 63100, Malaysia
2 School of Information Technology, Monash University Malaysia, Subang Jaya, Selangor 47500, Malaysia
3 School of Business, Monash University Malaysia, Subang Jaya, Selangor 47500, Malaysia
Corresponding author: Hui-Ngo Goh (hngoh@mmu.edu.my)

This work was supported by the Fundamental Research Grant Scheme, Ministry of Education, Malaysia, under
Grant FRGS/1/2017/ICT02/MMU/02/6.
ABSTRACT Due to the detrimental consequences caused by cyberbullying, a great deal of research has been
undertaken to propose effective techniques to resolve this reoccurring problem. The research presented in this
paper is motivated by the fact that negative emotions can be caused by cyberbullying. This paper proposes
cyberbullying detection models that are trained based on contextual, emotions and sentiment features.
An Emotion Detection Model (EDM) was constructed using Twitter datasets that have been improved in
terms of its annotations. Emotions and sentiment were extracted from cyberbullying datasets using EDM and
lexicons based. Two cyberbullying datasets from Wikipedia and Twitter respectively were further improved
by comprehensive annotation of emotion and sentiment features. The results show that anger, fear and guilt
were the major emotions associated with cyberbullying. Subsequently, the extracted emotions were used as
features in addition to contextual and sentiment features to train models for cyberbullying detection. The
results demonstrate that using emotion features and sentiment has improved the performance of detecting
cyberbullying by 0.5 to 0.6 recall. The proposed models also outperformed the state-of-the-art models by
a 0.7 f1-score. The main contribution of this work is two-fold, which includes a comprehensive emotion-
annotated dataset for cyberbullying detection, and an empirical proof of emotions as effective features for
cyberbullying detection.
INDEX TERMS Cyberbullying, BERT, emotion mining, sentiment analysis.
I. INTRODUCTION affective computing that aims to identify, analyze, and eval-

The advancement of information and communication tech- uate the human state of mind towards various events or
nologies has provided an avenue for the online community encounters [8]. Emotion analysis has significantly impacted
to publish and respond to user-generated content (UGC). different sectors such as the stock market, consumers’
Unfortunately, such convenience has been abused by online feedback, and recommendations [9], [10], [11]. However,
bullies, causing harm to others via threatening, harassing, researchers have not deeply taken emotion analysis into con-
humiliating, intimidating, manipulating, or controlling tar- sideration for cyberbullying detection. Therefore, including
geted victims [1]. These acts, termed ‘cyberbullying,’ are emotion features to facilitate cyberbullying detection can
defined as the willful and repeated harm inflicted using elec- improve detection accuracy due to the strong relationship
tronic devices [2], [3], [4]. between cyberbullying and negative emotions.
Cyberbullying (CB) can have a severe impact on a victim’s The textual format of cyberbullying comes in two main
mental health, ranging from negative emotions (anger, fear, forms, either via explicitly articulated profane words or
sadness, guilt, etc.) to depression, and even suicidal thoughts implicitly expressed ironic or sarcastic statements that do not
[5], [6], [7]. Emotion mining is one of the focus areas in contain vulgar words. Detecting sarcastic and ironic expres-
sions is a challenging and problematic task [12]. Based on
The associate editor coordinating the review of this manuscript and the literature review and experiments, the datasets used for
approving it for publication was Giacomo Fiumara . cyberbullying currently suffer from several limitations, which
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.
VOLUME 11, 2023 For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/ 53907
M. Al-Hashedi et al.: Cyberbullying Detection Based on Emotion
are data sparsity, a large imbalance between labels, and lim- • An improved and thoroughly processed cyberbullying
itations to specific social networking sites [13]. The efforts dataset annotated with emotion and sentiment features.
to create an autonomous cyberbullying detection model have • A label validation for the emotion dataset that discards
been unsuccessful so far due to the high speed and diversity instances with inaccurate annotations.
of user-generated content (UGC) on the Internet [14]. • A proposed DL-based cyberbullying model that incor-
The solution proposed in this work is driven by the hypoth- porates semantic, syntactic, emotion, and sentiment fea-
esis that the detection of cyberbullying can be improved tures.
through emotion mining. It consists of three stages. Table 1 defines the variables and notations used in this
The first stage is to obtain a clean, balanced, and feature- paper to serve readers as reference points to easily compre-
rich dataset. The effectiveness of machine learning mod- hend the meaning of these terms.
els is heavily dependent on the quality of the training
datasets [15], [16]. However, datasets that cover all forms TABLE 1. Variables and notations used in this paper.
of cyberbullying are rare [13], [17]. Additionally, the nature
of UGC varies across different social networking sites. For
example, tweets on Twitter have a different character limit
compared to Facebook posts. As a result, the first stage of
our proposed solution focuses on cleaning, transforming, and
integrating data from various sources to produce a clean,
balanced, and feature-rich dataset. This stage accounts for at
least 70% of the data lifecycle in this work.
The second stage focuses on extracting features from the
user-generated content (UGC), including semantic, syntactic,
emotional, and sentiment features. The correct extraction
and integration of these features will lead to a robust and
effective cyberbullying detection model, regardless of the
social media platform used. Semantic and syntactic features
can be obtained by using language models such as BERT
(Bidirectional Encoder Representations from Transformers).
In this paper, emotions are extracted from text using both
supervised and unsupervised approaches. The supervised
approach involves applying deep learning (DL) algorithms
on an emotion dataset obtained from Twitter. The tweets are
classified into specific emotions using hashtags. To validate
the automatic labeling for the emotion dataset, three valida-
tion steps have been proposed and implemented. These steps
aim to verify the sentiment and emotions of instances using
lexicons and to train emotion models using small, validated
datasets.
On the other hand, the unsupervised approach is based on a
predefined lexicon, which is a keyword-matching technique
that assigns the dominant emotions associated with a word.
This is a word-level technique that helps identify explicit
expressions in the text. Along with emotion features, senti-
ment features are extracted from text, and an overall polarity
is assigned to each instance.
After extracting the cyberbullying features, deep learning
DL models are trained in the third stage to classify the data
instances into the discrete categories of cyberbullying or
non-cyberbullying. Two sets of experiments are conducted.
The first set uses selected language models, including BERT
(BERT base and BERT large) and XLNet. The second set
includes additional features, namely emotions and sentiment.
The results show that the inclusion of these additional features The remainder of the paper is structured as follows.
has improved the performance of the cyberbullying detection Section II presents the related works and highlights the
models by 5% to 6%. In summary, the main contributions of research gaps. Section III discusses the preparation and pre-
this work are: processing of the datasets. Section IV presents an EDM
53908 VOLUME 11, 2023

that aims to identify emotions associated with cyberbullying. models have shown a huge improvement in natural language
In Section V, a series of experiments were presented, fol- processing (NLP) applications such as sentiment analysis,
lowed by the results and analysis in Section VI. Section VII question answering, topic classification and machine trans-
concludes and offers some recommendations for future work. lation [40], [42], [43]. With BERT, classification can be
done to classify text into the binary cyberbullying or non-
cyberbullying categories.
II. RELATED WORKS Since cyberbullying has a strong impact on the victim’s
In the past, different studies attempt to detect cyberbullying psychology, emotions and sentiment can be extracted from
through different sets of features [13], [18], [19], [20], [21]. the content to further improve the detection [44], [45]. Sen-
These features varied and are dependent on the social media timent analysis has been widely used in different domains,
platforms that are being explored. For instance, demographic such as products and service reviews, political analysis,
information such as gender, personality, number of follow- etc. [46], [47], [48]. However, there are very few studies that
ers, and date of account creation is adopted in some stud- incorporate emotion and sentiment in detecting cyberbully-
ies [22], [23], [24]. Other works have identified bullies using ing, particularly for emotions [20], [49], [50], [59].
tweets, user profile, and network-based attributes on Twitter. For sentiment analysis, there are typically three main
These features focused on distinguishing bullies from regular classes, namely, positive, negative and neutral. Intuitively,
users [25]. Other features such as a number of comments and negative sentiment is more relevant for detecting cyberbully-
likes are also taken into consideration by [26]. In addition, ing. For emotion categories, guilt and love with the six basic
features such as binary sentiment are used to identify cyber- emotions defined by [51] where emotions (anger, disgust,
bullying by related works [17], [27], [28]. fear, joy, sadness, and surprise) are used in this work. Emo-
Profile and usage features such as gender, number of fol- tions are more refined, as they portray the exact feeling of the
lowers, and hours spent online can be faked, or may not be victims. By identifying the type of emotion in a text, more
available due to data protection policies practiced by some information can be revealed as to whether cyberbullying is
social media platforms. Furthermore, the available features taking place. Therefore, this study focuses on emotions due
vary from one social media platform to another. Many works to the strong relationship between the impact of bullying and
rely on the textual UGC itself for cyberbullying detection. negative emotions.
The content is represented in word vectors or language mod-
els [13], [29]. In addition, there are other features that can
be extracted from the textual UGC. These features are inde- III. DATASETS
pendent of the target social media platform since they are This section describes the datasets employed in this work.
generated purely from the UGC. Firstly, we examine the cyberbullying datasets used to train
Generally, there are two categories of such features, cyberbullying detection models (CDMs). Secondly, we detail
namely, syntactic, and semantic features. Syntactic features the emotion dataset used to construct the emotion detection
are normally obtained from sentence parsing or depen- model (EDM).
dency structure. These include part-of-speech (POS) tag- There are two major datasets used for CDMs. Firstly,
ging, named entity recognition (NER) and term frequency the toxic dataset was sourced from Wikipedia, and the hate
inverse document frequency (TF-IDF). The TF-IDF is used speech dataset was crawled from Twitter. Wikipedia and
to weigh the importance of a word in a document and con- Twitter are two different platforms in terms of text format.
vert words to numerical embedding. There are some stud- Wikipedia comments consist of formal and lengthy com-
ies that use this technique to convert UGC to embeddings ments, while tweets are short, informal and may include
and encode them to machine learning classifiers to classify misspelled words. Therefore, it is essential to train CDMs
cyberbullying [30], [31]. based on these two datasets.
On the other hand, semantic features aim to enrich the To extract emotions from CB datasets, two emotion
UGC from the meaning perspective. Word embeddings is datasets were combined to train EDM namely Cleaned Bal-
a feature extraction technique to calculate a word’s weight anced Emotional Tweets (CBET) and Twitter Emotion Cor-
based on its contexts. There are word embeddings models pus (TEC). Both datasets were extracted from Twitter and
such as word2vec and Glove. Works that have implemented labeled using hashtag keywords, such as #anger, #fear, etc.
these models are [13] and [32]. Furthermore, more seman- Human judgment was not examined during the labeling
tic features were included in the UGC, for instance, some process. Therefore, a validation procedure was followed to
works extend a corpus by including a dictionary of impor- discard any potentially incorrect instances. The reason for
tant words [33], [34], [35], as well as the sentiment of the combining the two datasets is to increase the size of the
text [36], [37], [38], [39]. The extracted syntactic and seman- datasets after discarding wrong labeled instances. A detailed
tic features are then transformed into textual representation description of each dataset is provided in the following sec-
models through more advanced models such as BERT [40] tions. Table 2 summarizes the four major datasets used for
or XLNet [41]. These state-of-the-art (SOTA) language CDMs and EDM.
VOLUME 11, 2023 53909

TABLE 2. Summary of datasets used in this work. and non-English letters, lemmatizing the texts, and converting
the words to lowercase. Digits-containing words were also
removed as they may introduce noise to word representation
models. Stop words were not removed as BERT can handle
contextual information effectively. This dataset was chosen
for cyberbullying detection due to its rich content and appro-
priate size for building robust models
b: DATA SAMPLING
To address the imbalanced nature of the dataset, this study
proposes an under-sampling technique based on an anal-
ysis of the data. The non-cyberbullying class has about
193,000 comments, while the cyberbullying class has about
21,000 comments. The under-sampling approach applied to
the majority class aims to improve the class balance and
ensure relatively similar text lengths for both classes. Rather
than simply discarding instances randomly, a careful proce-
dure is designed to construct a useful dataset.
A. CYBERBULLYING DATASETS To under-sample the non-cyberbullying class, the number
In this study, considerable effort has been made to data pre- of words in each instance is counted. Instances with less than
processing, including acquisition, cleaning, and sampling. or equal to two words or more than 254 words are removed in
Two main challenges faced by cyberbullying datasets are data CB and Non-CB classes. This step is taken because instances
sparsity and imbalanced datasets. Data sparsity in cyberbul- with fewer than two words are often meaningless and could
lying refers to a shortage in the number of instances that cover negatively impact the performance of the model. Instances
both explicit and implicit forms of cyberbullying. Another with more than 254 words may not be fully captured by the
significant problem that can affect the performance of any DL vector due to its size. As a result of this step, 638 instances
machine learning model is the imbalance in the ratio between were removed from CB class which represents only 2.8% of
cyberbullying and non-cyberbullying classes. In cyberbul- all the CB instances.
lying datasets, positive instances, which are cyberbullying, Next, to ensure that the two classes have similar text
are in the minority. For example, the Formspring dataset lengths, instances are categorized into 12 groups based on
mostly contains insults, lacking other forms of cyberbully- the word count. Approximately 73% of the instances contain
ing [52]. Additionally, there is a large imbalance in the ratio 3 to 60 words. These instances are divided into six categories,
between the cyberbullying class (CB) and non-cyberbullying each with a range of 10 words. Instances with more than
class (non-CB), with positive instances consisting of only 60 words are divided into six categories with different ranges,
6% of the total dataset. To overcome these limitations, this as shown in Fig.1.
study has used two datasets from two different social media Thirdly, to achieve a similar length of instances and the
sources. desired ratio of 25-75% between the CB and non-CB classes,
three times the number of CB instances are selected from
the non-CB instances for each category, while the remaining
1) TOXIC DATASETS
instances are discarded. Fig. 1. displays the 12 categories
a: DATASET ACQUISITION AND PRE-PROCESSING with the word ranges for each category on the x-axis, and the
The Toxic dataset is collected from the Wikipedia plat- number of instances for the CB, non-CB, and under-sampled
form by the Conversation AI team, which was founded by non-CB classes on the y-axis. The number of under-sampled
Jigsaw and Google https://www.kaggle.com/c/jigsaw-toxic- non-CB instances is calculated by:
comment-classification-challenge/data. The datasets are pub-
licly shared on Kaggle for competition and research use.
The goal of these efforts is to support research towards samplednonCB = #CB × 3 (1)
making the internet a safer place by identifying toxic com-
ments, which include rude, disrespectful, threatening, insult- Ultimately, the aim is to reduce the imbalance from
ing, and hate speech. This dataset is especially valuable 10-90% to 25-75%. In other words, the non-CB should be
for detecting cyberbullying because it contains instances triple the size of the cyberbullying class. The ratio is deter-
of explicit and implicit expressions, overcoming the spar- mined based on the performance of the BERT representation
sity problem faced by previous cyberbullying datasets. The in training the model. As the CB contains quite a good number
dataset contains approximately 214,000 comments, with only of instances, this improved ratio, albeit still imbalanced, does
10% being classified as cyberbullying. Pre-processing steps not negatively affect the training, as shown in the experiment
have been applied, such as removing punctuation, symbols, results.
53910 VOLUME 11, 2023

within cyberbullying instances. However, there are several

challenges in creating accurate emotion datasets. Some exist-
ing emotion datasets are automatically labeled by using
keyword hashtags, where an instance is tagged with a spe-
cific emotion class if it is hashtagged with an emotion term
(#anger, #fear, etc.). This approach is often used due to the
high cost of manual labeling by experts. The hashtags are
believed to accurately reflect the intent of the user who posted
the content.
However, there are many instances of tweets that do not
align with the emotion hashtags. Some examples of these
cases are listed in Table 4. Given the potential inaccuracies
FIGURE 1. Dataset distribution for each category before and after of hashtags, this study proposes a process to validate the
sampling for both cyberbullying and non-cyberbullying classes.
instances’ labels.
2) HATE SPEECH AND OFFENSIVE DATASET TABLE 4. Some of the instances have been wrongly labeled by using
(TWITTER DATASETS) hashtags and ignored after the validation procedure.
Due to the popularity of Twitter and the difference in nature
between it and Wikipedia (toxic dataset), it is necessary
to include tweets when building a cyberbullying detection
model. The toxic dataset, derived from the Wikipedia com-
ments page, is cleaner, longer, and contains fewer misspelled
words. Meanwhile, Twitter datasets often consist of short,
informal text and may include intentional spelling errors and
user tags, links, and hashtags.
In this study, a hate speech and offensive language dataset
is improved and utilized to create a more robust model. Hate
speech is another term for bullying and includes threats,
insults, and offensive language. This dataset was deemed suit-
able for building cyberbullying detection models with some
improvements. The dataset was collected using the Twitter
API and a hate speech lexicon compiled by hatespeech.org. In this study, two emotion datasets are validated and
Approximately 25,000 tweets were selected and manually utilized for the development of emotion models. The first
labeled by CrowdFlower [53]. dataset is the Cleaned Balanced Emotional Tweets (CBET)
Out of the 25,000 tweets, 83% were classified as hate dataset [54], which is based on a basic emotions model [51]
speech or offensive, while the remaining 17% were catego- that includes anger, disgust, fear, joy, sadness, and surprise.
rized as the negative class. To address the imbalance between This dataset also includes the emotions of thankfulness, love,
the two classes, another publicly available Twitter dataset and guilt. The second dataset is the Twitter Emotion Corpus
from Kaggle was utilized. This dataset was labeled as insults (TEC) [55], which is also crawled from Twitter using the
or non-insults instances. The combined dataset named the Twitter API and labeled automatically using hashtags.
cyberbullying dataset (CD), is shown in Table 3 after over- To produce more accurate labeled datasets, three validation
sampling to increase the size of the negative class. steps are carried out, as depicted in Fig. 2.
The first validation step is focused on checking the general
TABLE 3. Data distribution before and after sampling of cyberbullying
sentiment of each instance. Sentiment analysis is used to
datasets. determine the overall emotion expressed in the text, whether
it is positive, negative, or neutral. A lexicon-based approach
is used to identify the sentiment polarity of each instance.
The AFINN lexicon, which is developed based on microblog
English word lists [56], is utilized in this process. Each
word in the lexicon has been manually rated for valence,
with an integer score ranging from +5 (positive) to -5 (neg-
ative). Before identifying the sentiment of each instance,
misspelled words are corrected using the Norvig probabilistic
B. EMOTION DATASETS method [57]. A lexicon-based approach requires correcting
Developing emotion models based on clean and accurately misspelled words to produce accurate results. If an instance
labeled datasets is a critical step in identifying emotions is labeled with a negative emotion (anger, fear, sadness, guilt,
VOLUME 11, 2023 53911

and 20% for testing. The results of the model show an 89%
recall and precision value. The model is then used to predict
the emotions for the instances produced in the previous step.
The correctly predicted instances are preserved, while the
others are discarded. The trained DL model is built to validate
instances with the four relevant emotions only. Due to the
unavailability of manually labeled datasets for the other emo-
tions, those instances are checked with the first two validation
steps only. The produced dataset after the validation process
is shown in Fig. 3.
There are 32,000 instances spanning eight fundamental
emotions. The validated emotion dataset (VED) covers five
negative emotions (anger, fear, sadness, guilt and disgust) and
three positive emotions (joy, love and surprise).
FIGURE 2. Validation steps to validate the annotation of emotion

datasets.
and disgust) but the identified sentiment is positive, it will be

discarded, and vice versa.
Secondly, an emotion lexicon is utilized to validate the
labeling of the automatic process. The NRC Emotion Lexicon FIGURE 3. Validated emotion dataset instances distribution.
is employed to assess the overall emotion of a given instance,

which includes eight emotions (anger, fear, anticipation, trust,
surprise, sadness, joy, and disgust) and a binary value of IV. EMOTION DETECTION MODEL (EDM)
positive or negative. Each term in the lexicon is linked to one This section outlines the training process for the emotion
or multiple emotions, which have been annotated manually detection model in our proposed solution, with a focus on
through crowdsourcing. The lexicon is used to generate a list emotions related to cyberbullying. As shown in Fig. 4. the
of emotions for every instance in the dataset through keyword EDM is trained using the validated emotion dataset (VED),
lookup. To determine the emotion type of an instance, the which is a multi-class classification where each instance is
emotions in the lexicon are categorized into negative and assigned to a single emotion. Similar to the cleaning steps
positive emotions. Negative emotions include anger, fear, applied to the Toxic Dataset, the VED dataset underwent data
sadness, disgust, and negative sentiment, while positive emo- cleaning. The data was split into 80% for training and 20% for
tions include the rest. A formula is then devised to classify testing, and a stratification technique was employed to ensure
an instance as overall positive or negative based on the list a fair representation of each emotion in the dataset. The BERT
of emotions generated. If the majority of emotions for the base model was chosen for word representations, as it is
instance are negative, the instance is considered overall neg- capable of effectively capturing the contextual features of
ative and vice versa. Finally, a comparison is made between each emotion. The specific BERT base used in the experiment
the labels of the dataset and the emotion class produced by the has 768 feedforward network units, 12 attention heads, and
lexicon. If there is any discrepancy, the instance is discarded. 110 million parameters.
The last validation step is performed through a trained DL
model using a short and manually labeled dataset. The name
of the dataset is Emotion Intensity (EmoInt) by [58]. The
EmoInt dataset contains four emotions: anger, fear, sadness
and joy. Since the priority is mainly on the emotions that
are associated with cyberbullying, it has been found to be a
significant dataset that contains three negative emotions. The FIGURE 4. Emotion detection model (EDM).
dataset consists of 7,000 tweets, which is sufficiently small
to build a robust emotion detection model. BERT is used for The deep learning model for emotion detection is built
word representation, and the data is split to 80% for training using the Keras libraries. Keras is a high-level library built
53912 VOLUME 11, 2023

on top of TensorFlow. During BERT tokenization, 15% of

the input words are masked, meaning that the masked words
must be predicted based on the surrounding words. The first
input token is supplied with the [CLS] token for classification
purposes. The input sequence of words is encoded into the
input layer, which then flows through the stack. Self-attention
is applied at each layer, and the results of each layer are passed
through the feed-forward network. The features produced
by BERT are encoded into hidden layers, which perform
activation functions to normalize the output of the nodes.
In this experiment, a Rectified Linear Unit (ReLU) activa- FIGURE 5. Emotion distribution for CB instances in the toxic and Twitter
tion function is attached to a dense layer with 256 nodes, datasets.
and the outputs are then forwarded to a classification layer

with eight nodes, each representing a distinct emotion. The
Softmax activation function is used in the classification layer
text have been made in recent years, and various types
to normalize the outputs and convert them into probabilities
of language models have been explored for cyberbullying
that sum up to one.
detection [13].
The most recent and highly regarded model is BERT, which
A. EMOTIONS ASSOCIATED WITH CYBERBULLYING
has created a sensation in the machine learning community
To empirically examine the emotions associated with cyber-
for a wide range of NLP tasks. BERT uses bidirectional
bullying, the EDM built using the VED was applied to clas-
training of a transformer, which considers text sequences
sify the emotions of instances in the CD for the Toxic and
from both the left and right directions, providing a deeper
Twitter datasets. The labeled CD was fed into the EDM for
understanding of each word and its context in a text sequence.
predicting the type of emotion each instance is associated
The pre-trained BERT model has been trained on large
with. The Toxic dataset in the CD contains 21,830 instances,
corpora, such as the Book Corpus and English Wikipedia.
with 92% classified as negative emotions (anger, fear, sad-
To fine-tune BERT for specific NLP tasks, a classification
ness, disgust, and guilt), and 8% as positive emotions (joy,
layer is added on top of the pre-trained core model. BERT
love, and surprise). Anger was the most prevalent emotion,
has two versions: BERT base, which has 768 feedforward
followed by fear. The Twitter dataset in the CD contains
network units, 12 attention heads, and 110 million param-
20,620 cyberbullying instances, with 88% categorized as neg-
eters, and BERT large, which has 24 layers, 1024 feedfor-
ative emotions and the rest as positive. Guilt was the most
ward network units, 16 attention heads, and 336 million
prevalent emotion, followed by disgust. Fig. 5. shows the
parameters [40].
emotion distribution of cyberbullying for both datasets. The
Another language model used in the experiments is
results demonstrate that negative emotions can be used as
XLNet, which is based on bidirectional encoding and
indicators in detecting cyberbullying
is trained through generalized autoregressive pre-training.
Unlike BERT, which predicts only 15% of the tokens in a
V. EXPERIMENTS
sequence, XLNet predicts all tokens in random order. The
This section presents experiments conducted on the CD,
XLNet base model has 12 layers, 768 feedforward network
consisting of the Twitter and Toxic datasets, to evaluate the
units, and 110 million parameters [41].
efficacy of the proposed method. Given the challenges in
detecting cyberbullying content, this work investigated var-
ious features that could enhance the detection of cyberbul- B. PART 1 (USING WORD REPRESENTATIONS VIA
lying. The experiments were divided into two parts. In the LANGUAGE MODELS ONLY)
first part, only word representation models were used without In this work, experiments are conducted to detect cyber-
any additional features, while in the second part, emotion and bullying using BERT base, BERT large, and XLNet pre-
sentiment features were extracted from the text and combined trained word representation models. The experiments are
with the word representations. performed using the PyTorch machine learning framework
and Pytorch-transformers, an open-source Python library that
A. LANGUAGE MODELS supports NLP applications. The high processing needs of
To extract semantic features from text, words need to be these pre-trained models are supported by Amazon Sage-
transformed into meaningful embeddings. Word embeddings Maker, a cloud computing service for building, training, and
represent words as numerical vectors that can be fed into deploying machine learning models. The experiments are
neural networks. These embeddings capture the contex- conducted on the Toxic and Twitter datasets, with the data
tual relationships between words in a sequence, and can split 80% for training and 20% for testing. Each model has
generate domain-specific textual features based on labeled its own configuration, which is determined by the pre-trained
data. Advances in language representation extraction from models.
VOLUME 11, 2023 53913

• BERT base
• BERT base + emotions (EDM)
• BERT base + emotion (EDM) + lexicon emotions
• BERT base + Sentiment
• All features.
These experiments are conducted using the Keras open-
source library based on Tensorflow, which provides a Python
interface for artificial neural networks. The reason for using
Keras instead of Pytorch is the flexibility to include features
as inputs for the neural network built. The Keras functional
model API is used since it can handle models with multiple
inputs.
FIGURE 6. Extract emotion and sentiment features to be incorporated for
CB Detection Models (CDMs).
D. EVALUATION METRICS
Recall, precision, and F1 measure are used to measure the per-
C. PART 2 (USING EMOTION AND SENTIMENT FEATURES formance of the detection models. These metrics are widely
WITH WORD REPRESENTATIONS VIA LANGUAGE MODELS) used in classification models. The importance of each metric
Fig. 6. shows our proposed solution, which includes senti- depends on the classified topic.
ment and emotion features in the CB datasets. The proposed Recall: Calculates the percentage of actual positives a
solution starts with the aforementioned data pre-processing model correctly classified. In our case, the true positive is the
steps. Each instance of the CB dataset is then fed into the number of CB instances that have been correctly identified.
EDM to classify the type of emotion associated with it, The recall is important since it identifies how many actual CB
which can be anger, fear, sadness, guilt, disgust, joy, love, instances are identified correctly. The false negatives are the
or surprise. A list of emotions is also extracted based on the CB instances that have been flagged as non-CB.
NRC lexicon, matching each word in the instance to the types Precision: Measures the percentage of predicted CB
of emotion it reveals. The AFFIN sentiment lexicon produces instances that were correctly classified out of all the actual
a sentiment polarity and sentiment score for each instance. CB instances. In real-time applications, precision is more
These features - emotion based on EDM, list of emotions, efficient to be used, since false positives are given higher
sentiment polarity, and sentiment score - were all added to priority.
the CB datasets for each instance. We hypothesize that these F1-score: measures the weighted average of precision and
features provide more information for the trained model to recall. It is calculated as the following formula:
detect cyberbullying scenarios. The features which have been 2 × ((precision × recall)/(precision + recall)) (2)
considered are:
• Contextual features (BERT base): These features repre- VI. RESULTS AND DISCUSSION
sent semantic, syntactic and contextual features that can In this section, the results obtained during the experimen-
be extracted using word representations models. tal simulations are presented and analyzed. The results are
• Emotions based on EDM: The overall emotion of an obtained using Pytorch and SageMaker for BERT base,
instance that is detected based on the emotion detection XLNet and BERT large.
model (EDM).
• Emotions based on Lexicon: This is based on the NRC A. RESULTS OF THE EMOTION DETECTION MODEL (EDM)
Emotion Lexicon, which gives a list of emotions based The performance of the model was measured using recall,
on the words of the cyberbullying instance. The list of precision, and F1-score, which were calculated using the
emotions is further collated to one single overall emo- Sklearn Python library. The model achieved a score of
tion, which is either negative or positive, based on the 0.84 for all three metrics, with only minor differences
frequency formula as described in the second step of between them. This result is considered efficient given the
validating the emotion dataset. number of classes and the size of the dataset in each class.
• Sentiment: The sentiment polarity is extracted based on To gain a deeper understanding of the model’s performance,
the AFINN lexicon. It contains a list of English words we calculated the true positive rate (TPR) for each emotion.
that are manually assigned a score for valence. Table 5 displays the TPR, the number of predicted instances,
Several experiments are conducted to determine the effec- and the total number of instances for each emotion. As seen
tiveness of the detection model based on the various textual in the table, the model performed well in correctly classifying
features. However, taking into consideration the strength of instances of fear and joy, while love and guilt had the lowest
the word representations in the pre-trained models, more TPR with scores of 0.67 and 0.75, respectively. The results
attention has been given to BERT. The list of settings con- indicate that the more instances a class has in the training
sidered are: dataset, the better the model performs in predicting instances
53914 VOLUME 11, 2023

TABLE 5. True positive rate for each emotion. TABLE 7. Results for twitter hate speech wikipedia toxic dataset.
TABLE 8. Results of cyberbullying detection models with added features

for toxic dataset.
of that class. Despite this, the overall performance of the

model is good and it can be further utilized to detect emotions
in cyberbullying cases.
B. WORD REPRESENTATION MODELS

Table 6 and Table 7 present the results of using the pre-trained
models BERT base, XLNet, and BERT large for cyberbully-
ing detection. The differences between the models are minor
in both datasets. BERT base demonstrates consistency across
all three metrics in both datasets and outperforms XLNet and
BERT large in recall and F1-score on the Toxic dataset by 8%
recall and 4% F1-score. XLNet and BERT large, however,
achieve higher precision than BERT base by 1%. XLNet had
the lowest recall of 82%. For the Twitter dataset, the results
dataset, all emotion features and sentiment features added
are high, and there is no significant difference between the
to BERT word representation models have outperformed the
three models, as shown in Table 7. BERT base had a higher
BERT only model by 5% and 6% recall, and F1-score by 2%
precision than BERT large and XLNet, while BERT large had
and 3% respectively. Regarding the precision measure, BERT
a higher recall than BERT base and XLNet. All three models
with emotion detected by EDM has produced the highest
scored 97% on the Twitter dataset in terms of F1-score. The
score of 0.95 on the Toxic dataset
overall results indicate that these models can perform well in
Using all emotion features achieved the highest recall and
detecting cyberbullying and there is no significant difference
F1 measure values of 0.98 and 0.97 respectively. For the pre-
in using BERT base, BERT Large, or XLNet to train the
cision metric, BERT + emotion EDM + emotion extracted
models.
from the NRC lexicon model and BERT + sentiment model
produced the highest precision of 0.97. Emotion features and
TABLE 6. Results for wikipedia toxic dataset.
sentiment features added to BERT have performed consis-
tently well on both datasets.
The Toxic dataset shows relatively lower scores than the
Twitter dataset, due to the imbalance ratio between cyber-
bullying and non-cyberbullying classes, where the minority
instances belong to the cyberbullying class. However, it is
more practical for real-time applications to handle this imbal-
ance, because the majority of instances are not classified as
cyberbullying in some applications.
On the other hand, the Twitter dataset produced relatively
higher results than the Toxic dataset, due to the distribution
C. BERT BASE RESULTS WITH ADDED FEATURES and the nature of the dataset. The cyberbullying class contains
Table 8 shows all the results of cyberbullying detection mod- the majority of the instances, and the Twitter instances are
els with the additional extracted features for the Toxic dataset, relatively shorter than in the Toxic datasets. Therefore, mod-
while Table 9 shows the results for the Twitter dataset. Three els could perform very well using all features. Nevertheless,
metrics evaluate the performance of each model. Looking at understanding the cyberbullying language by a model which
the results in general, added features have improved the per- contains the majority of cyberbullying instances can benefit
formance of the cyberbullying detection model. In the Toxic some applications where that kind of language is highly
VOLUME 11, 2023 53915

TABLE 9. Results of cyberbullying detection models with added features representations. Their baseline model achieved an f1-score
for the twitter dataset.
of 0.80, while the model that incorporated sentiment and
emotion features had an f1-score of 0.82. Our models outper-
formed their models by 16% and 14% in both experiments,
respectively.
VII. CONCLUSION
This research paper proposes cyberbullying detection models
(CDMs) that utilize emotion features to enhance the effi-
ciency of detecting cyberbullying. In this work, all critical
steps were taken into consideration, from data preparation
to deep learning models. The preparation of the datasets
is a major challenge that all intelligent systems must over-
come, as their success is entirely dependent on the quality
of the datasets. Hence, meticulous attention was given to
occurring such as in online gaming, and video broadcasting the datasets, from the acquisition and pre-processing, up to
where users can leave comments. sampling. There is a sparsity issue in cyberbullying datasets
that encompasses all forms of cyberbullying, such as threat-
D. COMPARISON OF RESULTS WITH RELATED STUDIES ening, harassing, humiliating, intimidating, and manipulating
A related study by Balakrishnan [20] investigated cyberbully- or controlling targeted victims. To address this issue, this
ing through the use of features such as baseline features (text, research utilized two datasets. The first is the toxic dataset
profile features, and network features), sentiment, emotion, collected by the Conversation AI team, and the second is
and the big 5 personality. The dataset used was extracted from the Twitter dataset. The dataset of cyberbullying generally
Twitter and contained 9,484 annotated tweet IDs. Random faces an imbalance between its labels, therefore, sampling
Forest, Naive Bayes, and J48 machine learning algorithms techniques were developed to reduce the imbalance ratio. The
were applied to detect cyberbullying using the open-source limitations of sparsity and imbalance in the cyberbullying
tool WEKA 3.8. Several experiments were conducted based dataset were addressed and resolved.
on the mentioned features. Although the datasets and clas- After the preparation of the datasets, extracting textual
sification algorithms used in the study differ from those in features was the second step in detecting cyberbullying. The
our study, a comparison was made based on the performance focus was on features that can be extracted from the text itself,
of the models for common features and Twitter datasets. such as syntactic, semantic, contextual, emotion, and senti-
The F1-score was used for comparison since it is a common ment features. Features related to the perpetrator, such as gen-
evaluation metric in both studies. der, age, and social media profiles, are not considered because
they are dependent on a specific social media platform and
TABLE 10. Comparison with related studies based on common features can sometimes be difficult to obtain due to privacy policies.
and twitter datasets. Nevertheless, emotion features were thoroughly investigated
through the use of a deep learning model and lexicon-based
approach. To build an emotion detection model, the CBET
dataset which was collected from Twitter using hashtag key-
words was used. The dataset was labeled using hashtags as
the keywords. Due to the inaccuracy of the hashtag labeling,
a procedure was then carried out to validate the annotation
of the emotion dataset labels. The validated dataset was then
used to train the emotion detection model (EDM) using BERT
as a pre-trained word representation model. This model was
used to study and explore the emotions related to cyberbul-
As shown in Table 10, our models outperformed the models lying texts. The results indicate that most cyberbullying texts
in the Balakrishnan study for both the baseline and sentiment are categorized as negative emotions.
experiments, with a 7% improvement. Despite the difference Emotions and sentiment were drawn out from cyberbul-
in baseline features between the two studies, our models still lying datasets through the use of EDM and NRC lexicon
scored 13% higher than the related study. for emotions and AFINN lexicon for sentiment. These fea-
Another study by Maity et al. [59] used sentiment and tures were fed to deep learning models to train cyberbullying
emotion features to improve the performance of their detection models. A set of experiments were carried out with
cyberbullying detection models. The dataset consists of different selections of features to investigate the best set of
6084 tweets in a mixture of English and Indian languages features for cyberbullying detection. The results show that
and was processed using multilingual BERT for word emotions and sentiment features improve the precision of
53916 VOLUME 11, 2023

cyberbullying detection and outperformed the use of BERT [18] L. Cheng, J. Li, Y. N. Silva, D. L. Hall, and H. Liu, ‘‘XBully: Cyberbullying
contextual features. The use of emotion features added to detection within a multi-modal context,’’ in Proc. 12th ACM Int. Conf. Web
Search Data Mining, Jan. 2019, pp. 339–347.
BERT resulted in a recall score of 0.87 on the Toxic dataset, [19] V. Balakrishnan, S. Khan, T. Fernandez, and H. R. Arabnia, ‘‘Cyberbully-
enhancing the performance of cyberbullying detection by ing detection on Twitter using big five and dark triad features,’’ Personality
0.5 compared to using BERT alone. While the use of sen- Individual Differences, vol. 141, pp. 252–257, Apr. 2019.
[20] V. Balakrishnan, S. Khan, and H. R. Arabnia, ‘‘Improving cyberbullying
timent features scored 0.88 recall, improving the model by detection using Twitter users’ psychological features and machine learn-
0.6 recall compared to using BERT alone which in general is ing,’’ Comput. Secur., vol. 90, Mar. 2020, Art. no. 101710.
greater than the baseline. [21] H. Rosa, N. Pereira, R. Ribeiro, P. C. Ferreira, J. P. Carvalho, S. Oliveira,
L. Coheur, P. Paulino, A. M. Veiga Simão, and I. Trancoso, ‘‘Automatic
Future work can be focused on enhancing emotion datasets cyberbullying detection: A systematic review,’’ Comput. Hum. Behav.,
in terms of size and annotations. Our study employed an auto- vol. 93, pp. 333–345, Apr. 2019.
matic validation procedure to eliminate instances with poten- [22] J. N. Navarro and J. L. Jasinski, ‘‘Going cyber: Using routine activities
theory to predict cyberbullying experiences,’’ Sociol. Spectr., vol. 32, no. 1,
tially inaccurate annotations. This approach was adopted due pp. 81–94, Jan. 2012.
to the high cost of human annotation, particularly for large [23] V. Nahar, S. Al-Maskari, X. Li, and C. Pang, ‘‘Semi-supervised learning for
datasets. cyberbullying detection in social networks,’’ in Proc. Australas. Database
Conf. Cham, Switzerland: Springer, 2014, pp. 160–171.
[24] M. A. Al-garadi, K. D. Varathan, and S. D. Ravana, ‘‘Cybercrime detection
REFERENCES in online communications: The experimental case of cyberbullying detec-
[1] S. Modha, P. Majumder, T. Mandl, and C. Mandalia, ‘‘Detecting and visu- tion in the Twitter network,’’ Comput. Hum. Behav., vol. 63, pp. 433–443,
alizing hate speech in social media: A cyber watchdog for surveillance,’’ Oct. 2016.
Expert Syst. Appl., vol. 161, Dec. 2020, Art. no. 113725. [25] D. Chatzakou, I. Leontiadis, J. Blackburn, E. D. Cristofaro, G. Stringh-
[2] S. Hinduja and J. W. Patchin, ‘‘Bullying, cyberbullying, and suicide,’’ Arch. ini, A. Vakali, and N. Kourtellis, ‘‘Detecting cyberbullying and cyber-
Suicide Res., vol. 14, no. 3, pp. 206–221, Jul. 2010. aggression in social media,’’ ACM Trans. Web, vol. 13, no. 3, pp. 1–51,
Aug. 2019.
[3] R. M. Kowalski, G.W. Giumetti, A.N. Schroeder, M.R. Lattanner, ‘‘Bully-
[26] H. Hosseinmardi, S. Arredondo Mattson, R. Ibn Rafiq, R. Han, Q. Lv, and
ing in the digital age: A critical review and meta-analysis of cyberbullying
S. Mishra, ‘‘Detection of cyberbullying incidents on the Instagram social
research among youth,’’ Psychol. Bulletin, vol. 140, p. 1073, 2014, doi:
network,’’ 2015, arXiv:1503.03909.
10.1037/a0035618.
[27] J.-M. Xu, X. Zhu, and A. Bellmore, ‘‘Fast learning for sentiment analy-
[4] V. Balakrishnan, ‘‘Cyberbullying among young adults in Malaysia:
sis on bullying,’’ in Proc. 1st Int. Workshop Issues Sentiment Discovery
The roles of gender, age and internet frequency,’’ Comput. Hum. Behav.,
Opinion Mining, Aug. 2012, pp. 1–6.
vol. 46, pp. 149–157, May 2015.
[28] J. A. Patch, ‘‘Detecting bullying on Twitter using emotion lexicons,’’
[5] S. M. B. Bottino, C. M. C. Bottino, C. G. Regina, A. V. L. Correia,
Ph.D. thesis, Dept. Sci., Univ. Georgia, Athens, GA, USA, 2015. [Online].
and W. S. Ribeiro, ‘‘Cyberbullying and adolescent mental health: Sys-
Available: https://getd.libs.uga.edu/pdfs/patch_jerrad_a_201505_ms.pdf
tematic review,’’ Cadernos Saude Publica, vol. 31, no. 3, pp. 463–475,
[29] M. J. Berger, ‘‘Large scale multi-label text classification with semantic
Mar. 2015.
word vectors,’’ Stanford Univ., Stanford, CA, USA, Tech. Rep., 2015.
[6] R. M. Kowalski, S. P. Limber, and P. W. Agatston, Cyberbullying: Bullying
[Online]. Available: https://dspace.bracu.ac.bd/xmlui/handle/10361/6420
in the Digital Age. Hoboken, NJ, USA: Wiley, 2012.
[30] Z. L. Chia, M. Ptaszynski, F. Masui, G. Leliwa, and M. Wroczynski,
[7] X.-W. Chu, C.-Y. Fan, Q.-Q. Liu, and Z.-K. Zhou, ‘‘Cyberbullying ‘‘Machine learning and feature engineering-based study into sarcasm and
victimization and symptoms of depression and anxiety among Chi- irony classification with application to cyberbullying detection,’’ Inf. Pro-
nese adolescents: Examining hopelessness as a mediator and self- cess. Manage., vol. 58, no. 4, Jul. 2021, Art. no. 102600.
compassion as a moderator,’’ Comput. Hum. Behav., vol. 86, pp. 377–386, [31] J. Eronen, M. Ptaszynski, F. Masui, A. Smywiński-Pohl, G. Leliwa, and
Sep. 2018. M. Wroczynski, ‘‘Improving classifier training efficiency for automatic
[8] A. Yadollahi, A. G. Shahraki, and O. R. Zaiane, ‘‘Current state of text cyberbullying detection with feature density,’’ Inf. Process. Manage.,
sentiment analysis from opinion to emotion mining,’’ ACM Comput. Surv., vol. 58, no. 5, Sep. 2021, Art. no. 102616.
vol. 50, no. 2, pp. 1–33, Mar. 2018. [32] Z. Mossie and J.-H. Wang, ‘‘Vulnerable community identification using
[9] Y. Ge, J. Qiu, Z. Liu, W. Gu, and L. Xu, ‘‘Beyond negative and positive: hate speech detection on social media,’’ Inf. Process. Manage., vol. 57,
Exploring the effects of emotions in social media during the stock market no. 3, May 2020, Art. no. 102087.
crash,’’ Inf. Process. Manage., vol. 57, no. 4, Jul. 2020, Art. no. 102218. [33] M. Dadvar, D. Trieschnigg, R. Ordelman, and F. de Jong, ‘‘Improving
[10] C. Yang, X. Chen, L. Liu, and P. Sweetser, ‘‘Leveraging semantic features cyberbullying detection with user context,’’ in Proc. Eur. Conf. Inf. Retr.
for recommendation: Sentence-level emotion analysis,’’ Inf. Process. Man- Cham, Switzerland: Springer, 2013, pp. 693–696.
age., vol. 58, no. 3, May 2021, Art. no. 102543. [34] S. Mahbub, E. Pardede, and A. S. M. Kayes, ‘‘Detection of harassment type
[11] L. Jiang, L. Liu, J. Yao, and L. Shi, ‘‘A hybrid recommendation model of cyberbullying: A dictionary of approach words and its impact,’’ Secur.
in social media based on deep emotion analysis and multi-source view Commun. Netw., vol. 2021, pp. 1–12, Jun. 2021.
fusion,’’ J. Cloud Comput., vol. 9, no. 1, pp. 1–16, Dec. 2020. [35] E. Hutson, ‘‘Cyberbullying in adolescence,’’ Adv. Nursing Sci., vol. 39,
[12] P. Parameswaran, A. Trotman, V. Liesaputra, and D. Eyers, ‘‘Detecting the no. 1, pp. 60–70, 2016.
target of sarcasm is hard: Really?’’ Inf. Process. Manage., vol. 58, no. 4, [36] M. Z. Naf’an, A. A. Bimantara, A. Larasati, E. M. Risondang, and
Jul. 2021, Art. no. 102599. N. A. S. Nugraha, ‘‘Sentiment analysis of cyberbullying on Instagram user
[13] M. Al-Hashedi, L.-K. Soon, and H.-N. Goh, ‘‘Cyberbullying detection comments,’’ J. Data Sci. Appl., vol. 2, pp. 38–48, Jan. 2019.
using deep learning and word embeddings: An empirical study,’’ in Proc. [37] H. Dani, J. Li, and H. Liu, ‘‘Sentiment informed cyberbullying detection
2nd Int. Conf. Comput. Intell. Intell. Syst., Nov. 2019, pp. 17–21. in social media,’’ in Proc. Joint Eur. Conf. Mach. Learn. Knowl. Discovery
[14] J. Chun, J. Lee, J. Kim, and S. Lee, ‘‘An international systematic review of Databases. Cham, Switzerland: Springer, 2017, pp. 52–67.
cyberbullying measurements,’’ Comput. Hum. Behav., vol. 113, Dec. 2020, [38] M. Sintaha, S. B. Satter, N. Zawad, C. Swarnaker, and A. Hassan, ‘‘Cyber-
Art. no. 106485. bullying detection using sentiment analysis in social media,’’ Ph.D. the-
[15] D. A. Winkler, ‘‘Role of artificial intelligence and machine learning in sis, Dept. Comput. Sci. Eng., BRAC Univ., Dhaka, Bangladesh, 2016.
nanosafety,’’ Small, vol. 16, no. 36, Sep. 2020, Art. no. 2001883. [Online]. Available: https://dspace.bracu.ac.bd/xmlui/handle/10361/6420
[16] A. L’Heureux, K. Grolinger, H. F. Elyamany, and M. A. M. Capretz, [39] V. Nahar, S. Unankard, X. Li, and C. Pang, ‘‘Sentiment analysis for
‘‘Machine learning with big data: Challenges and approaches,’’ IEEE effective detection of cyber bullying,’’ in Proc. Asia–Pacific Web Conf.
Access, vol. 5, pp. 7776–7797, 2017. Cham, Switzerland: Springer, 2012, pp. 767–774.
[17] S. Murnion, W. J. Buchanan, A. Smales, and G. Russell, ‘‘Machine learning [40] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, ‘‘BERT: Pre-training
and semantic analysis of in-game chat for cyberbullying,’’ Comput. Secur., of deep bidirectional transformers for language understanding,’’ 2018,
vol. 76, pp. 197–213, Jul. 2018. arXiv:1810.04805.
VOLUME 11, 2023 53917

[41] Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. R. Salakhutdinov, and Q. V. Le, LAY-KI SOON received the Ph.D. degree (engi-
‘‘XLNet: Generalized autoregressive pretraining for language understand- neering) in Web engineering from Soongsil Uni-
ing,’’ in Advances in Neural Information Processing Systems, H. Wallach, versity, South Korea, in 2009. She was a Senior
H. Larochelle, A. Beygelzimer, F. d’Alché-Buc, E. Fox, and R. Garnett, Lecturer with the Faculty of Computing and Infor-
Eds. Red Hook, NY, USA: Curran Associates, 2019. [Online]. Available: matics (FCI), Multimedia University, where she
https://proceedings.neurips.cc/paper/2019/file/dc6a7e655d7e5840e66733 was also the Deputy Dean (research and inno-
e9ee67cc69-Paper.pdf vation) from 2016 to 2018. She was a Research
[42] Z. Xinxi, ‘‘Single task fine-tune BERT for text classification,’’ in Proc. 2nd
Fellow with the Telekom Malaysia Research and
Int. Conf. Comput. Vis., Image, Deep Learn., Oct. 2021, pp. 194–206.
Development on Database Optimization Project.
[43] S. González-Carvajal and E. C. Garrido-Merchán, ‘‘Comparing
To date, she has graduated six Ph.D. and two mas-
BERT against traditional machine learning text classification,’’ 2020,
arXiv:2005.13012. ter’s students. Besides teaching at university, she has also conducted courses
[44] M. N. Turliuc, C. Măirean, and M. Boca-Zamfir, ‘‘The relation between on relational databases and NoSQL databases for corporate employees. Her
cyberbullying and depressive symptoms in adolescence. the moderating research interest includes data mining, with an emphasis on text mining.
role of emotion regulation strategies,’’ Comput. Hum. Behav., vol. 109,
Jan. 2020, Art. no. 106341.
[45] A.-L. Camerini, L. Marciano, A. Carrara, and P. J. Schulz, ‘‘Cyberbul-
lying perpetration and victimization among children and adolescents:
A systematic review of longitudinal studies,’’ Telematics Informat., vol. 49,
Jun. 2020, Art. no. 101362.
[46] O. Araque, I. Corcuera-Platas, J. F. Sánchez-Rada, and C. A. Iglesias,
‘‘Enhancing deep learning sentiment analysis with ensemble techniques HUI-NGO GOH received the bachelor’s degree
in social applications,’’ Expert Syst. Appl., vol. 77, pp. 236–246, Jul. 2017. (Hons.) in computer science and the M.Sc. degree
[47] A. H. Alamoodi, B. B. Zaidan, A. A. Zaidan, O. S. Albahri, from University Putra Malaysia, in 1999 and 2001,
K. I. Mohammed, R. Q. Malik, E. M. Almahdi, M. A. Chyad, Z. Tareq, respectively, and the Ph.D. degree in information
A. S. Albahri, H. Hameed, and M. Alaa, ‘‘Sentiment analysis and its
technology from Multimedia University, in 2014.
applications in fighting COVID-19 and infectious diseases: A systematic
She has served as a Deputy Dean (academic) and
review,’’ Expert Syst. Appl., vol. 167, Apr. 2021, Art. no. 114155.
the Dean with the Faculty of Computing and Infor-
[48] J. Guerreiro and P. Rita, ‘‘How to predict explicit recommendations in
online reviews using text mining and sentiment analysis,’’ J. Hospitality matics, Multimedia University, from 2016 to 2021,
Tourism Manage., vol. 43, pp. 269–272, Jun. 2020. where she is currently a Senior Lecturer. As a
[49] M. Fortunatus, P. Anthony, and S. Charters, ‘‘Combining textual features to Researcher, she is supervising and co-supervising
detect cyberbullying in social media posts,’’ Proc. Comput. Sci., vol. 176, post-graduate students and a project leader and/or project member of gov-
pp. 612–621, Jan. 2020. ernment grants. As a trainer, she has conducted training on social media
[50] W. A. Prabowo and F. Azizah, ‘‘Sentiment analysis for detecting cyber- analytics, big data essentials, text processing using Python, and advanced
bullying using TF-IDF and SVM,’’ Jurnal Rekayasa Sistem Teknologi data engineering. She is a certified HRDF trainer. Her research interest
Informasi, vol. 4, no. 6, pp. 1142–1148, Dec. 2020. includes text processing.
[51] P. Ekman, W. V. Friesen, and P. Ellsworth, Emotion in the Human Face:
Guidelines for Research and an Integration of Findings. Amsterdam,
The Netherlands: Elsevier, 2013.
[52] K. Reynolds, A. Kontostathis, and L. Edwards, ‘‘Using machine learning
to detect cyberbullying,’’ in Proc. 10th Int. Conf. Mach. Learn. Appl.
Workshops, vol. 2, Dec. 2011, pp. 241–244.
[53] T. Davidson, D. Warmsley, M. Macy, and I. Weber, ‘‘Automated hate
speech detection and the problem of offensive language,’’ in Proc. Int. AMY HUI LAN LIM received the B.I.T. degree
AAAI Conf. Web Social Media, 2017, vol. 11, no. 1, pp. 512–515.
(Hons) in information systems engineering, the
[54] A. G. Shahraki and O. R. Zaiane, ‘‘Lexical and learning-based emotion
M.Sc. degree in information technology, and the
mining from text,’’ in Proc. Int. Conf. Comput. Linguistics Intell. Text
Process., vol. 9, 2017, pp. 24–55.
Ph.D. degree in information technology from
[55] S. Mohammad, ‘‘# emotional tweets,’’ in Proc. 6th Int. Workshop Semantic Multimedia University. She is currently a Senior
Eval., 2012, pp. 246–255. Lecturer with the Faculty of Computing and Infor-
[56] F. Å. Nielsen, ‘‘A new ANEW: Evaluation of a word list for sentiment matics, Multimedia University, Cyberjaya. Her
analysis in microblogs,’’ 2011, arXiv:1103.2903. research interests include artificial intelligence,
[57] P. Norvig. (2007). How to Write a Spelling Corrector. [Online]. Available: information systems, data mining, and business
https://norvig.com/spell-correct.html process management.
[58] N. Ide, A. Herbelot, and L. Màrquez, ‘‘Proceedings of the 6th joint confer-
ence on lexical and computational semantics (SEM 2017),’’ in Proc. 6th
Joint Conf. Lexical Comput. Semantics, 2017.
[59] K. Maity, A. Kumar, and S. Saha, ‘‘A multitask multimodal framework
for sentiment and emotion-aided cyberbullying detection,’’ IEEE Internet
Comput., vol. 26, no. 4, pp. 68–78, Jul. 2022.
EU-GENE SIEW served as a Lecturer with KDU

MOHAMMED AL-HASHEDI received the bach- Penang. He was an auditor in an audit firm and
elor’s degree in information technology from Mul- an accountant in a manufacturing company. He is
timedia University, in 2017, where he is currently currently a Senior Lecturer with the School of
pursuing the master’s degree with the Faculty of Business, Monash University Malaysia. He has
Computing and Informatics. He was a Program- researched and published in top-tier journals and
mer and a Technical Manager with Al-Jifri Export presented papers at conferences, including aca-
Company. His research interests include text min- demic and industry seminars in the area of data
ing and machine learning. mining, Web mining, and accounting information
systems.
53918 VOLUME 11, 2023

Cyberbullying Detection Based On Emotion

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Cyberbullying Detection Based On Emotion

Uploaded by

Copyright:

Available Formats

Received 15 March 2023, accepted 14 May 2023, date of publication 29 May 2023, date of current version 7 June 2023.

Digital Object Identifier 10.1109/ACCESS.2023.3280556

Cyberbullying Detection Based on Emotion

AMY HUI LAN LIM1 , AND EU-GENE SIEW 3

Corresponding author: Hui-Ngo Goh (hngoh@mmu.edu.my)

INDEX TERMS Cyberbullying, BERT, emotion mining, sentiment analysis.

I. INTRODUCTION affective computing that aims to identify, analyze, and eval-

53908 VOLUME 11, 2023

VOLUME 11, 2023 53909

53910 VOLUME 11, 2023

within cyberbullying instances. However, there are several

VOLUME 11, 2023 53911

FIGURE 2. Validation steps to validate the annotation of emotion

and disgust) but the identified sentiment is positive, it will be

is employed to assess the overall emotion of a given instance,

53912 VOLUME 11, 2023

on top of TensorFlow. During BERT tokenization, 15% of

and the outputs are then forwarded to a classification layer

VOLUME 11, 2023 53913

53914 VOLUME 11, 2023

TABLE 8. Results of cyberbullying detection models with added features

of that class. Despite this, the overall performance of the

B. WORD REPRESENTATION MODELS

VOLUME 11, 2023 53915

53916 VOLUME 11, 2023

VOLUME 11, 2023 53917

EU-GENE SIEW served as a Lecturer with KDU

53918 VOLUME 11, 2023

You might also like