Professional Documents
Culture Documents
Cyberbullying Detection Based On Emotion
Cyberbullying Detection Based On Emotion
ABSTRACT Due to the detrimental consequences caused by cyberbullying, a great deal of research has been
undertaken to propose effective techniques to resolve this reoccurring problem. The research presented in this
paper is motivated by the fact that negative emotions can be caused by cyberbullying. This paper proposes
cyberbullying detection models that are trained based on contextual, emotions and sentiment features.
An Emotion Detection Model (EDM) was constructed using Twitter datasets that have been improved in
terms of its annotations. Emotions and sentiment were extracted from cyberbullying datasets using EDM and
lexicons based. Two cyberbullying datasets from Wikipedia and Twitter respectively were further improved
by comprehensive annotation of emotion and sentiment features. The results show that anger, fear and guilt
were the major emotions associated with cyberbullying. Subsequently, the extracted emotions were used as
features in addition to contextual and sentiment features to train models for cyberbullying detection. The
results demonstrate that using emotion features and sentiment has improved the performance of detecting
cyberbullying by 0.5 to 0.6 recall. The proposed models also outperformed the state-of-the-art models by
a 0.7 f1-score. The main contribution of this work is two-fold, which includes a comprehensive emotion-
annotated dataset for cyberbullying detection, and an empirical proof of emotions as effective features for
cyberbullying detection.
are data sparsity, a large imbalance between labels, and lim- • An improved and thoroughly processed cyberbullying
itations to specific social networking sites [13]. The efforts dataset annotated with emotion and sentiment features.
to create an autonomous cyberbullying detection model have • A label validation for the emotion dataset that discards
been unsuccessful so far due to the high speed and diversity instances with inaccurate annotations.
of user-generated content (UGC) on the Internet [14]. • A proposed DL-based cyberbullying model that incor-
The solution proposed in this work is driven by the hypoth- porates semantic, syntactic, emotion, and sentiment fea-
esis that the detection of cyberbullying can be improved tures.
through emotion mining. It consists of three stages. Table 1 defines the variables and notations used in this
The first stage is to obtain a clean, balanced, and feature- paper to serve readers as reference points to easily compre-
rich dataset. The effectiveness of machine learning mod- hend the meaning of these terms.
els is heavily dependent on the quality of the training
datasets [15], [16]. However, datasets that cover all forms TABLE 1. Variables and notations used in this paper.
of cyberbullying are rare [13], [17]. Additionally, the nature
of UGC varies across different social networking sites. For
example, tweets on Twitter have a different character limit
compared to Facebook posts. As a result, the first stage of
our proposed solution focuses on cleaning, transforming, and
integrating data from various sources to produce a clean,
balanced, and feature-rich dataset. This stage accounts for at
least 70% of the data lifecycle in this work.
The second stage focuses on extracting features from the
user-generated content (UGC), including semantic, syntactic,
emotional, and sentiment features. The correct extraction
and integration of these features will lead to a robust and
effective cyberbullying detection model, regardless of the
social media platform used. Semantic and syntactic features
can be obtained by using language models such as BERT
(Bidirectional Encoder Representations from Transformers).
In this paper, emotions are extracted from text using both
supervised and unsupervised approaches. The supervised
approach involves applying deep learning (DL) algorithms
on an emotion dataset obtained from Twitter. The tweets are
classified into specific emotions using hashtags. To validate
the automatic labeling for the emotion dataset, three valida-
tion steps have been proposed and implemented. These steps
aim to verify the sentiment and emotions of instances using
lexicons and to train emotion models using small, validated
datasets.
On the other hand, the unsupervised approach is based on a
predefined lexicon, which is a keyword-matching technique
that assigns the dominant emotions associated with a word.
This is a word-level technique that helps identify explicit
expressions in the text. Along with emotion features, senti-
ment features are extracted from text, and an overall polarity
is assigned to each instance.
After extracting the cyberbullying features, deep learning
DL models are trained in the third stage to classify the data
instances into the discrete categories of cyberbullying or
non-cyberbullying. Two sets of experiments are conducted.
The first set uses selected language models, including BERT
(BERT base and BERT large) and XLNet. The second set
includes additional features, namely emotions and sentiment.
The results show that the inclusion of these additional features The remainder of the paper is structured as follows.
has improved the performance of the cyberbullying detection Section II presents the related works and highlights the
models by 5% to 6%. In summary, the main contributions of research gaps. Section III discusses the preparation and pre-
this work are: processing of the datasets. Section IV presents an EDM
that aims to identify emotions associated with cyberbullying. models have shown a huge improvement in natural language
In Section V, a series of experiments were presented, fol- processing (NLP) applications such as sentiment analysis,
lowed by the results and analysis in Section VI. Section VII question answering, topic classification and machine trans-
concludes and offers some recommendations for future work. lation [40], [42], [43]. With BERT, classification can be
done to classify text into the binary cyberbullying or non-
cyberbullying categories.
II. RELATED WORKS Since cyberbullying has a strong impact on the victim’s
In the past, different studies attempt to detect cyberbullying psychology, emotions and sentiment can be extracted from
through different sets of features [13], [18], [19], [20], [21]. the content to further improve the detection [44], [45]. Sen-
These features varied and are dependent on the social media timent analysis has been widely used in different domains,
platforms that are being explored. For instance, demographic such as products and service reviews, political analysis,
information such as gender, personality, number of follow- etc. [46], [47], [48]. However, there are very few studies that
ers, and date of account creation is adopted in some stud- incorporate emotion and sentiment in detecting cyberbully-
ies [22], [23], [24]. Other works have identified bullies using ing, particularly for emotions [20], [49], [50], [59].
tweets, user profile, and network-based attributes on Twitter. For sentiment analysis, there are typically three main
These features focused on distinguishing bullies from regular classes, namely, positive, negative and neutral. Intuitively,
users [25]. Other features such as a number of comments and negative sentiment is more relevant for detecting cyberbully-
likes are also taken into consideration by [26]. In addition, ing. For emotion categories, guilt and love with the six basic
features such as binary sentiment are used to identify cyber- emotions defined by [51] where emotions (anger, disgust,
bullying by related works [17], [27], [28]. fear, joy, sadness, and surprise) are used in this work. Emo-
Profile and usage features such as gender, number of fol- tions are more refined, as they portray the exact feeling of the
lowers, and hours spent online can be faked, or may not be victims. By identifying the type of emotion in a text, more
available due to data protection policies practiced by some information can be revealed as to whether cyberbullying is
social media platforms. Furthermore, the available features taking place. Therefore, this study focuses on emotions due
vary from one social media platform to another. Many works to the strong relationship between the impact of bullying and
rely on the textual UGC itself for cyberbullying detection. negative emotions.
The content is represented in word vectors or language mod-
els [13], [29]. In addition, there are other features that can
be extracted from the textual UGC. These features are inde- III. DATASETS
pendent of the target social media platform since they are This section describes the datasets employed in this work.
generated purely from the UGC. Firstly, we examine the cyberbullying datasets used to train
Generally, there are two categories of such features, cyberbullying detection models (CDMs). Secondly, we detail
namely, syntactic, and semantic features. Syntactic features the emotion dataset used to construct the emotion detection
are normally obtained from sentence parsing or depen- model (EDM).
dency structure. These include part-of-speech (POS) tag- There are two major datasets used for CDMs. Firstly,
ging, named entity recognition (NER) and term frequency the toxic dataset was sourced from Wikipedia, and the hate
inverse document frequency (TF-IDF). The TF-IDF is used speech dataset was crawled from Twitter. Wikipedia and
to weigh the importance of a word in a document and con- Twitter are two different platforms in terms of text format.
vert words to numerical embedding. There are some stud- Wikipedia comments consist of formal and lengthy com-
ies that use this technique to convert UGC to embeddings ments, while tweets are short, informal and may include
and encode them to machine learning classifiers to classify misspelled words. Therefore, it is essential to train CDMs
cyberbullying [30], [31]. based on these two datasets.
On the other hand, semantic features aim to enrich the To extract emotions from CB datasets, two emotion
UGC from the meaning perspective. Word embeddings is datasets were combined to train EDM namely Cleaned Bal-
a feature extraction technique to calculate a word’s weight anced Emotional Tweets (CBET) and Twitter Emotion Cor-
based on its contexts. There are word embeddings models pus (TEC). Both datasets were extracted from Twitter and
such as word2vec and Glove. Works that have implemented labeled using hashtag keywords, such as #anger, #fear, etc.
these models are [13] and [32]. Furthermore, more seman- Human judgment was not examined during the labeling
tic features were included in the UGC, for instance, some process. Therefore, a validation procedure was followed to
works extend a corpus by including a dictionary of impor- discard any potentially incorrect instances. The reason for
tant words [33], [34], [35], as well as the sentiment of the combining the two datasets is to increase the size of the
text [36], [37], [38], [39]. The extracted syntactic and seman- datasets after discarding wrong labeled instances. A detailed
tic features are then transformed into textual representation description of each dataset is provided in the following sec-
models through more advanced models such as BERT [40] tions. Table 2 summarizes the four major datasets used for
or XLNet [41]. These state-of-the-art (SOTA) language CDMs and EDM.
TABLE 2. Summary of datasets used in this work. and non-English letters, lemmatizing the texts, and converting
the words to lowercase. Digits-containing words were also
removed as they may introduce noise to word representation
models. Stop words were not removed as BERT can handle
contextual information effectively. This dataset was chosen
for cyberbullying detection due to its rich content and appro-
priate size for building robust models
b: DATA SAMPLING
To address the imbalanced nature of the dataset, this study
proposes an under-sampling technique based on an anal-
ysis of the data. The non-cyberbullying class has about
193,000 comments, while the cyberbullying class has about
21,000 comments. The under-sampling approach applied to
the majority class aims to improve the class balance and
ensure relatively similar text lengths for both classes. Rather
than simply discarding instances randomly, a careful proce-
dure is designed to construct a useful dataset.
A. CYBERBULLYING DATASETS To under-sample the non-cyberbullying class, the number
In this study, considerable effort has been made to data pre- of words in each instance is counted. Instances with less than
processing, including acquisition, cleaning, and sampling. or equal to two words or more than 254 words are removed in
Two main challenges faced by cyberbullying datasets are data CB and Non-CB classes. This step is taken because instances
sparsity and imbalanced datasets. Data sparsity in cyberbul- with fewer than two words are often meaningless and could
lying refers to a shortage in the number of instances that cover negatively impact the performance of the model. Instances
both explicit and implicit forms of cyberbullying. Another with more than 254 words may not be fully captured by the
significant problem that can affect the performance of any DL vector due to its size. As a result of this step, 638 instances
machine learning model is the imbalance in the ratio between were removed from CB class which represents only 2.8% of
cyberbullying and non-cyberbullying classes. In cyberbul- all the CB instances.
lying datasets, positive instances, which are cyberbullying, Next, to ensure that the two classes have similar text
are in the minority. For example, the Formspring dataset lengths, instances are categorized into 12 groups based on
mostly contains insults, lacking other forms of cyberbully- the word count. Approximately 73% of the instances contain
ing [52]. Additionally, there is a large imbalance in the ratio 3 to 60 words. These instances are divided into six categories,
between the cyberbullying class (CB) and non-cyberbullying each with a range of 10 words. Instances with more than
class (non-CB), with positive instances consisting of only 60 words are divided into six categories with different ranges,
6% of the total dataset. To overcome these limitations, this as shown in Fig.1.
study has used two datasets from two different social media Thirdly, to achieve a similar length of instances and the
sources. desired ratio of 25-75% between the CB and non-CB classes,
three times the number of CB instances are selected from
the non-CB instances for each category, while the remaining
1) TOXIC DATASETS
instances are discarded. Fig. 1. displays the 12 categories
a: DATASET ACQUISITION AND PRE-PROCESSING with the word ranges for each category on the x-axis, and the
The Toxic dataset is collected from the Wikipedia plat- number of instances for the CB, non-CB, and under-sampled
form by the Conversation AI team, which was founded by non-CB classes on the y-axis. The number of under-sampled
Jigsaw and Google https://www.kaggle.com/c/jigsaw-toxic- non-CB instances is calculated by:
comment-classification-challenge/data. The datasets are pub-
licly shared on Kaggle for competition and research use.
The goal of these efforts is to support research towards samplednonCB = #CB × 3 (1)
making the internet a safer place by identifying toxic com-
ments, which include rude, disrespectful, threatening, insult- Ultimately, the aim is to reduce the imbalance from
ing, and hate speech. This dataset is especially valuable 10-90% to 25-75%. In other words, the non-CB should be
for detecting cyberbullying because it contains instances triple the size of the cyberbullying class. The ratio is deter-
of explicit and implicit expressions, overcoming the spar- mined based on the performance of the BERT representation
sity problem faced by previous cyberbullying datasets. The in training the model. As the CB contains quite a good number
dataset contains approximately 214,000 comments, with only of instances, this improved ratio, albeit still imbalanced, does
10% being classified as cyberbullying. Pre-processing steps not negatively affect the training, as shown in the experiment
have been applied, such as removing punctuation, symbols, results.
2) HATE SPEECH AND OFFENSIVE DATASET TABLE 4. Some of the instances have been wrongly labeled by using
(TWITTER DATASETS) hashtags and ignored after the validation procedure.
Due to the popularity of Twitter and the difference in nature
between it and Wikipedia (toxic dataset), it is necessary
to include tweets when building a cyberbullying detection
model. The toxic dataset, derived from the Wikipedia com-
ments page, is cleaner, longer, and contains fewer misspelled
words. Meanwhile, Twitter datasets often consist of short,
informal text and may include intentional spelling errors and
user tags, links, and hashtags.
In this study, a hate speech and offensive language dataset
is improved and utilized to create a more robust model. Hate
speech is another term for bullying and includes threats,
insults, and offensive language. This dataset was deemed suit-
able for building cyberbullying detection models with some
improvements. The dataset was collected using the Twitter
API and a hate speech lexicon compiled by hatespeech.org. In this study, two emotion datasets are validated and
Approximately 25,000 tweets were selected and manually utilized for the development of emotion models. The first
labeled by CrowdFlower [53]. dataset is the Cleaned Balanced Emotional Tweets (CBET)
Out of the 25,000 tweets, 83% were classified as hate dataset [54], which is based on a basic emotions model [51]
speech or offensive, while the remaining 17% were catego- that includes anger, disgust, fear, joy, sadness, and surprise.
rized as the negative class. To address the imbalance between This dataset also includes the emotions of thankfulness, love,
the two classes, another publicly available Twitter dataset and guilt. The second dataset is the Twitter Emotion Corpus
from Kaggle was utilized. This dataset was labeled as insults (TEC) [55], which is also crawled from Twitter using the
or non-insults instances. The combined dataset named the Twitter API and labeled automatically using hashtags.
cyberbullying dataset (CD), is shown in Table 3 after over- To produce more accurate labeled datasets, three validation
sampling to increase the size of the negative class. steps are carried out, as depicted in Fig. 2.
The first validation step is focused on checking the general
TABLE 3. Data distribution before and after sampling of cyberbullying
sentiment of each instance. Sentiment analysis is used to
datasets. determine the overall emotion expressed in the text, whether
it is positive, negative, or neutral. A lexicon-based approach
is used to identify the sentiment polarity of each instance.
The AFINN lexicon, which is developed based on microblog
English word lists [56], is utilized in this process. Each
word in the lexicon has been manually rated for valence,
with an integer score ranging from +5 (positive) to -5 (neg-
ative). Before identifying the sentiment of each instance,
misspelled words are corrected using the Norvig probabilistic
B. EMOTION DATASETS method [57]. A lexicon-based approach requires correcting
Developing emotion models based on clean and accurately misspelled words to produce accurate results. If an instance
labeled datasets is a critical step in identifying emotions is labeled with a negative emotion (anger, fear, sadness, guilt,
and 20% for testing. The results of the model show an 89%
recall and precision value. The model is then used to predict
the emotions for the instances produced in the previous step.
The correctly predicted instances are preserved, while the
others are discarded. The trained DL model is built to validate
instances with the four relevant emotions only. Due to the
unavailability of manually labeled datasets for the other emo-
tions, those instances are checked with the first two validation
steps only. The produced dataset after the validation process
is shown in Fig. 3.
There are 32,000 instances spanning eight fundamental
emotions. The validated emotion dataset (VED) covers five
negative emotions (anger, fear, sadness, guilt and disgust) and
three positive emotions (joy, love and surprise).
• BERT base
• BERT base + emotions (EDM)
• BERT base + emotion (EDM) + lexicon emotions
• BERT base + Sentiment
• All features.
These experiments are conducted using the Keras open-
source library based on Tensorflow, which provides a Python
interface for artificial neural networks. The reason for using
Keras instead of Pytorch is the flexibility to include features
as inputs for the neural network built. The Keras functional
model API is used since it can handle models with multiple
inputs.
FIGURE 6. Extract emotion and sentiment features to be incorporated for
CB Detection Models (CDMs).
D. EVALUATION METRICS
Recall, precision, and F1 measure are used to measure the per-
C. PART 2 (USING EMOTION AND SENTIMENT FEATURES formance of the detection models. These metrics are widely
WITH WORD REPRESENTATIONS VIA LANGUAGE MODELS) used in classification models. The importance of each metric
Fig. 6. shows our proposed solution, which includes senti- depends on the classified topic.
ment and emotion features in the CB datasets. The proposed Recall: Calculates the percentage of actual positives a
solution starts with the aforementioned data pre-processing model correctly classified. In our case, the true positive is the
steps. Each instance of the CB dataset is then fed into the number of CB instances that have been correctly identified.
EDM to classify the type of emotion associated with it, The recall is important since it identifies how many actual CB
which can be anger, fear, sadness, guilt, disgust, joy, love, instances are identified correctly. The false negatives are the
or surprise. A list of emotions is also extracted based on the CB instances that have been flagged as non-CB.
NRC lexicon, matching each word in the instance to the types Precision: Measures the percentage of predicted CB
of emotion it reveals. The AFFIN sentiment lexicon produces instances that were correctly classified out of all the actual
a sentiment polarity and sentiment score for each instance. CB instances. In real-time applications, precision is more
These features - emotion based on EDM, list of emotions, efficient to be used, since false positives are given higher
sentiment polarity, and sentiment score - were all added to priority.
the CB datasets for each instance. We hypothesize that these F1-score: measures the weighted average of precision and
features provide more information for the trained model to recall. It is calculated as the following formula:
detect cyberbullying scenarios. The features which have been 2 × ((precision × recall)/(precision + recall)) (2)
considered are:
• Contextual features (BERT base): These features repre- VI. RESULTS AND DISCUSSION
sent semantic, syntactic and contextual features that can In this section, the results obtained during the experimen-
be extracted using word representations models. tal simulations are presented and analyzed. The results are
• Emotions based on EDM: The overall emotion of an obtained using Pytorch and SageMaker for BERT base,
instance that is detected based on the emotion detection XLNet and BERT large.
model (EDM).
• Emotions based on Lexicon: This is based on the NRC A. RESULTS OF THE EMOTION DETECTION MODEL (EDM)
Emotion Lexicon, which gives a list of emotions based The performance of the model was measured using recall,
on the words of the cyberbullying instance. The list of precision, and F1-score, which were calculated using the
emotions is further collated to one single overall emo- Sklearn Python library. The model achieved a score of
tion, which is either negative or positive, based on the 0.84 for all three metrics, with only minor differences
frequency formula as described in the second step of between them. This result is considered efficient given the
validating the emotion dataset. number of classes and the size of the dataset in each class.
• Sentiment: The sentiment polarity is extracted based on To gain a deeper understanding of the model’s performance,
the AFINN lexicon. It contains a list of English words we calculated the true positive rate (TPR) for each emotion.
that are manually assigned a score for valence. Table 5 displays the TPR, the number of predicted instances,
Several experiments are conducted to determine the effec- and the total number of instances for each emotion. As seen
tiveness of the detection model based on the various textual in the table, the model performed well in correctly classifying
features. However, taking into consideration the strength of instances of fear and joy, while love and guilt had the lowest
the word representations in the pre-trained models, more TPR with scores of 0.67 and 0.75, respectively. The results
attention has been given to BERT. The list of settings con- indicate that the more instances a class has in the training
sidered are: dataset, the better the model performs in predicting instances
TABLE 5. True positive rate for each emotion. TABLE 7. Results for twitter hate speech wikipedia toxic dataset.
TABLE 9. Results of cyberbullying detection models with added features representations. Their baseline model achieved an f1-score
for the twitter dataset.
of 0.80, while the model that incorporated sentiment and
emotion features had an f1-score of 0.82. Our models outper-
formed their models by 16% and 14% in both experiments,
respectively.
VII. CONCLUSION
This research paper proposes cyberbullying detection models
(CDMs) that utilize emotion features to enhance the effi-
ciency of detecting cyberbullying. In this work, all critical
steps were taken into consideration, from data preparation
to deep learning models. The preparation of the datasets
is a major challenge that all intelligent systems must over-
come, as their success is entirely dependent on the quality
of the datasets. Hence, meticulous attention was given to
occurring such as in online gaming, and video broadcasting the datasets, from the acquisition and pre-processing, up to
where users can leave comments. sampling. There is a sparsity issue in cyberbullying datasets
that encompasses all forms of cyberbullying, such as threat-
D. COMPARISON OF RESULTS WITH RELATED STUDIES ening, harassing, humiliating, intimidating, and manipulating
A related study by Balakrishnan [20] investigated cyberbully- or controlling targeted victims. To address this issue, this
ing through the use of features such as baseline features (text, research utilized two datasets. The first is the toxic dataset
profile features, and network features), sentiment, emotion, collected by the Conversation AI team, and the second is
and the big 5 personality. The dataset used was extracted from the Twitter dataset. The dataset of cyberbullying generally
Twitter and contained 9,484 annotated tweet IDs. Random faces an imbalance between its labels, therefore, sampling
Forest, Naive Bayes, and J48 machine learning algorithms techniques were developed to reduce the imbalance ratio. The
were applied to detect cyberbullying using the open-source limitations of sparsity and imbalance in the cyberbullying
tool WEKA 3.8. Several experiments were conducted based dataset were addressed and resolved.
on the mentioned features. Although the datasets and clas- After the preparation of the datasets, extracting textual
sification algorithms used in the study differ from those in features was the second step in detecting cyberbullying. The
our study, a comparison was made based on the performance focus was on features that can be extracted from the text itself,
of the models for common features and Twitter datasets. such as syntactic, semantic, contextual, emotion, and senti-
The F1-score was used for comparison since it is a common ment features. Features related to the perpetrator, such as gen-
evaluation metric in both studies. der, age, and social media profiles, are not considered because
they are dependent on a specific social media platform and
TABLE 10. Comparison with related studies based on common features can sometimes be difficult to obtain due to privacy policies.
and twitter datasets. Nevertheless, emotion features were thoroughly investigated
through the use of a deep learning model and lexicon-based
approach. To build an emotion detection model, the CBET
dataset which was collected from Twitter using hashtag key-
words was used. The dataset was labeled using hashtags as
the keywords. Due to the inaccuracy of the hashtag labeling,
a procedure was then carried out to validate the annotation
of the emotion dataset labels. The validated dataset was then
used to train the emotion detection model (EDM) using BERT
as a pre-trained word representation model. This model was
used to study and explore the emotions related to cyberbul-
As shown in Table 10, our models outperformed the models lying texts. The results indicate that most cyberbullying texts
in the Balakrishnan study for both the baseline and sentiment are categorized as negative emotions.
experiments, with a 7% improvement. Despite the difference Emotions and sentiment were drawn out from cyberbul-
in baseline features between the two studies, our models still lying datasets through the use of EDM and NRC lexicon
scored 13% higher than the related study. for emotions and AFINN lexicon for sentiment. These fea-
Another study by Maity et al. [59] used sentiment and tures were fed to deep learning models to train cyberbullying
emotion features to improve the performance of their detection models. A set of experiments were carried out with
cyberbullying detection models. The dataset consists of different selections of features to investigate the best set of
6084 tweets in a mixture of English and Indian languages features for cyberbullying detection. The results show that
and was processed using multilingual BERT for word emotions and sentiment features improve the precision of
cyberbullying detection and outperformed the use of BERT [18] L. Cheng, J. Li, Y. N. Silva, D. L. Hall, and H. Liu, ‘‘XBully: Cyberbullying
contextual features. The use of emotion features added to detection within a multi-modal context,’’ in Proc. 12th ACM Int. Conf. Web
Search Data Mining, Jan. 2019, pp. 339–347.
BERT resulted in a recall score of 0.87 on the Toxic dataset, [19] V. Balakrishnan, S. Khan, T. Fernandez, and H. R. Arabnia, ‘‘Cyberbully-
enhancing the performance of cyberbullying detection by ing detection on Twitter using big five and dark triad features,’’ Personality
0.5 compared to using BERT alone. While the use of sen- Individual Differences, vol. 141, pp. 252–257, Apr. 2019.
[20] V. Balakrishnan, S. Khan, and H. R. Arabnia, ‘‘Improving cyberbullying
timent features scored 0.88 recall, improving the model by detection using Twitter users’ psychological features and machine learn-
0.6 recall compared to using BERT alone which in general is ing,’’ Comput. Secur., vol. 90, Mar. 2020, Art. no. 101710.
greater than the baseline. [21] H. Rosa, N. Pereira, R. Ribeiro, P. C. Ferreira, J. P. Carvalho, S. Oliveira,
L. Coheur, P. Paulino, A. M. Veiga Simão, and I. Trancoso, ‘‘Automatic
Future work can be focused on enhancing emotion datasets cyberbullying detection: A systematic review,’’ Comput. Hum. Behav.,
in terms of size and annotations. Our study employed an auto- vol. 93, pp. 333–345, Apr. 2019.
matic validation procedure to eliminate instances with poten- [22] J. N. Navarro and J. L. Jasinski, ‘‘Going cyber: Using routine activities
theory to predict cyberbullying experiences,’’ Sociol. Spectr., vol. 32, no. 1,
tially inaccurate annotations. This approach was adopted due pp. 81–94, Jan. 2012.
to the high cost of human annotation, particularly for large [23] V. Nahar, S. Al-Maskari, X. Li, and C. Pang, ‘‘Semi-supervised learning for
datasets. cyberbullying detection in social networks,’’ in Proc. Australas. Database
Conf. Cham, Switzerland: Springer, 2014, pp. 160–171.
[24] M. A. Al-garadi, K. D. Varathan, and S. D. Ravana, ‘‘Cybercrime detection
REFERENCES in online communications: The experimental case of cyberbullying detec-
[1] S. Modha, P. Majumder, T. Mandl, and C. Mandalia, ‘‘Detecting and visu- tion in the Twitter network,’’ Comput. Hum. Behav., vol. 63, pp. 433–443,
alizing hate speech in social media: A cyber watchdog for surveillance,’’ Oct. 2016.
Expert Syst. Appl., vol. 161, Dec. 2020, Art. no. 113725. [25] D. Chatzakou, I. Leontiadis, J. Blackburn, E. D. Cristofaro, G. Stringh-
[2] S. Hinduja and J. W. Patchin, ‘‘Bullying, cyberbullying, and suicide,’’ Arch. ini, A. Vakali, and N. Kourtellis, ‘‘Detecting cyberbullying and cyber-
Suicide Res., vol. 14, no. 3, pp. 206–221, Jul. 2010. aggression in social media,’’ ACM Trans. Web, vol. 13, no. 3, pp. 1–51,
Aug. 2019.
[3] R. M. Kowalski, G.W. Giumetti, A.N. Schroeder, M.R. Lattanner, ‘‘Bully-
[26] H. Hosseinmardi, S. Arredondo Mattson, R. Ibn Rafiq, R. Han, Q. Lv, and
ing in the digital age: A critical review and meta-analysis of cyberbullying
S. Mishra, ‘‘Detection of cyberbullying incidents on the Instagram social
research among youth,’’ Psychol. Bulletin, vol. 140, p. 1073, 2014, doi:
network,’’ 2015, arXiv:1503.03909.
10.1037/a0035618.
[27] J.-M. Xu, X. Zhu, and A. Bellmore, ‘‘Fast learning for sentiment analy-
[4] V. Balakrishnan, ‘‘Cyberbullying among young adults in Malaysia:
sis on bullying,’’ in Proc. 1st Int. Workshop Issues Sentiment Discovery
The roles of gender, age and internet frequency,’’ Comput. Hum. Behav.,
Opinion Mining, Aug. 2012, pp. 1–6.
vol. 46, pp. 149–157, May 2015.
[28] J. A. Patch, ‘‘Detecting bullying on Twitter using emotion lexicons,’’
[5] S. M. B. Bottino, C. M. C. Bottino, C. G. Regina, A. V. L. Correia,
Ph.D. thesis, Dept. Sci., Univ. Georgia, Athens, GA, USA, 2015. [Online].
and W. S. Ribeiro, ‘‘Cyberbullying and adolescent mental health: Sys-
Available: https://getd.libs.uga.edu/pdfs/patch_jerrad_a_201505_ms.pdf
tematic review,’’ Cadernos Saude Publica, vol. 31, no. 3, pp. 463–475,
[29] M. J. Berger, ‘‘Large scale multi-label text classification with semantic
Mar. 2015.
word vectors,’’ Stanford Univ., Stanford, CA, USA, Tech. Rep., 2015.
[6] R. M. Kowalski, S. P. Limber, and P. W. Agatston, Cyberbullying: Bullying
[Online]. Available: https://dspace.bracu.ac.bd/xmlui/handle/10361/6420
in the Digital Age. Hoboken, NJ, USA: Wiley, 2012.
[30] Z. L. Chia, M. Ptaszynski, F. Masui, G. Leliwa, and M. Wroczynski,
[7] X.-W. Chu, C.-Y. Fan, Q.-Q. Liu, and Z.-K. Zhou, ‘‘Cyberbullying ‘‘Machine learning and feature engineering-based study into sarcasm and
victimization and symptoms of depression and anxiety among Chi- irony classification with application to cyberbullying detection,’’ Inf. Pro-
nese adolescents: Examining hopelessness as a mediator and self- cess. Manage., vol. 58, no. 4, Jul. 2021, Art. no. 102600.
compassion as a moderator,’’ Comput. Hum. Behav., vol. 86, pp. 377–386, [31] J. Eronen, M. Ptaszynski, F. Masui, A. Smywiński-Pohl, G. Leliwa, and
Sep. 2018. M. Wroczynski, ‘‘Improving classifier training efficiency for automatic
[8] A. Yadollahi, A. G. Shahraki, and O. R. Zaiane, ‘‘Current state of text cyberbullying detection with feature density,’’ Inf. Process. Manage.,
sentiment analysis from opinion to emotion mining,’’ ACM Comput. Surv., vol. 58, no. 5, Sep. 2021, Art. no. 102616.
vol. 50, no. 2, pp. 1–33, Mar. 2018. [32] Z. Mossie and J.-H. Wang, ‘‘Vulnerable community identification using
[9] Y. Ge, J. Qiu, Z. Liu, W. Gu, and L. Xu, ‘‘Beyond negative and positive: hate speech detection on social media,’’ Inf. Process. Manage., vol. 57,
Exploring the effects of emotions in social media during the stock market no. 3, May 2020, Art. no. 102087.
crash,’’ Inf. Process. Manage., vol. 57, no. 4, Jul. 2020, Art. no. 102218. [33] M. Dadvar, D. Trieschnigg, R. Ordelman, and F. de Jong, ‘‘Improving
[10] C. Yang, X. Chen, L. Liu, and P. Sweetser, ‘‘Leveraging semantic features cyberbullying detection with user context,’’ in Proc. Eur. Conf. Inf. Retr.
for recommendation: Sentence-level emotion analysis,’’ Inf. Process. Man- Cham, Switzerland: Springer, 2013, pp. 693–696.
age., vol. 58, no. 3, May 2021, Art. no. 102543. [34] S. Mahbub, E. Pardede, and A. S. M. Kayes, ‘‘Detection of harassment type
[11] L. Jiang, L. Liu, J. Yao, and L. Shi, ‘‘A hybrid recommendation model of cyberbullying: A dictionary of approach words and its impact,’’ Secur.
in social media based on deep emotion analysis and multi-source view Commun. Netw., vol. 2021, pp. 1–12, Jun. 2021.
fusion,’’ J. Cloud Comput., vol. 9, no. 1, pp. 1–16, Dec. 2020. [35] E. Hutson, ‘‘Cyberbullying in adolescence,’’ Adv. Nursing Sci., vol. 39,
[12] P. Parameswaran, A. Trotman, V. Liesaputra, and D. Eyers, ‘‘Detecting the no. 1, pp. 60–70, 2016.
target of sarcasm is hard: Really?’’ Inf. Process. Manage., vol. 58, no. 4, [36] M. Z. Naf’an, A. A. Bimantara, A. Larasati, E. M. Risondang, and
Jul. 2021, Art. no. 102599. N. A. S. Nugraha, ‘‘Sentiment analysis of cyberbullying on Instagram user
[13] M. Al-Hashedi, L.-K. Soon, and H.-N. Goh, ‘‘Cyberbullying detection comments,’’ J. Data Sci. Appl., vol. 2, pp. 38–48, Jan. 2019.
using deep learning and word embeddings: An empirical study,’’ in Proc. [37] H. Dani, J. Li, and H. Liu, ‘‘Sentiment informed cyberbullying detection
2nd Int. Conf. Comput. Intell. Intell. Syst., Nov. 2019, pp. 17–21. in social media,’’ in Proc. Joint Eur. Conf. Mach. Learn. Knowl. Discovery
[14] J. Chun, J. Lee, J. Kim, and S. Lee, ‘‘An international systematic review of Databases. Cham, Switzerland: Springer, 2017, pp. 52–67.
cyberbullying measurements,’’ Comput. Hum. Behav., vol. 113, Dec. 2020, [38] M. Sintaha, S. B. Satter, N. Zawad, C. Swarnaker, and A. Hassan, ‘‘Cyber-
Art. no. 106485. bullying detection using sentiment analysis in social media,’’ Ph.D. the-
[15] D. A. Winkler, ‘‘Role of artificial intelligence and machine learning in sis, Dept. Comput. Sci. Eng., BRAC Univ., Dhaka, Bangladesh, 2016.
nanosafety,’’ Small, vol. 16, no. 36, Sep. 2020, Art. no. 2001883. [Online]. Available: https://dspace.bracu.ac.bd/xmlui/handle/10361/6420
[16] A. L’Heureux, K. Grolinger, H. F. Elyamany, and M. A. M. Capretz, [39] V. Nahar, S. Unankard, X. Li, and C. Pang, ‘‘Sentiment analysis for
‘‘Machine learning with big data: Challenges and approaches,’’ IEEE effective detection of cyber bullying,’’ in Proc. Asia–Pacific Web Conf.
Access, vol. 5, pp. 7776–7797, 2017. Cham, Switzerland: Springer, 2012, pp. 767–774.
[17] S. Murnion, W. J. Buchanan, A. Smales, and G. Russell, ‘‘Machine learning [40] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, ‘‘BERT: Pre-training
and semantic analysis of in-game chat for cyberbullying,’’ Comput. Secur., of deep bidirectional transformers for language understanding,’’ 2018,
vol. 76, pp. 197–213, Jul. 2018. arXiv:1810.04805.
[41] Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. R. Salakhutdinov, and Q. V. Le, LAY-KI SOON received the Ph.D. degree (engi-
‘‘XLNet: Generalized autoregressive pretraining for language understand- neering) in Web engineering from Soongsil Uni-
ing,’’ in Advances in Neural Information Processing Systems, H. Wallach, versity, South Korea, in 2009. She was a Senior
H. Larochelle, A. Beygelzimer, F. d’Alché-Buc, E. Fox, and R. Garnett, Lecturer with the Faculty of Computing and Infor-
Eds. Red Hook, NY, USA: Curran Associates, 2019. [Online]. Available: matics (FCI), Multimedia University, where she
https://proceedings.neurips.cc/paper/2019/file/dc6a7e655d7e5840e66733 was also the Deputy Dean (research and inno-
e9ee67cc69-Paper.pdf vation) from 2016 to 2018. She was a Research
[42] Z. Xinxi, ‘‘Single task fine-tune BERT for text classification,’’ in Proc. 2nd
Fellow with the Telekom Malaysia Research and
Int. Conf. Comput. Vis., Image, Deep Learn., Oct. 2021, pp. 194–206.
Development on Database Optimization Project.
[43] S. González-Carvajal and E. C. Garrido-Merchán, ‘‘Comparing
To date, she has graduated six Ph.D. and two mas-
BERT against traditional machine learning text classification,’’ 2020,
arXiv:2005.13012. ter’s students. Besides teaching at university, she has also conducted courses
[44] M. N. Turliuc, C. Măirean, and M. Boca-Zamfir, ‘‘The relation between on relational databases and NoSQL databases for corporate employees. Her
cyberbullying and depressive symptoms in adolescence. the moderating research interest includes data mining, with an emphasis on text mining.
role of emotion regulation strategies,’’ Comput. Hum. Behav., vol. 109,
Jan. 2020, Art. no. 106341.
[45] A.-L. Camerini, L. Marciano, A. Carrara, and P. J. Schulz, ‘‘Cyberbul-
lying perpetration and victimization among children and adolescents:
A systematic review of longitudinal studies,’’ Telematics Informat., vol. 49,
Jun. 2020, Art. no. 101362.
[46] O. Araque, I. Corcuera-Platas, J. F. Sánchez-Rada, and C. A. Iglesias,
‘‘Enhancing deep learning sentiment analysis with ensemble techniques HUI-NGO GOH received the bachelor’s degree
in social applications,’’ Expert Syst. Appl., vol. 77, pp. 236–246, Jul. 2017. (Hons.) in computer science and the M.Sc. degree
[47] A. H. Alamoodi, B. B. Zaidan, A. A. Zaidan, O. S. Albahri, from University Putra Malaysia, in 1999 and 2001,
K. I. Mohammed, R. Q. Malik, E. M. Almahdi, M. A. Chyad, Z. Tareq, respectively, and the Ph.D. degree in information
A. S. Albahri, H. Hameed, and M. Alaa, ‘‘Sentiment analysis and its
technology from Multimedia University, in 2014.
applications in fighting COVID-19 and infectious diseases: A systematic
She has served as a Deputy Dean (academic) and
review,’’ Expert Syst. Appl., vol. 167, Apr. 2021, Art. no. 114155.
the Dean with the Faculty of Computing and Infor-
[48] J. Guerreiro and P. Rita, ‘‘How to predict explicit recommendations in
online reviews using text mining and sentiment analysis,’’ J. Hospitality matics, Multimedia University, from 2016 to 2021,
Tourism Manage., vol. 43, pp. 269–272, Jun. 2020. where she is currently a Senior Lecturer. As a
[49] M. Fortunatus, P. Anthony, and S. Charters, ‘‘Combining textual features to Researcher, she is supervising and co-supervising
detect cyberbullying in social media posts,’’ Proc. Comput. Sci., vol. 176, post-graduate students and a project leader and/or project member of gov-
pp. 612–621, Jan. 2020. ernment grants. As a trainer, she has conducted training on social media
[50] W. A. Prabowo and F. Azizah, ‘‘Sentiment analysis for detecting cyber- analytics, big data essentials, text processing using Python, and advanced
bullying using TF-IDF and SVM,’’ Jurnal Rekayasa Sistem Teknologi data engineering. She is a certified HRDF trainer. Her research interest
Informasi, vol. 4, no. 6, pp. 1142–1148, Dec. 2020. includes text processing.
[51] P. Ekman, W. V. Friesen, and P. Ellsworth, Emotion in the Human Face:
Guidelines for Research and an Integration of Findings. Amsterdam,
The Netherlands: Elsevier, 2013.
[52] K. Reynolds, A. Kontostathis, and L. Edwards, ‘‘Using machine learning
to detect cyberbullying,’’ in Proc. 10th Int. Conf. Mach. Learn. Appl.
Workshops, vol. 2, Dec. 2011, pp. 241–244.
[53] T. Davidson, D. Warmsley, M. Macy, and I. Weber, ‘‘Automated hate
speech detection and the problem of offensive language,’’ in Proc. Int. AMY HUI LAN LIM received the B.I.T. degree
AAAI Conf. Web Social Media, 2017, vol. 11, no. 1, pp. 512–515.
(Hons) in information systems engineering, the
[54] A. G. Shahraki and O. R. Zaiane, ‘‘Lexical and learning-based emotion
M.Sc. degree in information technology, and the
mining from text,’’ in Proc. Int. Conf. Comput. Linguistics Intell. Text
Process., vol. 9, 2017, pp. 24–55.
Ph.D. degree in information technology from
[55] S. Mohammad, ‘‘# emotional tweets,’’ in Proc. 6th Int. Workshop Semantic Multimedia University. She is currently a Senior
Eval., 2012, pp. 246–255. Lecturer with the Faculty of Computing and Infor-
[56] F. Å. Nielsen, ‘‘A new ANEW: Evaluation of a word list for sentiment matics, Multimedia University, Cyberjaya. Her
analysis in microblogs,’’ 2011, arXiv:1103.2903. research interests include artificial intelligence,
[57] P. Norvig. (2007). How to Write a Spelling Corrector. [Online]. Available: information systems, data mining, and business
https://norvig.com/spell-correct.html process management.
[58] N. Ide, A. Herbelot, and L. Màrquez, ‘‘Proceedings of the 6th joint confer-
ence on lexical and computational semantics (SEM 2017),’’ in Proc. 6th
Joint Conf. Lexical Comput. Semantics, 2017.
[59] K. Maity, A. Kumar, and S. Saha, ‘‘A multitask multimodal framework
for sentiment and emotion-aided cyberbullying detection,’’ IEEE Internet
Comput., vol. 26, no. 4, pp. 68–78, Jul. 2022.