A Supervised Framework For Sentiment Analysis: A Two-Stage Approach

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 30

A Supervised Framework for Sentiment Analysis:

A Two-Stage Approach

Prashantkumar M. Gavali1,2* & Suresh K. Shirgave1


1
DKTE Society’s Textile and Engineering Institute, Ichalkaranji,
Maharashtra, India
2
Department of Technology, Shivaji University, Kolhapur,
Maharashtra, India

Abstract
Sentiment analysis (SA) is a vital part of natural language
processing (NLP). It involves analyzing textual data to determine
expressed sentiment, be it positive or negative. Transformer-
based models have gained popularity for sentiment prediction
from the text. However, achieving promising results with these
models requires extensive training data and processing power.
Additionally, when using pre-trained transformer models for
sentiment analysis, they do not generate sentiment-specific
text embeddings as they are trained on large general-purpose
corpora. To address these challenges, this paper presents a two-
step framework. Firstly, the framework aims to learn high-level
sentiment-oriented embeddings of text. It generates embeddings
with Siamese Network and compares them based on their
sentiment class using the triplet loss function. This approach
enables the framework to capture distinctions between positive
and negative sentences, ensuring the generation of sentiment-
specific embeddings. Secondly, it incorporates a classification layer
on top of the embedding layer to enhance sentiment classification.
Experimental results showcase the effectiveness of our proposed
framework, surpassing baseline sentiment analysis results on

Received 14 July 2023; Revised 30 August 2023; Accepted 30 August 2023


* Correspondence: gavalipm87@gmail.com
Journal of Cognitive Science 24(3): 283-312 September 2023
©2023 Institute for Cognitive Science, Seoul National University
284 Prashantkumar M. Gavali & Suresh K. Shirgave

various benchmark datasets.

Key words: natural language, neural network, sentiment analysis,


text embedding

1. Introduction

Sentiment analysis (SA) is a natural language processing (NLP) task


that involves the extraction and identification of sentiments or opinions
expressed in a piece of text. This field has evolved from simple techniques,
such as using a sentiment dictionary, to more sophisticated approaches,
including the use of deep learning and transformer models.
Previously, the identification of sentiment was accomplished by
employing a sentiment dictionary (Baccianella et. al, 2010), which is a
collection of words and phrases associated with positive, negative, or neutral
sentiments. This approach involves examining the presence or absence of
these words and phrases in a given text (Lapesa & Evert, 2017; Tan et. al,
2015; Chikersal et. al., 2015) to determine the overall sentiment of the text.
It is a quick and easy method for conducting sentiment analysis, but its
accuracy is limited by the fact that the meaning of words and phrases varies
depending on the context in which they are used. Furthermore, creating a
comprehensive sentiment dictionary that encompasses all possible words
and phrases for identifying emotion is challenging.
The machine learning paradigm makes it possible to identify sentiment by
modeling it as a classification task. This approach utilizes various machine
learning algorithms, such as support vector machines (Thakur & Deshpande,
2019; Borg & Boldt, 2020, Pang et. al., 2002), decision trees (Zuo, 2018),
and neural networks (Duncan & Zhang, 2015; Poongothai & Sangeetha,
2020). However, despite their promising results in sentiment analysis, these
models require manual feature engineering to identify crucial text features.
Selecting the most relevant features for sentiment analysis from a given
A Supervised Framework for Sentiment Analysis: A Two-Stage Approach 285

text demands a high level of expertise and experience. The contemporary


deep learning approach addresses the challenge of manual feature
engineering. It enables prediction models to understand advanced features
autonomously. Additionally, factors like the availability of large amounts
of data and processing power have significantly increased the popularity
of this technique. Convolutional Neural Networks (CNN) (Kim, 2014;
Conneau et. al., 2017; Dai et. al., 2022), Recurrent Neural Networks (RNN)
(Tang et. al., 2015), Long-Short Term Memory (LSTM) (Chen et. al., 2016;
Sivakumar & Uyyala, 2021), and Bi-directional LSTM (Bi-LSTM) (Arbane
et. al., 2023) are some well-known deep learning variants. These models
are frequently used for sentiment analysis tasks. The CNN model employs
a kernel that convolutes the text, allowing it to detect local relationships
between words within a given window size while ignoring relationships
beyond the kernel. The RNN model, on the other hand, effectively handles
sequential data, allowing for the identification of relationships between
words that go beyond the kernel size used by CNN. However, RNN models
still have limitations when it comes to long-term dependencies found
in long sentences. The LSTM model effectively addresses this issue by
incorporating various gates, such as the forget gate, to exclude irrelevant
words. Nonetheless, LSTM models only consider words in one direction,
specifically from left to right. To address this limitation, the bi-LSTM was
developed, which allows words to be processed from left to right as well
as right to left. Nonetheless, these sequential models face difficulties in
effectively capturing long-range dependencies, owing to the limited context
they can store within their fixed-length hidden state.
Transformers (Vaswani et. al., 2017) are a type of neural network
architecture that uses self-attention mechanisms to weigh the importance
of different parts of the input when making predictions. This attention
mechanism allows them to effectively capture long-range dependencies.
Bidirectional Encoder Representations from Transformers (BERT) is a
286 Prashantkumar M. Gavali & Suresh K. Shirgave

transformer-based model with an encoder and decoder mechanism. Encoder


provides the representation of text by considering the context of the word
and can be used on various natural language processing tasks (Devlin et.
al., 2019). However, encoder representation is not inherently sentiment-
oriented. This representation must be sentiment-oriented to get better results
in sentiment analysis. For example, the pre-trained BERT model puts the
opposite sentiment statement “I am happy by using this product” and “I
am sad by using this product” closer to each other with a higher cosine
similarity score of 0.9969. This lack of sentiment information makes it
difficult for the model to perform well on the sentiment analysis task.
So, it is hypothesized that the development of a framework that promises
sentiment-oriented embedding generation of input text and using the same
for sentiment analysis tasks improves sentiment results. This research work
presents a new two-fold deep learning framework that develops a sentiment-
oriented representation of text first and then identifies the sentiment of
the text. The triplet loss function helps the model to distinguish between
positive and negative text. The proposed framework uses a comparatively
very small number of parameters than BERT. Further, after training, it adds
one more layer for sentiment identification from the sentiment-oriented text
embedding.

The present paper aims to highlight its main contributions as follows:


1) Our paper proposes a novel two-step deep learning framework that
can leverage sentiment information of text with training samples to
generate sentiment-oriented embedding using the triplet loss function.
It brings text embedding of positive sentences together rather than
negative sentences and vice-versa. Then it uses the labeled sentences
for identification of sentiment analysis. We have also shown that
supervised training for generating sentiment-oriented embedding and
sentiment identification is feasible.
A Supervised Framework for Sentiment Analysis: A Two-Stage Approach 287

2) We devise a lightweight framework than the BERT model for


generating sentiment embedding and another layer for sentiment
identification of text. Its effectiveness and efficiency are checked on the
sentiment identification task.

The remainder of the paper is structured as follows: Section 2 outlines


various text representation and sentiment analysis methods. Section 3
describes the proposed two-step supervised framework. Section 4 discusses
important aspects of architecture. Section 5 discusses the experimental
results, whereas Section 6 highlights the conclusion and future scope.

2. Background

The field of sentiment analysis has developed around two key activities.
The first involves the development of sentiment-oriented representations
of text, while the second involves identifying sentiment by considering
important features of natural language. Both of these activities are described
in the following subsection.

2.1 Text Representation


Natural language text is a form of unstructured data that requires
conversion into a structured format for effective utilization in sentiment
analysis models. There are several techniques used to represent words as
vectors, including shallow CBOW and skip-gram models (Mikolov et. al.,
2013) that consider word context for generating word embeddings. However,
these embeddings are context-dependent and lack a comprehensive global
representation. To capture both syntactic and semantic relationships, global
co-occurrence statistics (Pennington et. al., 2014) of words in a corpus are
employed. Additionally, fastText (Joulin et. al., 2017) represents words
using characters, but it doesn't inherently capture the sentiment. Therefore,
288 Prashantkumar M. Gavali & Suresh K. Shirgave

researchers incorporate emoticons (Tang et. al., 2016) as sentiment signals


when constructing text representations with neural networks. Furthermore,
researchers have made advancements in incorporating sentiment intensity
into word embeddings. Deep learning models (Yu et. al., 2018; Dragoni &
Petrucci, 2017; Wang, 2021; Kabakus, 2022; Shi et. al., 2021) have been
utilized to generate sentence embeddings, but they face limitations in
handling long sentences. Transformer-based models address this challenge
by effectively encoding long-term sentence structures and achieving
superior results through multi-head attention mechanisms. GPT (Radford et.
al., 2018), as a transformer-based model, provides text encoding but focuses
solely on a left-to-right direction. To overcome this limitation, BERT (Devlin
et. al., 2019) considers text from both directions, although it only masks a
single word. Researchers aiming for a deeper understanding of language
have employed techniques such as masking multiple words and shuffling
word sequences. Numerous contemporary transformer-based models
have emerged as effective approaches for representing text by considering
contextual information. However, these models often overlook the aspect
of sentiment-oriented representation. Several attempts have been made to
imbue sentiment-oriented characteristics into these embeddings through
transfer learning (Boy et. al., 2021; Toledo and Marcacini, 2022; Zhang
et. al., 2019; Zhao et. al., 2017). However, it is worth noting that transfer
learning approaches may encounter challenges in terms of generalization.

2.2 Sentiment Analysis Techniques


Deep learning models, including recurrent neural networks (RNNs)
and convolutional neural networks (CNNs), have achieved remarkable
success in sentiment analysis tasks. RNNs, specifically LSTM and GRU
variants, are effective in capturing long-range dependencies and sequential
information in text, enabling them to comprehend sentiment in a more
nuanced manner. CNNs, on the other hand, excel at extracting local features
A Supervised Framework for Sentiment Analysis: A Two-Stage Approach 289

and patterns from text, which can be valuable in sentiment analysis, where
certain word combinations or phrases may carry specific sentiments.
However, the true breakthrough in sentiment analysis came with the
advent of transformer-based models. Models such as BERT (Bidirectional
Encoder Representations from Transformers) and GPT (Generative Pre-
trained Transformer) have revolutionized sentiment analysis performance.
Transformers are designed to process the entire input sequence
simultaneously, allowing them to efficiently capture global dependencies
and contextual information. By leveraging extensive pre-trained language
representations and fine-tuning techniques, these models can generate
sentiment predictions and adapt to diverse domains and languages.

Summary of discussion:
In summary, deep learning models require massive data and processing
power to work from scratch for downstream tasks like sentiment analysis.
If pre-trained models (Ke et. al. 2020; Zhou et. al., 2020) are used for
sentiment analysis in their original form, it does not provide a better
sentiment result as they do not consider the sentiment orientation of the text.
Transfer learning techniques are widely used to use pre-trained models for
sentiment analysis. However, it suffers from the problem of generalization.
Also, to the best of our knowledge, updating weight again requires a huge
data and time while the end-to-end model does not assure that models
are developing sentiment-oriented embedding for sentiment analysis. So,
we believe that ensuring the text embedding generated by the model on
relatively small datasets is sentiment-oriented, and training the model on
this sentiment-oriented embedding with a single layer can improve the
accuracy of sentiment identification.
290 Prashantkumar M. Gavali & Suresh K. Shirgave

3. Model architecture

Figure 1 indicates the overall design of the suggested framework. It


comprises a Siamese network (Koch et. al., 2015; Gleize et. al., 2019; Müller
et. al., 2022), an encoder, and a sentiment classifier along with a sentiment-
aware layer. The Siamese network helps to handle three text inputs at a time
to understand similar and opposite sentiment words. The encoder module
generates the context-aware embedding of the text by using the attention
mechanism while the sentiment-aware layer is responsible for generating
sentiment-oriented embedding from the context-aware encoding with the
help of the triplet loss function. The proposed framework takes textual data
as input and produces sentiment-based embeddings using a sentiment-aware
layer in the first phase. In the second phase, a simple feedforward neural
network is employed for sentiment identification. All the components are
explained in detail in the following subsections.

FigureFigure
1. Framework Architecture
1. Framework Architecture

3.1 Siamese Network


A Siamese Network, also known as a Twin Network, is an artificial neural network that
comprises three identical sub-networks. All these sub-networks have the same configuration and
weights. Typically, only one sub-network is trained and others share the configuration and learned
A Supervised Framework for Sentiment Analysis: A Two-Stage Approach 291

3.1 Siamese Network


A Siamese Network, also known as a Twin Network, is an artificial
neural network that comprises three identical sub-networks. All these sub-
networks have the same configuration and weights. Typically, only one sub-
network is trained and others share the configuration and learned weights.
Such networks are employed to determine the similarity between text
embeddings generated by the proposed model.
It takes three inputs, two from the positive category and one from the
negative category. One subnetwork, consisting of an encoder and sentiment-
aware layer will take one input at a time and generate text embedding. All
three outputs, one from each subnetwork, are combined and forwarded to
the triplet loss function to learn the similarity between text embeddings.

3.2 Encoder
The encoder component of the framework is responsible for generating
the encoding of input text. It is a BERT-like encoder with 6 encoding
units each having 12 attention heads. The primary function of the Encode
component is to preprocess the original input text received from the
Siamese network and convert it into a structured format. It then learns
the context-aware embedding of the text, even when dealing with long
sentences. This enables the framework to capture contextual information
effectively and generate meaningful representations of the input text.

3.3 Sentiment Aware Layer


The single layer that follows the context-aware embedding generated
by the attention-based encoder model is fully connected. It is trainable
and its purpose is to generate sentiment-oriented text embeddings for the
input text. By leveraging the information captured in the encoder's output,
this trainable layer generates embeddings that specifically represent the
sentiment aspects of the text. It includes 30 neurons with a ‘tanh’ activation
292 Prashantkumar M. Gavali & Suresh K. Shirgave

function to represent text embedding of size 30 dimensions. Every node has


an output between -1 and 1. These text embeddings are fed into the triplet
loss function, which directs the model to keep opposite sentiment texts
apart.

3.4. Triplet Loss


The triplet loss function is used to train a neural network that generates
embeddings for each data sample. The embeddings are similar for examples
from the same class (referred to as "positive" examples) and dissimilar
for examples from different classes (referred to as "negative" examples).
The loss function takes three embeddings of text generated by the neural
network from the two positive and one negative classes respectively.
Let text1, text2 and text3 are the text input from the positive, positive, and
negative classes respectively then o(text1), o(text1), and o(text1) represent the
generated embedding for text1, text2 and text3 respectively. The loss function
calculates the similarity and dissimilarity using equation 1 as

υ(o(text1), o(text2)) ≤ υ(o(text1), o(text3)) (1)

The function υ calculates the cosine distance between the embedding


generated for text1 and text2. This distance must be less than or equal to the
dissimilar entities text1 and text3. In equation (1) taking a right-hand side to
a left-hand side and subtracting µ from it equation becomes,

υ(o(text1), o(text2)) - υ(o(text1), o(text3)) - µ ≤ 0 (2)

The value µ sets the cosine distance between embedding developed


for positive and negative samples of text. This loss function assures that
embedding developed by the model’s sentiment-aware layer generates
sentiment-oriented text embeddings.
A Supervised Framework for Sentiment Analysis: A Two-Stage Approach 293

3.5 Sentiment Classifier


The sentiment classifier is responsible for identifying the sentiment of
a given piece of text. It is attached to the sentiment-aware layer in a feed-
forward fashion. After getting sentiment-oriented text embedding, it
includes only one layer for identifying the sentiment of the text.

4. Discussion

4.1 Selection of Triplet Loss Function


Various loss functions like cross-entropy loss, constructive loss, and
margin ranking loss are widely used in sentiment analysis. Cross entropy
loss is useful for sentiment classification while the remaining two are used
for developing the embedding of input text.
The cross-entropy loss is the standard choice for classification tasks
however, when it comes to building embeddings of input text, cross-
entropy loss might not be as useful. Cross-entropy loss focuses on the
categorical distribution of classes and doesn't directly aim at learning
meaningful semantic representations of the input. So, it is not very useful
for embedding development.
While constructive loss aims to create embeddings that capture
relationships between data points, it does so by encouraging similar
examples to be close together without explicitly emphasizing a desired
separation from dissimilar examples. While triplet loss adds an extra layer
of control by focusing on the relative distances between positive and
negative examples in addition to promoting proximity between similar
pairs using µ from equation 1. This nuanced emphasis on both relationships
and separations makes triplet loss more effective in scenarios where both
capturing similarities and differences are crucial, such as sentiment analysis.
Triplet loss holds an advantage over margin loss by not only establishing
a desired margin between different classes but also by directly optimizing
294 Prashantkumar M. Gavali & Suresh K. Shirgave

the distances between individual examples. While margin loss sets a fixed
threshold for class separation, triplet loss refines this concept by using
triplets of two positive, and negative examples, encouraging embeddings to
not only maintain a certain distance between classes but also to emphasize
the relative positioning of each embedding. This makes triplet loss more
adaptable and effective for tasks like embedding learning and similarity
measurement.
The choice of the triplet loss function for our model was motivated by
its efficacy in addressing the challenges inherent in the sentiment analysis
task. Triplet loss is particularly well-suited for tasks involving embedding
generation, where the objective is to pull similar instances closer together
in the embedding space while pushing dissimilar instances apart. In the
proposed model, we also want to get embedding of positive sentence
embedding closer to the other positive sentence embedding, and at the same
time, it should be far apart from the negative sentiment embedding. This
embedding generation setup helps the model to distinguish between positive
and negative sentiment more easily than the mixed positive and negative
sentiment embeddings.

4.2 Need for Siamese Network in Proposed Framework


A Siamese network is a neural network architecture designed to learn and
compare similarity between pairs of data points. It consists of two or more
identical subnetworks that share weights and parameters, taking in separate
input samples. The outputs from these subnetworks are then compared
to measure the similarity between the input pairs. Siamese networks find
valuable applications across various natural language processing (NLP)
tasks due to their ability to handle similarity and relationship learning. It
is widely used in tasks such as Text Matching (Pang et. al., 2016), Named
Entity Recognition (Oniani et. al., 2022), Question Answering (Shonibare,
2021), Recommendation system (Serrano & Bellogin, 2023; Angelovska,
A Supervised Framework for Sentiment Analysis: A Two-Stage Approach 295

2021) and image retrial (Wiggers et. al., 2019).


Siamese Network is extensively used in sentiment analysis (Zhang et. al.,
2022) to handle the problem of performance fall with the limited amount
of training data. Authors injected external knowledge about limited training
data in deep neural networks for sentiment identification. This external
information was identified by checking the similarity with the Siamese
Network. Further, the Siamese Network is used to develop sentiment-aware
text representation in resource-poor language (Choudhary et. al., 2018).
Authors have trained both resource-rich and resource-poor languages
together on the similar text by using the similarity index. It has provided
better sentiment results on resource-poor language. Further, the Siamese
Network is also used in topic modeling (Huang, 2018) which is useful in
aspect-based sentiment analysis.
The primary goal of the proposed framework is to develop sentiment-
oriented embedding for text. In the proposed model, the Siamese network
plays a crucial role in enhancing text embeddings based on the underlying
sentiment. To ensure that positive sentiment texts are proximate while
being distinctly separated from negative ones, the triplet loss function is
employed. This function operates on the embeddings produced for two
positive texts and one negative text. A standard neural network generates
embeddings and updates weights sequentially. But the requirement here
is simultaneous embedding generation for similarity check. This can be
achieved through shared weights. The Siamese network achieves this by
leveraging identical subnetworks to create consistent embeddings and
shared weights. This enables the desired sentiment-driven proximity and
distinction within the embedding space. Therefore, the proposed model uses
a Siamese Network for embedding generation.
296 Prashantkumar M. Gavali & Suresh K. Shirgave

5. Experimental Results

5.1 Experimental Setting


Purpose of Experiments
The pre-trained BERT model generates the text embedding dynamically
by considering local words. However, these embeddings are not sentiment-
oriented. So we have proposed a framework in two steps. These steps
will result in the development of sentiment-aware text representation and
sentiment identification. Extensive experiments were carried out to assess
the performance of the suggested framework. Experiments aim to address
the following questions:
1. D
 oes the modification of embedding improve the result of sentiment
analysis?
2. Does the proposed framework develop sentiment-oriented embedding?
3. D
 o the modified embeddings are separated sufficiently from the
dissimilar classes?

To answer these questions, we used two benchmarked datasets and


sentiment analysis models. Tensorflow 2.0 is used to implement these
models. Training is carried out on a system equipped with an Nvidia
Geforce GeTX Titan X graphics card. Before beginning actual training,
a few parameters such as epochs, batch size, and learning rate must be
established. These are known as hyper-parameters. We've set the epochs to
25, the batch size to 32, and the learning rate to 0.001.

Datasets
The proposed framework uses the dataset for two reasons. (1) For
generating sentiment-oriented embedding (2) For training the model for
sentiment identification task. For sentiment embedding generation, we
have used a sentiment-oriented sentence-labeled dataset (Dimitrios, 2015).
A Supervised Framework for Sentiment Analysis: A Two-Stage Approach 297

Following are some of the reasons for selecting this dataset for generating
sentiment-aware embedding.
•R
 elevance: Since the framework is used for sentiment analysis, the
specified dataset is relevant as it contains text data with annotations for
the sentiment.
•Q
 uality: The dataset is of high quality as it is without irrelevant and
noisy information. The reviews present in this dataset are clear, concise,
and without spelling and grammar errors.
• Balance: The dataset includes 1500 positive and 1500 negative reviews.
•D iversity: All the reviews present in the dataset are selected from
famous IMDB, Amazon, and Yelp websites. Thus it includes a wide
range of text styles, genres, and topics. This helps to ensure that the
model can generalize well to new and unseen text data.
•A
 nnotation Quality: The dataset contains high-quality annotations or
labels.

For the second task, sentiment identification, the following datasets were
used.
1. S
 ST-2 (Socher et. al., 2013): The Stanford Sentiment Treebank dataset
is a benchmark dataset commonly used for sentiment analysis tasks. It
consists of movie reviews from the Rotten Tomatoes website, labeled
with either a positive or negative sentiment.
2. IMDB (Maas et. al., 2011): IMDB is a well-known and extensively
used movie review website. This website's user reviews are helpful for
sentiment analysis. The IMDB dataset includes 50,000 movie reviews
taken from the IMDB website. These reviews are categorized as
positive or negative.
298 Prashantkumar M. Gavali & Suresh K. Shirgave

Table 1 summarizes the experimental datasets used in this study. The table
includes columns for the number of training, development, testing, and
total examples, as well as a column indicating the available classes in each
dataset. The final column, "balanced," indicates whether or not each training
class includes an equal number of examples.

Table 1. Statistics of Datasets

Dataset # Train # Dev # Test # Total Classes Balanced


Sentiment Positive,
2900 50 50 3000 Yes
Sentences Negative
Positive,
SST-2 8544 1101 2210 11855 Yes
Negative
Positive,
IMDB 48000 1000 1000 50000 Yes
Negative

Preprocessing
The preprocessing task mainly consists of tokenization, sentence
segmentation, adding special tokens, padding and truncation, and input
encoding. These preprocessing steps help to convert the raw text data into a
format that is suitable for the framework to process and make predictions.
•T
 okenization: It breaks down words into smaller sub-words and
characters to handle out-of-vocabulary words. This helps to reduce the
size of the vocabulary, which is essential in large-scale NLP models.
• Sentence segmentation: It splits the input into individual sentences.
•P adding and truncation: The input sequences must be of the same
length, so shorter sequences must be padded with zeros, and longer
sequences must be truncated to a maximum length.

Classifiers
The following classification models are considered to evaluate the impact
A Supervised Framework for Sentiment Analysis: A Two-Stage Approach 299

of the proposed model on sentiment analysis.


1. T
 F-IDF (Pang & Lee, 2005): The logistic regression model is trained
using TF-IDF features, with L2 regularization to classify the sentiment
of the text.
2. Naïve Bayes (Pang & Lee, 2005): This probabilistic classification
algorithm modeled the joint probability of the input features and the
output labels using Bayes' theorem to identify the sentiment of the text.
3. S
 VM (Borg & Boldt, 2020): The sentiment classification of textual
data was performed using the SVM model, which utilized text
representations generated by averaging the GloVe word embeddings.
4. C
 NN (Kim, 2014): CNN models have been trained using a combination
of convolutional, pooling, and fully-connected layers, with dropout
regularization to prevent overfitting and pre-trained Word2Vec word
embeddings to represent the input text.
5. L
 STM (Sivakumar & Uyyala, 2021): The LSTM network consisting of
input, forget, and output gates are used to identify the sentiment of the
text.
6. B
 ERT (Devlin et. al., 2019): Pre-trained BERT model is used to identify
the sentiment of the text. After the encoder one dense layer was added
to determine the sentiment of the text.
7. M
 C-AttCNN-AttRNN (Cheng et. al., 2020): Multi-channel model with
CNN and RNN for extracting local and long-term features using an
attention-based mechanism.
8. R
 outing Vector (Abaskohi et. al., 2023): The contrastive paraphrase-
based prompt learning model is also used for sentiment analysis on
various datasets.

Evaluation Metrics
Various metrics like accuracy, precision, recall, and f1 score, are available
to assess the sentiment models. Accuracy directly measures the percentage
300 Prashantkumar M. Gavali & Suresh K. Shirgave

of correctly classified instances. In addition to the accuracy metric, the F1


score is more promising as it considers precision and recall at the same time.
The formula to calculate the F1 score (Sasaki, 2007) is:

F1 score = 2*(P * R) / (P+R) (3)

Where P and R are precision and recall respectively.


For evaluation purposes, we have considered both accuracy and f1 score.

5.2 Results
Comparative Sentiment Results
Experiments were carried out to compare the performance of the
proposed framework to that of various benchmark models. Table II shows
sentiment results (Accuracy and F1 score) for Logistic Regression IF-IDF,
Naïve Bays, SVM, LSTM, CNN, and BERT models on the SST-2 and
IMDB datasets. From Table II, it is observed that the proposed model
outperforms the listed model on both datasets. This is because the proposed
two-stage supervised framework develops sentiment-oriented embeddings
and uses the same for sentiment analysis.

Table 2. Sentiment Analysis Result on Different Datasets with Different


Classification Networks

Sentiment Analysis Result


Models Sentiment Sentence SST-2 IMDB
Accuracy F1 Score Accuracy F1 Score Accuracy F1 Score
TF-IDF 87.54 86.75 86.57 78.47 87.65 87.86
Naïve
84.75 78.80 82.86 72.56 86.46 80.42
Bayes
SVM 87.42 86.45 85.42 82.57 86.75 86.42
A Supervised Framework for Sentiment Analysis: A Two-Stage Approach 301

CNN 84.26 85.42 82.45 84.52 87.42 87.62


LSTM 86.45 86.74 84.30 83.72 86.75 86.20
BERT 88.65 87.65 87.45 87.46 89.56 90.71
MC-
AttCNN- 89.75 88.89 88.75 88.96 90.58 90.42
AttRNN
RV 89.70 89.64 88.70 88.84 90.87 90.13
Proposed
Frame 91.89 90.87 89.80 90.56 91.75 91.62
work

Embedding Visualization
The t-SNE (t-distributed Stochastic Neighbor Embedding) (Hinton &
Rowei, 2013) is a commonly used method for visualizing data with high
dimensions in a space with fewer dimensions. Embeddings generated by
the proposed model, in task 1, are visualized using t-SNE. It is useful to
know whether the generated embeddings are sentiment-oriented. For the
comparison, we have also visualized text embedding generated by the
BERT encoder. The text embedding dimensions are reduced to 2 by the
t-SNE algorithm. These dimensions are used to plot the embedding in a
scatter plot graph. The t-SNE algorithm is trained with 450 iterations and
a perplexity of 30. We obtained the two-dimensional embedding graph
as shown in Fig. 2 and Fig. 3. Fig. 2 shows embedding generated by the
BERT model while Fig. 3 shows embedding generated by the proposed
framework.
we have also visualized text embedding generated by the BERT encoder. The text embedding
dimensions are reduced to 2 by the t-SNE algorithm. These dimensions are used to plot the
embedding in a scatter plot graph. The t-SNE algorithm is trained with 450 iterations and a
perplexity of 30. We obtained the two-dimensional embedding graph as shown in Fig. 2 and Fig.
Prashantkumar
302embedding generated
3. Fig. 2 shows by theM.BERT
Gavalimodel
& Suresh K. Shirgave
while Fig. 3 shows embedding generated
by the proposed framework.

Figure 2. Visualization
Figure of textofEmbeddings
2. Visualization in 2-dimensional
text Embeddings Space by BERT
in 2-dimensional Space model
by
BERT model

Figure
Figure 3. 3. Visualization
Visualization of text Embeddings
of text Embeddings in 2-dimensional
in 2-dimensional Space by theSpace by the
proposed model
proposed model
Small circles that are filled in Figure 2 and Figure 3 represent the two-dimensional embedding.
The closely spaced-filled circles signify embeddings that are closer together. Figure 3 shows that
positive text embeddings, represented by purple circles, are closer to each other, while negative
text embeddings, represented by red circles, are also closer to each other. This indicates that
positive and negative embeddings are more coherent and distinct. As expected, text embeddings
are separated based on their sentiment orientation. Negative text representations are more coherent
than positive ones because fewer positive text embeddings are intermixed with negative
representations. However, this is only true for a smaller number of embeddings when compared to
A Supervised Framework for Sentiment Analysis: A Two-Stage Approach 303

Small circles that are filled in Figure 2 and Figure 3 represent the
two-dimensional embedding. The closely spaced-filled circles signify
embeddings that are closer together. Figure 3 shows that positive text
embeddings, represented by purple circles, are closer to each other, while
negative text embeddings, represented by red circles, are also closer to
each other. This indicates that positive and negative embeddings are more
coherent and distinct. As expected, text embeddings are separated based
on their sentiment orientation. Negative text representations are more
coherent than positive ones because fewer positive text embeddings are
intermixed with negative representations. However, this is only true for a
smaller number of embeddings when compared to the original embedding
representation provided by BERT shown in Figure 2. Figure 2 illustrates
that the positive and negative embeddings are located near one another,
suggesting that they do not take into account the sentiment of the text and
do not differentiate the representations based on sentiment.

Similarity of Text Embedding


There are two levels of sentiments positive and negative. The embeddings
of positive text should exhibit a higher degree of proximity to one another,
whereas these embeddings must be distanced further from the negative
embedding. To examine this phenomenon, we selected two sample sentences
from each category and evaluated their similarity and dissimilarity with the
remaining review. We utilized cosine distances between the embeddings to
calculate these metrics.
304 Prashantkumar M. Gavali & Suresh K. Shirgave

Table 3. Proximity between sample sentence and other samples using the
proposed model

Class Sample Sentence Most Similar Most Dissimilar


“Very good quality. “Excellent product “Honestly it didn't
Excellent product.” for the price.” taste THAT fresh.)”
(cosine istance (cosine distance
=0.0158) = 0.5631)
“Great quality at a “The reception “I really loved the
Positive given price.” through this headset storyline and the
is excellent.” poler bear was kinda
(cosine distance cute. But if anyone
=0.0534) has a question about
Fort Steele, just
ask away:)” (cosine
distance: 0.4984)
“Bad product. Very “Disappointing “10/10”
sad by using the accessory from a (cosine
product.” good manufacturer.” distance=0.5026)
(cosine distance
Negative =0.0350)
“Very “Very Displeased.” “It works!”
Disappointing (cosine distance (cosine distance
Performance.” =0.0403) =0.4537)

According to the findings presented in Table 3, it is apparent that the


sample sentences belonging to the positive and negative classes demonstrate
a higher degree of proximity to other sentences within their respective
sentiment classes. Conversely, these same sample sentences are significantly
further away from sentences belonging to dissimilar sentiment classes.
In summary, experimental sentiment analysis provides the answer to the
questions posed in section 5.1.1. The answers to these questions are:
•T
 he modification of embedding improves the result of sentiment
analysis.
A Supervised Framework for Sentiment Analysis: A Two-Stage Approach 305

• The proposed framework develops sentiment-oriented embedding.


•T
 he modified embeddings are separated sufficiently from the dissimilar
classes.

6. Conclusion and Future

Expressions such as "I am extremely pleased with this product" and "I
am extremely dissatisfied with this product" convey opposite sentiments
in natural language. Pre-trained BERT models generate sentence
representations that consider the context. However, the high cosine
similarity between these two sentences indicates that text embeddings are
not sentiment-focused. To address this limitation, this paper introduced a
novel two-fold supervised BERT model that developed sentiment-aware
embeddings for text. The text representations were then used as input
for a feedforward neural network, which was designed to determine the
overall sentiment of the given text. A comprehensive investigation was
conducted to assess the effectiveness of intermediate text embeddings
generated for sentiment analysis. The findings indicated that the proposed
model facilitated the development of text embeddings specific to sentiment
orientation and improved the performance of sentiment analysis.
In the future, the framework can be extended further for developing
embedding according to the fine-grained sentiments of text like extremely
positive, positive, neutral, negative, and extremely negative. The new loss
function or different model architecture can help to consider fine levels
of sentiment while generating embeddings. This provision may improve
the result of fine-grained sentiment analysis. Further, the proposed system
can also be extended for handling sarcasm and irony present in the text
while developing sentiment embeddings. In this case, sentences might
have similar structures but use sarcasm, irony, or other forms of indirect
expression. These cases can mislead the system while developing sentiment-
306 Prashantkumar M. Gavali & Suresh K. Shirgave

oriented embeddings. Enhancing the system's ability to handle sarcasm


and irony may involve incorporating a deeper understanding of contextual
cues and linguistic nuances. This may lead to better sentiment-oriented
embedding generation and classification even with sarcasm or irony text.

Acknowledgement The authors would like to thank all those who contributed to the
completion of this research study and paper.

Funding No funding was received for this research work.

Declarations

Conflict of Interest The authors have no relevant financial or non-financial interests to


disclose.

References

Abaskohi, A., Rothe, S., Yaghoobzadeh, Y. (2023). LM-CPPF: Paraphrasing-


Guided Data Augmentation for Contrastive Prompt-Based Few-Shot Fine-
Tuning. arXiv:2211.11754v3.
Angelovska, M., Sheikholeslami, S., Dunn, B., and Payberah, A. H. (2021).
Siamese Neural Networks for Detecting Complementary Products. In
Proceedings of the 16th Conference of the European Chapter of the
Association for Computational Linguistics: Student Research Workshop,
pages 65–70, online. Association for Computational Linguistics.
Arbane, M., Benlamri, R., Brik, Y., and Alahmar, A. (2023). Social media-based
Covid-19 sentiment classification model using Bi-LSTM. Expert Systems
with Applications. Vol. 212 (2023)
Baccianella, S., Esuli, A., Sebatiani, F. (2010). SentiWordNet 3.0: An Enhanced
Lexical Resource for Sentiment Analysis and Opinion Mining. Proceedings
A Supervised Framework for Sentiment Analysis: A Two-Stage Approach 307

of the International Conference on Language Resources and Evaluation,


LREC 2010, 17-23 May 2010, Valletta, Malta
Borg, A., Boldt, M. (2020). Using VADER sentiment and SVM for predicting
customer response sentiment. Expert Systems with Applications. 162:
113746, (2020) DOI: 10.1016/j.eswa.2020.113746
Boy, S., Ruiter, D., Klakow, D. (2021). Emoji-Based Transfer Learning for
Sentiment Tasks. Proceedings of the 16th Conference of the European
Chapter of the Association for Computational Linguistics: Student
Research Workshop. April 2021. Association for Computational Linguistics.
PP 103-110. DOI: 10.18653/v1/2021.eacl-srw.15
Chikersal, P., Poria, S. and Cambria, E. (2015). SeNTU: Sentiment Analysis of
Tweets by Combining a Rule-based Classifier with Supervised Learning.
In Proceedings of the 9th International Workshop on Semantic Evaluation
(SemEval 2015). 647–651, Denver, Colorado. Association for Computational
Linguistics
Chen, H., Sun, M., Tu, C., Lin, Y., and Liu, Z. (2016). Neural sentiment
classification with user and product attention. In Proceedings of the 2016
conference on Empirical Methods in Natural Language processing. 1650-
1659
Cheng, Y., Yao, L., Xiang, G., Zhang, G., Tang, T., Zhong. L. (2020). Text
Sentiment Orientation Analysis Based on Multi-Channel CNN and
Bidirectional GRU with Attention Mechanism. IEEE Access.
Choudhary, N., Singh, R., Bindlish, I., Shrivastava, M. (2018). Emotions are
Universal: Learning Sentiment Based Representations of Resource-Poor
Languages using Siamese Networks. arxiv. arXiv:1804.00805
Conneau, A., Schwenk, H., Cun, Y. and Barrault, L. (2017). Very deep
convolutional networks for text classification. In Proceedings of the 15th
Conference of the European Chapter of the Association for Computational
Linguistics. 1:1107–1116
Dai, A.A., Hu, X.H., Nie, J.Y., Chen, J.P. (2022). Learning from word semantics
to sentence syntax by graph convolutional networks for aspect-based
sentiment analysis. International Journal of Data Science and Analytics.
Volume 14, Issue 1, Page 17-26. DOI:10.1007/s41060-022-00315-2
Devlin, J., Chang, M. W., Lee, K. and Toutanova, K. (2019). BERT: Pre-
training of Deep Bidirectional Transformers for Language Understanding.
Proceedings of the 2019 Conference of the North American Chapter
308 Prashantkumar M. Gavali & Suresh K. Shirgave

of the Association for Computational Linguistics: Human Language


Technologies. 1:4171-4186.
Dragoni, M. and Petrucci, G. (2017). A Neural Word Embeddings Approach
for Multi-Domain Sentiment Analysis. IEEE Transactions on Af fective
Computing. vol. 8, no. 4, pp. 457-470. doi: 10.1109/TAFFC.2017.2717879.
Dimitrios, K., Misha, D., Nando De, F., and Padhraic, S. (2015). From Group to
Individual Labels using Deep Features. KDD. Dataset available at: https://
archive.ics.uci.edu/ml/datasets/Sentiment+Labelled+Sentences
Duncan, B., Zhang, Y. (2015). Neural networks for sentiment analysis on Twitter.
2015 IEEE 14th International Conference on Cognitive Informatics &
Cognitive Computing (ICCI*CC). DOI: 10.1109/ICCI-CC.2015.7259397
Gleize, M., Shnarch, E., Choshen, L., Dankin, L., Moshkowich, G., Aharonov, R.,
and Slonim, N. (2019). Are You Convinced? Choosing the More Convincing
Evidence with a Siamese Network. In Proceedings of the 57th Annual
Meeting of the Association for Computational Linguistics, pages 967–976,
Florence, Italy. Association for Computational Linguistics.
Hinton, G., and Rowei, S. (2013). Stochastic Neighbor Embeddings. Natural
Information Processing Systems.
Huang, M., Rao, Y., Liu, Y., Xie, H., Wang, F. (2018). Siamese Network-Based
Supervised Topic Modeling. Proceedings of the 2018 Conference on
Empirical Methods in Natural Language Processing. Oct-Nov 2018.
Brussels, Belgium, Association for Computational Linguistics, 4652-4662.
Joulin, A., Grave, E., Bojanowski, P., and Mikolov, T. (2017). Bag of Tricks
for Efficient Text Classification. Proceedings of the 15th Conference of
the European Chapter of the Association for Computational Linguistics.
Volume 2, Short Papers, pages 427–431.
Kabakus, A. T. (2022). A novel COVID-19 sentiment analysis in Turkish based
on the combination of convolutional neural network and bidirectional long-
short term memory on Twitter. Concurrency and Computation-Practice and
Experience. Vol. 34, Issue 22. DOI: 10.1002/cpe.6883.
Ke, P., Ji, H., Liu, S., Zhu, X., Huang, M. (2020). SentiLARE: Sentiment-
Aware Language Representation Learning with Linguistic Knowledge.
arXiv:1911.02493
Kim, Y. 2014. Convolutional neural networks for sentence classification.
Proceedings of the (2014) Conference on Empirical Methods in Natural
A Supervised Framework for Sentiment Analysis: A Two-Stage Approach 309

Language Processing. 1746-1751 (2014) https://www.aclweb.org/anthology/


D14-1181/
Koch, G., Zemel, R., and Salakhutdinov, R. (2015). Siamese Neural Networks
for One-shot Image Recognition. Proceedings of the 32nd International
Conference on Machine Learning. Lille, France.
Lapesa, G. and Evert, S. (2017). Large Scale Evaluation of Dependency-based
DSMs: Are They worth of Efforts. 15th Conference of the European
Chapter of the Association for Computational Linguistics. 2:394-400
Maas, A. L., Daly, R. E., Pham P. T., Huang D., Ng, A., and Potts, C.(2011).
Learning word vectors for sentiment analysis. Proceedings of the 49th
Annual Meeting of the Association for Computational Linguistics: Human
Language Technologies. 1:142-150. https://www.aclweb.org/anthology/P11-
1015/
Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013). Efficient estimation
of word representations in vector space. In Proc. Int. Conf. Learning
Representations.
Müller, T., Pérez-Torró, G., Franco-Salvador, M. (2022). Few-Shot Learning with
Siamese Networks and Label Tuning. In Proceedings of the 60th Annual
Meeting of the Association for Computational Linguistics (Volume 1: Long
Papers), pages 8532–8545, Dublin, Ireland. Association for Computational
Linguistics.
Oniani, D., Sivarajkumar, S., Wang, Y. (2022). Few Shot Learning for
Clinical Natural Language Processing using Siamese Neural Network.
arXiv:2208.14923v2.
Pang, B., and Lee, L. (2005). Seeing stars: Exploiting class relationships for
sentiment categorization with respect to rating scales. Proceedings of the
43rd Annual Meeting on Association for Computational Linguistics. 115-
124.
Pang, B., Lee, L., and Vaithyanathan, S. (2002). Thumbs up? Sentiment
classification using machine learning techniques. In Proc. Assoc. Comput.
Linguistics Conf. Empirical Methods Natural Lang. Process. 10: 79-86
Pang, L., Lan, Y., Guo, J., Xu, J., Wan, S., Cheng, X. (2016). Text Matching as
Image Recognition. Arxiv. https://doi.org/10.48550/arXiv.1602.06359
Pennington, J., Socher, R., and Manning, C. (2014). GloVe: Global Vectors for
Word Representation. Proceedings of the 2014 Conference on Empirical
310 Prashantkumar M. Gavali & Suresh K. Shirgave

Methods in Natural Language Processing (EMNLP). Pages 1532–1543.


Poongothai, M. and Sangeetha, M. (2020). Chronological-brain storm
optimization based support vector neural network for sentiment
classification using map reduce framework. Sadhana-Academy Proceedings
in Engineering Sciences. 45(1) DOI: 10.1007/s12046-020-01342-0
Radford, A., Narasimhan, K., Salimans, T., Sutskever, I. (2018). Improving
language understanding by generative pre-training.
Sasaki Y. (2007). The Truth of the F-measure.
Serrano, N., Bellogin, A. (2023). Siamese Neural Network in Recommendation.
Neural Computing and Application. 35. 13941–13953. https://doi.
org/10.1007/s00521-023-08610-0
Shi, Y., Zheng, Y., Guo, K., and Ren, X. (2021). Stock Movement Prediction
with Sentiment Analysis Based on Deep Learning Network. Concurrency
and Computation-Practice and Experience. Vol. 33. Issue 6. DOI:10.1002/
cpe.6076
Shonibare, O. (2021). ASBERT: Siamese and Triplet network embedding for
Open Question Answering. arXiv:2104.08558
Sivakumar, M., Uyyala, S.R. (2021). Aspect-based sentiment analysis of mobile
phone reviews using LSTM and fuzzy logic. International Journal of Data
Science and Analytics. Volume 12, Issue 4, Page 355-367. DOI: 10.1007/
s41060-021-00277-x
Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C., Ng, A., Potts, C. (2013).
Recursive deep models for semantic compositionality over a sentiment
treebank. In Proceedings of the 2013 Conference on Empirical Methods in
Natural Language Processing. Pages 1631–1642.
Tan, L.I., Phang, W.S., Chin, K.O. and Patricia, A. (2015). Rule-Based Sentiment
Analysis for Financial News. 2015 IEEE International Conference on
Systems, Man, and Cybernetics. pp. 1601-1606
Tang, D., Wei, F., Qin, B., Yang, N., Liu, T., and Zhou, M. (2016). Sentiment
Embeddings with Applications to Sentiment Analysis. IEEE Transactions
on Knowledge and Data Engineering. Vol. 28, No. 2.
Tang, D., Qin, B., and Liu, T. (2015). Document modeling with gated recurrent
neural network for sentiment classification. Proceedings of the 2015
Conference on Empirical Methods in Natural Language Processing. 1422-
1432.
A Supervised Framework for Sentiment Analysis: A Two-Stage Approach 311

Thakur, R. K., and Deshpande, M. V. (2019). Kernel Optimized-Support Vector


Machine and Mapreduce framework for sentiment classification of train
reviews. Sadhana-Academy Proceedings in Engineering Sciences. 44(1)
DOI: 10.1007/s12046-018-0980-1
Toledo, G., and Marcacini, R. (2022). Transfer Learning with Joint Fine-Tuning
for Multimodal Sentiment Analysis. arxiv-preprint. arXiv:2210.05790.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A., Kaiser,
L., Polosukhin, I. (2017). Attention is all you need. arXiv:1706.03762
Wang, Y., Huang, G., Li, J., Li, H., Zhou, Y. and Jiang, H. (2021). Refined Global
Word Embeddings Based on Sentiment Concept for Sentiment Analysis.
IEEE Access.
Wiggers, K. L., Britto, A. S., Heutte, L., Koerich A. L. and Oliveira, L. S. (2019).
Image Retrieval and Pattern Spotting using Siamese Neural Network. 2019
International Joint Conference on Neural Networks (IJCNN), Budapest,
Hungary, 2019, pp. 1-8. doi: 10.1109/IJCNN.2019.8852197.
Yu, L., Li, K., and Zhang, X. (2018). Refining Word Embeddings Using Intensity
Scores for Sentiment Analysis. IEEE Transactions on Audio, Speech, and
Language Processing. Vol. 26, No. 3
Zhang, S., Zhang, X., Chan, J., and Rosso, P. (2019). Irony detection via
sentiment-based transfer learning. Information Processing & Management.
56(5):1633-1644.
Zhang, J., Mao, K., Xu, Y., Li, P. (2022). KASN: Knowledge-Aware Siamese
Network for sentiment analysis. AIIPCC 2022. The Third International
Conference on Artificial Intelligence. Information Processing and Cloud
Computing. pp. 1-8.
Zhao, C., Wang, S., and Li, D. (2017). Deep transfer learning for social media
cross-domain sentiment classification. In Chinese National Conference on
Social Media Processing. Springer, Singapore. 232-243.
Zuo, Z. (2018). Sentiment analysis of steam review datasets using naive bayes
and decision tree classifier
Zhou, J., Tian, J., Wang, R., Wu, Y., Xiao, W., He, L. (2020). SentiX: A
Sentiment-Aware Pre-Trained Model for Cross-Domain Sentiment Analysis.
Proceedings of the 28th International Conference on Computational
Linguistics, pages 568–579

You might also like