Deep Sentient Network With Multifarious Features and Inter-Mutual Attention Mechanism For Target-Specific Sentiment Classification 2

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Deep Sentient network with multifarious features

and inter-mutual attention mechanism for target-


specific sentiment classification
2022 IEEE Conference on Interdisciplinary Approaches in Technology and Management for Social Innovation (IATMSI) | 978-1-6654-7719-2/22/$31.00 ©2022 IEEE | DOI: 10.1109/IATMSI56455.2022.10119248

Deepak Chowdary Edara Venkataramaphanikumar S Venkata Krishna Kishore Kolli


Department of CSE Department of CSE Department of CSE
VFSTR Deemed to be University VFSTR Deemed to be University VFSTR Deemed to be University
Vadlamudi, Guntur, India Vadlamudi, Guntur, India Vadlamudi, Guntur, India
edara.deepakchowdary@yahoo.com svphanikumar@yahoo.com kishorekvk_1@yahoo.com

Abstract— Target-based aspect level sentiment analysis mechanisms to ensure that the model is focused on the aspects
(TBASA) seeks to discover the polarity of the text towards of a word. After including the position, and pertinent
certain aspect terms in each text. Most of the recent studies information in the model, the RNN approach has gained
utilize deep learning (DL) frameworks like Convolutional improved performance, but still, the training remains time-
Neural Networks (CNN) and Recurrent Neural Networks intensive. The RNN-CNN-based approaches offer researchers
(RNN) to predict the influences of multiple contextual aspects an alternate technique with an understandable design and
on sentiment polarity. Both CNN and RNN are successfully used achieved better performance [10]–[12].
earlier to create complicated semantic representations.
However, existing approaches fail to capture the sequence In earlier research, RNN with an attention technique or a
information due to the high dimensionality. In this paper, a CNN with a gating mechanism was utilized to increase
typical approach called a Deep Sentient network with a novel performance. However, each technique compromises the
inter-mutual attention mechanism is proposed to tackle this merits of the other. Thus, a novel approach called Deep
issue. The proposed model adds the sequence information Sentient is being proposed to identify and extract the
identified with RNN into CNN to consistently anticipate the multifarious semantic relations and sequence information with
polarity. It also learns the contextual and target terms aspect-specific correlation. Additionally, to discriminate the
sequentially to understand the mutual impact between the context-relevant information, the word embedding
features. Furthermore, both Part-of-Speech (POS) and position
incorporates position and POS information.
information are also included in the input layer as background
knowledge. Finally, a series of experiments are performed on The following are the main contributions of this paper:
various benchmark datasets to verify the efficacy of our
proposed approach. 1) A unique DSN is built on the combination of Bi-
LSTM and CNN.
Keywords—Sentiment Analysis, multifarious features, 2) The efficiency of positional information provided by
Intermutual Attention, Convolution Neural Network, Bi-LSTM,
supplementary information with contextual terms is
Target-specific-sentiment classification.
explored. In the third step, the effectiveness of the
I. INTRODUCTION task-relevant information through the filtered POS
information is examined.
SA is an essential task in Natural Language Processing
(NLP) that has gained considerable interest from academia 3) The experimental findings reveal that our DSN
and business. Conventional SA is divided into a variety of outperforms the existing methods on the frequently
categories: the sentence level and the document level. The utilized benchmark datasets.
purpose of ASA is to determine the polarity towards distinct The other sections of this paper are structured as follows.
aspects in each sentence. For example, the polarity of Section 2 presents the related works on ASA. Section 3
“bathroom” and “service” in the sentence “the bathroom is describes the proposed model in detail. Section 4
good, but the service is poor” is positive and negative, demonstrates the competitive performance on good datasets
respectively. Traditional techniques often create extensive and proves the efficiency of our DSN. Section 5 presents the
aspects for classifying sentiments [1], [2], nevertheless, the conclusion of our work.
development is tedious and results in more time complexity.
Moreover, its performance is highly dependent on the II. RELATED WORK
technical aspects and experiences great limitations. With the
advent success of DL in question answering [3], machine ASA differs from typical sentiment analysis in that it seeks
translation [4], [5], and various domains, studies have brought to detect the polarity of sentiment toward various aspect words
RNN and its variants into target-based sentiment analysis within a given sentence. Earlier approaches depend primarily
(TSA) like Hierarchical Bidirectional LSTM [6], target- on manmade features, such as dictionaries or machine
connection LSTM (TC-LSTM) and target-dependent Long- learning. When using a dictionary-based technique, the
Short Term Memory (TD-LSTM) [7]. Deep neural networks, emotional aspects of the words directly surrounding the aspect
such as ATAE-LSTM [8] and IAN [9], use attention term are mostly collected.
mechanisms to ensure that the model is focused on the aspects

978-1-6654-7719-2/22/$31.00 © 2022 IEEE


ed licensed use limited to: Vignan's Foundation for Science Technology & Research (Deemed to be University). Downloaded on January 19,2024 at 07:45:36 UTC from IEEE Xplore. Restriction
Hu and Liu [13] used the emotion dictionary to provide a uses the combined output of two different mechanisms of
label for each adjective in the phrase. The polarity of the attention. Figure 1 depicts the entire architecture of the DSN.
sentiment will be reversed when there are negatives in the
preceding words. Finally, the same sentiment labels are
assigned to all of the sentence's targets using a majority vote. Output Layer
To train classifiers, machine learning algorithms typically use
semantic features that are incorporated into the dataset. An
SVM-based sentiment label was generated using a parse tree
for each aspect. Jiang et al. [1] proposed SVM based approach Intermutual
to analyze the grammatical patterns from the text. Therefore, Attention
Authors [6] developed a novel LSTM to focus primarily on
intra-sequences. Authors [5] were the first to introduce LSTM
to ASA to exploit the relationship between target and
contextual words. They separated a sentence into right and left MP
components combined with contextual aspects and deployed
two distinct LSTMs to simulate the interactions. LSTM is only Convolution Layer
capable of collecting information that is passed from one text
to another. Authors [7] developed a novel attention-based
approach by concatenating both word and aspect embedding Bi-LSTM
to the LSTM input for classifying the sentiment polarities. The Layer
significant lack of training examples is perhaps the most
significant drawback of fine-grained SA. Authors [8]
proposed a study that utilizes an LSTM network to capture
target-based sequence information to assess the creation of an Target
attention weight vector. Embedding
` `
Recently, DL models like CNN [9], and RNN [14]–[17] POS
have attained impressive performance in ASA. Authors [18] Embedding
proposed a novel graph-based approach to develop sentiment
dictionaries with competitive performance. Still, all of the Multifarious Multifarious
above strategies take quite a long time and require a lot of
Features Features
manual effort. Authors [19] introduced a gating mechanism
with different filters into CNN to derive n-gram features. They Fig 1. Proposed architecture for TBSA
demonstrated that a CNN-based approach may be a feasible
alternative in the end. Memory network has been successful in
NLP and delivers a new solution for sentiment classification. B. Multifarious Features
Authors [20] proposed a novel attention-based LSTM
1. POS Embedding
framework to fetch the information acquired from a large
To analyze the polarity, POS features are used as
corpus of documents to accomplish the task of ASA. Further,
they employed CNN to extract higher-level semantic aspects, existing knowledge in sentiment classification. Simply
covering local and global aspects, compared to LSTM. Tang employing POS categories does not contribute to
et al. [21] introduced a novel attention framework based on sentiment polarity and it also adds noise to the proposed
exterior storage and utilized various filtering layers to assess model. Motivated by the efforts [23]–[25], the POS tags
the significance of text elements. Authors [22] proposed a are applied for every term in the input from the candidate
set Pverb, Padv, Padj, Poth. In our experiment, we utilize the
new technique to accomplish aspect extraction and sentiment
classification at the aspect level using a deep memory natural language toolkit (NLTK) to identify the POS index
network. of the sentence “We finally decided to visit this restaurant,
and to our pleasure, they have terrace dining, wonderful
III. PROPOSED MODEL since I had my jack with me.” Therefore, the POS index of
the input is indicated as posc = { Poth, Padv, Poth, Poth, Pverb,
A. Overview Poth, Poth, Poth, Pverb, Poth, Poth, Poth, Poth, Padj, Poth, Poth, Poth,
This section provides an outline of the proposed DSN Poth, Poth, Poth, Poth, Poth, Poth, Poth, Poth, Poth, Poth, Poth, Poth,
approach. Each layer of the DSN is then discussed in detail. Poth, Poth, Poth}. post = “Poth” represents the POS index of
For ASA, let us assume the sentence, 𝑆 = 𝑤1 , 𝑤2 , … , 𝑤𝑁 , the aspect. We determine the desired POS embedding of target
aspect term, 𝑇 = 𝑡1 , 𝑡2 , … , 𝑡𝑀 , where T is a subsequence of S, words 𝑝𝑜𝑠 𝑡 = [𝑝𝑜𝑠1𝑡 , 𝑝𝑜𝑠2𝑡 , . . . 𝑝𝑜𝑠𝑀 𝑡
] and context words
M is the target length, and N represents the sentence length. 𝑐 𝑐 𝑐 𝑐
𝑝𝑜𝑠 = [𝑝𝑜𝑠1 , 𝑝𝑜𝑠2 , . . . 𝑝𝑜𝑠𝑁 ] correspondingly by
The objective of the DSN approach is to deduce the polarity seeking the POS matrix 𝑃𝑝𝑜𝑠 ∈ 𝑅𝑑
𝑝𝑜𝑠×𝑁𝑈𝑀
, which is
of several aspects within the given text. Initially, the initialized randomly and adjusted throughout the training
contextual and target terms are converted into word phase, where Num shows the number of POS tags, and dpos
embedding representations and accompanied by the fusion of denotes the POS embedding dimension.
POS and position information. Then, the converted
representations are fed into two distinct Bi-LSTMs to capture 2. Positional Embedding
the sequence-based context from the sentence. Subsequently, Preliminary studies have understood the importance
the Bi-LSTM hidden state output is passed into CNN, which of position information and have typically addressed it by
applies maximum pooling to determine the relationship using a pre-defined approach and training. Tang et al. [1]
between target and contextual words. In the end, a classifier integrated and contributed two methods for incorporating

ed licensed use limited to: Vignan's Foundation for Science Technology & Research (Deemed to be University). Downloaded on January 19,2024 at 07:45:36 UTC from IEEE Xplore. Restriction
position information into the ASA. Based on earlier
c̃t = tanh(Wc [ht−1 ; xt ] + bc ) (5)
frameworks, we assume that the contextual phrases that
are closest to the specified target contribute more ct = it ⨂ c̃t + ft ⨂ ct−1 (6)
sentiments. Moreover, a relative distance measure is used
to measure the relationship between the target and context ht = ot ⨂ tanh(ct ) (7)
and results in a one-dimensional word positioning matrix
with one or more aspect information present in the text. E. Convolutional Neural Network
We presume that the target itself is indicated as 0 in the
word position index. The following equation (1) represents CNN[30]–[33] is a revolutionary finding for pattern
the calculation of the relative distance of words 𝑟𝑖 . recognition, image classification, and sentiment classification.
We use the Bi-hidden GRU's layer state to connect CNN to
i − te, i > te extract spatial and temporal relationships from the contextual
ri = {0, te ≤ i ≤ ts (1) and target aspects. For contextual aspects, we adopt a
t s − i, i < ts convolution layer with several convolution kernels and
various widths, using the correct hidden state as input. Higher-
where 𝑟𝑖 represents the relative distance of the ith word in a
level information is extracted from the input patterns using
text and 𝑡𝑠 and 𝑡𝑒 represent the timeline of the target,
this method. Specifically, we employ a 1D convolution layer
correspondingly.
to build local features with different filters in a fixed window
To get a briefer understanding, we provide an example size. Each filter is a latent semantic detector that follows the
below. Given the review “All the pasta is great, and the cooked n-gram feature pattern. Precisely, let 𝑋 = [𝑣1, 𝑣2 , … , 𝑣𝑙 ] be
lasagna is one of the tastiest I had in the town,” the index the fixed input sequence l with padding. We extract additional
position [2, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 𝑐×2𝑑ℎ
features by sliding a convolution filter 𝑊𝑖𝑐 = 𝑅𝐾𝑖 over the
17, 18] for the particular target “pasta” will be calculated. The
input window. Further, a max-pooling (MP) mechanism and a
position index of each term is 0, but the index of other contextual ReLu activation function is applied to identify the more
words rises with an increase in distance. The position indexing
important aspects of contexts 𝑔𝑖𝑐 as shown in equation (8). To
list will be [7, 6, 5, 4, 3, 2, 1, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,
compute the relationship between context and target aspects,
12, 13] when “cooked lasagna” is selected as the essential
these output representations of q convolution filters are
aspect word. After acquiring the relative distance ri, we compute
connected as the ultimate knowledge on an aspect denoted as
the position embedding 𝑝 = 𝑝1 , 𝑝2 , … , 𝑝𝑁 by finding the
𝑝 𝐼𝑐 = [𝑔1𝑐 ⨁ 𝑔2𝑐 ⨁ … ⨁ 𝑔𝑞𝑐 ] . To identify the most relevant
relevant position matrix 𝑃 ∈ 𝑅𝑑 ×𝑁 , which is initialized
aspects, we feed the target state hidden layer to the left of
arbitrarily and updated throughout the training process, CNN. The most significant aspects of p convolution filters
where 𝑑𝑝 indicates the position embedding dimension and N is
𝐼𝑡 = [𝑢1𝑡 ⨁ 𝑢2𝑡 ⨁ … ⨁ 𝑢𝑝𝑡 ], where 𝑢𝑖𝑡 is computed in equation
the sentence.
(9):
C. Word Embedding
We search up the embedding matrix 𝐸 ∈ 𝑅𝑣×𝑑 to g ci = Max ((relu(hci:i+Kc ∗ Wic + bc )) (8)
i
transform the N number of contextual terms [C1, C2,...,CN] and
the target of M words [T1, T2,... tM], where v represents the size uti = Max ((relu(hti:i+Kt ∗ Wit + bt )) (9)
of the vocabulary and d is the word embedding dimension. As i
a preliminary step for our analysis, we initialized E as a pre-
trained word embedding vector acquired from Glove [26].
Moreover, both the contextual and aspect embeddings were where 𝑏𝑐 and 𝑏𝑡 are bias, 𝐾𝑖𝑐 and 𝐾𝑖𝑡 indicate the width of ith
separately labeled as [𝑒1𝑐 , 𝑒2𝑐 , … , 𝑒𝑁𝑐 ] and [𝑒1𝑡 , 𝑒2𝑡 , … , 𝑒𝑀
𝑡
]. contextual convolution filter and target convolution filter
Furthermore, the POS embedding, position embedding, and respectively.
word embedding are concatenated together to obtain ith word F. Intermutual Attention Network
as 𝑥𝑖𝑐 = 𝑝𝑜𝑠𝑖𝑐 ⨁ 𝑝𝑖 ⨁𝑒𝑖𝑐 . Similarly, the position index of the
target as it is recorded with 0, and directly links word The sentiment is expressed in several ways from each
embedding with POS embedding of a labeled target as 𝑥𝑖𝑡 = term in a sentence. Consequently, we develop an attention
𝑝𝑜𝑠𝑖𝑡 ⨁𝑒𝑖𝑡 . mechanism [34] that analyzes both target and contextual
terms identically and estimate their interactions inter
D. Bidirectional Long Short-Term Memory Network mutually to regulate the generation of attention weight
The RNN base is extended to develop the LSTM network vectors for a specific aspect of the model. The attention
[27] . To solve the long-range dependence of the sequence weights vectors of aspects to contextual terms 𝛼𝑖 and
[28], [29]. There are three gating mechanisms in the LSTM contextual terms to aspects 𝛽𝑖 are obtained as follows:
unit, namely forget gate (ft), input gate (it), and output gate
(ot). These gating methods regulate the flow of knowledge for e(μ(g ci , I t ))
reading, storing, and updating at every time step. The αi = (10)
∑qj=1 e (μ(gcj , I t ))
following equations (2) - (7) denote the LSTM cell is:
μ(g ci , I t ) = tanh(g ci Wa (I t )T + ba ) (11)
ft = σ(Wf [ht−1 ; xt ] + bf ) (2)
e(μ(uti , I c ))
it = σ(Wi [ht−1 ; xt ] + bi ) (3) βi = (12)
∑pj=1 e (μ(utj , I c ))
ot = σ(Wo [ht−1 ; xt ] + bo ) (4)

ed licensed use limited to: Vignan's Foundation for Science Technology & Research (Deemed to be University). Downloaded on January 19,2024 at 07:45:36 UTC from IEEE Xplore. Restriction
Where ba, Wa represents the bias term and corresponding with positive, negative, conflict, and neutral. And only three
weight matrix. v is the ultimate aspect representation and a is sentiment polarities in the cancer tweet dataset were positive,
the context that is determined by the relative attention neutral, and negative. We eliminate conflict polarity by
weights which is specified as follows: leveraging valuable existing linguistic resources. Table 1
presents the details of the data set.
q B. Baseline Models
v= ∑ αi g ci (13) The proposed DSN model is compared with the following
i=1 conventional and modern approaches to evaluate its
p effectiveness.
a = ∑ βi uti (14) IGCN [40]: This model is a CNN-based network that
i=1
assesses the independent modeling of targets and employs a
gate mechanism to understand their interaction. Additionally,
As the final sentence representations for a classifier labeled this model also includes both position and POS information
S, and further combines the aspect representations v and the in the data.
context representations a by adding a non-linear layer as CMA-MemNet [41]: This model is comprised of a
shown in the following equation (15): memory network and a CNN with multi-head attention to
compensate for the contextual information in the sequential
z = tanh(Ws S + bs ) (15) manner that is neglected by memory networks.
GCAE [42]: When training this model, the training data is
Lastly, we apply a sigmoid-linear unit (SiLU) to acquire the fed through a convolution filter, which is then fed into a gated
positive, negative, and neutral polarities based on the convolution network with aspect embedding.
following equations (16)-(17): IAN [8]: The hidden states of this model are pooled and
averaged to generate the interactive attention weight vector,
which combines two distinct LSTMs to handle both target
and contextual aspects uniformly.
zk = ∑ wik Zi + bk (16)
ATAE-LSTM [7]: This model employs hidden states
concatenated with word embedding to obtain weight vector
a k (x) = ∑ zk σ(zk ) (17) from the attention layer by feeding LSTM with aspect
embedding and word embedding.
G. Model Training TD-LSTM [5]: For the final prediction, this model
Here, we performed model optimization to reduce the loss combines the hidden states from two LSTMs by
between X and Y which are given in the training dataset. We concatenating the hidden states from the two LSTMs on the
consider X to be the actual value and Y to be the predicted right and left .
value. For cross-entropy, the L2 regularization is used and the C. Parameter Setting
coefficient for L2 regularization contains all parameters like 𝜆
We utilize pre-trained Glove with 300-dimensional word
and 𝜃 as shown in the following equation (18): embedding vectors in this experiment. The multifarious
embedding dimension is set to 100 and 36 with relevant
𝐿2 = − ∑ ∑ 𝑦𝑎𝑏 𝑙𝑜𝑔𝑦̃𝑎𝑏 + 𝜆‖𝜃 ‖2 knowledge which is arbitrarily initiated and revised
(18)
𝑎 𝑏
throughout the training procedure. The total number of hidden
units is set to 200 and dropout is set at 0.5. The number of
IV. EXPERIMENTAL ANALYSIS & RESULTS kernels in CNN is set to 100 and the kernel sizes are set to 2,
3, and 4. The coefficient for L2 regularization is set at 1e-5.
A. Datasets Adam optimizer with the lowest learning rate of 0.001 is
TABLE I. DETAILED STATISTICS ABOUT DATASETS chosen. The total number of epochs is set to 25 with a batch
size of 32 each. Finally, we chose accuracy as the evaluation
metric.
D. Results
To measure the effectiveness of our DSN model, a few
modern methods were found and reported in the literature. As
observed in Table 2, DSN outperforms conventional models
on both the Laptop and Restaurant datasets. T-LSTM is the
poorest, despite using the conventional LSTM to gather
target-based information in prior and subsequent contexts.
However, the target information is ignored. ATAE-LSTM
outperforms TD-LSTM because it includes an attention
We perform our evaluations on various benchmark mechanism to ensure that the model prioritizes the significant
SemEval [35]–[39] datasets, which have many customer target information in a particular text. IAN is superior to TD-
reviews in the Restaurants and Laptops sectors. Additionally, LSTM and ATAE-LSTM because it employs an interactive
we also evaluated the real-time cancer tweets dataset to attention mechanism to acquire the relationships among
analyze the mood of the cancer patient in the medical domain. contextual and target words. CGAE combines a CNN and
There are four sentiment polarities in the SemEval datasets GRU to acquire significant multi-granular n-gram aspects.

ed licensed use limited to: Vignan's Foundation for Science Technology & Research (Deemed to be University). Downloaded on January 19,2024 at 07:45:36 UTC from IEEE Xplore. Restriction
Consequently, its efficiency is much better than that of earlier REFERENCES
LSTM versions. This model concurrently employs CNN and [1] L. Jiang, M. Yu, M. Zhou, X. Liu, and T. Zhao, “Target-dependent
a memory network to discover the quality lexical and Twitter sentiment classification,” in ACL-HLT 2011 - Proceedings
pragmatic features associated with targets. CMA-MemNet has of the 49th Annual Meeting of the Association for Computational
surpassed CGAE in performance. Because of its separate Linguistics: Human Language Technologies, 2011, vol. 1.
modeling of aspect-based information and interactive [2] L. Dong, F. Wei, C. Tan, D. Tang, M. Zhou, and K. Xu, “Adaptive
Recursive Neural Network for Target-dependent Twitter
learning, IGCN performs similarly to CMA-MemNet on Sentiment Classification,” Acl-2014, pp. 49–54, 2014.
Restaurant datasets, while it achieves the highest efficiency [3] M. Esposito, E. Damiano, A. Minutolo, G. de Pietro, and H. Fujita,
than CMA-MemNet on Laptop datasets. DRN is another “Hybrid query expansion using lexical resources and word
novel approach developed for target-specific sentiment embeddings for sentence retrieval in question answering,”
classification. This model performs better than the existing Information Sciences, vol. 514, pp. 88–105, Apr. 2020, doi:
10.1016/J.INS.2019.12.002.
models in all aspects of the benchmark datasets. DRN [4] A. V. Miceli Barone, J. Helcl, R. Sennrich, B. Haddow, and A.
improves the accuracy rate by capturing the inter-sequence Birch, “Deep architectures for Neural Machine Translation,” 2018.
relationships between the contextual aspects based on the doi: 10.18653/v1/w17-4710.
multichannel feature mechanism. [5] K. Cho et al., “Learning phrase representations using RNN
Thus, the proposed DSN model learns POS and position encoder-decoder for statistical machine translation,” 2014. doi:
10.3115/v1/d14-1179.
information as supplementary input along with embedding
[6] D. Tang, B. Qin, X. Feng, and T. Liu, “Effective LSTMs for target-
representations into Bi-LSTM that exhibits superior dependent sentiment classification,” COLING 2016 - 26th
performance in extracting task-relevant sequential International Conference on Computational Linguistics,
information. Simultaneously, CNN is also used to capture Proceedings of COLING 2016: Technical Papers, pp. 3298–3307,
high-quality syntactic information. Thus, the proposed model 2016.
surpasses all baseline approaches on the various datasets [7] S. Ruder, P. Ghaffari, and J. G. Breslin, “A hierarchical model of
reviews for aspect-based sentiment analysis,” 2016. doi:
utilized in this study. 10.18653/v1/d16-1103.
TABLE II. ACCURACY COMPARISON OF PROPOSED DSN MODEL WITH [8] Y. Wang, M. Huang, L. Zhao, and X. Zhu, “Attention-based
VARIOUS STATE -OF-ART MODELS ON DIFFERENT DATASETS LSTM for aspect-level sentiment classification,” EMNLP 2016 -
Conference on Empirical Methods in Natural Language
Processing, Proceedings, pp. 606–615, 2016, doi:
10.18653/v1/d16-1058.
[9] Q. Zhang, R. Lu, Q. Wang, Z. Zhu, and P. Liu, “Interactive Multi-
Head Attention Networks for Aspect-Level Sentiment
Classification,” IEEE Access, vol. 7, 2019, doi:
10.1109/ACCESS.2019.2951283.
[10] T. Chen, R. Xu, Y. He, and X. Wang, “Improving sentiment
analysis via sentence type classification using BiLSTM-CRF and
CNN,” Expert Systems with Applications, vol. 72. pp. 221–230,
2017. doi: 10.1016/j.eswa.2016.10.065.
[11] A. U. Rehman, A. K. Malik, B. Raza, and W. Ali, “A Hybrid CNN-
LSTM Model for Improving Accuracy of Movie Reviews
Sentiment Analysis,” Multimedia Tools and Applications, 2019,
doi: 10.1007/s11042-019-07788-7.
[12] B. Guo, C. Zhang, J. Liu, and X. Ma, “Improving text classification
with weighted word embeddings via a multi-channel TextCNN
V. CONCLUSION model,” Neurocomputing, vol. 363, pp. 366–374, 2019, doi:
In this paper, a novel deep-learning framework is proposed 10.1016/j.neucom.2019.07.052.
[13] M. Hu and B. Liu, “Mining and summarizing customer reviews,”
to perform target-specific sentiment classification. DSN 2004. doi: 10.1145/1014052.1014073.
combines position and POS knowledge into embedding [14] F. Yang, C. Du, and L. Huang, “Ensemble sentiment analysis
representations and employs CNN and Bi-LSTM jointly to method based on R-CNN and C-RNN with fusion gate,”
capture the sequential aspects as well as the high-level spatial International Journal of Computers, Communications and
and temporal features. In addition, we also designed a novel Control, 2019, doi: 10.15837/ijccc.2019.2.3375.
[15] J. P. C. Chiu and E. Nichols, “Named Entity Recognition with
attention network to model the inter mutual relations between Bidirectional LSTM-CNNs,” Trans Assoc Comput Linguist, 2016,
the target and contextual aspects. The effectiveness of the doi: 10.1162/tacl_a_00104.
DSN was proved by the achieved experimental results on the [16] Y. Ma, H. Peng, T. Khan, E. Cambria, and A. Hussain, “Sentic
various domains of the benchmark datasets that have been LSTM: a Hybrid Network for Targeted Aspect-Based Sentiment
used in this study. Furthermore, the results also state that the Analysis,” Cognitive Computation, 2018, doi: 10.1007/s12559-
performance of the TSC can be effectively rises by combining 018-9549-x.
[17] S. Wen et al., “Memristive LSTM Network for Sentiment
POS and position representations as the essential components. Analysis,” IEEE Transactions on Systems, Man, and Cybernetics:
However, the DSN does not consider the explicit relationship Systems, 2019, doi: 10.1109/tsmc.2019.2906098.
between position information and POS information, since we [18] D. Rao and D. Ravichandran, “Semi-supervised polarity lexicon
are primarily concerned with how they perform. In the induction,” 2009. doi: 10.3115/1609067.1609142.
immediate future, we will investigate how to model the inter- [19] W. Xue and T. Li, “Aspect based sentiment analysis with gated
convolutional networks,” ACL 2018 - 56th Annual Meeting of the
sequence relationships effectively between the contextual Association for Computational Linguistics, Proceedings of the
features and use more coherent and incisive pre-trained Conference (Long Papers), vol. 1, pp. 2514–2523, 2018, doi:
frameworks to improve the efficiency of TSC. 10.18653/v1/p18-1234.
[20] Y. Ma, H. Peng, and E. Cambria, “Targeted aspect-based
sentiment analysis via embedding commonsense knowledge into
an attentive LSTM,” 2018.
[21] D. Tang, B. Qin, and T. Liu, “Aspect level sentiment classification
with deep memory network,” 2016. doi: 10.18653/v1/d16-1021.

ed licensed use limited to: Vignan's Foundation for Science Technology & Research (Deemed to be University). Downloaded on January 19,2024 at 07:45:36 UTC from IEEE Xplore. Restriction
[22] X. Li and W. Lam, “Deep multi-task learning for aspect term classification,” Knowledge-Based Systems, vol. 193, no. xxxx,
extraction with memory interaction,” 2017. doi: 10.18653/v1/d17- 2020, doi: 10.1016/j.knosys.2019.105443.
1310.
[23] S. Gu, L. Zhang, Y. Hou, and Y. Song, “A position-aware
bidirectional attention network for aspect-level sentiment
analysis,” Proceedings of the 27th International Conference on
Computational Linguistics, pp. 774–784, 2018.
[24] G. Wang, Z. Zhang, J. Sun, S. Yang, and C. A. Larson, “POS-RS:
A Random Subspace method for sentiment classification based on
part-of-speech analysis,” Information Processing and
Management, 2015, doi: 10.1016/j.ipm.2014.09.004.
[25] O. Das and R. Chandra Balabantaray, “Sentiment Analysis of
Movie Reviews using POS tags and Term Frequencies,”
International Journal of Computer Applications, vol. 96, no. 25,
pp. 36–41, 2014, doi: 10.5120/16952-7048.
[26] J. Pennington, R. Socher, and C. D. Manning, “GloVe: Global
vectors for word representation,” 2014. doi: 10.3115/v1/d14-1162.
[27] D. C. Edara, L. P. Vanukuri, V. Sistla, and V. K. K. Kolli,
“Sentiment analysis and text categorization of cancer medical
records with LSTM,” Journal of Ambient Intelligence and
Humanized Computing, no. 0123456789, 2019, doi:
10.1007/s12652-019-01399-8.
[28] G. Xu, Y. Meng, X. Qiu, Z. Yu, and X. Wu, “Sentiment analysis
of comment texts based on BiLSTM,” IEEE Access, 2019, doi:
10.1109/ACCESS.2019.2909919.
[29] M. Rhanoui, M. Mikram, S. Yousfi, and S. Barzali, “A CNN-
BiLSTM Model for Document-Level Sentiment Analysis,”
Machine Learning and Knowledge Extraction, 2019, doi:
10.3390/make1030048.
[30] S. Poria, E. Cambria, and A. Gelbukh, “Aspect extraction for
opinion mining with a deep convolutional neural network,”
Knowledge-Based Systems, vol. 108, pp. 42–49, 2016, doi:
10.1016/j.knosys.2016.06.009.
[31] C. N. dos Santos and M. Gatti, “Deep Convolutional Neural
Networks for Sentiment Analysis of Short Texts,” Coling-2014,
pp. 69–78, 2014.
[32] S. Poria, E. Cambria, and A. Gelbukh, “Aspect Extraction for
Opinion Miningwith a Deep Convolutional Neural Network,”
Knowledge-Based Systems, vol. 108, pp. 42–49, 2016, doi:
10.1016/j.knosys.2016.06.009.
[33] A. Severyn and A. Moschitti, “Twitter Sentiment Analysis with
Deep Convolutional Neural Networks,” pp. 1–12, doi:
10.1145/2766462.2767830.
[34] A. Vaswani et al., “Attention Is All You Need,” Advances in
Neural Information Processing Systems, vol. 2017-December, pp.
5999–6009, Jun. 2017, Accessed: Dec. 06, 2021. [Online].
Available: https://arxiv.org/abs/1706.03762v5
[35] S. Rosenthal, P. Nakov, S. Kiritchenko, S. M. Mohammad, A.
Ritter, and V. Stoyanov, “SemEval-2015 Task 10: Sentiment
Analysis in Twitter,” Proceedings of the 9th International
Workshop on Semantic Evaluation (SemEval 2015), no. SemEval,
pp. 451–463, 2015, [Online]. Available:
http://alt.qcri.org/semeval2015/cdrom/pdf/SemEval078.pdf
[36] K. Cortis et al., “SemEval-2017 Task 5: Fine-Grained Sentiment
Analysis on Financial Microblogs and News,” Proceedings of the
11th International Workshop on Semantic Evaluation (SemEval-
2017), pp. 519–535, 2017, [Online]. Available:
http://www.aclweb.org/anthology/S17-2089
[37] H. Hamdan, “SentiSys at SemEval-2016 Task 4 : Feature-Based
System for Sentiment Analysis in Twitter,” pp. 195–202, 2016.
[38] J. Wagner et al., “DCU: Aspect-based Polarity Classification for
SemEval Task 4,” no. SemEval, pp. 223–229, 2015, doi:
10.3115/v1/s14-2036.
[39] S. Kiritchenko, X. Zhu, C. Cherry, and S. M. Mohammad, “NRC-
Canada-2014: Detecting Aspects and Sentiment in Customer
Reviews,” 2014. doi: 10.3115/v1/s14-2076.
[40] A. Kumar, V. T. Narapareddy, V. Aditya Srikanth, L. B. M. Neti,
and A. Malapati, “Aspect-based sentiment classification using
interactive gated convolutional network,” IEEE Access, vol. 8,
2020, doi: 10.1109/ACCESS.2020.2970030.
[41] Y. Zhang, B. Xu, and T. Zhao, “Convolutional multi-head self-
attention on memory for aspect sentiment classification,”
IEEE/CAA Journal of Automatica Sinica, vol. 7, no. 4, 2020, doi:
10.1109/JAS.2020.1003243.
[42] P. Zhao, L. Hou, and O. Wu, “Modeling sentiment dependencies
with graph convolutional networks for aspect-level sentiment

ed licensed use limited to: Vignan's Foundation for Science Technology & Research (Deemed to be University). Downloaded on January 19,2024 at 07:45:36 UTC from IEEE Xplore. Restriction

You might also like