Acuan CNN + M-LSTM 2

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Sentiment Analysis using Word2vec-CNN-BiLSTM

Classification
Wang Yue* and Lei Li†
*Li Laboratory, Graduate School
of Science of Engineering, Hosei
University, 3-7-2 Kajinocho,
Koganei-shi, Tokyo 184-8584,
Japan
2020 Seventh International Conference on Social Networks Analysis, Management and Security (SNAMS) | 978-0-7381-1180-3/20/$31.00 ©2020 IEEE | DOI: 10.1109/SNAMS52053.2020.9336549

Email: yue.wang.4q@stu.hosei.ac.jp


Department of Science and Engineering, Hosei University, 3-7-2
Kajinocho Koganei-shi, Tokyo 184-8584, Japan
Email: lilei@hosei.ac.jp

Abstract— Traditional neural network based short CNN, 0.7983 for LSTM, 0.8569 for BiLSTM, 0.8744 for
text classification algorithms for sentiment classification CNN-LSTM, and 0.9148 for the model proposed in this
is easy to find the errors. In order to solve this problem, paper. It can be seen from the results that the Word2vec-
the Word Vector Model (Word2vec), Bidirectional CNN-BiLSTM model is superior to other models.
Long-term and Short-term Memory networks (BiLSTM) The rest of the paper is as follows: Section 2 reviews the
and convolutional neural network (CNN) are combined. previous research. Section 3 describes the principles and
The experiment shows that the accuracy of CNN- details of the proposed model. Section 4 gives the relevant
BiLSTM model associated with Word2vec word work, section 5 gives the experimental analysis, section 6
embedding achieved 91.48%. This proves that the gives the future works, and section 7 gives the conclusion.
hybrid network model performs better than the single
structure neural network in short text. II. PREVIOUS RESEARCH
KIM Y [4] used CNN for text feature extraction for text
Keywords — sentiment analysis, CNN, BiLSTM, classification for the first time.
Word2vec, text classification Shen et al. [5] proposed a special design that combines
the CNN and BiLSTM models for optimal performance.
I. INTRODUCTION They found that this combination gave an accuracy of
With the rapid development of the Internet, the number of 89.7%, better than the accuracy of either model individually.
Internet users increases sharply. When people are surfing Yenter and Verma [6] proposed a CNN-LSTM model for
the Internet, a large amount of information is generated, opinion analysis from the IMDB database. This work differs
which reflects people's views and attitudes, and contains from the previous one as they concatenated the results after
great commercial and social value [1]. Text is still an the application of the LSTM layer. This model achieved
important way for people to produce and obtain information. 89% accuracy.
The emotional classification of the short text is conducive to The main work of this paper is as follows: (1) Proposing a
complete the user's push service, improve user experience mix model of short text calculation based on Word2vec,
better. CNN and BiLSTM; (2) The results of emotion analysis were
In recent years, the analysis of text affective tendency has used as samples to realize the parameter training of
attracted the attention of many scholars and has become a word2vec-CNN-BiLSTM model; (3) Comparing the
hot topic. With the rise of neural network, its related accuracy results of multiple models on Quora data set.
algorithms show a higher classification effect in text III. METHODOLOGY
classification. The emotional detection and classification of
text words, sentences and documents to a group of emotions A. CNN
by using psychological models. Therefore, a great practical Convolutional Neural Networks (CNN) is a feed-forward
significance to analyze the emotional information contained neural network with convolution structure. It USES the
in the text and classify the text emotionally. convolutional neural network to extract local feature
Sentiment analysis [2] is a set of linguistic operations vectors. The gradient descent method is used to minimize
belonging to the automatic processing of natural language. the loss of the weight parameters in the network layer by
It’s objective to identify the sentiment expressed in the text layer feedback adjustment, through iterative training to
and to predict its polarity (positive or negative) towards a improve the accuracy of the network. The convolutional
given subject [3]. neural network adopted in this paper is mainly composed of
In this paper, for addressing the individual weaknesses input layer, convolutional layer, pooling layer, fully
and leverage the distinct advantages of LSTM and CNN, we connected layer and output layer.
propose a Word2vec-BiLSTM-CNN hybrid model that
classifies text using Internet Quora dataset. B. BiLSTM
The results were obtained by fixing the amount of data Bi-directional Long Short-Term Memory (BiLSTM) is a
and evaluating the performance of each training epoch. The combination of directional Long short-term Memory for the
accuracy under the Word2vec embedding is 0.8125 for forward LSTM and the backward LSTM. The bi-directional
Long short-term Memory is well suited for sequential

Authorized licensed use limited to: Dalhousie University. Downloaded on June 18,2021 at 22:26:27 UTC from IEEE Xplore. Restrictions apply.
labeling tasks that relate to the up and down. Therefore, it is A. Pre-processing Part
often used to model context information in NLP [7]. In this stage, data cleansing and pre-processing are
If we want the expression of a sentence, we can combine performed. Then, distributed document representation using
the expression of a sentence on the basis of the expression Doc2Vec embedding is applied to prepare data for
of a word, then we can use the method of addition, the convolution. The resulting vector is passed as an input to the
expression of all words for the sum, or take the average next stage.
method. Using the LSTM model can better capture longer
B. Convolution Part
distance dependencies. Because the LSTM can learn what to
In this stage, convolution and max pooling layers are
remember and what to forget through the training process.
applied for feature extraction to extract high level features.
But there is a problem with modeling sentences using the
The output of this stage is the input of the next stage.
LSTM can't encode information from back to front.
BiLSTM is a better way to capture bi-directional semantic
dependencies.
C. Word2vec
Word2vec is one of the approaches of Word Embedding.
A new word embedding method was proposed by Mikolov
in Google in 2013 [8].
Word2vec has two training modes, CBOW (Continuous
Bag-of-Words Model) and Skip-gram (Continuous Skip-
gram Mode). Fig.1 is a simple explanation:
1) CBOW
The current value is predicted from the context. It's
equivalent to taking a word out of a sentence and asking you
to guess what it is [9].

Fig. 2. CNN-BiLSTM general architecture

Fig. 1. Word2vec structure chart.


C. BiLSTM/Fully Connected Part
In this stage, BiLSTM and fully connected layers are
applied for document sentiment classification. The output of
2) Skip-gram
this stage is the final classification of the document (as
Use the current word to predict context. It's like giving
positive, negative or neutral) [13].
you a word and asking you to guess what might come before
and after [10].
D. Algorithm
IV. RELATED WORK ⑴
This paper presents a BiLSTM-CNN based on Word2vec, ⑵
BiLSTM and CNN were adopted as the hybrid neural ⑶
network model which is built and used to classify the ⑷
emotion of the short text. The combination of CNN and ⑸
RNN models requires particular form. Since each model

has a specific architecture and its own strengths:
CNN is known for its ability to extract as many features ⑺
as possible from the text [11]. BiLSTM keeps the
chronological order between words in a document, so it has In formula ⑴ to ⑶ , The output vector of CNN network
the ability to ignore unnecessary words using the delete gate structure is expressed as vector [p1, p2,... pn].
[12]. The characteristic obtained by convolution is firstly
Hence, we propose the following architecture composed nonlinear transformed by tanu activation function to
of three parts, which are described in more detail below obtain , and nonlinear factors are added to the output
(Figure 2): characteristic to improve the expression ability of the
model.

Authorized licensed use limited to: Dalhousie University. Downloaded on June 18,2021 at 22:26:27 UTC from IEEE Xplore. Restrictions apply.
SoftMax function obtains the attention weight of each A. The Data Set
component , Wcnn= ( 1, 2, ... n), The weight represents
the importance of the word.
Formula weighted sum of the output vectors of CNN
structure and obtained the semantic vector of sentence level
extracted by CNN to represent .
Similarly, the output vector of BiLSTM network structure Fig. 4. Data set example
is expressed as vector [q1, q2... qn],
Formula ⑷ to ⑹ , used tanh activation function to Quora Question Pairs [14,15]: The data is from Quora,
perform a nonlinear transformation on the features obtained which contains 2006 problem pairs, 1256 neutral, 482
by LSTM coding to obtain . positive and 273 negative ones (Table 1).
obtained the attention weight of each component by
using SoftMax function, Wlstm = ( 1, 2,... , n), TABLE I. DATASET DESCRIPTION
Finally, Sbilstm is weighted and summed to the vectors [q1,
THE DATA Sample Sample
q2... qn] output from the BiLSTM structure to obtain the Data size
SET classification size
semantic vector of sentence level extracted by LSTM to
represent . positive 362
Hence, the semantic vector representation extracted from Training
negative 205 1504
CNN and BiLSTM is spliced, that is Formula ⑺ set (75%)
neutral 942
which is used as the input of the later positive 120
matching layer, so as to extract more abundant features by Test set
negative 68 502
combining the respective advantages. (25%)
neutral 314
V. EXPERIMENTAL ANALYSIS
The Word2vec-CNN-BiLSTM Model is mainly divided B. Model Parameters
into the following steps (Fig. 3): Comparing the performance of different configurations for
processing short-text dataset.
Hence, the accuracy was computed according to three
different iterations (i.e., 8, 10 and 20), between two values
of the batch size (i.e., 32 and 64), [16,17] and the optimizers
SGD and Adam (Table 2, the highest value is highlighted in
red).
Accuracy refers to the proportion of correct predictions
made by the model:

From formula⑻, predict labels of , is the actual
label for .The international label [18], N represents the
size of the test set.
Fig. 3. Word2vec-CNN-BiLSTM Model The Word2vec-CNN-BiLSTM model achieved an
accuracy of 91.48% (Table 2 and Fig. 5).
Step 1: Preprocess the short text data and remove the stop
words and low-frequency words in the short text data. TABLE II. WORD2VEC-CNN-BILSTM
Step 2: training data with Word2vec to get the required CONFIGURATIONS
word vectors. Import the preprocessed text data into
Word2vec to obtain the word vector representation of text. WORD BATC
Step 3: Convert the generated word index into the input of EPOCH OPTIMIZE ACCURAC
EMBEDDIN H
CNN through the word embedding layer, then add the G
S
SIZE
R Y
pooling layer structure, and input the data after the pooling
operation into the BiLSTM network layer.
Step 4: Finally add the whole layer and classifier. In this ADAM 81.18%
experiment, data were divided into positive, negative and 32
neutral, and evaluation indexes were calculated. This
experiment evaluates the classification effect of the model SGD 67.72%
from the prediction score and prediction accuracy. 8
WORD2VE
ADAM 82.49%
C
64
SGD 69.53%

10 32 ADAM 84.55%

Authorized licensed use limited to: Dalhousie University. Downloaded on June 18,2021 at 22:26:27 UTC from IEEE Xplore. Restrictions apply.
SGD 71.21% CNN-BiLSTM 91.48%

ADAM 87.03% VI. CONCLUSIONS AND FUTURE WORK


64 Because the research is not very detailed, the accuracy is not
quite in line with expectations, we need to further refine the
SGD 73.40% analysis, such as determining the form of emotional tags,
studying the method of extracting sentiment, and studying
ADAM 88.96% the model of mining or generating comments on sentiment
analysis.
32
Theoretically, with the increase of model depth, the
SGD 75.67% expression ability of the model due to the increase of
20 parameters will be better. For the overfitting phenomenon in
the training process, the future work will balance by
ADAM 91.48%
adjusting the size of dropout and adopting L2 regularization
64 processing.
SGD 78.96%
VII. CONCLUSION
In this paper, a combination of convolutional and
bidirectional recurrent neural networks for short-text
sentiment analysis with Word2vec Embedding was
presented. The combined CNN-BiLSTM model gives good
results, since it benefits from the CNN’s ability to extract
features and the BiLSTM’s characteristic to learn short-term
bidirectional dependencies of the text.
One disadvantage of the current model is that it requires
more training data and training time than the existing
baselines. Even with this limitation, it can be effective in
Fig. 5. CNN-BiLSTM accuracy and loss. classification that requires a lot of training data.
By using Word2vec to put the words with similar
meanings in close places. By training a large amount of
corpus to obtain word vectors, the problem of "polysemy"
C. Comparative Experimental Analysis can be solved. BiLSTM is also used to capture long-distance
In order to verify the classification effect of Word2vec- bidirectional dependencies and encode information from
CNN-BiLSTM combination model proposed in this paper, back to front. In the experiment. It is found that using
the text combination model was compared with a single word2vec-CNN-BiLSTM model to analyze the text will
model. The comparison models are LSTM model, CNN become easier and more accurate, which will greatly
model and CNN-BiLSTM model. The specific experimental improve the accuracy of sentiment classification of short
results in red highlighted are shown in Table3. Compared text.
with the CNN-LSTM model, the accuracy of Word2vec-
CNN-BiLSTM model proposed in this paper is improved by VIII. ACKNOWLEDMENT
about 4.04%, which proves that the Word2vec-CNN-
First and foremost, I would like to show my deepest
BiLSTM model has a better classification effect.
gratitude to my supervisor, Professor Lei Li, a respectable,
The proposed model was the highest at 91.48%, followed
responsible and resourceful scholar, who has provided me
by CNN-BiLSTM, BiLSTM, CNN and LSTM.
with valuable guidance in every stage of the writing of this
thesis. Without his enlightening instruction, impressive
TABLE III. MODEL COMPARISON
kindness and patience, I could not have completed my
Word thesis. His keen and vigorous academic observation
Model Accuracy
Embedding enlightens me not only in this thesis but also in my future
study. I would also like to thank all my teachers who helped
LSTM 79.83% me work out the outline of the paper and gave me many
valuable suggestions. Last but not the least, I' d like to thank
my parents for their encouragement and support.
CNN 81.25%
Word2vec
BiLSTM 85.69% REFERENCES
[1] Ceraj, T.; Kliman, I.; Kutnjak, M. Redefining Cancer Treatment:
Comparison of Word2vec Embeddings Using Deep BiLSTM
CNN-LSTM 87.44% Classification Model; Text Analysis and Retrieval 2019 Course
Project Reports; Faculty of Electrical Engineering and Computing,
University of Zagreb: Zagreb, Croatia, July 2019.

Authorized licensed use limited to: Dalhousie University. Downloaded on June 18,2021 at 22:26:27 UTC from IEEE Xplore. Restrictions apply.
[2] Rehman, A.U.; Malik, A.K.; Raza, B.; Ali, W. A Hybrid CNN-LSTM Algorithms Using Supervised Machine Learning. Int. J. Environ. Res.
Model for. Improving Accuracy of Movie Reviews Sentiment Public Health 2020, 17, 1093.
Analysis. Multimed. Tools Appl. 2019, 78, 26597–26613. [11] Yenter, A.; Verma, A. Deep CNN-LSTM with combined kernels
[3] REN P J, CHEN Z M, REN Z C, et al. Leveraging contextual from multiple branches for IMDb review sentiment analysis. In
sentence relations for extractive summarization using a neural Proceedings of the 2017 IEEE 8th Annual Ubiquitous Computing,
attention model[C]. Proceedings of the 40th International ACM Electronics and Mobile Communication Conference (UEMCON),
SIGIR Conference on Research and Development in Information New York, NY, USA, 19–21 October 2017; pp. 540–546.
Retrieval, 2017: 95-104. [12] Li Dong, Nan Yang, Wenhui Wang, Furu Wei, Xiaodong Liu, Yu
[4] Yoon, J.; Kim, H. Multi-Channel Lexicon Integrated CNN-BiLSTM Wang, Jianfeng Gao, Ming Zhou, and Hsiao-Wuen Hon. 2019.
Models for Sentiment Analysis. In Proceedings of the 29th Conference Unified language model pre-training for natural language
on Computational. Linguistics and Speech Processing (ROCLING understanding and generation. arXiv preprint arXiv:1905.03197.
2017), Taipei, Taiwan, 27–28 November 2017; pp. 244–253. [13] Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; and Sutskever,
[5] Shen, Q.; Wang, Z.; Sun, Y. Sentiment analysis of movie reviews I. 2019. Language models are unsupervised multitask learners.
based on cnn-blstm. In International Conference on Intelligence OpenAI Blog.
Science; Springer: Berlin, Germany, 2017, pp. 164–171. [14] SHANKAR I, NIKHILD. First Quora dataset release: question pairs
[6] Zheng, Z.; Huang, S.; Tu, Z.; DAI, X.-Y.; and CHEN, J. 2019. [EB/OL]. [2019-03-01] https://data.Quora.com/First-Quora-Dataset-
Dynamic past and future for neural machine translation. In EMNLP- Release-Question-Pairs.
IJCNLP. [15] Jasmir, J.; Nurmaini, S.; Malik, R.F.; Abidin, D.Z. Text Classification
[7] Srivastava, S.K.; Singh, S.K.; Suri, J.S. A healthcare text of Cancer Clinical Trials Documents Using Deep Neural Network and
classification system and its performance evaluation: A source of Fine Grained Document Clustering. In Proceedings of the Sriwijaya
better intelligence by characterizing healthcare text. In Cognitive International Conference on Information Technology and Its
Informatics, Computer Modelling, and Cognitive Science; Elsevier Applications (SICONIAN 2019), Palembang, Indonesia, 16
BV: Amsterdam, The Netherlands, 2020; pp. 319–369. November 2019; Atlantis Press: Paris, France, 2020; pp. 396–404.
[8] Kang, M.; Ahn, J.; Lee, K. Opinion mining using ensemble text [16] She, X.; Zhang, D. Text Classification Based on Hybrid CNN-LSTM
hidden Markov models for text classification. Expert Syst. Appl. 2018, Hybrid Model. In Proceedings of the 2018 11th International
94, 218–227. Symposium on Computational Intelligence and Design (ISCID),
[9] Li, P.; Zhao, F.; Li, Y.; Zhu, Z. Law text classification using semi- Hangzhou, China, 8–9 December 2018; Volume 2, pp. 185–189.
supervised convolutional neural networks. In Proceedings of the 2018 [17] Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G.S.; Dean, J.
Chinese Control and Decision Conference (CCDC), Institute of Distributed representations of words and phrases and their
Electrical and Electronics Engineers (IEEE), Shenyang, China, 9–11 compositionality. In Proceedings of the Advances in Neural
June 2018; pp. 309–313. Information Processing Systems, Tahoe, NV, USA, 5–10 December
[10] Seguí, F.L.; Aguilar, R.A.E.; De Maeztu, G.; García-Altés, A.; 2013; pp. 3111–3119.
Garcia-Cuyàs, F.; Walsh, S.; Castro, M.S.; Vidal-Alaball, J. [18] Yousfi, S.; Rhanoui, M.; Mikram, M. Comparative Study of CNN and
Teleconsultations between Patients and Healthcare Professionals in RNN For Opinion Mining in Long Text. In Proceeding of the
Primary Care in Catalonia: The Evaluation of Text Classification International Conference on Modern Intelligent Systems Concepts,
Rabat, Morocco, 12–13 December 2018.

Authorized licensed use limited to: Dalhousie University. Downloaded on June 18,2021 at 22:26:27 UTC from IEEE Xplore. Restrictions apply.

You might also like