Professional Documents
Culture Documents
Acuan CNN + M-LSTM 2
Acuan CNN + M-LSTM 2
Acuan CNN + M-LSTM 2
Classification
Wang Yue* and Lei Li†
*Li Laboratory, Graduate School
of Science of Engineering, Hosei
University, 3-7-2 Kajinocho,
Koganei-shi, Tokyo 184-8584,
Japan
2020 Seventh International Conference on Social Networks Analysis, Management and Security (SNAMS) | 978-0-7381-1180-3/20/$31.00 ©2020 IEEE | DOI: 10.1109/SNAMS52053.2020.9336549
Email: yue.wang.4q@stu.hosei.ac.jp
†
Department of Science and Engineering, Hosei University, 3-7-2
Kajinocho Koganei-shi, Tokyo 184-8584, Japan
Email: lilei@hosei.ac.jp
Abstract— Traditional neural network based short CNN, 0.7983 for LSTM, 0.8569 for BiLSTM, 0.8744 for
text classification algorithms for sentiment classification CNN-LSTM, and 0.9148 for the model proposed in this
is easy to find the errors. In order to solve this problem, paper. It can be seen from the results that the Word2vec-
the Word Vector Model (Word2vec), Bidirectional CNN-BiLSTM model is superior to other models.
Long-term and Short-term Memory networks (BiLSTM) The rest of the paper is as follows: Section 2 reviews the
and convolutional neural network (CNN) are combined. previous research. Section 3 describes the principles and
The experiment shows that the accuracy of CNN- details of the proposed model. Section 4 gives the relevant
BiLSTM model associated with Word2vec word work, section 5 gives the experimental analysis, section 6
embedding achieved 91.48%. This proves that the gives the future works, and section 7 gives the conclusion.
hybrid network model performs better than the single
structure neural network in short text. II. PREVIOUS RESEARCH
KIM Y [4] used CNN for text feature extraction for text
Keywords — sentiment analysis, CNN, BiLSTM, classification for the first time.
Word2vec, text classification Shen et al. [5] proposed a special design that combines
the CNN and BiLSTM models for optimal performance.
I. INTRODUCTION They found that this combination gave an accuracy of
With the rapid development of the Internet, the number of 89.7%, better than the accuracy of either model individually.
Internet users increases sharply. When people are surfing Yenter and Verma [6] proposed a CNN-LSTM model for
the Internet, a large amount of information is generated, opinion analysis from the IMDB database. This work differs
which reflects people's views and attitudes, and contains from the previous one as they concatenated the results after
great commercial and social value [1]. Text is still an the application of the LSTM layer. This model achieved
important way for people to produce and obtain information. 89% accuracy.
The emotional classification of the short text is conducive to The main work of this paper is as follows: (1) Proposing a
complete the user's push service, improve user experience mix model of short text calculation based on Word2vec,
better. CNN and BiLSTM; (2) The results of emotion analysis were
In recent years, the analysis of text affective tendency has used as samples to realize the parameter training of
attracted the attention of many scholars and has become a word2vec-CNN-BiLSTM model; (3) Comparing the
hot topic. With the rise of neural network, its related accuracy results of multiple models on Quora data set.
algorithms show a higher classification effect in text III. METHODOLOGY
classification. The emotional detection and classification of
text words, sentences and documents to a group of emotions A. CNN
by using psychological models. Therefore, a great practical Convolutional Neural Networks (CNN) is a feed-forward
significance to analyze the emotional information contained neural network with convolution structure. It USES the
in the text and classify the text emotionally. convolutional neural network to extract local feature
Sentiment analysis [2] is a set of linguistic operations vectors. The gradient descent method is used to minimize
belonging to the automatic processing of natural language. the loss of the weight parameters in the network layer by
It’s objective to identify the sentiment expressed in the text layer feedback adjustment, through iterative training to
and to predict its polarity (positive or negative) towards a improve the accuracy of the network. The convolutional
given subject [3]. neural network adopted in this paper is mainly composed of
In this paper, for addressing the individual weaknesses input layer, convolutional layer, pooling layer, fully
and leverage the distinct advantages of LSTM and CNN, we connected layer and output layer.
propose a Word2vec-BiLSTM-CNN hybrid model that
classifies text using Internet Quora dataset. B. BiLSTM
The results were obtained by fixing the amount of data Bi-directional Long Short-Term Memory (BiLSTM) is a
and evaluating the performance of each training epoch. The combination of directional Long short-term Memory for the
accuracy under the Word2vec embedding is 0.8125 for forward LSTM and the backward LSTM. The bi-directional
Long short-term Memory is well suited for sequential
Authorized licensed use limited to: Dalhousie University. Downloaded on June 18,2021 at 22:26:27 UTC from IEEE Xplore. Restrictions apply.
labeling tasks that relate to the up and down. Therefore, it is A. Pre-processing Part
often used to model context information in NLP [7]. In this stage, data cleansing and pre-processing are
If we want the expression of a sentence, we can combine performed. Then, distributed document representation using
the expression of a sentence on the basis of the expression Doc2Vec embedding is applied to prepare data for
of a word, then we can use the method of addition, the convolution. The resulting vector is passed as an input to the
expression of all words for the sum, or take the average next stage.
method. Using the LSTM model can better capture longer
B. Convolution Part
distance dependencies. Because the LSTM can learn what to
In this stage, convolution and max pooling layers are
remember and what to forget through the training process.
applied for feature extraction to extract high level features.
But there is a problem with modeling sentences using the
The output of this stage is the input of the next stage.
LSTM can't encode information from back to front.
BiLSTM is a better way to capture bi-directional semantic
dependencies.
C. Word2vec
Word2vec is one of the approaches of Word Embedding.
A new word embedding method was proposed by Mikolov
in Google in 2013 [8].
Word2vec has two training modes, CBOW (Continuous
Bag-of-Words Model) and Skip-gram (Continuous Skip-
gram Mode). Fig.1 is a simple explanation:
1) CBOW
The current value is predicted from the context. It's
equivalent to taking a word out of a sentence and asking you
to guess what it is [9].
Authorized licensed use limited to: Dalhousie University. Downloaded on June 18,2021 at 22:26:27 UTC from IEEE Xplore. Restrictions apply.
SoftMax function obtains the attention weight of each A. The Data Set
component , Wcnn= ( 1, 2, ... n), The weight represents
the importance of the word.
Formula weighted sum of the output vectors of CNN
structure and obtained the semantic vector of sentence level
extracted by CNN to represent .
Similarly, the output vector of BiLSTM network structure Fig. 4. Data set example
is expressed as vector [q1, q2... qn],
Formula ⑷ to ⑹ , used tanh activation function to Quora Question Pairs [14,15]: The data is from Quora,
perform a nonlinear transformation on the features obtained which contains 2006 problem pairs, 1256 neutral, 482
by LSTM coding to obtain . positive and 273 negative ones (Table 1).
obtained the attention weight of each component by
using SoftMax function, Wlstm = ( 1, 2,... , n), TABLE I. DATASET DESCRIPTION
Finally, Sbilstm is weighted and summed to the vectors [q1,
THE DATA Sample Sample
q2... qn] output from the BiLSTM structure to obtain the Data size
SET classification size
semantic vector of sentence level extracted by LSTM to
represent . positive 362
Hence, the semantic vector representation extracted from Training
negative 205 1504
CNN and BiLSTM is spliced, that is Formula ⑺ set (75%)
neutral 942
which is used as the input of the later positive 120
matching layer, so as to extract more abundant features by Test set
negative 68 502
combining the respective advantages. (25%)
neutral 314
V. EXPERIMENTAL ANALYSIS
The Word2vec-CNN-BiLSTM Model is mainly divided B. Model Parameters
into the following steps (Fig. 3): Comparing the performance of different configurations for
processing short-text dataset.
Hence, the accuracy was computed according to three
different iterations (i.e., 8, 10 and 20), between two values
of the batch size (i.e., 32 and 64), [16,17] and the optimizers
SGD and Adam (Table 2, the highest value is highlighted in
red).
Accuracy refers to the proportion of correct predictions
made by the model:
⑻
From formula⑻, predict labels of , is the actual
label for .The international label [18], N represents the
size of the test set.
Fig. 3. Word2vec-CNN-BiLSTM Model The Word2vec-CNN-BiLSTM model achieved an
accuracy of 91.48% (Table 2 and Fig. 5).
Step 1: Preprocess the short text data and remove the stop
words and low-frequency words in the short text data. TABLE II. WORD2VEC-CNN-BILSTM
Step 2: training data with Word2vec to get the required CONFIGURATIONS
word vectors. Import the preprocessed text data into
Word2vec to obtain the word vector representation of text. WORD BATC
Step 3: Convert the generated word index into the input of EPOCH OPTIMIZE ACCURAC
EMBEDDIN H
CNN through the word embedding layer, then add the G
S
SIZE
R Y
pooling layer structure, and input the data after the pooling
operation into the BiLSTM network layer.
Step 4: Finally add the whole layer and classifier. In this ADAM 81.18%
experiment, data were divided into positive, negative and 32
neutral, and evaluation indexes were calculated. This
experiment evaluates the classification effect of the model SGD 67.72%
from the prediction score and prediction accuracy. 8
WORD2VE
ADAM 82.49%
C
64
SGD 69.53%
10 32 ADAM 84.55%
Authorized licensed use limited to: Dalhousie University. Downloaded on June 18,2021 at 22:26:27 UTC from IEEE Xplore. Restrictions apply.
SGD 71.21% CNN-BiLSTM 91.48%
Authorized licensed use limited to: Dalhousie University. Downloaded on June 18,2021 at 22:26:27 UTC from IEEE Xplore. Restrictions apply.
[2] Rehman, A.U.; Malik, A.K.; Raza, B.; Ali, W. A Hybrid CNN-LSTM Algorithms Using Supervised Machine Learning. Int. J. Environ. Res.
Model for. Improving Accuracy of Movie Reviews Sentiment Public Health 2020, 17, 1093.
Analysis. Multimed. Tools Appl. 2019, 78, 26597–26613. [11] Yenter, A.; Verma, A. Deep CNN-LSTM with combined kernels
[3] REN P J, CHEN Z M, REN Z C, et al. Leveraging contextual from multiple branches for IMDb review sentiment analysis. In
sentence relations for extractive summarization using a neural Proceedings of the 2017 IEEE 8th Annual Ubiquitous Computing,
attention model[C]. Proceedings of the 40th International ACM Electronics and Mobile Communication Conference (UEMCON),
SIGIR Conference on Research and Development in Information New York, NY, USA, 19–21 October 2017; pp. 540–546.
Retrieval, 2017: 95-104. [12] Li Dong, Nan Yang, Wenhui Wang, Furu Wei, Xiaodong Liu, Yu
[4] Yoon, J.; Kim, H. Multi-Channel Lexicon Integrated CNN-BiLSTM Wang, Jianfeng Gao, Ming Zhou, and Hsiao-Wuen Hon. 2019.
Models for Sentiment Analysis. In Proceedings of the 29th Conference Unified language model pre-training for natural language
on Computational. Linguistics and Speech Processing (ROCLING understanding and generation. arXiv preprint arXiv:1905.03197.
2017), Taipei, Taiwan, 27–28 November 2017; pp. 244–253. [13] Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; and Sutskever,
[5] Shen, Q.; Wang, Z.; Sun, Y. Sentiment analysis of movie reviews I. 2019. Language models are unsupervised multitask learners.
based on cnn-blstm. In International Conference on Intelligence OpenAI Blog.
Science; Springer: Berlin, Germany, 2017, pp. 164–171. [14] SHANKAR I, NIKHILD. First Quora dataset release: question pairs
[6] Zheng, Z.; Huang, S.; Tu, Z.; DAI, X.-Y.; and CHEN, J. 2019. [EB/OL]. [2019-03-01] https://data.Quora.com/First-Quora-Dataset-
Dynamic past and future for neural machine translation. In EMNLP- Release-Question-Pairs.
IJCNLP. [15] Jasmir, J.; Nurmaini, S.; Malik, R.F.; Abidin, D.Z. Text Classification
[7] Srivastava, S.K.; Singh, S.K.; Suri, J.S. A healthcare text of Cancer Clinical Trials Documents Using Deep Neural Network and
classification system and its performance evaluation: A source of Fine Grained Document Clustering. In Proceedings of the Sriwijaya
better intelligence by characterizing healthcare text. In Cognitive International Conference on Information Technology and Its
Informatics, Computer Modelling, and Cognitive Science; Elsevier Applications (SICONIAN 2019), Palembang, Indonesia, 16
BV: Amsterdam, The Netherlands, 2020; pp. 319–369. November 2019; Atlantis Press: Paris, France, 2020; pp. 396–404.
[8] Kang, M.; Ahn, J.; Lee, K. Opinion mining using ensemble text [16] She, X.; Zhang, D. Text Classification Based on Hybrid CNN-LSTM
hidden Markov models for text classification. Expert Syst. Appl. 2018, Hybrid Model. In Proceedings of the 2018 11th International
94, 218–227. Symposium on Computational Intelligence and Design (ISCID),
[9] Li, P.; Zhao, F.; Li, Y.; Zhu, Z. Law text classification using semi- Hangzhou, China, 8–9 December 2018; Volume 2, pp. 185–189.
supervised convolutional neural networks. In Proceedings of the 2018 [17] Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G.S.; Dean, J.
Chinese Control and Decision Conference (CCDC), Institute of Distributed representations of words and phrases and their
Electrical and Electronics Engineers (IEEE), Shenyang, China, 9–11 compositionality. In Proceedings of the Advances in Neural
June 2018; pp. 309–313. Information Processing Systems, Tahoe, NV, USA, 5–10 December
[10] Seguí, F.L.; Aguilar, R.A.E.; De Maeztu, G.; García-Altés, A.; 2013; pp. 3111–3119.
Garcia-Cuyàs, F.; Walsh, S.; Castro, M.S.; Vidal-Alaball, J. [18] Yousfi, S.; Rhanoui, M.; Mikram, M. Comparative Study of CNN and
Teleconsultations between Patients and Healthcare Professionals in RNN For Opinion Mining in Long Text. In Proceeding of the
Primary Care in Catalonia: The Evaluation of Text Classification International Conference on Modern Intelligent Systems Concepts,
Rabat, Morocco, 12–13 December 2018.
Authorized licensed use limited to: Dalhousie University. Downloaded on June 18,2021 at 22:26:27 UTC from IEEE Xplore. Restrictions apply.