Professional Documents
Culture Documents
Disease Prediction and Early Intervention System Based On Symptom Similarity Analysis
Disease Prediction and Early Intervention System Based On Symptom Similarity Analysis
net/publication/337773541
CITATIONS READS
7 114
3 authors, including:
Peiying Zhang
China University of Petroleum
153 PUBLICATIONS 1,587 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Peiying Zhang on 19 March 2020.
Received November 13, 2019, accepted November 29, 2019, date of publication December 5, 2019,
date of current version December 23, 2019.
Digital Object Identifier 10.1109/ACCESS.2019.2957816
ABSTRACT With the development of computer technology, the electronic of medical data has become a
reality. Now, how to analyze the data sufficiently to predict patient’s disease and conduct early intervention
has become a focused research direction. The patient’s intuitive expression of feelings is also an aspect
that cannot be ignored. Doctors record the pathological characteristics of patients in system. In the paper,
we proposed a sentence similarity model to carry out symptom similarity analysis to achieve elementary
disease prediction and early intervention, which makes use of word embedding and convolutional neural
network (CNN) to extract a sentence vector that contains keyword information about the patient’s feelings
and symptoms. In order to increase the accuracy of sentence similarity computation, this model integrated
syntactic tree and neural network into the computation process. Our main innovation is to use symptom
similarity analysis model for disease prediction and early intervention. In addition, the SPO kernel is also one
of the innovations. Finally, the results of experiment on Microsoft research paraphrase identification (MSRP)
indicated that our model can achieve an excellent performance reached 83.9% in the terms of F1 and accuracy.
Furthermore, we also conducted experiments on the data of the Semantic Textual Similarity task. Pearson
correlation coefficient indicates that our result is closer to the gold standard scores, which illustrates that
it can extract the key information of sentence well to realize the prediction of disease and carry out early
intervention.
INDEX TERMS Convolutional neural network, disease predictions, early intervention, symptom similarity
analysis.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/
176484 VOLUME 7, 2019
P. Zhang et al.: Disease Prediction and Early Intervention System Based on Symptom Similarity Analysis
a fundamental subtask in NLP and it often used to solve used word2vec to obtain the word vector, and calculated the
larger tasks, the model would slow down the proceeding of value of edit distance using the calculated vector distance as
whole task. Therefore, we should devise some novel sentence the weight of the replacement operation. The authors in [5]
representing models to shorten the running time with an aim proposed a model based on improved chunking edit-distance
to solving the real issues. to analyze the sentence similarity, which divides the sentence
For the aforementioned problem, we proposed an effective into NP (noun chunk), VP (verb chunk), ADJP (adjective
model based on CNN for the symptom similarity analysis. chunk) computing the similarity between the corresponding
The main contributions of this model can be summarized as chunks in the sentence pairs. When calculating the sentence
follows: editing distance, the higher the similarity between chunks,
1. Since the raw symptom sentence is not accepted by the lower the replacement cost. Because of the introduction
model, it should be processed and encoded before it input of chunks, the editing distance method makes a further step in
into model. Firstly, in order to reduce the computation burden the extraction of sentence semantics. Liu et al. [6] proposed
and improve the model efficiency, we proposed a kernel for a model based on edit distance and dependency syntax. The
extracting the main information about symptom of the sen- model divides sentences into two layers. The first layer uses
tence to preprocess the sentence. Secondly, word embedding the semantic dictionary to calculate the distance. The second
technology is applied to map words into vectors to form the layer is the dominant component of the predicate central
vector representation of patient’s symptoms. word in the sentence, whose similarity is calculated by edit
2. The task of our convolutional neural network is learning distance. Finally, the two results are combined as sentence
the semantic and syntactic features from the input tensor. similarity. For the three methods mentioned above, in order
Finally, we use Manhattan distance to compute the score of to extract the semantic information of sentences, the weight
sentence similarity. factor is taken into account in the calculation of editing
3. We used different datasets and evaluation criteria to eval- distance.
uate the model, and obtained better results than related work, However, because editing distance only includes replace,
which ensures the accuracy of patient disease prediction. delete and insert operations, the model extraction of sentence
The next section discusses some related work. Section 3 meaning is limited. This is one of the reasons why we intro-
describes the proposed method for measuring sentence sim- duce parsing trees in our model, which makes it possible to
ilarity. In section 4, we show and discuss the experimenta- consider more sentence components.
tion evaluation for our approach. We conclude our work in
Section 5.
B. THE COMPUTATION METHODS OF SENTENCE
II. RELATED WORKS SIMILARITY BASED ON SYNTACTIC ANALYSIS
In this paper, we proposed a disease prediction model based The sentence trunk can reflect the meaning of a sentence,
on symptom similarity analysis. In this model, we use sen- therefore, the extraction of the sentence trunk requires the
tence similarity analysis technology to analyze the similarity use of syntactic relations. Most researches focus on the
of patient’s symptoms. Therefore, the accuracy of sentence dependency between adjacent words. However, in a complex
similarity analysis determines the accuracy of the model to sentence, the main elements of the sentence are not adjacent,
a large extent. Sentence similarity analysis is a basic task which requires us to consider the syntactic relationship when
of natural language processing. Many researchers put for- analyzing sentence similarity. The authors of [7] proposed
ward various models to deal with sentence similarity anal- a model based on syntax and modifiers, which extracts the
ysis. At present, sentence similarity computation methods subject, predicate and object of sentences and preposition as
can be roughly categorized into four types: the models based the trunk of sentences to calculate the similarity. In addition,
on words match, the models based on syntactic analysis, modifiers in sentences are also added to trunk, which makes
the models based on semantic analysis and the models based the judgment result of sentences more close to the human
on neural network. The proposed model is some different judgment value. The authors in [8] proposed a model based
from the aforementioned models in some respects. In the on purely syntactic approach for searching similarities within
following Sections, we analyze them respectively. sentences. In addition, position filter and counting filter are
introduced into the model, which ensures the fast discarding
A. THE COMPUTATION METHODS OF SENTENCE of mismatched sequences. In the QA system, Qiu et al. [9]
SIMILARITY BASED ON EDIT DISTANCE proposed a weighted syntactic analysis model, which con-
Sentence similarity computation method based on edit dis- tains 23 dependent elements, each of which is assigned dif-
tance analyzes sentences from the semantic perspective. ferent weights according to its influence on the meaning of a
However, the disadvantage of this method is the lack of sentence.
semantic understanding of sentences, which is the main rea- Like our model, the subject, predicate and object of a
son for the low accuracy of this method. Lu et al. [4] pro- sentence are considered in all three models. Compared with
posed a method based on word2vec and editing distance, literature [7], the authors of [9] gave a more detailed consider-
which solved the aforesaid problem in some extent. They ation. In our model, in addition to considering syntax, we also
incorporated convolutional neural network into the extraction model displayed in the contextual rare word dataset through
process of sentence features. Spearman coefficient between the human score and cosine
similarity between word vectors. In the work [17], the authors
C. THE COMPUTATION METHODS OF SENTENCE put forward a brand new computational semantic similarity
SIMILARITY BASED ON SEMANTIC ANALYSIS model named Direct Network Transfer (DNT). In previous
Semantic characteristic [10] of sentence plays an impor- work, the task of semantic similarity analysis mainly includes
tant role in the sentence similarity computation, therefore, two types of methods. The first one is to encode each sen-
the scholars at home and abroad use semantic analysis to tence into a fixed-length vector between them. The second
improve the accuracy of sentence similarity computation. major approach is to jointly take both sentences as input,
Li et al. [11] presented an algorithm that takes account of using interaction between sentences to produce the similar-
semantic information and word order information implied ity score. In DNT model, the cosine similarity of sentence
in the sentences to further improve the accuracy of sen- embedding pairs is directly used in the loss function dur-
tence similarity calculation. They constructed the word vector ing transfer learning. Experiments showed that DNT model
dynamically based entirely on the words in the compared performs well in the related works of semantic similarity
sentences instead of high-dimensional space. The semantic analysis.
similarity of two sentences is calculated using information Semantics is closely related to syntax and word order.
from a structured lexical database and from corpus statistics. Compared with the syntactic structure of a sentence, word
The use of corpora enables their models to capture common order has a greater impact on the meaning of the sentence.
language knowledge. At the same time, the model can also be In our model, the extraction of syntax is accomplished by
applied to different domains through corpus switching. Syn- stanford parser. In the extraction of word meaning, we drew
tactic information is obtained through a deep parsing process more attention to the utilization of convolutional neural net-
that finds the phrases in each sentence. The authors of [12] work to learn word meaning with the aim of obtaining an
leveraged semantic dependency relationship analysis to mea- sentence vector containing richer sentence features [18].
sure the degree of sentence similarity, incorporating seman-
tic level and dependency syntactic level into the process of D. THE COMPUTATION METHODS OF SENTENCE
sentence similarity computation, and obtained satisfactory SIMILARITY BASED ON NEURAL NETWORK
experimental results. The literature [13] presented a method The traditional methods of sentence similarity computation
for measuring the semantic similarity between texts using a include the methods based on convolutional neural network
corpus-base measure of word semantic similarity, and intro- (CNN), the method based on recurrent neural network (RNN)
duced a normalized and modified version of longest common and so on. The authors of [19] proposed a CNN enhanced
subsequence (LCS) string matching algorithm. The authors in method with the aim of extracting more effective sentence
[14] presented a novel sentence similarity based on semantic. features. It allows the filters to convolve not only the adjacent
He divided the model into two subsystems, the semantic words in sentences but also skipped words. Sliding convo-
quantification and the semantic extracting function. The main lution makes it possible to extract the trunk of a sentence,
function of the first subsystem is to preprocess sentences and although this method is not very precise. The authors in
determine the values of vectors using the WordNet semantic [20] proposed a novel method based on RNN to capture
tree. The task of the semantic extracting subsystem is to both textual similarity and semantic similarity between two
calculate the score of sentence similarity via the semantic sentences. Kim et al. [21] proposed a convolutional neural
distance evaluated in first subsystem. The authors in [15] add network model containing multiple convolutional kernels to
offset inference algorithm to the anchored packed tree model extract sentence features through repeated convolution oper-
(APT),which takes distributional composition to be a process ations. This method, which increases the number of chan-
of lexeme contextualisation, to infer the exact information in nels in CNN, performs well in sentence feature extraction,
sentences. The central innovation in the compositional theory but it is followed by complex computation. In this model
is that the APT’s type structure enables the precise alignment convolution kernel with different window sizes is used for
of the semantic representation of each of the lexemes being multiple convolution of sentence matrix to extract matrix
composed. The authors shown its effectiveness for semantic elements. Pontes et al. [22] presents a method of analyzing
composition on two benchmark phrase similarity tasks where sentence similarity based on CNN and LSTM. This model
the model achieved state-of-the-art performance. The authors uses the CNN network to investigate every word in the sen-
in [16] proposed a method to acquire semantic feature vectors tence to obtain the local context, and uses the LSTM network
through lacarte embedding, and this method learns semantic to check every word in the sentence to obtain the general
features using corpus context. It effectively solves the prob- context information of the sentence. Different from other
lem that the set of word vectors has a common direction. neural network models, this model considers both general and
Although remove stop-word and remove the top one or top local information, which makes it more sufficient to extract
few principal components are also used to solve this problem, sentence features. The authors in [23] proposed a RNN model
the result is much worse than the model proposed by the incorporating attention model to assess the similarity of sen-
authors. Finally, the authors illustrated the superiority of the tence pairs. In this model, attention mechanism improves the
2) EVALUATION METRICS
In this experiment, we obtained the values of parameters in
the model including true positive (TP), false positive (FP),
true negative (TN) and false negative (FN). Then, these
parameters are calculated to obtain the evaluation criteria of
FIGURE 4. Comparison of different models.
the model including F1, accuracy, recall and precision. The
calculation formula of parameters is as follows:
TP + TN
Accuracy = , (13) Rus et al. [33] used a novel method to judge the seman-
TP + TN + FP + FN
TP tic similarity between sentences, which assumes that the
Precision = , (14) meaning of sentences can be captured by their syntactic
TP + FP
TP components and their dependencies. They refer to syntactic
Recall = , (15) components and their dependencies as feature blocks of
TP + FN
2 × Precision × Recall sentences. According to the algorithm, if a suitable mapping
F1 = , (16) can be found between the feature blocks of two sentences
Precision + Recall
and the words have the same dependencies, we can assume
where TP means that the actual result is positive and the that the score of sentence similarity is higher. Furthermore,
predicted result is positive, TN means that the actual result sentence meaning is also affected by the word content in the
is negative and the predicted result is negative, FP means that feature block.
the actual result is negative and the predicted result is positive, Fernando and Stevenson [34] made efficient use of word
FN means that the actual result is positive and the predicted similarity information obtained from WordNet based on cog-
result is negative. nitive linguistics and proposed a model named matrix simi-
For the STS dataset, we used the similarity score calculated larity method to analyze text similarity, which measures the
by the model and the gold standard given by the offical to similarity of two text segments utilizing semantic similarity.
analyze the degree of correlation using Pearson correlation, Unlike Rus, blacoe considers the semantic impact of all the
and compared it with similar work. words in a sentence. Blacoe used the distribution method to
carry out different combinations of phrases and sentences to
B. PERFORMANCE EVALUATION achieve the modeling of sentences. Although this method is
Experiment 1: We utilize MSRP datasets to train and test very flexible to extract sentence features, it requires access to
the model with the purpose of verifying the performance a very large corpus.
of our model. Compared with the traditional model based Experiment 2: In this experiment, we used Pearson’s cor-
on convolutional neural network, we incorporated syntactic relation between prediction scores and gold standard scores
dependency tree into the model, which can extract the sen- as the evaluation criteria. The gold standard contains a score
tence trunk. It not only contains the main information of the between 0 and 5 for each pair of sentences, with the following
sentence but also reduces the computation of the model effec- interpretation: (0) The two sentences are on different topics.
tively. Therefore, our model outperform than other models as (1) The two sentences are not equivalent, but are on the same
show in FIGURE. 4, where the vertical axis represents dif- topic. (2) The two sentences are not equivalent, but share
ferent models, whereas the horizontal axis represents F1 and some details. (3) The two sentences are roughly equivalent,
accuracy. We not only compare the results of different models but some important information differs / missing. (4) The
in FIGURE. 4, but also analyze the merits and defects of each two sentences are mostly equivalent, but some unimportant
model as follows. details differ. (5) The two sentences are completely equiva-
Hu et al. [32] proposed a model based on convolutional lent, as they mean the same thing. Furthermore, we have taken
neural network for modeling sentences, which embedded the experimental results of some previous models and com-
words in sentence sequences and extracted the features of sen- pared them with our results, which will make our data more
tence by convolution and aggregation. Like most convolution convincing. We selected four models for comparison with our
models, they designed a large feature mapping using convo- proposed model. Among them, RNN [35] and LSTM [36] are
lution units of shared weights and local acceptable domains models that use neural network to process sentence features.
to fully simulate the rich structures in word combinations. RNN and LSTM are good at calculating the similarity of long
TABLE 2. The comparison results: the bold number highlights one of sentences is that they have complete subject predicate and
strongest result in each dataset.
object and the sentences are relatively short. Stanford parser
can analyze the subject predicate and object of a sentence well
to construct a sentence vector. Based on the above discussion,
our model is more suitable for processing sentences with
complete syntactic structure.
Obviously, we can see that the method of Hu worse than other [4] L. Y, ‘‘A method of sentence similarity calculation based on word2vector
three methods besides baseline. The reason is that Hu model and edit distance,’’ Comput. Knowl. Technol., vol. 8, no. 1, pp. 1–4, 2018.
[5] H. Zhang, Z. Yu, L. Shen, J. Guo, and X. Hong, ‘‘Naxi sentence similarity
only considers the influence of words on sentences without calculation based on improved chunking edit-distance,’’ Int. J. Wireless
adding syntactic features. Rus preprocessed sentences with Mobile Comput., vol. 7, no. 1, pp. 48–53, 2014.
feature blocks containing sentences with syntax and depen- [6] Z. J. B. Liu and H. Lin, ‘‘Chinese sentence similarity computing based on
improved edit-distance and dependency grammar,’’ Comput. Appl. Softw.,
dency, which makes a small advance in F1 and accuracy. The vol. 25, no. 7, pp. 33–35, 2008.
persistence of model performance was commendably verified [7] H. Deng, X. Zhu, Q. Li, and Q. Peng, ‘‘Sentence similarity calculation
in experiment 2. In addition, as can be seen from experi- based on syntactic structure and modifier,’’ Comput. Eng., vol. 43, no. 9,
pp. 240–244, Sep. 2017.
ment 3, different combinations of syntactic components have [8] T. P. F. Mandreoli and R. Martoglia, ‘‘A syntactic approach for searching
different extraction effects on sentence features. In this paper, similarities within sentences,’’ in Proc. 11th Int. Conf. Inf., 2002, pp. 1–4.
SPO model is adopted to make F1 extraction 4 percentage [9] C. C. G. Qiu and J. Bu, ‘‘Syntactic impact on sentence similarity measure
points. Meanwhile, it can be learned that the experimental in archive-based QA system,’’ in Proc. Conf. Adv. Knowl. Discovery, 2007,
pp. 2–3.
results of SPO model are better than other models with an [10] J. Zwarts and Y. Winter, ‘‘Vector space semantics: A model-theoretic anal-
accuracy of 75.8%, which shows that SPO model can capture ysis of locative prepositions,’’ J. Log. Lang. Inf., vol. 9, no. 2, pp. 169–211,
sentence meaning better. 2000.
[11] Y. Li, D. Mclean, Z. A. Bandar, D. O. James, and K. Crockett, ‘‘Sentence
similarity based on semantic nets and corpus statistics,’’ IEEE Trans.
V. CONCLUSION Knowl. Data Eng., vol. 18, no. 8, pp. 1138–1150, Aug. 2006.
In this paper, we propose a patient symptom similarity analy- [12] B. Li, T. Liu, and B. Qin, ‘‘Chinese sentence similarity computing based
on semantic dependency relationship analsis,’’ Appl. Res. Comput., vol. 20,
sis model to achieve the initial prediction and early interven- no. 12, pp. 15–17, 2003.
tion of the disease. The model uses the convolution neural [13] A. Islam and D. Inkpen, ‘‘Semantic text similarity using corpus-based
network to extract the main information including symptoms word similarity and string similarity,’’ ACM Trans. Knowl. Discovery Data,
vol. 2, no. 2, pp. 1–25, 2008.
and feelings of the patient in the patient description sentences [14] C. L. Ming, ‘‘A novel sentence similarity measure for semantic-based
to construct the sentence vector and the pooling layer to expert systems,’’ Expert Syst. Appl., vol. 38, no. 5, pp. 6392–6399, 2011.
reduce the dimension of the sentence. The main innovation of [15] T. Kober, J. Weeds, J. Reffin, and D. Weir, ‘‘Improving semantic com-
this paper lies in the preprocessing of sentences and the cal- position with offset inference,’’ in Proc. Comput. Res. Repository, 2017,
pp. 1–9.
culation of similarity score. Firstly, we utilize SPO model to [16] M. Khodak, N. Saunshi, Y. Liang, T. Ma, B. Stewart, and S. Arora, ‘‘A la
extract symptoms information as the input of neural network, carte embedding: Cheap but effective induction of semantic feature vec-
which is of great significance to reduce the computational tors,’’ in Proc. 56th Annu. Meeting Assoc. Comput. Linguistics, Melbourne,
SA, Australia, vol. 1, Jul. 2018, pp. 12–22.
burden of the model and effectively extract the key patholog- [17] R. M. Li zhang and R. S. Wilson, ‘‘Transfer learning of sentence embed-
ical features. Secondly, we used Manhattan distance formula ding for semantic similarity,’’ Assoc. Comput. Linguistics, Stroudsburg,
process the sentence vector output of the model to obtain the PA, USA, Tech. Rep., 2018.
[18] W. Lan and X. Wei, ‘‘Neural network models for paraphrase identifica-
most similar disease prediction results. To certify our model, tion, semantic textual similarity, natural language inference, and question
we develop three experiments: the first experiment shows that answering,’’ in Proc. COLING, 2018, pp. 1–13.
our model perform better than the other four models in terms [19] X. G. J. Guo and B. Yue, ‘‘An enhanced convolutional neural network
model for answer selection,’’ in Proc. Int. World Wide Web Conf. Steering
of accuracy and F1. The second experiment illustrates that Committee, 2017, vol. 7, no. 1, pp. 48–53.
our model can maintain excellent performance continuously. [20] B. Ye, G. Feng, A. Cui, and M. Li, ‘‘Learning question similarity
In addition, the influence of different syntactic combinations with recurrent neural networks,’’ in Proc. IEEE Int. Conf. Big Knowl.,
Aug. 2017, pp. 111–118.
on sentence similarity calculation is demonstrated in the third
[21] Y. Kim, ‘‘Convolutional neural networks for sentence classifica-
experiment. tion,’’ 2014, arXiv:1408.5882. [Online]. Available: https://arxiv.org/abs/
In the future work, we will according to the domain con- 1408.5882
text, extract keywords to carry out more accurate sentence [22] E. L. Pontes, S. Huet, A. C. Linhares, and J.-M. Torres-Moreno, ‘‘Predict-
ing the semantic textual similarity with siamese CNN and LSTM,’’ in Proc.
similarity calculation. Comput. Res. Repository, 2018, pp. 311–319.
[23] W. Zhuang and E. Chang, ‘‘Neobility at SemEval-2017 task 1: An
ACKNOWLEDGMENT attention-based sentence similarity model,’’ CoRR, vol. abs/1703.05465,
2017. [Online]. Available: http://arxiv.org/abs/1703.05465
The authors would like to thank the helpful comments and [24] D. Chen and C. D. Manning, ‘‘A fast and accurate dependency parser
suggestions of the reviewers, which have improved the pre- using neural networks,’’ in Proc. Conf. Empirical Methods Natural Lang.
sentation. Process., 2014, pp. 740–750.
[25] I. J. Goodfellow, M. Mirza, A. Courville, and Y. Bengio, ‘‘Multi-prediction
deep Boltzmann machines,’’ in Proc. Int. Conf. Neural Inf. Process. Syst.,
REFERENCES 2013, pp. 548–556.
[1] R. Socher, A. Karpathy, Q. V. Le, C. D. Manning, and A. Y. Ng. (2013). [26] A. Krizhevsky, I. Sutskever, and G. E. Hinton, ‘‘Imagenet classification
Grounded Compositional Semantics for Finding and Describing Images with deep convolutional neural networks,’’ in Proc. Int. Conf. Neural Inf.
With Sentences. [Online]. Available: https://nlp.stanford.edu/ Process. Syst., 2012, pp. 1097–1105.
[2] Q. Xiaoyun, K. Xiaoning, Z. Chao, J. Shuai, and M. Xiuda, ‘‘Short-term [27] N. Kalchbrenner, E. Grefenstette, and P. Blunsom, ‘‘A convolutional neu-
prediction of wind power based on deep long short-term memory,’’ in Proc. ral network for modelling sentences,’’ 2014, vol. 1, arXiv:1404.2188.
IEEE PES Asia–Pacific Power Energy Eng. Conf. (APPEEC), Oct. 2016, [Online]. Available: https://arxiv.org/abs/1404.2188
pp. 1148–1152. [28] G. Adam, K. Asimakis, C. Bouras, and V. Poulopoulos, An Efficient
[3] A. Thyagarajan, ‘‘Siamese recurrent architectures for learning sentence Mechanism for Stemming and Tagging: The Case of Greek Language.
similarity,’’ in Proc. 30th AAAI Conf. Artif. Intell., Nov. 2015, pp. 2–8. Berlin, Germany: Springer, 2010.
[29] L. Bottou, F. E. Curtis, and J. Nocedal, ‘‘Optimization methods for large- XINGZHE HUANG is currently pursuing the mas-
scale machine learning,’’ in Proc. SIAM Rev., 2016, pp. 223–311. ter’s degree with the College of Computer Science
[30] J. Gu, Z. Wang, J. Kuen, L. Ma, A. Shahroudy, B. Shuai, T. Liu, X. Wang, and Technology, China University of Petroleum
and G. Wang, ‘‘Recent advances in convolutional neural networks,’’ (East China). His research interests include artifi-
Comput. Sci., vol. abs/1512.07108, pp. 1–38, 2015. [Online]. Available: cial intelligence and natural language processing.
http://arxiv.org/abs/1512.07108
[31] Z. Lu and H. Li, ‘‘A deep architecture for matching short texts,’’ in Proc.
Int. Conf. Neural Inf. Process. Syst., 2013, pp. 2–6.
[32] B. Hu, Z. Lu, L. Hang, and Q. Chen, ‘‘Convolutional neural network
architectures for matching natural language sentences,’’ in Proc. Int. Conf.
Neural Inf. Process. Syst., 2014, pp. 2042–2050.
[33] V. Rus, A. C. Graesser, C. Moldovan, and N. B. Niraula, ‘‘Automatic
discovery of speech act categories in educational games,’’ in Proc. Int.
Educ. Data Mining Soc., 2012, p. 8.
[34] S. Fernando and M. Stevenson, ‘‘A semantic similarity approach to para-
phrase detection,’’ in Proc. Comput. Linguistics U.K. Annu. Res. Colloq.,
2008, pp. 45–52.
[35] J. Wieting, M. Bansal, K. Gimpel, and K. Livescu, ‘‘Towards uni-
versal paraphrastic sentence embeddings,’’ in Proc. ICLR, Nov. 2015,
pp. 223–311.
[36] F. A. Gers, N. N. Schraudolph, and J. Schmidhuber, ‘‘Learning precise
timing with LSTM recurrent networks,’’ J. Mach. Learn. Res., vol. 3, no. 1,
pp. 115–143, 2002.
[37] T. Kenter, A. Borisov, and M. Rijke, ‘‘Siamese CBOW: Optimizing word
embeddings for sentence representations,’’ in Proc. 54th Annu. Meeting
Assoc. Comput. Linguistics, Berlin, Germany, Aug. 2016, pp. 941–951,
doi: 10.18653/v1/P16-1089.
[38] J. Wieting, M. Bansal, K. Gimpel, and K. Livescu, ‘‘From paraphrase
database to compositional paraphrase model and back,’’ Comput. Sci.,
vol. 3, no. 2, pp. 345–358, 2015.