Professional Documents
Culture Documents
Sentiment Deeplearning 2016 Prerna
Sentiment Deeplearning 2016 Prerna
Sentiment Deeplearning 2016 Prerna
The weights on edges are learned by means of Figure 2: Bengio’s Neural Network Model (Ben-
back-propagation of errors through the layers of gio et al., 2003)
the neural network based on the inter-connections
and the non-linear functions. The labeled data Each word embedding may be of any dimen-
is fed to the network a number of times (called sionality as the user wishes. Higher dimensional-
epochs) for the network to learn the weight pa- ity implies more information captured but on the
rameters until the error becomes negligible (in the other hand incurs higher computational expense.
ideal case). Generally, for experiments, training is Hence, a trade-off is to be chosen to balance both.
done for a fixed number of times to reach a mini- Google has released pre-trained vectors trained on
mum error value when the network does not con- part of Google News dataset (about 100 billion
verge any further. words)1 , which can be used by researchers. The
model contains 300-dimensional vectors for 3 mil-
4 Word Embeddings lion words and phrases. These can then be used as
Neural networks in NLP, unlike other traditional inputs into the neural networks for any NLP tasks.
algorithms, do not take raw words as input, since The quality of the word vectors is defined by
the networks can understand only numbers and how the vectors distinguish between dissimilar
functions. Hence words need to be transformed words and are close for similar ones. The close-
into feature vectors, or in other words word em- ness of word vectors is generally determined by
beddings (Mikolov et al., 2013), which capture the cosine distance (Mikolov et al., 2013). For exam-
characteristics and semantics of the words if ex- 1
available at https://code.google.com/
tracted properly. The word vectors can be learned archive/p/word2vec/
ple, embeddings of words cat and dog are closer to The window-level approach cannot be applied
that of horse as compared to that of play. In addi- to macro-text level tasks like sentiment classifi-
tion, certain directions in this induced vector space cation because sentiment tag requires the whole
are observed to specialize towards certain seman- sentence to be taken into consideration, whereas
tic relationships (unlike one-hot encoding) like in window level approach, only a portion of sen-
gender, verb tense and even country-capital rela- tence is considered at a time. Also, for other NLP
tionships between words (Mikolov et al., 2013). tasks as well, one word may depend on some word
For example, following relationships are captured which does not fall in the pre-decided window.
by the pre-trained Google vectors: Hence, sentence-level approach is a viable alter-
native, which takes the feature vectors of all the
(<word> implies corresponding word vector) words in the input text as input (Collobert et al.,
2011).
<king> + <queen> = <boy> + <girl>
Figure 4 shows the overall structure of such a
<playing> + <played> = <trying> + network which takes the whole sentence as input.
<tried> The sequence layer can have different structures
<Japan> + <Tokyo> = <Berlin> + to handle the sequence of words in the text. For
<Germany> sentence classification, since sentences can be of
variable size, there is a pooling layer after the
Hence, this substantiates the utility of these vec- sequence layer, which is a feature map of a fixed
tors, i.e., word embeddings as features for several size. This is then fed into fully connected layers
canonical NLP prediction tasks like part-of-speech of neurons to produce a single tag for the whole
tagging, named entity recognition and sentiment sentence.
classification (Collobert et al., 2011).
Table 1: Results (Accuracy values in %) of different machine learning approaches for sentiment classifi-
cation on Stanford Sentiment Treebank dataset
2. MPQA6 : Opinion polarity detection subtask taken from papers cited alongside the classifica-
of the MPQA dataset (Wiebe et al., 2005) tion models in the table).
Table 2: Results (Accuracy values in %) of different machine learning approaches on MR, MPQA, CR
and Subj Datasets
bank dataset (50.6%) – more than 8% perfor- Konstantin Buschmeier, Philipp Cimiano, and Roman
mance gain over the traditional models. This Klinger. 2014. An impact analysis of features in
a classification approach to irony detection in prod-
clearly exhibits the efficacy of deep learning mod-
uct reviews. In Proceedings of the 5th Workshop
els for sentiment analysis. on Computational Approaches to Subjectivity, Sen-
Neural networks are versatile, owing to the fact timent and Social Media Analysis, pages 42–49.
that they avoid task-specific engineering, thereby
Stanley F Chen and Ronald Rosenfeld. 2000. A
disregarding any sort of prior knowledge related survey of smoothing techniques for me models.
to the data. Hence, man-made features, which IEEE transactions on Speech and Audio Processing,
have to be carefully optimized to produce proper 8(1):37–50.
results in case of traditional models, are not at Ronan Collobert, Jason Weston, Léon Bottou, Michael
all necessary while using deep learning models. Karlen, Koray Kavukcuoglu, and Pavel Kuksa.
Although the widely popular traditional models 2011. Natural language processing (almost) from
like Naı̈ve Bayes and SVM have proved useful scratch. The Journal of Machine Learning Re-
search, 12:2493–2537.
for so long, the potential of the deep learning
paradigm cannot be overlooked. In fact, the latter Corinna Cortes and Vladimir Vapnik. 1995. Support-
promises to perform much better than the former, vector networks. Machine learning, 20(3):273–297.
with minimal constraints on the task or data for Li Deng and Dong Yu. 2014. Deep learning: methods
sentiment analysis. and applications. Foundations and Trends in Signal
Processing, 7(3–4):197–387.
Andrea Esuli and Fabrizio Sebastiani. 2006. Senti-
wordnet: A publicly available lexical resource for
References opinion mining. In Proceedings of LREC, volume 6,
pages 417–422.
Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and
Christian Janvin. 2003. A neural probabilistic lan- Felix A Gers, Jürgen Schmidhuber, and Fred Cummins.
guage model. The Journal of Machine Learning Re- 2000. Learning to forget: Continual prediction with
search, 3:1137–1155. lstm. Neural computation, 12(10):2451–2471.
Adam L Berger, Vincent J Della Pietra, and Stephen Roberto González-Ibánez, Smaranda Muresan, and
A Della Pietra. 1996. A maximum entropy ap- Nina Wacholder. 2011. Identifying sarcasm in twit-
proach to natural language processing. Computa- ter: a closer look. In Proceedings of the 49th An-
tional linguistics, 22(1):39–71. nual Meeting of the Association for Computational
Linguistics: Human Language Technologies: short Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan.
papers-Volume 2, pages 581–586. Association for 2002. Thumbs up?: sentiment classification using
Computational Linguistics. machine learning techniques. In Proceedings of the
ACL-02 conference on Empirical methods in natural
Sepp Hochreiter and Jürgen Schmidhuber. 1997. language processing-Volume 10, pages 79–86.
Long short-term memory. Neural computation,
9(8):1735–1780. Jason D Rennie, Lawrence Shih, Jaime Teevan,
David R Karger, et al. 2003. Tackling the poor as-
Minqing Hu and Bing Liu. 2004. Mining and summa- sumptions of naive bayes text classifiers. In ICML,
rizing customer reviews. In Proceedings of the tenth volume 3, pages 616–623. Washington DC).
ACM SIGKDD international conference on Knowl-
edge discovery and data mining, pages 168–177. Stuart Jonathan Russell, Peter Norvig, John F Canny,
ACM. Jitendra M Malik, and Douglas D Edwards. 2003.
Artificial intelligence: a modern approach, vol-
Ozan Irsoy and Claire Cardie. 2014. Deep recursive ume 2. Prentice hall Upper Saddle River.
neural networks for compositionality in language.
In Advances in Neural Information Processing Sys- Richard Socher, Jeffrey Pennington, Eric H Huang,
tems, pages 2096–2104. Andrew Y Ng, and Christopher D Manning. 2011.
Semi-supervised recursive autoencoders for predict-
Edwin T Jaynes. 1957. Information theory and statis- ing sentiment distributions. In Proceedings of the
tical mechanics. Physical review, 106(4):620. Conference on Empirical Methods in Natural Lan-
guage Processing, pages 151–161. Association for
Thorsten Joachims. 1998. Text categorization with Computational Linguistics.
support vector machines: Learning with many rel-
evant features. In European conference on machine Richard Socher, Alex Perelygin, Jean Y Wu, Jason
learning, pages 137–142. Springer. Chuang, Christopher D Manning, Andrew Y Ng,
and Christopher Potts. 2013. Recursive deep mod-
Nal Kalchbrenner, Edward Grefenstette, and Phil Blun- els for semantic compositionality over a sentiment
som. 2014. A convolutional neural network for treebank. In Proceedings of the conference on
modelling sentences. In Proceedings of the 52nd empirical methods in natural language processing
Annual Meeting of the Association for Computa- (EMNLP), volume 1631, pages 1631–1642.
tional Linguistics, pages 655–665.
Kai Sheng Tai, Richard Socher, and Christopher D
Yoon Kim. 2014. Convolutional neural networks for Manning. 2015. Improved semantic representa-
sentence classification. In Proceedings of the 2014 tions from tree-structured long short-term memory
Conference on EMNLP, page 17461751. networks. arXiv preprint arXiv:1503.00075.
Quoc V Le and Tomas Mikolov. 2014. Distributed Sida Wang and Christopher D Manning. 2012. Base-
representations of sentences and documents. lines and bigrams: Simple, good sentiment and topic
classification. In Proceedings of the 50th Annual
Andrew McCallum, Kamal Nigam, et al. 1998. A Meeting of the Association for Computational Lin-
comparison of event models for naive bayes text guistics: Short Papers-Volume 2, pages 90–94. As-
classification. In AAAI-98 workshop on learning for sociation for Computational Linguistics.
text categorization, volume 752, pages 41–48. Cite-
seer.