Professional Documents
Culture Documents
2012 Liviu P. Dinu, Iulia Iuga, 2012. The Naive Bayes Classifier in Opinion Mining - in Search of The Best Feature
2012 Liviu P. Dinu, Iulia Iuga, 2012. The Naive Bayes Classifier in Opinion Mining - in Search of The Best Feature
2012 Liviu P. Dinu, Iulia Iuga, 2012. The Naive Bayes Classifier in Opinion Mining - in Search of The Best Feature
1 Introduction
During the last decade, data text mining [15] has received a lot of attention, due
to the explosion of available data (over 80% of information is stored as text).
Typical text mining tasks include text categorization and text clustering [4],
humor characterization [9], coherence texts investigation [3], or opinion mining
and sentiment analysis [12], [8].
This paper focuses on how naive Bayes classifiers work in opinion mining
field, which has had a boost as the on-line social media which has had a boost
as the on-line social media (blogs [2], social networks [10], etc.) has risen and
the interest in quickly determining the general opinions on certain topics has
increased.
Given a set of subjective texts that express opinions about a certain object,
the purpose is to extract those attributes (features) of the object that have
been commented on in the given texts and to determine whether these texts are
positive, negative or neutral.
A couple of interesting applications are sentiment analysis
tools for Twitter status updates (http://www.tweetfeel.com/ ,
http://twittersentiment.appspot.com/ are relevant examples, but not
the only ones) or the analysis of short comments on film reviews[16], [11].
1.1 Preliminaries
In the ”bag-of-words” model [7], we begin with making the simplifying assump-
tion about a text that it can be represented as collections of words in which
grammar rules are negligible and even the word order is unimportant.
A. Gelbukh (Ed.): CICLing 2012, Part I, LNCS 7181, pp. 556–567, 2012.
c Springer-Verlag Berlin Heidelberg 2012
The Naive Bayes Classifier in Opinion Mining 557
Bayes classifiers [6] assign the most likely class to a given example described
by its feature vector. Training such classifiers can be significantly simplified by
assuming that features are independent classes, that is:
P (X|C) = (P (Xi |C)) (1)
i=1,n
2 Tests
In this section we will present the results obtained when training and testing
naive Bayes classifiers on ten different feature sets. Each feature set will be
discussed in the following.
558 L.P. Dinu and I. Iuga
The data set used for training the naive Bayes classifiers was Polarity Dataset
v2.0 [12]. It consists of 1000 positive movie reviews and 1000 negative ones (we
can assume that the classifiers receives equal numbers of positive and negative
features when trained). This corpus is included in NLTK (Natural Language
ToolKit, a tool that we used when programming these tests) under the name
movie reviews.
For testing data we used Polarity Dataset v1.0 [12] which consist from 700
positive movie reviews and 700 negative ones.
Both these data bases can be found and downloaded at the following link:
http://www.cs.cornell.edu/people/pabo/movie-review-data/.
1. Test no. 1: The first test was run when considering all the words.
For this test, the most informative features were:
avoids = True pos : neg = 13.0 : 1.0
astounding = True pos : neg = 12.3 : 1.0
slip = True pos : neg = 11.7 : 1.0
outstanding = True pos : neg = 11.5 : 1.0
ludicrous = True neg : pos = 11.0 : 1.0
fascination = True pos : neg = 11.0 : 1.0
3000 = True neg : pos = 11.0 : 1.0
insulting = True neg : pos = 11.0 : 1.0
sucks = True neg : pos = 10.6 : 1.0
hudson = True neg : pos = 10.3 : 1.0
2. Test no. 2: For the second test, we eliminated the stopwords from the texts,
hoping that they don’t weight much in the subjectivity department. It turned
out it made no difference if we filter or not tyhe stopwords.
Same as before, the most informative features were:
avoids = True pos : neg = 13.0 : 1.0
astounding = True pos : neg = 12.3 : 1.0
slip = True pos : neg = 11.7 : 1.0
outstanding = True pos : neg = 11.5 : 1.0
ludicrous = True neg : pos = 11.0 : 1.0
fascination = True pos : neg = 11.0 : 1.0
3000 = True neg : pos = 11.0 : 1.0
insulting = True neg : pos = 11.0 : 1.0
sucks = True neg : pos = 10.6 : 1.0
hudson = True neg : pos = 10.3 : 1.0
The Naive Bayes Classifier in Opinion Mining 559
3. Test no. 3: For this test, we applied a stemmer to the words, trying to find
out if maybe only the roots of the words of the texts would be sufficient to
obtain the information we are looking for. According to this test, different
forms of the words are relevant when expressing opinions.
For this test, the most informative features were:
4. Test no. 4: For the next test, we took into consideration the bigrams (pairs
of words) from the texts, plus all the words. It appears that collocations of
words from the text help determine polarity.
The most informative features were:
5. Test no. 5: For this test we considered as the feature set for training and
testing the most frequent 10.000 words. The words that have the highest
frequencies are the most relevant, but still, there is room for improvement
and, according to the previous test, it appears that the collocations provide
slightly more information. That leads to the next idea: to combine these two,
that is to train and test a classifier on all the bigrams from the texts and
most frequent 10000 words too.
For this test, the most informative features were:
560 L.P. Dinu and I. Iuga
6. Test no. 6: For all bigrams and most frequent words. When using the best
words and the bigrams as feature set for training and testing, there is no
improvement comparing to the test ran just on bigrams.
The most informative features were:
(’give’, ’us’) = True pos : neg = 14.3 : 1.0
avoids = True pos : neg = 13.0 : 1.0
(’quite’, ’frankly’) = True pos : neg = 12.3 : 1.0
astounding = True pos : neg = 12.3 : 1.0
(’does’, ’so’) = True neg : pos = 12.3 : 1.0
slip = True pos : neg = 11.7 : 1.0
(’&’, ’robin’) = True neg : pos = 11.7 : 1.0
(’fairy’, ’tale’) = True neg : pos = 11.7 : 1.0
outstanding = True neg : pos = 11.5 : 1.0
ludicrous = True neg : pos = 11.0 : 1.0
7. Test no. 7: A different approach to take that came to mind was using as
features those parts of speech from the text that seem to express the most
subjectivity, that being the adjectives and the adverbs. Test number 7 was
done on adjectives only.
In order to extract the adjectives from the text we used the WordNet the-
saurus (also included as a package in a the Natural Language ToolKit) and
extracted the words of the movie reviews that appeared at least once in WN
as adjectives. We did not use a part of speech tagger, but that is a technique
that is worth being investigated. The same tactic was used in the next test,
for extracting the adverbs.
8. Test no. 8: This test was done on both the adjectives and the adverbs from
the texts.
9. Test no. 9: Going in this direction, another idea came to mind, that being
that we might benefit from adding to the adjectives, extracted from the texts
in the same manner as presented before, their WordNet synonyms.
We can notice that this has not improved our results, but the contrary and
the reason that happened could be that we did not determine the meaning
of those adjectives in their contexts and therefore we added to the training
feature sets all the possible synonyms of those words, disregarding their
actual meaning in the context. An interesting direction to go from this point
The Naive Bayes Classifier in Opinion Mining 561
NaiveBayes Accuracy Neg precision Neg recall Pos precision Pos recall
For every features we computed the accuracy, negation precision, negation re-
call, positive precision and positive recall. The Results obtained for testing the
classification given by the naive Bayes classifier when the previous 10 features
were taken into consideration are summarized in Figure 1
Example 1. We show in Figure 2 examples of classification of the documents
on which the testing was done into the ”positive” and ”negative” categories.
562 L.P. Dinu and I. Iuga
In the left side you can see the list of the documents classified by naive Bayes
as positive: for example, the document cv002 tok-12931.txt has a probability of
being negative of 0.0229 and a positive probability of 0.9771, therefore is included
in the positive documents list.
Remark 1. We split the texts in thirds and ran the same tests described before on
these parts. The accuracy decreased slightly for each of the thirds. This indicates
that there isn’t a rule about having more information about sentiment polarity
in the begining as oposed to the end or the middle.
As discussed in the previous subsection, for each feature set, we have built a
naive Bayes classifier that we trained on the first data set and test on the second
one. The results for each of the feature sets listed before can be read in the Table
2.1 from the previous subsection. In the following we will provide two combining
classifiers methods which we applied on the previous features.
Each classifier calculates a certain probability for the documents to be pos-
itive or negative ones. If the probability of a text to be positive is bigger then
the one of it being negative, then the classifier assigns it the positive class and
the other way around. We will therefore have a resulting list that looks like this:
{(neg > pos), (pos > neg), ...}.
The Majority Rule. For each document, we will assign the class that appears
a majority of times in the list generated by the classifiers.
Probability Aggregation. We calculate the sum of the positive/negative
probabilities given by each classifier. Then, if the sum of the positive probabil-
ities is bigger then the negative one, we will assign that document the positive
class and vice versa.
The Naive Bayes Classifier in Opinion Mining 563
Method 1 Accuracy Neg precision Neg recall Pos precision Pos recall
c1 - c5 86.57 97.23 75.29 79.84 97.86
c1 - c7 87.79 97.32 77.71 81.45 97.86
c1 - c9 86.79 97.42 75.57 80.05 98.00
Method 2 Accuracy Neg precision Neg recall Pos precision Pos recall
c1 - c5 87.29 97.11 76.86 80.85 97.71
c1 - c7 87.79 97.32 77.71 81.45 97.86
c1 - c9 87.14 97.27 76.43 80.59 97.86
As seen in the result tables 1 and 2 figured before, we did not find a combining
method that increases the performance for combined classifiers as opposed to
the individual ones. An idea would be to take advantage of the difference in
the recall and precision measurements obtained on different feature sets. That
is, suppose we have a number of classifiers trained on a certain feature set for
which it generates big positive precision values and another series of classifiers
that have a big positive recall (and the other way around for the negative values).
By combining them, they might balance each other out and conduct to better
results. In our case, we did not have good examples of independent feature sets
to implement this idea, but we consider it worth investigating.
will be made exclusively based on the algorithm discussed previously and not
taking into consideration the scores the commentators might have given for that
particular title. The algorithm gives scores for the comments tested for opinions
that give the positive/negative probabilities; so, if this probability of a certain
review to be positive is somewhere between 0.45 and 0.55, the review will be
considered to be neutral (same for negative probability).
We show three print-screens, the first one (Figure 3) shows the toolbar of a
browser, where the two buttons are to be found, the second and third one (Figure
4)show the dialog boxes displayed after pressing the two buttons hen the user is
visiting the page of a particular title listed on the IMDb website.
In order to minimize the time spent on calculations, we will start with an
already trained classifier. For this training, we used the same data base used in
The Naive Bayes Classifier in Opinion Mining 565
the training step from the tests presented in section 2 (1000 positive and 1000
negative movie reviews).
4 Conclusions
Nowadays, due to the explosion of the internet, we deal with an unprecedented
amount of data published by people all over the world who express their opinions
on different topics. In most cases, when in need to access and determine general
opinions on certain topics, we do not have a rating system available (such as the
one provided by the IMDb) and developing opinion mining methods that are
fast and as efficient as possible is a current issue.
This study was focused on developing such a method that uses a time-efficient
classification algorithm. We decided to use the naive Bayes classifier. After per-
forming a series of tests to determine its performance when running on different
feature sets extracted from the analysed texts, we came to the conclusion that
the best option for selecting the feature set for real time applications is extract-
ing a relevant number of the most frequent words from the texts used for training
the classifier. For such a simple and apparently overly-simplifying technique, it
performs very well and extremely fast.
As we were trying to find those feature sets that make the most sense to
be relevant in opinion mining, we made a series of assumptions, such as that
groups of words might give more information (correct), that the most frequent
words weight more (also correct), but also found new leads that we think are
worth further investigating: some parts of speech provide more information than
others (adjectives, adverbs); this seems like a sensible assumption, but there
is information lost when extracting the adjectives from the texts (we used the
WordNet thesaurus to determine if each word has at least one adjective/adverb
meaning; a better solution might be using a part of speech tagger). Then, we
tried to add to the training feature set the synonyms of the adjectives selected
that way; this method also lead to decreasing the performance, a reason for this
might be that we did not use a disambiguation technique to determine the sense
566 L.P. Dinu and I. Iuga
of the words in their contexts before adding the synonyms, but we added all
synonyms of the words instead. That would be the second most important idea
to investigate in a future work.
Of course, this method, even if will give good accuracy for classification, will be
extremely slow: applying a part of speech tagger and a disambiguation algorithm
are both time consuming. And in the end it might turn out not have even
been worth the effort, not gaining those many percent points in final results of
classification.
Secondly, we tried to find a combining method for classifiers that had been
trained on different feature sets in order to increase the final accuracy level. We
did not manage to find such a method, but a disadvantage that we had was
not having a balanced set of classifiers, that is, all the classifiers were wrong in
the same way: similar values for precision and recall values. We think that by
combining classifiers that even each other out, we would have a much better
chance with the combining methods we proposed.
References
1. Chaovalit, P., Zhou, L.: Movie Review Mining: a Comparison between Supervised
and Unsupervised Classification Approaches. In: 38th Hawaii International Con-
ference on System Sciences, HICSS 2005 (2005)
2. Conrad, J.G., Schilder, F.: Opinion mining in legal blogs. In: Proceedings of the
11th International Conference on Artificial Intelligence and Law, ICAIL 2007, pp.
231–236 (2007)
3. Dinu, A.: Short Text Categorization via Coherence Constraints. In: Proc. 13th In-
ternational Symposium on Symbolic and Numeric Algorithms for Scientific Com-
puting, SYNASC 2011, Timisoara, Romania, September 26-29, pp. 247–251 (2011)
4. Feldman, R., Sanger, J.: The Text Mining Handbook - Advanced Approaches in
Analyzing Unstructured Data. Cambridge University Press (2007)
5. Kononenko, I.: Machine learning for medical diagnosis: history, state of the art and
perspective. Artificial Intelligence in Medicine 23(1), 89–109 (2001)
6. Langley, P., Iba, W., Thompson, K.: An Analysis of Bayesian Classifiers. In: Proc.
AAAI 1992, pp. 223–228 (1992)
7. Lewis, D.D.: Naive (Bayes) at Forty: The Independence Assumption in Information
Retrieval. In: Proc. Machine Learning: ECML-1998, 10th European Conference on
Machine Learning, Chemnitz, Germany, April 21-23, pp. 4–15 (1998)
8. Mihalcea, R., Banea, C., Wiebe, J.: Learning Multilingual Subjective Language
via Cross-Lingual Projections. In: Proceedings of the 45th Annual Meeting of the
Association for Computational Linguistics, ACL 2007, Prague, Czech Republic,
June 23-30 (2007)
9. Mihalcea, R., Pulman, S.: Characterizing Humour: An Exploration of Features
in Humorous Texts. In: Gelbukh, A. (ed.) CICLing 2007. LNCS, vol. 4394, pp.
337–347. Springer, Heidelberg (2007)
The Naive Bayes Classifier in Opinion Mining 567
10. Pak, A., Paroubek, P.: Twitter as a Corpus for Sentiment Analysis and Opinion
Mining. In: Proceedings of the International Conference on Language Resources
and Evaluation, LREC 2010, Valletta, Malta, May 17-23 (2010)
11. Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment classification using
machine learning techniques. In: Proceedings of the 2002 Conference on Empirical
Methods in Natural Language Processing (EMNLP), pp. 79–86 (2002)
12. Pang, B., Lee, L.: Opinion Mining and Sentiment Analysis. Foundations and Trends
in Information Retrieval (FTIR) 2(1-2), 1–135 (2007)
13. Perkins, J.: Python Text Processing with NLTK 2.0 Cookbook. Packt Publishing
(2010)
14. Rish, I., Watson, T.J.: An empirical study of the naive Bayes classifier. Research
Center (2001),
http://domino.research.ibm.com/comm/
research people.nsf/pages/rish.pubs.html
15. Witten, I.H., Frank, E.: Data mining: practical machine learning tools and tech-
niques. Elsevier (2005)
16. Yessenov, K., Misailovic, S.: Sentiment Analysis of Movie Review Comments, Re-
port on Spring 2009 final project (2009),
http://people.csail.mit.edu/kuat/courses/6.863/
17. Beautiful Soup - HTML/XML parser for Python,
http://www.crummy.com/software/BeautifulSoup/
18. IMDbPY - package for manipulating IMDb data for Python,
http://imdbpy.sourceforge.net/
19. NLTK - Natural Language ToolKit, http://www.nltk.org/
20. PyGTK - library for implementing graphic user interfaces in Python,
http://www.pygtk.org/
21. WebKit - web browser web, http://www.webkit.org/