Professional Documents
Culture Documents
Sentiment Analysis of Hotel Reviews - Performance Evaluation of Machine Learning Algorithms
Sentiment Analysis of Hotel Reviews - Performance Evaluation of Machine Learning Algorithms
Sentiment Analysis of Hotel Reviews - Performance Evaluation of Machine Learning Algorithms
net/publication/351262735
CITATIONS READS
0 24
1 author:
Saman Zahid
Linköping University
2 PUBLICATIONS 0 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Comparing Resource Abundance And Intake At The Reda And Wisla River Estuaries View project
All content following this page was uploaded by Saman Zahid on 01 May 2021.
Saman Zahid-samza595
1 ABSTRACT
Customer reviews on hotels are very important part of travel plan for people now a days. People
prefer to book such hotels which have high number of positive reviews. There are different sources
to find the reviews to get a better insight about the hotel’s reputation. Thus it can be said that
customer reviews plays an important part for business owners in order to improve their services.
In this project, sentiment analysis is performed on the basis of user reviews using three different
classifiers. The classifiers used in this project are “Naive Bayes”,“Random Forest” and “Support
Vector Machine”. The performance of these algorithms are assessed on two different parameter
settings. The reviews are classified as “positive”,“negative” or “average” labels.
1
Contents
1 ABSTRACT 1
2 Introduction 3
4 Data 6
4.1 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
5 Method 7
6 Classification Result 8
7 Discussion 9
8 Conclusion 10
Reference 11
2
2 Introduction
Sentiment analysis also known as “opinion mining” or “emotion AI” is used to extract and analyze
users’ opinions, sentiments,emotions and response on certain matter. Text mining using Natural
Language Processing(NLP) techniques is often used to analyze ones responses and reviews to perform
sentiment analysis.
With the advancement of technology and increase in social interactions, it has become very important
for any business to consider user reviews as it plays an important role in providing best services to
the customers. Customer reviews can be used by the business owners to identify the glitches in
their system highlighted by the customers and make improvements accordingly. Customer reviews
also plays a vital role in establishing a company’s reputation.
In this project, I have taken data of 1000 hotels. The purpose of this project is to perform sentiment
analysis on 3 polarity levels that are positive, negative and average using text classification. In this
project I have chosen three different algorithms to check which performs best on this data. The
approach taken are probabilistic, non-probabilistic discriminative and ensemble.
Text classification is not a process of building a classifier only, it also involves different steps that are
required to clean the data and make it useful for the analysis.The steps for text classification are:
3
For the given condition: c = Class(Polarity) , D= Review(text document)
So,
P (Review|Class) ∗ P (Class)
P (Class|Review) =
P (Review)
Naive bayes does not perform well when there is any dependence between the attributes but for
text classification, it sometimes give remarkable results. In (Gindl, Weichselbraun, and Scharl
2010), it has been highlighted that naive bayes performs well for text classification because it takes
contextualization into account.
In this research, I have implemented Multinomial Naive bayes classifier. Multinomial Naive bayes
classifier is suitable for classification with discrete predictors and for multi-label classification. In
our project we have 3 labels and discrete predictor is the word count which is done by using tf-idf.
The maximum posterior for naive bayes is given as:
4
For text classification, we are have performed data-preprocessing that is we have already selected
the features that contributes most for the sentiment analysis. Thus the expectation from random
forest is increased as there is very low chance of selecting insignificant features while making trees.
3.4 TF-IDF
TF-IDF is used to transform data into term frequency(TF) times inverse document frequency(IDF).
Term frequency as recognized by its name is the frequency count of each word in the document.
Inverse document frequency is the weighted frequency count. Term frequency counts frequency
with equal weightage to all the words in the document while inverse document frequency gives
less weightage to the commonly occuring words such as “the, those, these” etc,. The advantage of
performing TF-IDF transformation is that it extracts those features from the document that are
significant for classification.
(1+n)
IDF as used in this project can be given as idf (t) = log (1+df ) + 1 where df is the document frequency
and n is the total number of documents. Constant 1 is added in both numerator and denominator
to avoid 0 divisions. And tfidf as mentioned earlier is tf idf = tf ∗ idf (Pedregosa et al. 2011)
3.5.1 Accuracy
Accuracy is the fraction of sum of true positive and true negative predictions to that of total number
of predictions.
N umber of T rue P redictions
Accuracy =
T otal P redictions
3.5.2 Precision
Precision is the fraction of true positive labels to that of sum of true positive and false positive
predictions. It is given as:
5
TP
P recision =
TP + FP
3.5.3 Recall
Recall is also known as sensitivity. Sensitivity is for binary classification while recall is more genereal.
Recall is the rate of true positive labels to that of sum of true positive and false negative labels. It
is given as:
TP
Recall =
TP + FN
3.5.4 F1-Score
It is the harmonic mean of precision and recall. It is given as:
P recision ∗ Recall
F 1 Score = 2 ∗
P recision + Recall
4 Data
For this project dataset is taken from kaggle. The data is originally fetched from " Datafiniti’s
Business Database" (“Datafiniti Business Database,” n.d.). Dataset initially contained 10000 rows
and 26 columns. In this dataset, each row contains all the information related to a hotel,hotel’s
review,rating as well as reviewr’s information. There are many columns which are irrelevant
for this project such as hotel’s address,country,province,postal code etc, as well as reviewer’s
name,province,source of review(url),review date added and seen etc. Thus for this project I have
taken only the most relevant columns into account though some other columns can be used for
making different sort of analysis such as classifying best hotels in each city.
Since in this project the target is to perform simple sentiment analysis using different classifiers,
regardless of any categorization, only 6 columns are kept. Figure 2 illustrates the overview of data.
6
if Rating > 3
P ositive,
Label = Average, if Rating = 3
N egative, if Rating < 3
The following figure describes the actual distribution of labels in complete dataset.
Text classification is vastly being done using different techniques all over the world. There are many
researches available on sentiment analysis such as (Hegde and Padma 2017). The steps applied in
this project to perform sentiment analysis is dicussed below:
5 Method
For this project, data is to split into 80% training set and 20% test dataset. The classifiers are
learnt using training data and predictions are made on test data.
In this project, 3 different algorithms are used each with 2 different settings. Though there are other
parameters that can be changed but each combination of parameters can influence the performance
of classifier greatly thus I have decided to compare using only 2 different settings for each algorithm.
7
1. Multinomial naive bayes is applied with default settings that is alpha = 1 and then for alpha
= 0.009.
2. Learning is performed using “Linear SVM” classifier that is simple a support vector machine
with kernel set to “Linear”. Once training is done by keeping default settings i-e regularization
parameter C= 1 and class weight = 1 then with C= 1 but class weight = balanced See SVM.
3. Random forest with number of trees 100 and 2000 and minimum number of samples required
to split internal nodes as 2 is used.
6 Classification Result
The performance of each algorithm with different settings is evaluated on test data. The predictions
were made on test set by using each classifier one by one and the performance metrics discussed in
section 3.5 are calculated.
Figure 3, illustrates the confusion matrix heat map for all the classifiers.
It can be observed that the only big difference appears in case of naive bayes classifier with alpha=1
because obviously value 1 does not perform proper tuning and one of the labels i-e “average” is not
predicted at all. For all other classifiers, the difference by keeping different parameter values for the
same classifier is very low.
8
Naive bayes with alpha=0.009 has the highest number of correctly predicted instances that is 1674.
SVM with class weight 1 ranks second with 1628 correctly predicted instances out of 1995 total test
instances. Random forest with default settings performs very well with 1625 correctly predicted
instances. It is quite interesting that though naive bayes with alpha=1 has missed one label, yet it
succeeded to make 1529 correct predictions.
Further assessment can be made by looking at the following graph:
7 Discussion
In figure 4, the performance of every classifier have been plotted. For clear idea of values, the table
containing the values of performance metrics for each classifier is mentioned. It can be seen that in
terms of accuracy and precision, random forest (E,F) performs really well but it provides a very
low recall and f1 score. On the other hand naive bayes(B) has the highest accuracy and very good
precision,recall as well as f1-score.
In (Samal, Panda, and Behera 2017) research, performance analysis using supervised learning
algorithm is done on sentiment analysis. The best algorithm in this case was turned out to be
SVM with linear kernel (LinearSVC). As we can see, linear SVM (C) performs really well with
81.60% accuracy and 66.5% F1-score which is slightly less than multinomial naive bayes. In another
research (Sharif, Hoque, and Hossain 2019), sentiment analysis has been done on restaurant reviews
and is analyzed using several different classifiers but “Multinomial Naive Bayes” in combination
with tf-idf outperformed all other algorithms.In another reserach (Dey et al. 2016), two different
datasets(movie reviews and hotel reviews) are used to analyze the performance of classifiers. It was
concluded that the naive bayes performs well for movie reviews but not for hotel reviews.
In this project, SVM and multinomial naive bayes seems to perform well. One thing that should
be considered is the data taken in this project is skewed that is data contains more positively
labeled text than that of negative and average. This results in bias. The other problem in case
of multinomial naive bayes as highlighted in (Rennie, n.d.) is weight magnitude error, that is the
reason the classifier performs well with appropriate alpha.
9
8 Conclusion
In this project, we have performed sentiment analysis on hotel reviews and have analyzed performance
of Multinomial naive bayes, SVM and Random forest. It was concluded that on the basis of
precision,recall,accuracy and F1-score, multinomial naive bayes with appropriate parameter settings,
performs the best. It can be said that the simplicity of multinomial naive bayes does not make it
less effective than any other algorithm. We can observe that both SVM and multinomial naive bayes
performs really well as compared to the most complex of these three that is Random forest.The
small amount of data could be the reason for it.
In (Govindarajan 2014) research, a hybrid method is used for sentiment analysis which is a
combination of naive bayes,SVM and genetic algorithm. It will be interesting to use such a hybrid
method for sentiment analysis of hotel reviews as not much work has been done on hotel reviews
using such approach.
10
Reference
“Datafiniti Business Database.” n.d. https://datafiniti.co/products/business-data/.
Dey, Lopamudra, Sanjay Chakraborty, Anuraag Biswas, Beepa Bose, and Sweta Tiwari. 2016.
“Sentiment Analysis of Review Datasets Using Naive Bayes and K-Nn Classifier.” ArXiv
abs/1610.09982.
Gindl, Stefan, Albert Weichselbraun, and Arno Scharl. 2010. “Cross-Domain Contextualization of
Sentiment Lexicons.” In.
Govindarajan, M. 2014. “SENTIMENT Analysis of Restaurant Reviews Using Hybrid Classification
Method.” In.
Hegde, Y., and S. Padma. 2017. “Sentiment Analysis Using Random Forest Ensemble for Mobile
Product Reviews in Kannada.” In 2017 Ieee 7th International Advance Computing Conference
(Iacc), 777–82. Los Alamitos, CA, USA: IEEE Computer Society. https://doi.org/10.1109/IA
CC.2017.0160.
Pedregosa, F., G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, et al.
2011. “Scikit-Learn: Machine Learning in Python.” Journal of Machine Learning Research 12:
2825–30.
Rennie, Shih, J. D. M. n.d. “Tackling the Poor Assumptions of Naive Bayes Text Classifiers.” In
Twentieth International Conference on Machine Learning, 616–23. Washington, DC: Goole
Scholar. https://www.aaai.org/Papers/ICML/2003/ICML03-081.pdf.
Samal, Biswaranjan, Mrutyunjaya Panda, and Anil Behera. 2017. “Performance Analysis of
Supervised Machine Learning Techniques for Sentiment Analysis.” In. https://doi.org/10.1109/
SSPS.2017.8071579.
Sharif, Omar, Mohammed Hoque, and Eftekhar Hossain. 2019. “Sentiment Analysis of Bengali
Texts on Online Restaurant Reviews Using Multinomial Naïve Bayes.” In, 1–6. https://doi.org/
10.1109/ICASERT.2019.8934655.
Sokolova, Marina, and Guy Lapalme. 2009. “A Systematic Analysis of Performance Measures
for Classification Tasks.” Information Processing & Management 45 (July): 427–37. https:
//doi.org/10.1016/j.ipm.2009.03.002.
11