43 - A Framework For Sentiment Analysis With Opinion Mining of Hotel Reviews

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

2018 Conference on Information Communications Technology and Society (ICTAS)

A Framework for Sentiment Analysis with


Opinion Mining of Hotel Reviews
Kudakwashe Zvarevashe Oludayo O Olugbara
ICT and Society Research Group, ICT and Society Research Group,
Durban University of Technology, Durban University of Technology,
P.O. Box 1334, Durban 4000, South Africa P.O. Box 1334, Durban 4000, South Africa
kudakwashe.zvarevashe@gmail.com oludayoo@dut.ac.za

Abstract — The rapid increase in mountains of unstructured comfo rt, lu xu ry and lodging services for travellers and people
textual data accompanied by proliferation of tools to analyse on vacation. Mining hotel reviews is desirable to gain deeper
them has opened up great o pportunities and challenges for text knowledge of customer expectations and support effective
mining research. The automatic labelling of text data is hard management of customer relationships. It would enable the
because people often express opinions in complex ways that are hotel managers to have a good understanding of customer
sometimes difficult to comprehend. The labelling process
involves huge amount of efforts and mislabelled datasets usually
needs, discover areas for further improvement and improve
lead to incorrect decisions. In this paper, we design a framework service quality. The hotel reviews are provided exclusively
for sentiment analysis with opinion mining for the case of hotel by customers who have made reservations at a particular
customer feedback. Most available datasets of hotel reviews are hotel. Customers post feedback about hotels which include
not labelled which presents a lot of works for researchers as far hygiene, quality of food, location, customer service quality
as text data pre-processing task is concerned. Moreover, and hospitality exh ibited by hotel staff. Moreover, sentiment
sentiment datasets are often highly domain sensitive and hard to analysis of hotel reviews is crucial to understand hidden
create because sentiments are feelings such as emotions, patterns generated by data that would help to effect ively
attitudes and opinions that are commonly rife with idioms, improve performance [1].
onomatopoeias, homophones, phonemes, alliterations and
acronyms. The proposed framework is termed sentiment
polarity that automatically prepares a sentiment dataset for II. RELATED LITERATURE
training and testing to extract unbiased opinions of hotel Sentiment analysis [2] and opinion mining [3] are terms
services from reviews. A comparati ve analysis was established that refer to the field of study that analyses opinions,
with Naïve Bayes multinomial, sequential minimal optimization, evaluations, appraisals, attitudes and emotions of people
compliment Naïve Bayes and Composite hypercubes on iterated towards entities such as products, services, organizations,
random projections to discover a suitable machine learning individuals, issues, events, topics and their attributes [4].
algorithm for the classification component of the framework. These terms were used interchangeably to define opinions
that entail positive or negative sentiments [4, 5]. Sentiment
Keywords: opinion mining, sentiment analysis, machine learning determines the polarity of opinions expressed in a given
algorithm, natural language processing, sentiment polarity,
dataset labelling
review. Cambria et al. [6] disputed the interchange of these
concepts by classifying opinion mining as polarity detection
I. INTRODUCTION and sentiment analysis as focusing on emotion recognition.
In recent years, the world has experienced a tremendous The opinion min ing system only needs to understand polarity
rise in the volu me of textual data especially for the that can be positive, negative or neutral sentiments depending
unstructured data generated from people who express on the nature of sentences expressed in a review [7]. The
opinions through various web and social media platforms for process of detecting polarity is strongly linked to analysing
different reasons. Mountains of these textual data, in itially sentiments on a particular subject.
could be equated to garbage which would need to be disposed Most researches on sentiment analysis are focused on
fro m t ime to time. However, with the advancement in storage descriptive data. Manke and Shivale [8] exp lored the
capacity accompanied by the increasing sophistication in data significance of social networks as preferred environ ments for
mining tools, opportunities and challenges have been created opinion mining and sentiment analysis. They introduced the
for analysing and deriv ing useful insights from these original method of opinion classification and tested their
mountains of data. algorith m on real social network datasets. They concluded
In this paper, we have chosen textual data in the form of fro m their findings that social networks exhib it properties
hotel reviews for sentiment analysis with opinion min ing that make them suitable for opin ion min ing activities.
fro m customer perspectives. Sentiment analysis uses the Co mprehensive surveys have been presented on various
techniques of natural language processing and computational methods used in opinion mining [9-11] with limited focus on
linguistics to automate the classification of sentiments aspect oriented analysis.
generated from reviews. Hotels provide satisfaction, security, The majority of current methods of sentiment analysis
attempt to detect the polarity of a review regardless of the
ISBN 978-1-5386-1001-5/ 18/ $31.00 ©2018 IEEE
entities such as hotels and facilities with their respective clustering, classification, regression, visualization,
aspects such as food and internet access for instance. By association rule min ing and feature selection [13]. Data can
contrast, the task of this study is concerned with aspect based be imported fro m an external source in formats such as
sentiment analysis with the goal of identifying the aspects of comma separated value (CSV) file. The data in its raw format
given target entities and sentiment expressed towards each needs to be cleaned as it may not be compatible with the
aspect. The aspect based sentiment analysis summarizes what processing required. Pre-processing is done to transform the
people like and dislike fro m reviews of products or services. raw data into a format that can be manipulated by appropriate
It has always been a difficu lt task [7] because several subtasks tools. Furthermore, the larger the dataset, the more accurate
such as feature extract ion, feature grouping, polarity is the performance of the lean ing algorith ms that is inherently
classification and evaluation measures have to be performed measured in terms of the standard evaluation metrics.
to get an unbiased opinion, usually under the assumption of
grammar free errors which is not always realistic. III. METHODOLOGY
Hotel review is an important theme of natural language
processing (NLP), which is a discipline that deals with A. The Intuition Model
processing of textual data [12]. It is at the intersection of The conceptual view of the intuit ion model shown in
artificial intelligence and linguistics [6] which makes NLP Figure 1, begins with the feedback collection. Customers
techniques amenable to sentiment analysis. The analysis of respond to questionnaires concerning their feelings about
sentiment can be performed using two basic approaches of services received from the selected hotels. This can be done
lexicon-based and machine learn ing [9, 10]. The mach ine in a nu mber of ways, for example opening a web portal
learning approach which this study is based upon utilizes t wo through which customers can drop comments. The next step
main learn ing techniques of supervised and unsupervised will be to label the comments based on intuition. This will be
learning. Supervised learning algorith ms such as the support done by human agents who simply read the co mments and
vector mach ine, Naïve Bayes, K-nearest neighbour and assign labels based on perceptions. Once data are transformed
convolutional neural network deep learn ing have been to a desire format, the next step will be to convert the labelled
applied for sentiment analysis. Supervised learning text to feature vectors through the use of filters. This will
algorith ms require the train ing of machine using labelled make it easier to imp lement a classification algorith m for
dataset. However, unsupervised learning algorith ms such as training and testing of data. The next step involves the
the K-means and Fuzzy C-means clustering do not require selection of an appropriate classificat ion algorith m whilst the
training datasets because they learn by observation. last step is the training and testing of the selected algorith m
The application of supervised learning algorith ms as on dataset and capturing of results.
applied in this study to volumes of labelled train ing data on
hotel reviews can provide insightful in formation that would B. The Sentiment Polarity Based Model
help hotels imp rove performance and overall ratings amongst The research reported in this paper was done using the
competitors. Different evaluation measures such as true sentimental polarity based model (SPBM ) as illustrated in
positive rate, false positive rate, precision, recall, and F- Figure 2. Just like the intuition based model (IBM), the
measure rate, receiver operating characteristic (ROC) area SPBM model begins with the elicitation of opinions which is
and precision recall curve (PRC) area are the benchmark the step skipped because we used the raw OpinRank dataset
metrics used to determine the accuracy of different [14, 15]. Customers respond to questionnaires concerning
supervised learning algorith ms. However, to get accurate services received fro m the selected hotels through an
results of these measures, there is the need to design a appropriate user interface. The next step will be to label the
framework that fosters the automatic creat ion of properly comments based on sentiment polarity score using a
labelled datasets containing actual sentiments expressed by sentiment polarity algorith m. The score obtained will
customers. determine whether a co mment is positive, negative or neutral.
The success of sentiment analysis and opinion mining Once the data are transformed, the next step is to convert the
hinges largely on the use of tools for executing d ifferent NLP labelled text to feature vectors through the use of filters. The
tasks. Tools that can be used for the NLP tasks include Red next step involves the selection of a suitable classification
Opal wh ich is used for enabling users to find products based algorith m. The last step is the training and testing of the
on aspects. Tools that are used to help companies extract and selected classification algorithm and capturing of results. The
analyse opinions of customers on products from blogs distinguishing property of SPBM is labelling is that
include SenticNet, Luminoso, Factiva, Attensity and automatic, it does not involve human intervention and it is
Converseon. The NLTK, OpenNLP and Stanford CoreNLP quite consistent in labelling sentiments. However, the IBM
are widely used NLP toolkits to support the implementation relies heavily on human intervention to label sentiments
of basic NLP tasks such as POS tagging, named entity which sometimes may not be consistent and the labelling
recognition and parsing [11]. The W EKA system is a widely process is intrinsically laborious and time demanding.
used tool that contains a collection of techniques for data
analysis and predictive modelling. It supports several
standard data mining tasks that include data pre-processing,
IV. EXPERIM ENTA L RESULTS labelling of sentiment datasets. The experimental result of
Data fro m the Opin Rank opin ion based ranking dataset this study has indicated that Naïve Bayes multino mial
was acquired to experimentally test the performance of the algorith m gave good performance when compared to other
investigated learning algorith ms to discover a suitable classification algorithms in terms of the evaluation metrics
algorith m for the classificat ion component of the SPBM applied. In future work, we would like to improve on
framework. We selected the OpinRan k dataset because it automatic labelling, feature ext raction and perform
contains unlabelled reviews that gave the flexib ility for a classification of customer responses based on emotions using
custom experimentation. The Opin Rank dataset contains deep learning algorithms.
approximately 259000 unlabelled reviews on cars and hotels
fro m 80 to 100 hotels in 10 different cities across the world. A CKNOWLEDGMENT
These cities include Dubai, Beijing, London, New York City, We would like to thank Prudence Kadebu and Innocent
New Delhi, San Francisco, Shanghai, Montreal, Las Vegas Mapanga from Harare Institute of Technology, Belvedere,
and Chicago. We selected hotel reviews fro m London, Harare for their heartfelt assistance.
Beijing and Montreal as a matter of choice to create a sub-
dataset for our experimentation. We labelled the hotel REFERENCES
features in the dataset using a sentiment polarity software [1] V. Dhanalakshmi, B. Dhivya and A.M. Saravanan, “ Opinion mining
written in Python with the TextBlob which is a library for from student feedback data using supervised learning algorithms”.
processing textual data. The scores obtained from the IEEE 3rd MEC International Conference on Big Data and Smart City.
, pp. 1-5, 2016.
sentiment polarity were then used to automatically label the
[2] J. Yi, T . Nasukawa, R. Bunescu and W. Niblack, “ Sentiment analyzer:
data. After performing all the necessary processing s teps, Extracting sentiments about a given topic using natural language
including labelling and filtering, the dataset was split into two processing techniques”. In Data Mining, 2003. ICDM 2003. Third
subsets to create testing and training datasets. We used four IEEE International Conference, pp. 427-434, 2003.
classification algorithms wh ich are Naïve Bayes mu ltino mial [3] K. Dave, S. Lawrence D.M. and Pennock, “Mining the peanut gallery:
(NBM), Sequential min imal optimization (SMO), opinion extraction and semantic classification of product reviews”. In
ACM Proceedings of the 12th international Conference on World Wide
Co mpliment Naïve Bayes (CNB) and Co mposite hypercubes Web, pp. 519-528, 2003.
on iterated random projections (CHIRP) to train and test the [4] B. Liu, “ Sentiment analysis and opinion mining”. Synthesis Lectures
dataset. Figure 3 shows the co mparative results obtained after on Human Language Technologies, vol. 5, no. 1, pp. 1-167, 2012.
experimentation. The Naïve Bayes mult inomial algorith m [5] A. Buche, D. Chandak and A. Zadgaonkar, “ Opinion mining and
had the highest precision which reached 80.9%. It was closely analysis: a survey”. International Journal on Natural Language
followed by the compliment Naïve Bayes algorithm which Computing (IJNLC), vol. 2, no. 3, pp. 39-48, 2013.
had 80.5%. CHIRP was the lowest performing algorithm and [6] E. Cambria, B. Schuller, Y. Xia and C. Havasi, “New avenues in
opinion mining and sentiment analysis”. IEEE Intelligent Systems, vol.
it scored 75.6% in precision. 28, no. 2, pp. 15-21, 2013.
[7] M.S. Akhtar, D.k Gupta, A. Ekbal and P. Bhattacharyya. "Feature
V. CONCLUSION selection and ensemble construction: a two-step method for aspect
The sentiment analysis with opinion mining framework based sentiment analysis." Knowledge-Based Systems, vol. 125, pp.
reported in this paper can be incorporated into a hotel 116-135, 2017.
technology system that can help improve customer [8] S.N. Manke and N. Shivale, “ A review on: opinion mining and
sentiment analysis based on natural language processing”.
relationship management. What good is a system that predicts International Journal of Computer Applications, vol. 109, no. 4, pp.
the polarity of sentiments if it wo rks with the wrongly 29-32, 2015.
labelled data? Fro m the sentiment polarity exercise that we [9] W. Medhat, A. Hassan and H. Korashy, “ Sentiment analysis algorithms
did, we found out that some co mments may be wrongly and applications: a survey”. Ain Shams Engineering Journal, vol. 5,
viewed as neutral while they will be either positive or no. 4, pp. 1093-1113, 2014.
negative. The following examp le was viewed as a neutral [10] J.A. Balazs and J.D. Velásquez, “Opinion mining and information
comment. “That hotel is surely a HELLTEL!” This comment fusion: a survey”. Information Fusion, vol. 27, pp. 95-110, 2016.
is truly negative and sarcastic, but because the word [11] S. Sun, C. Luo and J. Chen, “ A review of natural language processing
techniques for opinion mining systems”. Information Fusion, vol. 36,
HELLTEL does not exist in the English vocabulary it was pp.10-25, 2017.
classified under the neutral class. However, most comments [12] P.M. Nadkarni, L. Ohno-Machado and W.W. Chapman, “Natural
were labelled with a much better accuracy. We believe that a language processing: an introduction”. Journal of the American
lot of research can be done in this area especially in fine Medical Informatics Association, vol. 18, no. 5, pp. 544-551, 2011.
tuning the feature extract ion algorith m of the framework so [13] M.T. Khan, M. Durrani, A. Ali, I. Inayat, S. Khalid and K.H. Khan,
that classification error is minimised. “ Sentiment analysis and the complex natural language”. Complex
Adaptive Systems Modeling, vol. 4, no. 1, pp. 1-19, 2016.
The system is expected to determine sentiments the way
[14] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann and I.H.
human beings do and labelled datasets are normally used for Witten, “The WEKA data mining software: an update”. ACM SIGKDD
the system to learn automat ically. The proposed framework explorations newsletter, vol. 11, no. 1, pp. 10-18, 2009.
tries to make sure that sentences are correctly labelled such [15] K. Ganesan and C. Zhai, “ Opinion-based entity ranking”. Information
that false information is not fed into the system. In a nutshell, retrieval, vol. 15, no. 2, pp. 116-150, 2012.
the proposed framework in this paper helps in automatic
Figure 2. Sentiment polarity analysis framework

Figure 1. Intuitive sentiment analysis framework

NBM SMO CNB CHIRP

0.90
Weighted average score

0.80
0.70
0.60
0.50
0.40
0.30
0.20
0.10
0.00

Performance metric

Figure 3. Weighted average score against performance metrics

You might also like