Download as txt, pdf, or txt
Download as txt, pdf, or txt
You are on page 1of 20

Sentiment Analysis of restaurant reviews: Mining opinion

Submitted towards partial fulfilment of the criteria


for award of PGPBA by GLIEMR

Submitted By:

Anish Kumar (BABICHO5104)


Navaneetha Krishnan (BABICHO5145)
Praveen Christopher (BABICHO5152)

Course & Batch: PGPBA JULY 2016


Mentor: Dr Monika Mittal

Great Lakes Institute of Management

Dr. Bala V. Balachandar Campus 


East Coast Road, Manamai Village ThirukazhukundramTaluk 
Kancheepuram District 
https://www.greatlakes.edu.in/

Acknowledgements

First of all, we would like to express thanks to our guide Dr Monika Mittal,
Assistant Professor at Great Lakes Institute of Management, Chennai for being an
excellent mentor during the period of this project thesis. Her encouragement and
valuable advice has made it possible for us to complete our work.
We would also like to say thanks to some of our friends who devoted their valuable
time in reviewing our work and giving suggestions when required.

Date: 22nd April, 2016


Anish Kumar, Navaneethan Krishnan, Praveen Christopher
Place: Chennai

Certificate of Completion

I hereby certify that the project titled “Sentiment Analysis of restaurant reviews:
Mining opinion” was undertaken and completed under my supervision by Anish Kumar,
Navaneethan Krishnan, Praveen Christopher, and students of July batch of
Postgraduate Program in Business Analytics (PGPBA Jul 2016).

Date: Apr 22, 2016 (Dr Monika Mittal)


Place: Chennai Mentor
Table of Contents

Contents
ABSTRACT 6
Introduction 7
Title & Objective of the Study 7
Need of the study 7
Company under Study 9
Data Source 9
Tools & Techniques 10
Limitations: 11
LITERATURE REVIEW 11
DATA DESCRIPTION AND PREPARATION 12
Data Collection 12
Data Cleansing 13
Text Exploration 14
EXPLORATORY ANALYSIS 15
Clustering and Classification Algorithms used : 19
Hierarchical Clustering 19
K Means Clustering 21
Naïve Bayes: 23
Support Vector Machine 23
Predicting polarity of the review comments using Naïve Bayes and SVM 24
Classification of Review Texts 24
Feature Extraction 25
Results and Analysis 29
Definition of Terms used in the results tables 30
Classifier Accuracy 30
Classifier Precision 30
Classifier Recall 30
F-measure Metric 30
Measuring Precision and Recall of a Naive Bayes Classifier 30
Inference from Results 31
Aspect Based Analysis of Review text 32
Extracting Aspects from review using Boot-strapping Method:33
Recommendation and Applications 38
Recommendation: 38
Applications: 38
CONCLUSIONS: 39
LIST OF REFERENCES 39
ABSTRACT

Zomato.com is an online restaurant search and discovery service which enables user
to search for a restaurant to dine-out and also let them to share their review and
rating about the restaurant. These reviews and ratings help other users who are
searching for restaurant to dine out to check what is good and what is bad about
any particular restaurant which helps them to have a good meal outside. This
project deals with business insights that can be derived from text / opinion mining
of restaurant reviews shared on the Zomato by the foodies. The text data available
is not structured and so to look for what a majority of the crowd is looking for
when they dine can make good business sense for owners. Business owners have to
modify their operations based on customer preferences, and there is no better way
to understand the customer and what they need, feel and want changed than a review
that has no personal agenda.
Capturing the emotion of a customer who has written a review, by the choice of
words they use in the review is an essential part of improving the overall customer
experience at a restaurant. While business owners benefit from constant
improvements done to the various aspects of the restaurants, customers who are
willing to invest their time are hugely benefitted by online reviews. So, this
project looks at how text used in a review can influence a potential costumer and,
what is it that works or does not work for a business, why do people go frequently
a certain restaurant, what is it that people look for in a restaurant.

Introduction

Title & Objective of the Study

“Sentiment Analysis of restaurant reviews: Mining opinion” is the title for this
project. The main goal of this project is to understand the polarity of the review
comments whether it is positive or negative and also to extract aspect based such
as food, service, ambience review analysis.
Need of the study

The food & restaurant business is a multi-billion-dollar Industry, the success or


failure of an establishment solely depends on visits and revisits from its
customers. Sentiment analysis or Opinion mining thus is one of the major tasks of
Natural Language Processing. The power of word of mouth advertising has never been
as effective as it is today by the use of online reviews to help this cause. These
reviews are taken to be largely unbiased and neutral when it comes to rendering
ones’ opinion and thereby helping a person who seeks online advice on a certain
restaurant/cuisine.
A review being a personal evaluation of products or services, or a musical
performance, literature, culture or current affairs. We have chosen to analyse
reviews done on food at restaurants and the service which surrounds it from
zomato.com. Reviews can also convey emotion and at times even sarcasm. Users have
the choice to like / Share / or reply to reviews posted on the site.
With the variety in food that is available the exposure for us as a local to taste
exotic food around the corner has never been easier or faster. Our options have
only gotten wider and the choice that we make depends not only on what we want, but
it also on what others think of a certain place that we want to visit. The dilemma
that a person would have earlier is to spend at a restaurant, learn, and then
decide if it’s worth a second visit or not. With a platform like Zomato we can
learn from the experience of others to make our informed decision. The objective of
the study is to provide an understanding of what the use of such huge unstructured
data can do for users in making suitable decisions that help themselves and others.
When consumers have to make a decision or a choice regarding a product, an
important information is the reputation of that product, which is derived from the
opinion of others. Sentiment analysis can reveal what other people think about a
product. The first application of sentiment analysis is thus giving indication and
recommendation in the choice of products according to the wisdom of the crowd. When
you choose a product, you are generally attracted to certain specific aspects of
the product. A single global rating could be deceiving. Sentiment analysis can
regroup the opinions of the reviewers and estimate ratings on certain aspects of
the product. Another utility of sentiment analysis is for companies that want to
know the opinion of customers on their products. They can then improve the aspects
that the customers found unsatisfying. Sentiment analysis can also determine which
aspects are more important for the customers. Finally, sentiment analysis has been
proposed as a component of other technologies. One idea is to improve information
mining in text analysis by excluding the most subjective section of a document or
to automatically propose internet ads for products that fit the viewer’s opinion
(and removing the others). Knowing what people think gives numerous possibilities
in the Human/Machine interface domain. Sentiment analysis for determining the
opinion of a customer on a product (and consequently the reputation of the product)
is the main focus of this paper.

Company under Study

Zomato is a restaurant search and discovery service founded in 2008 by Deepinder


Goyal and Pankaj Chaddah. It currently operates in 23 countries, including India,
Australia and the United States. It provides information and reviews on
restaurants, including images of menus where the restaurant does not have its own
website. Zomato team gathers information from every restaurant on a regular basis
to ensure our data is fresh. Our vast community of food lovers share their reviews
and photos, so you have all that you need to make an informed choice.
Data Source

Zomato has share its API to the developers through a separate website
developers.zomato.com where we need to register and get a unique API key. Through
this API key we can make a web service call to Zomato to collect the relevant data.
The number of web services call is limited to 1000 calls per unique API key.

Figure-1. Zomato API


Tools & Techniques

We have mainly used R and Python to do data cleaning and for running NLP (Natural
Language Processing) algorithms. Tableau and excel is mainly used to draw graphs
and charts. Data collection, extraction and cleaning the data for the NLP
algorithms are the most challenging work for us. The basics steps executed after
collecting data from the Zomato API is as follow.
♣ Data integration in R.
♣ Data cleaning in R and Python
♣ missing value
♣ removing unnecessary spacing, punctuation and numbers
♣ removing stop word
♣ removing special and junk characters
♣ Running classification and Aspect based model in Python
Limitations:

♣ For each restaurant there are only 20 reviews shared by the Zomato Api.
♣ Data cleaning
♣ Most of the review comments don’t have proper content, it includes Junk
characters.
♣ Users have shared images in review comments which have not been for this
project.
♣ Restaurant menu are in image format, hard to extract content from that for
Aspects.
♣ Review comments contain spelling mistakes and some regional languages as
well.

LITERATURE REVIEW

The existing work on sentiment analysis can be classified from different points of
views: technique used, view of the text, level of detail of text analysis, rating
level, etc. From a technical point of view, we identified machine learning,
lexicon-based, statistical and rule-based approaches. The machine learning method
uses several learning algorithms to determine the sentiment by training on a known
dataset. The lexicon-based approach involves calculating sentiment polarity for a
review using the semantic orientation of words or sentences in the review. The
“semantic orientation” is a measure of subjectivity and opinion in text. The rule-
based approach looks for opinion words in a text and then classifies it based on
the number of positive and negative words. It considers different rules for
classification such as dictionary polarity, negation words, booster words, idioms,
emoticons, mixed opinions etc. A Study and Comparison of Sentiment Analysis Methods
for Reputation Evaluation Statistical models represent each review as a mixture of
latent aspects and ratings. It is assumed that aspects and their ratings can be
represented by multinomial distributions and try to cluster head terms into aspects
and sentiments into ratings. Another classification is oriented more on the
structure of the text: document level, sentence level or word/feature level
classification. Document-level classification aims to find a sentiment polarity for
the whole review, whereas sentence level or word-level classification can express a
sentiment polarity for each sentence of a review and even for each word. Our study
shows that most of the methods tend to focus on a document-level classification.
Most of the solutions on review classification consider only the polarity of the
review (positive/negative) and rely on machine learning techniques. Solutions that
aim a more detailed classification of reviews (e.g., three or five star ratings)
use more linguistic features.

"When you write a review on the web you're providing a window into your own psyche
– and the vast amount of text on the web means that researchers have millions of
pieces of data about people's mind sets," said Jurafsky, whose co-authors include
Victor Chahuneau, Bryan Routledge and Noah Smith, all from Carnegie Mellon
University.
DATA DESCRIPTION AND PREPARATION
Data Collection
The reviews have been collected from the official developers.zomato.com site, which
offers download of reviews from an API generated from the site. Though it is free
to download data it is labour intensive to collect as it allows only 10 reviews per
call per API. The extracted reviews were then converted from the JSON format to
a .CSV format using an online converter, then consolidated into a Comma Separated
Value file.
Data Cleansing
Data cleaning is viewed as a series of steps where each step increases the ‘value’
of the data. It moves from the unorganised raw state and start to clean it up.
Post installation of the necessary packages in R, punctuations, numbers, HTML
links, unnecessary spaces & ‘not applicable’ (NAs), were removed from the review
text, in that order. Then the text was converted to lower case to prepare for
analysis.

The R codes used for data cleansing are below:


# remove punctuation
sample_text = gsub("[[:punct:]]", "", some_txt)
# remove numbers
sample_text = gsub("[[:digit:]]", "", some_txt)
# remove html links
sample_text = gsub("http\\w+", "", some_txt)
# remove unnecessary spaces
sample_text = gsub("[ \t]{2,}", "", some_txt)
sample_text = gsub("^\\s+|\\s+$", "", some_txt)
# lower case using try.error with sapply
sample_text = sapply(some_txt, try.error)
# remove NAs in some_txt
sample_text = some_txt[!is.na(some_txt)]
names(sample_text) = NULL
Figure-2. Code for data cleansing
R has a function called ‘tolower ()’, it makes all the functions in a string lower
case. This is helpful for term aggregation but can be harmful if you are trying to
identify proper nouns like cities. To remove punctuation from the text, it is
useful to remove punctuations especially for data from social media but can be an
issue if the need is to identify emoticons made of punctuations. Depending on the
analysis numbers should be removed, we are not trying to deduce currencies, text
mine quantities so the function was used since numbers hold no value in the
analysis.
A function to Strip white spaces was very useful in removing extra tab, lines and
unnecessary white spaces in the text, especially post the remove punctuation and
remove number function.A very important function from TM is ‘removewords()’,
articles, and words which hold no interest in the analysis were removed. All of
these functions were applied to the corpus using the TMmap function.
Text Exploration

The flow chart below shows an overview of a typical data analysis project. Each
rectangle represents data in a certain state while each arrow represents the
activities needed to get from one state to the other. The first state (Raw data) is
the data as it comes in. Raw data files may lack headers, contain wrong data types
(e.g. numbers stored as strings), wrong category labels, unknown or unexpected
character encoding and so on. In short, reading such files into an R data.frame
directly is either difficult or impossible without some sort of pre-processing.
Once this pre-processing has taken place, data can be deemed Technically correct.
That is, in this state data can be read into an R data.frame, with correct names,
types and labels, without further trouble. However, that does not mean that the
values are error-free or complete. For example, an age variable may be reported
negative, an under-aged person may be registered to possess a driver's license, or
data may simply be missing. Such inconsistencies obviously depend on the subject
matter

Type checking, normalizing Fix & impute estimate & analyse


Tabulate & plot
/ /
/ /
Raw data -> Technically correct data -> Consistent data -> Statistical results ->
Formatted output

Figure-3. Steps for data text exploration


EXPLORATORY ANALYSIS

With the help of Tableau:

The negative reviews seem to be more towards poor service than poor food.
Ice creams and desserts are way more reviewed in Hyderabad than in Bangalore,
whereas Italian cuisine has got a much higher rating and reviews in Bangalore than
Hyderabad.
Reviewers have been split into 4 types based on the number of reviews they have
submitted and their regularity, there is no clear bias towards a certain food type,
all levels of foodies have submitted reviews to all types of cuisines.
Eyeballing the data tells us that a people who think a restaurant is great or who
really loved a restaurant are more likely to write a review, holding the fact that
a disappointed customer walks away quietly and never returns, true.

Figure.4 – Review Polarity based on review text heading

Figure.5 –Snippet for cuisine vs foodie level


For Bangalore and Hyderabad Review Analysis (in our data set) we have found that
most of the reviews have been shared between Chinese/Thai and Italian food. Also,
Italian food is the most popular among all the cuisine. The below figure show the
popularity of the cuisine among the reviewers in the data set.

Word Cloud formation from the review head and review text:

Figure.6 –Word cloud showing most frequent word used as review header

Figure.7 –Word cloud showing most frequent word used as review text

Clustering and Classification Algorithms used :

Hierarchical Clustering
What is Hierarchical Clustering algorithm: Each observation is a (n) data point and
each data point is a cluster the distance matrix from each data point is computed
i.e. distance of a data point from all other data points. Since each data point is
a cluster, merge two closest data points and what is created is a cluster and what
is left is n-1 data points. Now this time it is not a cluster in A data point there
are 2 clusters, now from each cluster. We again merge the 2 closest data points and
have n-2 data points and this is repeated till a singe cluster remains. The
distance between 2 data points is easier to calculate that the distance between 2
clusters, the mostly commonly used algorithm is the hierarchical clustering method.
The steps and associated terms : How to merge observations and clusters and
calculating the distance between observations/clusters. Post calculation of
distance of one data point to another, the obvious is first taken care of i.e. the
distance of one data point to itself is zero, so that is marked and made a note of.
The Euclidean distance method is used to calculate the distance between 2 data
points which is the most widely used (the Euclidean distance or Euclidean metric is
the "ordinary" (i.e. straight-line) distance between two points in Euclidean space.
With this distance, Euclidean space becomes a metric space. The associated norm is
called the Euclidean norm). Here the difference between 2 data points is taken and
squared and sum of square is taken before its square root. A Euclidean distance is
as good as a straight line distance.

Few rules of the algorithm are:


-This distance between 2 points is bound to be greater than 0.
-The distance of one data point to itself is zero.
-The distance between 2 points does not dependant on the direction of one another,
they always remain equal.
-The straight line distance is always smaller that going to any other point and
then reaching the destination point (unless the 3rd point lies on the straight line
itself).
Calculating distance between clusters: the most commonly used method of calculating
distance between 2 clusters is called ‘Centroid’. In centroid method, the distance
between two clusters is the distance between the two mean vectors of the clusters.
At each stage of the process we combine the two clusters that have the
smallest centroid distance. The mean of each cluster is calculated, for each
variable the average is calculated i.e. lets say a cluster has n observation where
each observation is indicated by k dimensions like x1, x2 ,x3…to xk, the average
for each of the dimension is calculated. The centroid now describes the whole
cluster and calculating the distance between the clusters is now easy. Now merge
the 2 nearest clusters.
How to decide on the optimal number of clusters?
Scree Plot: A scree plot displays the eigenvalues associated with a component or
factor in descending order versus the number of the component or factor. You can
use scree plots in principal components analysis and factor analysis to visually
assess which components or factors explain most of the variability in the data. The
ideal pattern in a scree plot is a steep curve, followed by a bend and then a flat
or horizontal line. Retain those components or factors in the steep curve before
the first point that starts the flat line trend. You might have difficulty
interpreting a scree plot. Use your knowledge of the data and the results from the
other approaches of selecting components or factors to help decide the number of
important components or factors. So, for our situation, we plot the number of
clusters on the x axis and RMS STD on the y axis. This will show us how within
cluster standard deviation is increasing. The whole purpose of clustering was to
create groups which should be as homogeneous within and as heterogeneous across,
check for the point on the graph till when the cluster deviation between x and y is
not a huge change, when the deviation increases significantly, that shows that we
are trying to merge something which is not homogeneous within. So, the scree plot
when RMSSTD increases sharply is the optimal to stop.

Dendogram: It produces a set of nested clusters organised as a hierarchical tree


and can be viewed a s a dendogram. The dendogram will explain why it is called a
hierarchical cluster. It us the tree like structure which gives us the sequence of
merging. Tells us how the cluster got developed, the sequence and how the distances
are occur and giving us all the possible clusters and the relative distances.

Figure.6 – Dendogram showing most frequent word used as review headings


When not to go for a Hierarchical Clustering method: In case of large data sets it
is not feasible. Usually used on data sets less than 100. In the case where the
dataset is large a non-hierarchical or a K means clustering is used

K Means Clustering

k-means clustering is a method of vector quantization, originally from signal


processing, that aims to partition n observations into k clusters in which each
observation belongs to the cluster with the nearest mean, serving as a prototype of
the cluster. This results in a partitioning of the data space into Voronoi cells.

k-means is one of the simplest unsupervised learning algorithms that solve the well


known clustering problem. The procedure follows a simple and easy way to classify a
given data set through a certain number of clusters (assume k clusters) fixed
apriori. The main idea is to define k centres, one for each cluster. These
centers should be placed in a cunning way because of different location causes
different result. So, the better choice is to place them as much as possible far
away from each other. The next step is to take each point belonging to a given data
set and associate it to the nearest center. When no point is pending, the first
step is completed and an early group age is done. At this point we need to re-
calculate k new centroids as barycenter of the clusters resulting from the previous
step. After we have these k new centroids, a new binding has to be done between the
same data set points and the nearest new center. A loop has been generated. As a
result of this loop we may notice that the k centers change their location step by
step until no more changes  are done or  in  other words centers do not move any
more.

Algorithmic steps for k-means clustering


Let  X = {x1,x2,x3,……..,xn} be the set of data points and V = {v1,v2,…….,vc} be the
set of centers.
1) Randomly select ‘c’ cluster centers.
2) Calculate the distance between each data point and cluster centers.
3) Assign the data point to the cluster center whose distance from the cluster
center is minimum of all the cluster centers.
4) Recalculate the new cluster center using:  

where, ‘ci’ represents the number of data points in ith cluster.

5) Recalculate the distance between each data point and new obtained cluster
centers.
6) If no data point was reassigned then stop, otherwise repeat from step 3).
 

Naïve Bayes:

In simple terms, a naive Bayes classifier assumes that the presence (or absence) of
a particular feature of a class is unrelated to the presence (or absence) of any
other feature, given the class variable. It's called naive because it makes the
assumption that all attributes are independent of each other.
Naïve Bayes has been studied from the 1950s, and is still a very popular method for
text categorization. It is a simple technique for constructing classifiers: models
that assign class labels to problem instances, represented as vectors
of feature values, where the class labels are drawn from some finite set. It is not
a single algorithm for training such classifiers, but a family of algorithms based
on a common principle: all naive Bayes classifiers assume that the value of a
particular feature is independent of the value of any other feature, given the
class variable. For example, a fruit may be considered to be an apple if it is red,
round, and about 10 cm in diameter. A naive Bayes classifier considers each of
these features to contribute independently to the probability that this fruit is an
apple, regardless of any possible correlations between the colour, roundness, and
diameter features.
For some types of probability models, naive Bayes classifiers can be trained very
efficiently in a supervised learning setting. In many practical applications,
parameter estimation for naive Bayes models uses the method of maximum likelihood;
in other words, one can work with the naive Bayes model without accepting Bayesian
probability or using any Bayesian methods.
Despite their naive design and apparently oversimplified assumptions, naive Bayes
classifiers have worked quite well in many complex real-world situations. In 2004,
an analysis of the Bayesian classification problem showed that there are sound
theoretical reasons for the apparently implausible efficacy of naive Bayes
classifiers. Still, a comprehensive comparison with other classification algorithms
in 2006 showed that Bayes classification is outperformed by other approaches, such
as boosted trees or random forests.
Support Vector Machine

“Support Vector Machine” (SVM) is a supervised machine learning algorithm which can
be used for both classification or regression problem. However, it is mostly
used in classification problems. In this algorithm, we plot each data item as a
point in n-dimensional space (where n is number of features you have) with the
value of each feature being the value of a particular coordinate. Then, we perform
classification by finding the hyper-plane that differentiate the two classes very
well (look at the below snapshot).

Support Vectors are simply the co-ordinates of individual observation. Support


Vector Machine is a frontier which best segregates the two classes (hyper-plane/
line).
Predicting polarity of the review comments using Naïve Bayes and SVM

Classification of Review Texts

To classify reviews in different class (positive and negative) we build our


classifier using a library of Python called NLTK. NLTK is a very powerful and most
useful library in Python which provides many classification algorithms NLTK also
include tools for classification, clustering, regression and visualization. To
install NLTK we simply use on line command in python which is ‘pip install nltk’.
In order to build our classifier, we use 2 in-build classifiers which come in NLTK
library, which are:
Σ Naïve-Bayes Classifier
Σ Support Vector Machine (SVM)
The reason we are using two classifiers, so that we can get the more reliable
output. To use these classifiers, we write a script in Python, in which we first
import the classifier and then we pass the training set to each classifier.

Feature Extraction
As mentioned, training and test data have been collected from the Zomato API. We
have 4000 restaurant reviews from 2 cities Bengaluru, & Hyderabad. We have divided
the reviews into positive and negative reviews based on the ratings given by the
users. The reviews which are associated with a rating above 3 have been taken as a
positive reviews and the reviews associated with rating less than 3 have been taken
as a negative reviews. The training data set has 750 positive reviews and 750
negative reviews, the remaining reviews have been used as the test data set.
Both the training and test data must be represented in the same order for learning.
One of the ways that data can be represented is feature-based. By features, it is
meant that some attributes that are thought to capture the pattern of the data are
first selected and the entire dataset must be represented in terms of them before
it is fed to a machine learning algorithm. Different features such as n-gram
presence or n-gram frequency, POS (Part of Speech) tags, syntactic features, or
semantic features can be used. For example, one can use the keyword lexicons as
features. Then the dataset can be represented by these features using either their
presence or frequency.
Feature vector plays a very important role in classification and helps to determine
the working of the built classifier. Feature vector also help in predicting the
unknown data sample. There are many types of feature vectors, but in this process,
we used the unigram and the bigram approach. Each review word have been added to
generate the feature vectors. The presence/absence of sentimental word helps to
indicate the polarity of the sentences. We create a python script to extract the
features from the training data. Code snippet for extracting features is shown in
Figure 7.

Figure 7. Code for extracting features from review text

Once we extracted the features from training data, they were then passed through
our classifier. A script written in python was used to pass training sets to the
classifier. Once, the classifier is trained we can also check the accuracy of each
classifier by passing the testing set. Sample script of training and testing of
classifier is shown in Figure 8.
Figure 8. Sample code for training classifier
The evaluation of the model is done using cross-validation. For cross validation,
at first the positive and negative features which are extracted from the reviews
are combined and then it is randomly shuffled. This is done mainly because in
cross-validation if the shuffling is not done then the test sets might contain only
negative or only positive review text data. To build up a test set having fairly or
random distribution of both positive and negative feature the set is shuffled 5
times. The code below indicates the folds. n = 5 means 5-fold cross-validation.

Figure 9. Sample code for n-fold validation


Below is the description of the different sets of train data.
1. Single Fold Train data with single word extractions with and without removals
of stop words: - This train data set contains a mix of positive and negative
reviews with single word extraction and randomly shuffled only once. The train set
is again classified in sample, in which no stop words are removed and another
sample with stop words removed.
2. Single Fold Train data with bigram feature extractions with and without
removals of stop words: - This train data set contains mix of positive and negative
reviews with bigram word extraction and randomly shuffled only once. The train set
is again classified in sample in which no stop words are removed and another sample
with stop words removed.
3. N Fold Train data with single word feature extractions with and without
removals of stop words: - This train data set contains mix of positive and negative
reviews with single word extraction and randomly shuffled five times. The train set
is again classified in sample in which no stop words are removed and another sample
with stop words removed.
4. N Fold Train data with bigram feature extractions with and without removals
of stop words: - This train data set contains mix of positive and negative reviews
with bigram extraction and randomly shuffled five times. The train set is again
classified in the sample in which no stop words are removed and another sample with
stop words removed.
Results and Analysis
In this section, we are going to show various results that we have achieved in our
implementation. The below table shows the results for the test data - tested on the
model with different features for the training model as mentioned above.

SINGLE FOLD RESULT (Naive Bayes)


SINGLE FOLD RESULT (SVM)
N-FOLD CROSS VALIDATION RESULT (Naive Bayes)
N-FOLD CROSS VALIDATION RESULT (SVM)
accuracy
0.787375415
0.873754153
0.88981289
0.874012474
precision
0.797302381
0.849561025
0.867899726
0.856430726
recall
0.843242207
0.872317322
0.891914914
0.849358353
f-measure
0.781827242
0.858784893
0.877009618
0.852207067

With Stop Words

accuracy
0.852159468
0.877076412
0.906860707
0.88981289
precision
0.839616773
0.853890082
0.889465812
0.878829495
recall
0.890572305
0.870490547
0.899685505
0.862341099
f-measure
0.844331002
0.861129469
0.893929583
0.869597673

bigram
accuracy
0.810631229
0.893687707
0.887318087
0.895218295
precision
0.807653241
0.875606428
0.864492936
0.890466027
recall
0.855978538
0.879790495
0.897148814
0.862744523
f-measure
0.803328862
0.877642276
0.875646631
0.87425202

bigram with stop words

accuracy
0.828903654
0.895348837
0.900207900
0.893555093
precision
0.820006641
0.880854236
0.877991014
0.888068313
recall
0.869328053
0.875332141
0.905064000
0.860037066
f-measure
0.820934229
0.878002413
0.888529565
0.871886306

Figure 10. Results from classifiers on test data

Definition of Terms used in the results tables


Classifier Accuracy
Classifier Accuracy, or recognition rate: percentage of test set tuples that are
correctly classified
Accuracy = (TP + TN)/All
Classifier Precision
Precision measures the exactness of a classifier. A higher precision means less
false positives, while a lower precision means otherwise. This is often at odds
with recall, as an easy way to improve precision is to decrease recall.
Classifier Recall
Recall measures the completeness, or sensitivity, of a classifier. Higher recall
means less false negatives, while lower recall means more false negatives.
Improving recall can often decrease precision because it gets increasingly harder
to be precise as the sample space increases.
F-measure Metric
Precision and recall can be combined to produce a single metric known as F-measure,
which is the weighted harmonic mean of precision and recall. F-measure as useful as
accuracy. Or in other words, compared to precision & recall, F-measure is mostly
useless, as you’ll see below.
Measuring Precision and Recall of a Naive Bayes Classifier
The NLTK package has a metrics module which provides functions for calculating all
three-metrics mentioned above. But to do so, we need to build 2 sets for each
classification label: a reference set of correct values, and a test set of observed
values. We have collected reference values and observed values for each label
(positive or negative), then use those sets to calculate the precision, recall, and
F-measure for the classifier. The actual values collected are simply the index of
each feature set using enumerate.
Inference from Results
From the result table given above we can see that accuracy is pretty good for the
all the classifiers and gradually increasing with introduction of more features in
the training set to train the model.
The precision and recall is also on the higher side which means we are able to
predict most of the true positives and true negative values and lesser false
positive and false negative values which is pretty valuable in terms of classifying
a review comment being positive or negative. The precision and recall value also
varies with the variation of testing on models trained with different training
sets.
The SVM classifier has shown better results than Naïve Bayes classifier in all of
the scenarios, possibly because the features extracted in the training set is
better suited for the SVM model than Naïve Bayes.

Aspect Based Analysis of Review text

RATED 4.0
XXXXXXXX->Reviewer Name
45 Reviews , 110 Follower
 
 
 
 
Visited today. Barbeque nation is a usual place. I have visited several branches
all over India.. the food was as usual good. We ordered cocktails and Mocktails,
which was good either. 
 
We had a very unusual problem today. App in my mobile was not at all working. It
was updated to latest version in iOS. Then o tried calling their toll free number
to reserve. It is connecting with only Gujarat BBQ nation where I lived some years.
I tried hard to get the t nagar branch number to do the booking. I guess the
technological transformation of your system had done serious flaws which you need
to address immediately for better experience. Hope you rectify it soon. There is no
complaint against food, it was great as usual.NO complains about Service.
 
So I'm reducing the rating for the booking experience not for the food.

Positive
Neutral
Negative

Figure 11. Example of review text used in Aspect based Analysis

Consider a typical restaurant review shown in the above snapshot. This review
discusses multiple aspect of the restaurant, such as room condition, food,
experience and service, but the reviewer only gives an overall rating for the
restaurant; without an explicit rating on each aspect, a user would not be able to
easily know the reviewer's opinion on each aspect. Even though he has rated food
and ambience to be great, but he doesn’t like the service on that day, and because
of that the rating turned out to be low.
From the above sentence, we can extract different aspects and get more insights,
for example if the reviewer might like conveniences like valet-parking, he might
not necessarily be price conscious, so those users tend to express their views in
the comments. So, it very important to conduct a text based analysis on the
different aspects for meaningful insights.
Extracting Aspects from reviews using Boot-strapping Method:

Since its restaurant review related data, we assume that only keywords are required
to describe the specific aspects, we have referred the boot-strapping method from
the journal below:
https://pdfs.semanticscholar.org/6ff5/05e63ffebf419736d6c65741ee63b3ea720e.pdf

Algorithm: Aspect Segmentation


Input : A collection of reviews {d1; d2; : : : ; djDg)}, set of aspect keywords
{T1; T2; : : : ; Tk}, vocabulary V, selec-tion threshold p and iteration step limit
I.
Output : Reviews split into sentences with aspect assignments.

Step 0: Split all reviews into sentences, X ={x1; x2; : : : ; xM};

Step 1: Match the aspect keywords in each sentence of X and record the matching
hits
for each aspect i in
Count(i);
Step 2: Assign the sentence an aspect label by
ai =argmaxi Count(i). If there is a tie, assign the sentence with multiple aspects.
Step 3: Calculate Â2 measure of each word (in V);
Step 4: Rank the words under each aspect with respect to their Â2 value and join
the top p words for each aspect into their corresponding aspect
keyword list Ti;
Step 5: If the aspect keyword list is unchanged or iteration exceeds I, go to Step
6, else go to Step 1;
Step 6: Output the annotated sentences with aspect assignments.
Figure 12. Algorithm for Aspect based Analysis

Task 1: We ran the above algorithm with defined set aspects key words for our
dataset and with the review text. Please refer below for the screenshot, we had
considered the food, ambience and service as the main aspects. Extracting the
Aspect ‘food’ was a challenge, since most of the reviewers quote the name of the
dish in their review text, we needed to have the menus of all 110 restaurants in
our analysis which was not feasible. We worked around this by having a sample menu
only for Italian cuisine in the ‘Food’ Aspect.

Figure 13. Snippet of Aspects used for Aspect based Analysis


Task 2: Calculate the dependencies between aspects and words by Chi-Square
statistic, and include the words with high dependencies into the corresponding
aspect keyword list. Its defined as follows in snapshot:

C1: C1 is the number of times w occurs in sentences belonging to aspect Ai


C2: is the number of times w occurs in sentences not belonging to Ai,
C3: is the number of sentences of aspect Ai that do not contain w,
C4: is the number of sentences that neither belong to aspect Ai, nor contain word
w, and
C: is the total number of word occurrences.
Task 3: Coding the algorithm. Please find the code snippet used for extracting the
aspect based on keywords,

Figure 13. Snippet of the code applying the algorithm for Aspect based Analysis
After running the code above we will get a .csv file which will contain aspects and
the review text that fits well for that aspect based on the feature words and
weight passed to the Chi-Square test. A review text can not only be mapped to one
aspect also can also be done for two or more aspects based on the feature words
used in the review comments and how much that sentence is closer to that particular
Aspect.

Figure 14. Snippet of the resulted excel file with different aspects mapped
sentences

Task 4: Most important sentences about a particular aspect using LexRank. To


achieve this, we used the below R -code snippet
Step 1 : Clean the Data set
Step 2: Convert the corpus into the term Document Matrix.
Def: A document-term matrix or term-document matrix is a mathematical
matrix that describes the frequency of terms that occur in a collection of
documents. In a document-term matrix, rows correspond to documents in the
collection and columns correspond to terms.
Step 4: Calculate the Weight of the term using weightTfIdf {tm}

Def: Weight a term-document matrix by term frequency - inverse document


frequency
Step 5: Compute similarity between sentences and store in d [nxn] matrix
Step 6: Compute page rank recursively using page rank function and display top-5
sentences with highest page rank
Def: Calculates the Google PageRank for the specified vertices.

Figure 16. Snippet of the R-code used for LexRanking

Output :

Figure 16. top 5 sentences in output of LexRanking Algorithm

Recommendation and Applications

Recommendation:

By extracting aspects from each review sentence , we can use this to calculate the
weight of each aspect against the reviewers rating for that restaurant , to score
the overall sentiment for the restaurant across different aspects like ambience,
food and service.
To calculate the weight we can use the Latent Rating Regression Model and the same
to predict the overall weight based on the Word Frequency with given pre-defined
keywords.

Applications:

Aspect-level analysis for all restaurants


As discussed above, the importance of aspect extraction, we can use the
same to get detailed sentiment of restaurants for different aspects. This will be
helpful to restaurants to understand the consumers behaviours and expectation on
different categories
Reviewer-aspects level
Getting insights about different the aspects, we can use the same to
target different clusters of customers.
Competitor prediction
While reviewing the restaurants, some of the users tend to compare the
menu and experience with other restaurants. This can used be as the major factors
to know competition better.
Conclusion:

In this project we extracted the overall sentiment score for each


review text and measured the sentiment with Naive basis and SVM. We extracted the
Aspects from each review based on pre-defined keywords. This project gave as a
great hands-on experience to the process of text mining and understanding basics
behind NLP. This project can be applied to other domains also. We are very eager to
explore more on this with big date sets and get the application live.

Bibliography:

1. M. Hu and B. Liu. Mining and summarizing customer reviews. In W. Kim, R.


Kohavi, J. Gehrke, and W. DuMouchel, editors, KDD, pages 168{177. ACM, 2004.
2. K. Jarvelin and J. Kekalainen. IR evaluation methods for retrieving highly
relevant documents. In Proceedings of SIGIR'00, pages 41{48. ACM, 2000.
3. N. Jindal and B. Liu. Identifying comparative sentences in text documents. In
Proceedings of SIGIR '06, pages 244{251, New York, NY, USA, 2006. ACM.
4. H. Kim and C. Zhai. Generating Comparative Summaries of Contradictory
Opinions in Text. In Proceedings of CIKM'09, pages 385{394, 2009.
5. S. Kim and E. Hovy. Determining the sentiment of opinions. In Proceedings of
COLING, volume 4, pages 1367{1373, 2004.
6. [H. Zang, “The optimality of Naïve-Bayes”, Proc. FLAIRS, 2004

7. C.D. Manning, P. Raghavan and H. Schütze, “Introduction to Information


Retrieval”, Cambridge University Press, pp. 234-265, 2008
8. A. McCallum and K. Nigam, “A comparison of event models for Naive Bayes text
classification”, Proc. AAAI/ICML-98 Workshop on Learning for Text Categorization,
pp. 41-48, 1998
9. “Support Vector Machines”,
http://scikitlearn.org/stable/modules/svm.html#svm-classification
10. P. Pang, L. Lee and S. Vaithyanathan, “Thumbs up? sentiment classification
using machine learning techniques”, Proc. ACL-02 conference on Empirical methods in
natural language processing, vol.10, pp. 79-86, 2002
11. P. Pang and L. Lee, “Opinion Mining and Sentiment Analysis. Foundation and
Trends in Information Retrieval”, vol. 2(1-2), pp.1-135, 2008
12. https://en.wikipedia.org/wiki
13. https://www.zomato.com/about

You might also like