Download as pdf or txt
Download as pdf or txt
You are on page 1of 34

Restaurant Rating Prediction

A Report Submitted
in Partial Fulfillment of the Requirements
for the Degree of
Bachelor of Technology
in
Computer Science & Engineering

by
Shivam Anand,Satyam Singhal,Somesh Gupta,Uddesh Singh
Group: CS-17

to the
COMPUTER SCIENCE AND ENGINEERING DEPARTMENT
MOTILAL NEHRU NATIONAL INSTITUTE OF TECHNOLOGY
ALLAHABAD , PRAYAGRAJ INDIA
May, 2019
UNDERTAKING

I declare that the work presented in this report ti-


tled “Restaurant Rating Prediction”, submitted to the
Computer Science and Engineering Department, Motilal
Nehru National Institute of Technology, Allahabad, for
the award of the Bachelor of Technology degree in
Computer Science & Engineering , is my original work. I
have not plagiarized or submitted the same work for the award
of any other degree. In case this undertaking is found incorrect,
I accept that my degree may be unconditionally withdrawn.

May, 2019
Allahabad
Shivam Anand (20164050)

Satyam Singhal (20164047)

Somesh Gupta (20164161)

Uddesh Singh (20164141)

ii
CERTIFICATE

Certified that the work contained in the report titled “Restau-


rant Rating Prediction”, by Shivam Anand,Satyam Singhal,
Somesh Gupta, Uddesh Singh, has been carried out under my
supervision and that this work has not been submitted elsewhere
for a degree.

(Dr. Dinesh Singh)


Computer Science and Engineering Dept.
M.N.N.I.T. Allahabad

May, 2019

iii
Preface

The well-being of many business today heavily rely on the positive ratings given
by their customers. Our project enables any company or individual to know in
advance about the success of their future restaurant by providing to their customer
as the input and know their predicted rating that their customer might give them in
future for their facilities. This enables the company to know in advance the list of
features which will result in absolute failure of their restaurant and which will load
in flourishment of their restaurant.

iv
Acknowledgment

First and foremost, we would like to take this opportunity to express our sincerest
gratitude to our mentor, Dr. Dinesh Singh for his continuous support, constant
guidance and without whose motivation, words of criticism and patience at our
failures, this work would not have been as it stands today. This project work comes
from his helpful discussions, supervision and meticulous attention to details. We
would like to extend our heartfelt gratitude for support and guidance. Lastly, we
would like to thank our colleagues and seniors for supporting us and enabling us to
take a different approach to our project. We would also like to express our deepest
gratitude to our parents and family whose love and support was always our greatest
rock. Lastly we would like to thank the Almighty God for everything and pray for
his continued blessings.

v
Abstract

Yelp has played a crucial role in influencing business success as it provides public
information on the overall quality of businesses to customers. Using the Yelp open
dataset from the Yelp Dataset Challenge, we extracted restaurant attributes and
unigrams and bigrams from reviews to use as features for classification and regression
to predict the star rating of restaurants. The algorithms used for prediction were
linear regression, SVR, SVM, and perceptron neural networks. Analysis on the
test set shows that neural networks and SVM performed the best with classification
accuracies of 48% and 42% respectively, which are about 4 times better than random
guessing as we are dealing with a 9-class classification problem. We found that
textual features which includes bigrams and unigrams were best in achieving low
classification error during prediction compared to restaurant attributes.

vi
Contents

Preface iv

Acknowledgment v

Abstract vi

1 Introduction 1

2 Related Work 2

3 Library Used 3
3.1 Pandas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3.2 Numpy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3.3 Scikit-Learn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

4 Proposed Work 5
4.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4.1.1 Based On Restaurant Attributes . . . . . . . . . . . . . . . . . 5
4.1.2 Based On Restaurant Reviews . . . . . . . . . . . . . . . . . . 8
4.2 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.2.1 Based On Restaurant Attributes . . . . . . . . . . . . . . . . . 9
4.2.2 Based On Restaurant Reviews . . . . . . . . . . . . . . . . . . 9
4.3 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.3.1 Based On Restaurant Attributes . . . . . . . . . . . . . . . . . 10
4.3.2 Based On Restaurant Reviews . . . . . . . . . . . . . . . . . . 10

vii
4.4 How to fit data in the model for prediction . . . . . . . . . . . . . . . 12
4.4.1 Train-Test Split . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.4.2 Cross Validation . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.5 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.5.1 Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.5.2 Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.6 Hyperparameter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.6.1 Based on Restaurant Attributes . . . . . . . . . . . . . . . . . 19
4.6.2 Based on Restaurant Reviews . . . . . . . . . . . . . . . . . . 20

5 Result and Discussion 21


5.1 Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.1.1 Based on Restaurant Attributes . . . . . . . . . . . . . . . . . 21
5.1.2 Based on Restaurant Reviews . . . . . . . . . . . . . . . . . . 22
5.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

6 Conclusion and Future Work 24


6.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

References 26

viii
Chapter 1

Introduction

The well-being of many businesses today heavily rely on the positive ratings given
by their customers. With the founding of Yelp in 2004, the relationship between
businesses and their customers has become more dynamic. Many businesses for
example, offer special deals for visitors using Yelp,and previous visitors offer valuable
advice for future customers based on their experience such as recommendations and
warnings on what to purchase. In this project, we will utilize the Yelp public dataset
to analyze the success of restaurants. In particular, we will predict the star ratings
of restaurants and find the most useful traits in determining their success. This task
is important as it will allow new businesses with limited customer input to have
a better idea of how well they will perform in the long run. This prediction will
give restaurants the opportunity to improve their services at an earlier stage in their
business. For our input features, we will use a restaurants characteristics (hours
open, food category, review count, etc.) and unigrams and bigrams extracted from
customer reviews. These features are inputted into our preprocessing model used
for classification. The output for these algorithms is a star rating prediction of each
restaurant.

1
Chapter 2

Related Work

There have been many works dedicated to analyzing the success of businesses based
the Yelp dataset.

• One interesting method that has been used focuses on extracting subtopics
from Yelp reviews and predicting a star rating for each subtopic . Using an
online Latent Dirichlet Allocation algorithm and Expectation Maximization,
reviews were grouped into topics such as service, healthiness,lunch, etc., and
a rating was assigned to each topic.

• Another interesting study predicted business ratings by adopting a latent fac-


tor model for a business and its geographical neighbours . It was shown that
there was a weak positive correlation between a businesss ratings and its neigh-
bors ratings. By incorporating geographical information, it was shown that
the proposed methods had an improved rating prediction accuracy.

• One paper attempted to predict business star ratings based on business at-
tributes such as noise level, smoking options, price range, etc. Linear regres-
sion, decision trees, and neural networks were used to predict the business
rating .

2
Chapter 3

Library Used

3.1 Pandas
In computer programming, pandas is a software library written for the Python
programming language for data manipulation and analysis. In particular, it offers
data structures and operations for manipulating numerical tables and time series. It
is free software released under the three-clause BSD license.The name is derived from
the term ”panel data”, an econometrics term for data sets that include observations
over multiple time periods for the same individuals.

• DataFrame object for data manipulation with integrated indexing.

• Tools for reading and writing data between in-memory data structures and
different file formats.

• Data alignment and integrated handling of missing data.

3.2 Numpy
NumPy is a library for the Python programming language, adding support for large,
multi-dimensional arrays and matrices, along with a large collection of high-level
mathematical functions to operate on these arrays. The ancestor of NumPy, Nu-
meric, was originally created by Jim Hugunin with contributions from several other

3
developers. In 2005, Travis Oliphant created NumPy by incorporating features of
the competing Numarray into Numeric, with extensive modifications. NumPy is
open-source software and has many contributors.

3.3 Scikit-Learn
Scikit-learn (formerly scikits.learn) is a free software machine learning library for
the Python programming language. It features various classification, regression and
clustering algorithms including support vector machines, random forests, gradient
boosting, k-means and DBSCAN, and is designed to interoperate with the Python
numerical and scientific libraries NumPy.Scikit-learn is largely written in Python,
with some core algorithms written in Cython to achieve performance. Support vector
machines are implemented by a Cython wrapper around LIBSVM; logistic regression
and linear support vector machines by a similar wrapper around LIBLINEAR.

4
Chapter 4

Proposed Work

We are working on two criteria to predict the possible ratings :

• Business attributes and finding the best attributes for success.

• Sentiment analysis on textual reviews.

4.1 Dataset

4.1.1 Based On Restaurant Attributes


The dataset used in this project came from the Yelp Dataset Challenge in the form
of JSON files. The dataset contained JSON files of various businesses. From this,
the restaurants were extracted.Given each restaurant, the initial data gave us the
restaurants attributes. The total amount of data included 3164 businesses.

Note: the following example contain inline comments, which are technically not
valid JSON. This is done here to simplify the documentation and explaining the
structure, the JSON files you download will not contain any comments and will be
fully valid JSON.

5
Figure 1: Business.json

6
Figure 2: Business.json

7
4.1.2 Based On Restaurant Reviews

Figure 3: Reviews.json

8
4.2 Preprocessing

4.2.1 Based On Restaurant Attributes


• Extraction of data, city-wise :
The business categories (e.g. Restaurant, Home Cleaning) are in a list in the
’categories’ column, which isn’t easy to parse. By one-hot encoding them,
we can more easily filter the dataset by category and extract the businesses
falling in our desired category i.e. Restaurant .We got 1361 restaurants from
the dataset.

• Extracting individual attributes and sub-attributes as features :


We also need to one-hot encode the business attributes and their sub-attributes,
which are currently in dictionaries in the ’attributes’ column.We now get new
73 columns. While most of these new columns we just made are boolean at-
tributes, a number of them are categorical, which need some special handling.
We need to encode them so models can use them as features.Now we have new
15 columns. Attributes are going to form a significant part of our features.

• Splitting checkin-data into time slots to use as a feature :


The ’time’ column is a dictionary - we need to separate the data for each hour
into its own column.This adds 169 new columns.

• Attributes existing in only minority establishments were excluded :


Additionally features that appeared in less than 20% of the restaurants in the
training data were filtered out to prevent excessive unnecessary features.

4.2.2 Based On Restaurant Reviews


• Extraction of star rating and textual reviews
Extracting star rating and textual reviews of users from the chosen data of
Food establishments, for the city of cleveland.

9
• Grouping of data according to establishments
All reviews for a particlar establishment were weighed according to users and
their helpfulness and merged into a single field, resulting in one overall review
per establishment.

• Balancing of dataset to remove bias


The reviews were classified according to star rating and the rating with mini-
mal reviews was chosen as a baseline to keep the amount of reviews equal.

• Vectorizing and feature creation


Each review is then tokenized into unigrams and bigrams, foolowed by vector-
ization and polarization of adjectives using tf-idf.

4.3 Features

4.3.1 Based On Restaurant Attributes


So we have total of 871 columns even after preprocessing .Now we will drop the
undesired columns, so as to leave only attributes columns which are going to be the
features for our predictive models.

So we drop all the the unnecesarry non integer columns such as business name,
all various categories etc. and are left with 90 attribute columns which will serve as
feature for the models. The attributes which will serve as features finally are:
Index([ u’BikeParking’, u’BusinessAcceptsCreditCards’,u’Caters’])

4.3.2 Based On Restaurant Reviews


Computers deal with numbers much better than they do with text, so we need a
meaningful way to convert all the text data into matrices of numbers.

10
CountVectoriser
A straightforward (and oft-used) method for doing this is to count how often words
appear in a piece of text and represent each text with an array of word-frequencies.The
array would be quite large and sparse, containing one element for every possible
word.

TF-IDF
A slightly more sophisticated approach would be to use Term Frequency Inverse
Document Frequency (TF-IDF) vectors. This approach comes from the idea that
most frequent words in individual texts are important to that text but most common
words in entire dataset, such as ’the’ arent very important, while less common words
such as Namibia are more important. TF-IDF therefore normalises the count of each
word in each text by the number of times that that word occurs in all of the texts.
If a word occurs in nearly all of the texts, we deem it to be less significant. If it only
appears in several texts, we regard it as more important.

n-grams
Words often mean very different things when we combine them in different ways. We
will expect our learning algorithm to learn that a review containing the words bad
is likely to be negative, while one containing the word great is likely to be positive.
However, reviews containing phrases such as and then they gave us a full refund.
Not bad! or The food was not great will trip up our system if it only considers words
individually.

When we break a text into n-grams, we consider several words grouped together
to be a single word. The food was not great would be represented using bi-grams
as (the food, food was, was not, not great), and this would allow our system to
learn that not great is a typically negative statement because it appears in many
negative reviews.For our analysis, well stick with single words (also called unigrams)
and bigrams (two words at a time).

11
4.4 How to fit data in the model for prediction

4.4.1 Train-Test Split


We split the total dataset into training and testing data .We have taken 10% of the
total dataset as testing data. Split arrays or matrices into random train and test
subsets
Quick utility that wraps input validation and next(ShuffleSplit().split(X, y)) and
application to input data into a single call for splitting (and optionally subsampling)
data in a oneliner.

4.4.2 Cross Validation


Cross-validation is a resampling procedure used to evaluate machine learning models
on a limited data sample.

The procedure has a single parameter called k that refers to the number of groups
that a given data sample is to be split into. As such, the procedure is often called
k-fold cross-validation. When a specific value for k is chosen, it may be used in place
of k in the reference to the model, such as k=10 becoming 10-fold cross-validation.

Cross-validation is primarily used in applied machine learning to estimate the


skill of a machine learning model on unseen data. That is, to use a limited sample
in order to estimate how the model is expected to perform in general when used to
make predictions on data not used during the training of the model.

It is a popular method because it is simple to understand and because it gener-


ally results in a less biased or less optimistic estimate of the model skill than other
methods, such as a simple train/test split.

12
4.5 Models

4.5.1 Regression

Linear Regression
In statistics, linear regression is a linear approach to modelling the relationship be-
tween a scalar response and one or more explanatory variables .The case of one
explanatory variable is called simple linear regression. For more than one explana-
tory variable, the process is called multiple linear regression.This term is distinct
from multivariate linear regression, where multiple correlated dependent variables
are predicted, rather than a single scalar variable.In linear regression, the relation-
ships are modeled using linear predictor functions whose unknown model parameters
are estimated from the data. Such models are called linear models. Most commonly,
the conditional mean of the response given the values of the explanatory variables
(or predictors) is assumed to be an affine function of those values; less commonly,
the conditional median or some other quantile is used.

Explanation:
Given a data set {yi , xi1 , . . . , xip }ni=1
of n statistical units, a linear regression model assumes that the relationship
between the dependent variable y and the p-vector of regressors x is linear. This
relationship is modeled through a disturbance term or error variable an unobserved
random variable that adds ”noise” to the linear relationship between the dependent
variable and regressors. Thus the model takes the form
Yi = β0 1 + β1 xi1 + · · · + βp xip + ∈i = xTi β+ ∈i , i = 1, . . . , n
where T denotes the transpose, so that xiT is the inner product between vectors
xi and . Often these n equations are stacked together and written in matrix notation
as
Y = Xβ+ ∈ where

13
Figure 4: Linear Regression Eq.

In linear regression, the observations (red) are assumed to be the result of ran-
dom deviations (green) from an underlying relationship (blue) between a dependent
variable (y) and an independent variable (x).

Figure 5: Linear Regression

14
• Polynomial Features
Generate polynomial and interaction features. Generate a new feature matrix con-
sisting of all polynomial combinations of the features with degree less than or equal
to the specified degree.The number of features in the output array scales polynomi-
ally in the number of features of the input array, and exponentially in the degree.
High degrees can cause overfitting. The method works on simple estimators as
well as on nested objects (such as pipelines).For example, if an input sample is two
dimensional and of the form [a, b], the degree-2 polynomial features are [1, a, b,
a2 , ab, b2 ].

• Ridge regularization
This model solves a regression model where the loss function is the linear
least squares function and regularization is given by the l2-norm. Also known
as Ridge Regression or Tikhonov regularization. This estimator has built-
in support for multi-variate regression (i.e., when y is a 2d-array of shape
[n samples, n targets]).The method works on simple estimators as well as on
nested objects (such as pipelines).

Decision Tree
A decision tree is a decision support tool that uses a tree-like model of decisions and
their possible consequences, including chance event outcomes, resource costs, and
utility. It is one way to display an algorithm that only contains conditional control
statements. Decision trees are commonly used in operations research, specifically in
decision analysis, to help identify a strategy most likely to reach a goal, but are also
a popular tool in machine learning. A decision tree is a flowchart-like structure in
which each internal node represents a ”test” on an attribute (e.g. whether a coin flip
comes up heads or tails), each branch represents the outcome of the test, and each
leaf node represents a class label (decision taken after computing all attributes).
The paths from root to leaf represent classification rules. In decision analysis, a
decision tree and the closely related influence diagram are used as a visual and
analytical decision support tool, where the expected values of competing alternatives

15
are calculated. A decision tree consists of three types of nodes:

1. Decision nodes : typically represented by squares

2. Chance nodes : typically represented by circles

3. End nodes : typically represented by triangles

4.5.2 Classifier

Logistic Regression
In statistics, the logistic model (or logit model) is a widely used statistical model
that in its basic form uses a logistic function to model a binary dependent vari-
able, although many more complex extensions exist. In regression analysis, logistic
regression (or logit regression) is estimating the parameters of a logistic model .
Mathematically, a binary logistic model has a dependent variable with two possible
values, such as pass/fail, win/lose, alive/dead or healthy/sick; these are represented
by an indicator variable, where the two values are labeled ”0” and ”1”. In the lo-
gistic model, the log-odds for the value labeled ”1” is a linear combination of one
or more independent variables (”predictors”); the independent variables can each
be a binary variable (two classes, coded by an indicator variable) or a continuous
variable (any real value). The corresponding probability of the value labeled ”1” can
vary between 0 (certainly the value ”0”) and 1 (certainly the value ”1”), hence the
labeling; the function that converts log-odds to probability is the logistic function,
hence the name. The unit of measurement for the log-odds scale is called a logit,
from logistic unit, hence the alternative names. Analogous models with a different
sigmoid function instead of the logistic function can also be used, such as the probit
model; the defining characteristic of the logistic model is that increasing one of the
independent variables multiplicatively scales the odds of the given outcome at a
constant rate, with each dependent variable having its own parameter; for a binary
independent variable this generalizes the odds ratio.

16
Logistic regression is basically a supervised classification algorithm. In a classi-
fication problem, the target variable(or output), y, can take only discrete values for
given set of features(or inputs), X.
We can also say that the target variable is categorical. Based on the number of
categories, Logistic regression can be classified as:

1. Binomial: Target variable can have only 2 possible types: 0 or 1 which may
represent win vs loss, pass vs fail, dead vs alive, etc.

2. Multinomial: Target variable can have 3 or more possible types which are not
ordered(i.e. types have no quantitative significance) like disease A vs disease
B vs disease C.

3. Ordinal: it deals with target variables with ordered categories. For example,
a test score can be categorized as:very poor, poor, good, very good. Here, each
category can be given a score like 0, 1, 2, 3.

SVM
Support Vector Machine (SVM) is a supervised machine learning algorithm which
can be used for both classification or regression challenges. However, it is mostly
used in classification problems. In this algorithm, we plot each data item as a point
in n-dimensional space (where n is number of features you have) with the value of
each feature being the value of a particular coordinate. Then, we perform classifi-
cation by nding the hyperplane that differentiate the two classes very well(look at
the below snapshot).

17
Figure 6: Support Vector Machine

KNN
In pattern recognition, the k-nearest neighbors algorithm (k-NN) is a non-parametric
method used for classification and regression.In both cases, the input consists of the
k closest training examples in the feature space.

• In kNN classification, the output is a class membership. An object is classified


by a plurality vote of its neighbors, with the object being assigned to the class
most common among its k nearest neighbors (k is a positive integer, typically
small). If k = 1, then the object is simply assigned to the class of that single
nearest neighbor.

• In kNN regression, the output is the property value for the object. This value
is the average of the values of k nearest neighbors.

kNN is a type of instance-based learning, or lazy learning, where the function is


only approximated locally and all computation is deferred until classification. The
k-NN algorithm is among the simplest of all machine learning algorithms.Both for
classification and regression, a useful technique can be used to assign weight to the
contributions of the neighbors, so that the nearer neighbors contribute more to the
average than the more distant ones. For example, a common weighting scheme
consists in giving each neighbor a weight of 1/d, where d is the distance to the
neighbor.

18
4.6 Hyperparameter
In machine learning, a hyperparameter is a parameter whose value is set before the
learning process begins. By contrast, the values of other parameters are derived
via training. Different model training algorithms require different hyperparameters,
some simple algorithms (such as ordinary least squares regression) require none.
Given these hyperparameters, the training algorithm learns the parameters from
the data. For instance, LASSO is an algorithm that adds a regularization hyperpa-
rameter to ordinary least squares regression, which has to be set before estimating
the parameters through the training algorithm.

4.6.1 Based on Restaurant Attributes

HYPERPARAMETERS APPLIED TO TRAIN REGRESSION


MODELS
Restaurant Characteristics
normalize(LinearReg) False
max features(DTReg) 0.5
max depth(DTReg) 4
C(SVR) 1.0
Gamma(SVR) 0.01

HYPERPARAMETERS APPLIED TO TRAIN CLASSIFICATION


MODELS
Restaurant Characteristics
n neighbours(KNN) 3
n jobs(KNN) -1
weights(KNN) distance
leafs ize(KN N ) 2
algorithm(KNN) ball tree

19
Restaurant Characteristics
kernel(SVC) rbf
C(SVC) 6
penalty(LogReg) 12
C(LogReg) 0.001

4.6.2 Based on Restaurant Reviews

HYPERPARAMETERS APPLIED TO TRAIN REGRESSION


MODELS
Restaurant Review Texts
fit intercept(LinearReg) True
normalize(LinearReg) False
max features(DTReg) 0.5
max depth(DTReg) 8
C(SVR) 1000.0
Gamma(SVR) 1.0

HYPERPARAMETERS APPLIED TO TRAIN CLASSIFICATION


MODELS
Restaurant Review Texts
n neighbours(KNN) 3
leaf size(KNN) 2
Penalty(LogisticReg) 11
C(LogisticReg) 1.0
C(SVC) 100.0

20
Chapter 5

Result and Discussion

5.1 Result

5.1.1 Based on Restaurant Attributes


Restaurants’ rating are predicted on the basis of features provided by the restau-
rants’.To predict the ratings,we are using various classifiers like logistic Regres-
sion,Support Vector Classifier,K Nearest Neighbour and Regressions like Linear Re-
gression,Decision Tree Regression ans Support Vector Regression. Classifiers’ accu-
racy score and Regressions’ rmse value are displayed in the following table.

Figure 8: Models’s RMSE

21
H

Figure 7: Models’s Accuracy

5.1.2 Based on Restaurant Reviews


Restaurants’ rating are predicted on the basis of reviews given by their customers.To
predict the ratings,we are using various classifiers like Logistic Regression,Support
Vector Classifier,K Nearest Neighbour,Multinomial Naive Bayes and Regressions like
Linear Regression,Decision Tree Regression ans Support Vector Regression.Classifiers’
accuracy score and Regressions’ rmse value are displayed in the following table. Re-
modeling of problem is done by using n for negative and p for positive instead of
using 1, 2, 3, 4 and 5 for y and linearSVC is used for obtaining predicted result and
report and confusion matrix obtained looks like-

Figure 9: Model’s RMSE

22
Figure 10: Models’s Accuracy

5.2 Discussion
• As we see from the results of both cases we worked upon, Support Vector
Machine perform best among all the models.

• We can attribute this to the fact that the number of features(attributes) and
instances of training data are of comparable magnitude.

• The sparsity in our chosen data being high, also attributes to SVM being an
optimum fit.

• We see that our classifier models are more suited for textual review analysis,
and give a better accuracy score for the same.

• Regression gives better result on attributes dataset, due to the relatively less
features there ,thus enabling correlation between features easy.

• Increasing the dataset significantly improves the result.

23
Chapter 6

Conclusion and Future Work

6.1 Conclusion
• Support Vector Machine and Logistic Regression model give the most accu-
rate result among classifiers, whereas Linear Regression and Support Vector
Regressor give most close results among regressors ,with regards to our chosen
dataset.

• We found least deviated result with a RMSE of 0.7742 for regression and best
ACCURACY of around 53% for classification.

• The accuracy score of 53% suggest our models perform satisfactory, as on a


random guess the probability of a correct classification will be 20%.

• If only sentiment of a review is determined, the accuracy score shoots up to


96% for the same dataset.

6.2 Future Work


There is a lot to explore in this existing project.So,our future goal is to apply oth-
ers algorithms in order to increase the accuracy score of our predictions regarding
restaurants’ rating obtained from both the features provided by restaurants and the
reviews given by restaurants’ customers.We will try to make our system more user

24
friendly by adding a user interface which enables the user to provide features that
the user is planning to have for their restaurant as input and our system will provide
the user with a rating from 1 to 5 that they will get for their restaurant.

Future work includes the use of unsupervised learning algorithms in conjunction


to the super- vised learning algorithms. This is because with the use of algorithms
such as k-means clustering, the model is able to fit more closely to different ge-
ographic regions. Although a large k in this case would cause severe overfitting,
a reasonable k value could result in a more accurate model due to differences in
customers desires depending on the region. We also can try other classification
algorithms like naive bayes and random forests for text classification.

25
References

[1] Public data: http://www.yelp.com/dataset challenge


[2] Wang, Junyi. Predicting Yelp Star Ratings Based on Text Analysis of User Re-
views.
[3] Asghar Nabiha, Yelp Dataset Challenge: Review Rating Prediction. 2016
[4] Vo, Kang, Predicting Success of Restaurants in Las Vegas, 2016

26

You might also like