Restaurant's Feedback Analysis System Using Sentimental Analysis and Data Mining Techniques

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Proceeding of 2018 IEEE International Conference on Current Trends toward Converging Technologies, Coimbatore, India

Restaurant’s Feedback Analysis System using


Sentimental Analysis and Data Mining
Techniques
Atharva Patil Nishita S. Upadhyay
Information Technology Engineering Information Technology Engineering
Sardar Patel Institute of Sardar Patel Institute of Technology
Technology Mumbai, India
Mumbai, India nishita84@gmail.com
atharvapatil1996@gmail.com
Rupali Sawant
Karan Bheda Information Technology Department
Information Technology Engineering Sardar Patel Institute of Technology
Sardar Patel Institute of Technology Mumbai, India
Mumbai, India rupali_sawant@spit.ac.in
bhedakaran1@gmail.com

Abstract— There are many challenges faced by attitude or thought. Sentiment analysis, also known as
restaurants of which getting honest feedbacks of the opinion mining, studies people‟s sentiments towards
services have been a huge challenge but it is equally certain entities. The internet is a resourceful place with
very important to remain on the top of the current respect to sentiment information. From a user‟s
market. The current trend and fact are; people perspective, people are able to post their own content on
express their honest sentiments through social media. various social media, such as forums, micro-blogs, or
Hence, through this paper, we present a methodology online social networking sites [1]. In this paper, we are
that will quell this challenge by performing proposing analysis on the sentiment level.
sentimental analysis on the feedbacks and determine
their polarity. Followed by clustering on the positive Data mining is the process of identifying patterns within a
and negative feedbacks obtained from the previous large amount of dataset to acquire useful information. It
process, to identify the broad topics of that requires the data to be consolidated in a data warehouse.
organization that the feedbacks target. This Various data mining algorithms are applied so as to find
summarization helps the restaurants to improve their out meaningful insights within the data. These insights are
current processes based on the feedback received. The used for taking timely and accurate decisions by various
feedbacks are collected from different sources to companies, business process, market analysis, etc. Data
suffice the needs. mining is also known as data discovery and knowledge
discovery [7].
Keywords—Data Mining, Clustering, Sentiments,
Feedback Analysis, Polarity, Natural Language In today's world feedbacks for restaurants are generally
Processing. posted on various social media platforms. These
feedbacks are generally posted by users on various social
I.INTRODUCTION media platforms expressing their views about the
restaurant's services. These user views can be used to
Social Media is being used today by all kinds of people meet customer needs, the key aspect of any restaurant in
whether young or old, poor or rich. It has become a the competitive market. It is practically not possible for
platform for expressing your views towards a particular restaurants to keep track and analyse all these feedbacks.
topic or retrieve information from other posts available
online. People try to give their opinion regarding a II.PROPOSED METHODOLOGY
particular topic on social media without any hesitation
and are influenced by others. These opinions of people Our methodology comprises of three sections –
can be considered in order to improvise services and take Sentimental Analysis, Clustering the positive and the
action against wrong practises. negative feedbacks and Identifying the clustered
feedback‟s topics. These topics are discussed in detail in
The sentiment is a feeling that expresses judgement, the further sections.

978-1-5386-3702-9/18/$31.00 © 2018 IEEE 1

Authorized licensed use limited to: Cornell University Library. Downloaded on September 03,2020 at 05:07:24 UTC from IEEE Xplore. Restrictions apply.
Proceeding of 2018 IEEE International Conference on Current Trends toward Converging Technologies, Coimbatore, India

Fig. 1: Operations performed in Sentimental Analysis

Sentimental analysis the Naive Bayes classifier in our approach. The end result
Sentiment Analysis is a Natural Language Processing and of this step is a set of positive, negative and neutral
Information Extraction task that aims to obtain writer‟s sentences, out of which only the set of positive and
feelings expressed in positive or negative comments, negative sentences are the useful feedback.
questions and requests, by analysing large numbers of
documents. Sentimental Analysis helps in determining Clustering within positive and negative feedbacks
whether a text is positive, negative or neutral, using Next step is to perform clustering operation on the
various classification techniques. This analysis can be positive and the negative set of feedbacks. This is useful
aggregated over large sets of data and the resulting for determining the broad topics on which the feedbacks
information can be helpful in different contexts [3]. E.g.: have been obtained. And hence one knows why the
positive – The food is very tasty; negative – The service is feedbacks are positive or why are they negative.
not good; neutral – The restaurant is in India. Here, it can
be seen that based on the words that the sentences contain For e.g., consider five positive/negative feedbacks, out of
they are appropriately classified. In order to perform which, two of them talk about the ambience and three of
sentimental analysis, it is necessary to train a classifier so them talk about the quality of food. So here clustering
to classify the unknown samples accurately. Certain steps helps us to determine the broad topics - ambience and
are involved in order to perform the sentimental analysis. quality of food, on which the feedback has been obtained.
The various steps are illustrated using the flowchart
shown in Fig. 1. As we know the traditional K-Means Algorithm has
certain drawbacks; it requires the number of clusters to be
The feedbacks are first extracted from social media using specified initially, and it has a great dependency on initial
various available APIs (Tweepy), these feedbacks are cluster center which when selected improperly leads to
nothing but sentences that are posted by users that give an unstable and inaccurate results, and it is also sensitive to
insight into the restaurant‟s different aspects. E.g. “The noisy data [4].
restaurant has an awesome ambience.”. Firstly, clean the
extracted data by removing slangs and emoticons, using In order to overcome these limitations, we use modified
string operations and stop words using nltk English K-Means clustering with dynamic thresholding. Dynamic
stopwords dictionary. E.g. words like an, the, has are thresholding helps in creating clusters dynamically
stopwords and are thus removed. In this example, after depending on the dataset. Dynamic Thresholding is
cleaning the feedback, the result is “restaurant awesome expressed by [5],
ambience.”. The next step is to convert the sentence into a
list of words and perform POS (Parts of Speech) tagging, (1)
this can be done by simply using the split() method which
returns an array of terms in the sentences. E.g.:
For two clusters A and B the variables are specified as
[„restaurant’, „awesome’, „ambience’].
follows:
The Topia package or nltk.pos_tag() method in Python  represents the total number of contexts the
can be used to tag each term in the above array with a clusters A and B have in common.
POS - Nouns, adjectives, verbs and adverbs. From this,  represents the total number of contexts present in
the adjectives and adverbs are enough to find the cluster B, but not in cluster A.
sentiment of the feedback. The array now contains  represents the total number of contexts present in
[„restaurant - noun‟, „awesome - adjective‟, „ambience - cluster A, but not in cluster B.
noun‟]. The words in the sentences need to be brought
down to their original form. E.g.: flies, flying is converted In conventional k-means method, Euclidean distance is
to fly which is the original word, the lemmatizer in nltk used as a measure of similarity while in the modified k-
can be used to find the base words. The set of phrases means approach, out of the numerous similarity measures
obtained from this step at the moment are processed to which might be a gift, we use Jaccard Index. The Jaccard
assign a polarity whether it is positive or negative [2]. similarity index (sometimes called the Jaccard
And are further processed to discover how positive or similarity coefficient) is a measure of similarity between
negative it is. positive. Hence, the positive and negative two sets of data, the range of which lies between 0% to
feedbacks that are identified are used to train the classifier 100%. The higher the percentage, the more similar the
that will improve the accuracy of the classifier. We use two populations. Jaccard Index between two sets can be

Authorized licensed use limited to: Cornell University Library. Downloaded on September 03,2020 at 05:07:24 UTC from IEEE Xplore. Restrictions apply.
Proceeding of 2018 IEEE International Conference on Current Trends toward Converging Technologies, Coimbatore, India

given as
Consider the following feedbacks collected from Zomato
(2) and Twitter as shown in Table 1:

Table 1: Collected Feedbacks


where ∩ denotes intersection and ∪ denotes union ID Feedbacks
between the two sets [6].
1 Always great service and clean rooms.
Modified K-means clustering with dynamic thresholding 2 Poor quality of food.
algorithm: 3 The room was clean, well lit, and comfortable
Input: D - a data set containing N positive/negative
4 Nice hotel. The breakfast was very good.
feedbacks.
Output: A set of k-clusters 5 The food was quiet expensive and not worth it.
Each pre-processed feedback is represented as a set of
words containing nouns using POS tags. And each cluster 6 The staff was nice and the breakfast was great.
represents a set of nouns present in all the feedbacks of
7 Great continental breakfast area.
that cluster.
8 The room was very comfortable.
Step 1: The first cluster is created using the first feedback.
Step 2: For each existing cluster, the new feedback is
compared with the cluster using Jaccard Index. Also
compute the Dynamic Threshold between the cluster and
the feedback.
Step 3: If Jaccard Index is greater than or equal to the
Dynamic Threshold, add the feedback to the existing
cluster and go to step 5.
Step 4: If the above condition fails, create a new cluster
for the feedback.
Step 5: Repeat step 2 to step 4 for all the N-1 feedbacks in Fig. 3: Positive and Negative feedbacks
D.
Fig. 3 shows the feedbacks after finding their polarities –
Identifying cluster labels
negative and positive.
Now, the next task is to identify the labels for the clusters
generated in the previous step. Each cluster is a set of
nouns that fulfilled the similarity criteria for clustering.
The labelling is performed on basis of the most frequently
occurring noun in the feedbacks present in that cluster.
The proposed system flow is described by Fig. 2.

Fig. 4: Positive Clusters and Cluster Labels

Fig. 5: Negative Clusters and Cluster Labels

Figs. 4 and 5 shows the list of nouns present in each


positive and negative feedbacks respectively, along with
Fig. 2: Proposed System Flow the clusters formed and with a suitable cluster label for
III.RESULT them.

Authorized licensed use limited to: Cornell University Library. Downloaded on September 03,2020 at 05:07:24 UTC from IEEE Xplore. Restrictions apply.
Proceeding of 2018 IEEE International Conference on Current Trends toward Converging Technologies, Coimbatore, India

From the above result, we observe that, modified k-means Hamadou, Kamel Smaili, “Clustering and Classification
with dynamic thresholding produces accurate results as of Like-Minded People from Their Tweets,” 2014 IEEE
compared to the dynamic k-means algorithm, without International Conference on Data Mining Workshop,
dynamic clustering, where a threshold was defined before Dec. 2014, pp. 921-927.
[3]. Reshma Bhonde, Binita Bhagwat, Sayali Ingulkar,
clustering. This defined threshold is not universal for all Apeksha Pande, “Sentiment Analysis Based on
restaurant feedbacks and hence produces inaccurate Dictionary Approach,” International Journal of
results. Emerging Engineering Research and Technology, Jan.
2015, vol. 3, pp. 51-55.
IV.CONCLUSION [4]. Caiquan Xiong, Zhen Hua, Ke Lv, Xuan Li, “An
Improved K-means text clustering algorithm by
Our proposed methodology suggests a way to analyse the Optimizing initial cluster centers,” 2016 7th
feedbacks that are posted on various social media International Conference on Cloud Computing and Big
platforms and apply the knowledge of Data mining Data, Nov. 2016, pp. 265-268.
[5]. Pieter van de Spek, Steven Klusener, “Applying a
combined with Sentimental Analysis with the help of a dynamic threshold to improve cluster detection of LSI,”
modified K-means algorithm with dynamic thresholding. Science of computer programming, Dec.2011, vol. 76,
The scope of this system can be extended to multinational pp. 1261-1274.
companies, organisations and startups to boon their [6]. Stephanie, “Jaccard Index/ Similarity Coefficient”, Dec.
development. 2016. [Online]. Available:
http://www.statisticshowto.com/jaccard-index
REFERENCES [7]. Techopedia, “Data Mining”, 2015. [Online]. Available:
https://www.techopedia.com/definition/1181/data-
[1]. Xing Fang, Justin Zhan, “Sentiment Analysis using mining
product review data,” Fang and Zhan Journal of Big
Data, 2015.
[2]. Soufiene Jaffali, Salma Jamoussi, Abdelmajid Ben

Authorized licensed use limited to: Cornell University Library. Downloaded on September 03,2020 at 05:07:24 UTC from IEEE Xplore. Restrictions apply.

You might also like