Professional Documents
Culture Documents
Restaurant's Feedback Analysis System Using Sentimental Analysis and Data Mining Techniques
Restaurant's Feedback Analysis System Using Sentimental Analysis and Data Mining Techniques
Restaurant's Feedback Analysis System Using Sentimental Analysis and Data Mining Techniques
Abstract— There are many challenges faced by attitude or thought. Sentiment analysis, also known as
restaurants of which getting honest feedbacks of the opinion mining, studies people‟s sentiments towards
services have been a huge challenge but it is equally certain entities. The internet is a resourceful place with
very important to remain on the top of the current respect to sentiment information. From a user‟s
market. The current trend and fact are; people perspective, people are able to post their own content on
express their honest sentiments through social media. various social media, such as forums, micro-blogs, or
Hence, through this paper, we present a methodology online social networking sites [1]. In this paper, we are
that will quell this challenge by performing proposing analysis on the sentiment level.
sentimental analysis on the feedbacks and determine
their polarity. Followed by clustering on the positive Data mining is the process of identifying patterns within a
and negative feedbacks obtained from the previous large amount of dataset to acquire useful information. It
process, to identify the broad topics of that requires the data to be consolidated in a data warehouse.
organization that the feedbacks target. This Various data mining algorithms are applied so as to find
summarization helps the restaurants to improve their out meaningful insights within the data. These insights are
current processes based on the feedback received. The used for taking timely and accurate decisions by various
feedbacks are collected from different sources to companies, business process, market analysis, etc. Data
suffice the needs. mining is also known as data discovery and knowledge
discovery [7].
Keywords—Data Mining, Clustering, Sentiments,
Feedback Analysis, Polarity, Natural Language In today's world feedbacks for restaurants are generally
Processing. posted on various social media platforms. These
feedbacks are generally posted by users on various social
I.INTRODUCTION media platforms expressing their views about the
restaurant's services. These user views can be used to
Social Media is being used today by all kinds of people meet customer needs, the key aspect of any restaurant in
whether young or old, poor or rich. It has become a the competitive market. It is practically not possible for
platform for expressing your views towards a particular restaurants to keep track and analyse all these feedbacks.
topic or retrieve information from other posts available
online. People try to give their opinion regarding a II.PROPOSED METHODOLOGY
particular topic on social media without any hesitation
and are influenced by others. These opinions of people Our methodology comprises of three sections –
can be considered in order to improvise services and take Sentimental Analysis, Clustering the positive and the
action against wrong practises. negative feedbacks and Identifying the clustered
feedback‟s topics. These topics are discussed in detail in
The sentiment is a feeling that expresses judgement, the further sections.
Authorized licensed use limited to: Cornell University Library. Downloaded on September 03,2020 at 05:07:24 UTC from IEEE Xplore. Restrictions apply.
Proceeding of 2018 IEEE International Conference on Current Trends toward Converging Technologies, Coimbatore, India
Sentimental analysis the Naive Bayes classifier in our approach. The end result
Sentiment Analysis is a Natural Language Processing and of this step is a set of positive, negative and neutral
Information Extraction task that aims to obtain writer‟s sentences, out of which only the set of positive and
feelings expressed in positive or negative comments, negative sentences are the useful feedback.
questions and requests, by analysing large numbers of
documents. Sentimental Analysis helps in determining Clustering within positive and negative feedbacks
whether a text is positive, negative or neutral, using Next step is to perform clustering operation on the
various classification techniques. This analysis can be positive and the negative set of feedbacks. This is useful
aggregated over large sets of data and the resulting for determining the broad topics on which the feedbacks
information can be helpful in different contexts [3]. E.g.: have been obtained. And hence one knows why the
positive – The food is very tasty; negative – The service is feedbacks are positive or why are they negative.
not good; neutral – The restaurant is in India. Here, it can
be seen that based on the words that the sentences contain For e.g., consider five positive/negative feedbacks, out of
they are appropriately classified. In order to perform which, two of them talk about the ambience and three of
sentimental analysis, it is necessary to train a classifier so them talk about the quality of food. So here clustering
to classify the unknown samples accurately. Certain steps helps us to determine the broad topics - ambience and
are involved in order to perform the sentimental analysis. quality of food, on which the feedback has been obtained.
The various steps are illustrated using the flowchart
shown in Fig. 1. As we know the traditional K-Means Algorithm has
certain drawbacks; it requires the number of clusters to be
The feedbacks are first extracted from social media using specified initially, and it has a great dependency on initial
various available APIs (Tweepy), these feedbacks are cluster center which when selected improperly leads to
nothing but sentences that are posted by users that give an unstable and inaccurate results, and it is also sensitive to
insight into the restaurant‟s different aspects. E.g. “The noisy data [4].
restaurant has an awesome ambience.”. Firstly, clean the
extracted data by removing slangs and emoticons, using In order to overcome these limitations, we use modified
string operations and stop words using nltk English K-Means clustering with dynamic thresholding. Dynamic
stopwords dictionary. E.g. words like an, the, has are thresholding helps in creating clusters dynamically
stopwords and are thus removed. In this example, after depending on the dataset. Dynamic Thresholding is
cleaning the feedback, the result is “restaurant awesome expressed by [5],
ambience.”. The next step is to convert the sentence into a
list of words and perform POS (Parts of Speech) tagging, (1)
this can be done by simply using the split() method which
returns an array of terms in the sentences. E.g.:
For two clusters A and B the variables are specified as
[„restaurant’, „awesome’, „ambience’].
follows:
The Topia package or nltk.pos_tag() method in Python represents the total number of contexts the
can be used to tag each term in the above array with a clusters A and B have in common.
POS - Nouns, adjectives, verbs and adverbs. From this, represents the total number of contexts present in
the adjectives and adverbs are enough to find the cluster B, but not in cluster A.
sentiment of the feedback. The array now contains represents the total number of contexts present in
[„restaurant - noun‟, „awesome - adjective‟, „ambience - cluster A, but not in cluster B.
noun‟]. The words in the sentences need to be brought
down to their original form. E.g.: flies, flying is converted In conventional k-means method, Euclidean distance is
to fly which is the original word, the lemmatizer in nltk used as a measure of similarity while in the modified k-
can be used to find the base words. The set of phrases means approach, out of the numerous similarity measures
obtained from this step at the moment are processed to which might be a gift, we use Jaccard Index. The Jaccard
assign a polarity whether it is positive or negative [2]. similarity index (sometimes called the Jaccard
And are further processed to discover how positive or similarity coefficient) is a measure of similarity between
negative it is. positive. Hence, the positive and negative two sets of data, the range of which lies between 0% to
feedbacks that are identified are used to train the classifier 100%. The higher the percentage, the more similar the
that will improve the accuracy of the classifier. We use two populations. Jaccard Index between two sets can be
Authorized licensed use limited to: Cornell University Library. Downloaded on September 03,2020 at 05:07:24 UTC from IEEE Xplore. Restrictions apply.
Proceeding of 2018 IEEE International Conference on Current Trends toward Converging Technologies, Coimbatore, India
given as
Consider the following feedbacks collected from Zomato
(2) and Twitter as shown in Table 1:
Authorized licensed use limited to: Cornell University Library. Downloaded on September 03,2020 at 05:07:24 UTC from IEEE Xplore. Restrictions apply.
Proceeding of 2018 IEEE International Conference on Current Trends toward Converging Technologies, Coimbatore, India
From the above result, we observe that, modified k-means Hamadou, Kamel Smaili, “Clustering and Classification
with dynamic thresholding produces accurate results as of Like-Minded People from Their Tweets,” 2014 IEEE
compared to the dynamic k-means algorithm, without International Conference on Data Mining Workshop,
dynamic clustering, where a threshold was defined before Dec. 2014, pp. 921-927.
[3]. Reshma Bhonde, Binita Bhagwat, Sayali Ingulkar,
clustering. This defined threshold is not universal for all Apeksha Pande, “Sentiment Analysis Based on
restaurant feedbacks and hence produces inaccurate Dictionary Approach,” International Journal of
results. Emerging Engineering Research and Technology, Jan.
2015, vol. 3, pp. 51-55.
IV.CONCLUSION [4]. Caiquan Xiong, Zhen Hua, Ke Lv, Xuan Li, “An
Improved K-means text clustering algorithm by
Our proposed methodology suggests a way to analyse the Optimizing initial cluster centers,” 2016 7th
feedbacks that are posted on various social media International Conference on Cloud Computing and Big
platforms and apply the knowledge of Data mining Data, Nov. 2016, pp. 265-268.
[5]. Pieter van de Spek, Steven Klusener, “Applying a
combined with Sentimental Analysis with the help of a dynamic threshold to improve cluster detection of LSI,”
modified K-means algorithm with dynamic thresholding. Science of computer programming, Dec.2011, vol. 76,
The scope of this system can be extended to multinational pp. 1261-1274.
companies, organisations and startups to boon their [6]. Stephanie, “Jaccard Index/ Similarity Coefficient”, Dec.
development. 2016. [Online]. Available:
http://www.statisticshowto.com/jaccard-index
REFERENCES [7]. Techopedia, “Data Mining”, 2015. [Online]. Available:
https://www.techopedia.com/definition/1181/data-
[1]. Xing Fang, Justin Zhan, “Sentiment Analysis using mining
product review data,” Fang and Zhan Journal of Big
Data, 2015.
[2]. Soufiene Jaffali, Salma Jamoussi, Abdelmajid Ben
Authorized licensed use limited to: Cornell University Library. Downloaded on September 03,2020 at 05:07:24 UTC from IEEE Xplore. Restrictions apply.