Professional Documents
Culture Documents
Aid SP
Aid SP
Aid SP
A Report On
“ASPECT BIASED SENTIMATE ANALYSIS ON FINE FOOD REVIEWS”
Submitted By:
MOHAMMED AIRSHAD (1KN19CS050) ANSARI MOHAMMED IZAN
(1KN19CS003)
MOHAMMED ASIF RAZA (1KN19CS051) RAYYANUDDIN A (1KN19CS083)
Chapter 01 : Introduction……………………………………………………………………………………………..
1.1 overview ……………………………………………………………………………………………………………………………………….
1. Introduction
▪ Overview
The dataset is provided by Kaggle website and was collected and published first by McAuley et al. (2013) in their
research related to online reviews. The dataset consists of reviews on fine food posted on Amazon.com. It has a
total of 568,454 reviews on 74,258 products. The attributes and their descriptions are discussed in Table 3-1
below:
1.1 Motivation
It has been estimated that the ‘Helpfulness Voting’ system brings in Amazon.com about $2.7 billion in
additional revenue (Cao et al., 2011). It is imperative for any business organization to perceive which
factors determine the helpfulness of the online reviews. This can help the online business managers
increase the revenue by designing and implementing an automated system for classification of the vast
quantity of the online consumer reviews data.
Although this voting process is an improvement in building the automated classification system, there
are some detrimental issues that need to be taken care of before proceeding further. First, reviews
which are posted most recently might have very few votes or no votes at all and therefore, identifying
the helpfulness of these reviews becomes very difficult. Moreover, the highest voted reviews get
viewed first before the newly published reviews which are yet to be voted (Liu et al., 2008). For
example, Kim et al. (2006) found in their study that, out of 20919 Amazon reviews for all MP3 player
products, 38% received 3 or lesser helpfulness votes. It is also known that there is always a considerable
bias in the human analysis of textual information as users like to believe what matches with their
dispositions (Liu et al., 2012). Studies have also shown that the ratings that the reviews get do not convey
profound information as they are either extremely high or extremely low (Ghose et al., 2007).
In light of all the above factors, it is important to build an automated text-based classification system that
predicts the helpfulness of online consumer reviews irrespective of when the reviews get posted. If the newly
posted reviews are instantaneously deemed ‘Helpful’ or ‘Not helpful’ by the system built on machine learning
classification algorithms, it gives consumers a reasonable choice to make their purchasing decisions rather than
relying solely on older reviews. Text mining and Natural Language Processing (NLP) have been used in the past to
build a text classification system to identify spam reviews (Crawford et al., 2015; Shojaee et al., 2013; Lilleberg et
al., 2015).
The same can be applied to detect whether the reviews are helpful or not.