Project Report Final

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 54

Sentiment Analysis on E-Commerce Platform May 2019

Department of Computer Science Engineering, ASIET 0


Sentiment Analysis on E-Commerce Platform May 2019

CHAPTER 1
INTRODUCTION

This project is going to be focused on solving the problem of genuineness while shopping
online. When shopping online, people get confused to choose a product which is genuine. To
ensure quality people have to browse through all the user reviews and it is really tiring. We by
this project is trying to create a platform for customers where they can identify the genuineness
of a product from the user reviews.

To start with, let’s take a moment to pin down exactly what it is we’re trying to do. Challenge
is to predict the genuineness of a product from the already existing user reviews. We give a
rating to that product from 1 to 10 with 1 as least genuine and 10 to be the most genuine
product.

For public opinion we collected the data from Amazon and used it as our test data. We created
a dataset from the reviews available in Amazon reviews as our training data. The data is
pre-processed and then filtered for irrelevant characters. The data is then clustered based on the
negativity and positivity of the comments. After that using the Naive Bayes classifier to a scale
of 0 to 10 (where 5 is average) and thus the rating of an individual product is obtained. This
finding is made presentable as a website. To start with small only some categories of products
like Laptop, Camera and Mobile Phones are taken.

1.1 UNDERSTANDING THE CLIENT AND THEIR PROBLEM

Client online buyers: Our client wants to check the genuineness of the products available in
E-Commerce platforms.Client is buying a product without actually seeing it so to ensure its
genuineness is a worrisome job by itself. To understand the genuineness of a product is only
possible by reading out the reviews which are given by already existing users. To read out all
the reviews available and understand the product is a tiring job. By using this platform we
provide a space to circumvent this exasperating work by analysing a product and its reviews
and thereby giving a rating to that product from already existing user reviews. By this rating it

Department of Computer Science Engineering, ASIET 1


Sentiment Analysis on E-Commerce Platform May 2019

is easy to understand whether a product is genuine or not without having to read all the
reviews. With this study, they can understand which features influence the genuineness of a
product. If the rating is good, they can ensure that they are getting a genuine product.

1.2 SENTIMENT ANALYSIS

Sentiment analysis is a process that finds opinions, emotions from texts, comments and other
sources of natural languages. All the opinion/emotion is captured using natural language
processing. As now a day the amount of data is getting bigger and bigger ‘Natural Language
Processing ‘ is becoming more popular.

People who shop online nowadays are caring more to leave their opinions and criticisms
online. These opinions are from a wide range of users all including people who know
technically more about the product and people who have no idea about the technicalities of the
product. So by these comments both techies and non techies get a good idea of the product. By
these comments or reviews new buyers are able to analyse a product even before buying. All
types of positive negative and neutral reviews are someway helpful for other buyers. So to
analyse these comments provide great importance and effectiveness.

Due to these reasons, we thought of collecting these reviews and analyzing them to create a
rating system.This system will take ll reviews and process them using Naive Bayes algorithm ,
do a sentiment analysis on reviews to get the polarity which further leads to rating of a product.

1.3. NATURAL LANGUAGE PROCESSING


Natural language processing is a focus on machines and human languages. It basically enables
machines to understand human languages. Its main challenge is to recognize speech and
understand human language. The main syntax of natural language processing is parts of speech
tagging, parsing, word segmentation, sentence breaking, and terminology extraction.

Department of Computer Science Engineering, ASIET 2


Sentiment Analysis on E-Commerce Platform May 2019

1.4 SUPERVISED LEARNING

Supervised learning is basically training the algorithm manually so that it can predict relatively
correctly for future datasets. In supervised learning training datasets are given with a desired
output. From all the information received by the algorithm it then predicts the probability for
unknown attributes.

1.5 BAYES THEOREM

For probability theory or to find out the probability of an event Bayes theorem/Bayes rule is
used. Bayes theorem follows the prior knowledge of the conditions for a specific event and
then calculates the probability of a certain event occurrence. Bayes theorem works based on the
conditional probability.

The mathematical equation of Bayes theorem:

Where P(A) and P(B) are the probabilities of A and B disregarding each other and this is called
the marginal probability. P(B|A) is the probability of B occurring depending on the occurence
of A. Finally, the answer, P(A|B) is the conditional probability of A occurring given B is true.
Provided P(B) ≠ Φ.

1.6 NAIVE BAYES CLASSIFIER

While working on dataset with millions of records, Naїve Bayes approach is recommended.

The Naїve Bayes algorithm uses Bayes Theorem with strong independent assumptions. Bayes
theorem works on conditional probability. Conditional probability is the probability of
something will happen given that something else has already happened. It predicts the

Department of Computer Science Engineering, ASIET 3


Sentiment Analysis on E-Commerce Platform May 2019

probabilities for each class such as the probability that given record or data point belongs to a
particular class. If there are m possible classes A={a​1​,a​2​,.........,a​m​} for reviews
T={t​1​,t​2​,...........,t​n​} then using bayes rule we can predict the probability review ‘t’ to be in a
class a

A naive bayes occurs independently thus it assumes each term or word w​k​,t​k is the frequency of
each word w​k​, n​d​ is the number of unique words then the equation becomes

1.6.1 NAIVE BAYES WORKING

Figure 1.1 Workflow of Naive Bayes Classifier

Department of Computer Science Engineering, ASIET 4


Sentiment Analysis on E-Commerce Platform May 2019

In Figure 1, the basic workflow of Naive Bayes Classifier is shown. For each attribute it
traverses each node and finds the probability to be in a specific class. If it finds all the values of
an attribute, it goes to the next attribute. If it does not get the values, then it goes to different
nodes and check again.

Naive Bayes pseudocode is given. First it extracts the vocabulary and has attribute counter. For
each attribute, go to the node and check if the attribute belongs to the class. For each text, word
is tokenized. Then the tokenized word’s probability is measured.Then each word is scored.
Then it returns the score.

Both the training and the testing algorithms are presented below in the form of pseudo code:

Department of Computer Science Engineering, ASIET 5


Sentiment Analysis on E-Commerce Platform May 2019

CHAPTER 2
LITERATURE SURVEY

In this chapter, we briefly review some existing Sentiment analysis programs for Social
platforms. Also, we take a look at the past scenario (without sentiment analysis). Many
Sentiment analysis platforms have been designed for one or more social platforms.

2.1 BEFORE SENTIMENT ANALYSIS

Precursors to sentiment analysis include the General Inquirer, which provided hints towards
quantifying patterns in text and, separately, psychological research that examined a person’s
psychological state based on analysis of their verbal behavior. The General Inquirer is a unique
set of procedures for identifying, in a useful and meaningful way, recurrent patterns within the
rich variety of man’s written and spoken communications. It provides a flexible common
referent for testing the hypotheses of different investigators.The system is programmed to
accept actual text, look up words and phrases in dictionaries, assign descriptors, check for
specified descriptor patterns, count occurrences, and retrieve sentences with specified
characteristics.

Subsequently the method described in a patent by Volcani and Fogel looked especially at
sentiment and identified individual words and phrases in text with respect to different
emotional scales. A current system based on their work, called EffectCheck,presents synonyms
that can be used to increase or decrease the level of evoked emotion in each scale.

Work by Turney used a mere polar view of sentiment, from positive to negative, applied
different methods for detecting the polarity of product reviews and movie reviews respectively.
This work is at the document level. This is an early and influential paper presenting an
unsupervised approach to review classification. There are three basic ideas introduced:

One key idea is to score the polarity of a review based on the total polarity of the phrases in it.
A second idea is to use patterns of part of speech tags to pick out phrases that are likely to be

Department of Computer Science Engineering, ASIET 6


Sentiment Analysis on E-Commerce Platform May 2019

meaningful and unambiguous with respect to semantic orientation. Finally, these


potentially-meaningful phrases are then scored using pointwise mutual information (PMI) to
see words on known polarity.

First step of bringing together various approaches- learning, lexical, knowledge-based, etc. -
were taken in 2004 AAAI Spring Symposium where linguistics, computer scientists, and
researchers first aligned interests and proposed shared tasks and benchmark data sets for the
systematic computational research on affect, appeal, subjectivity and sentiment in text.

2.2 METHODS AND FEATURES

Existing approaches to sentiment analysis can be grouped into three main categories:
Knowledge-based techniques
Statistical methods and
Hybrid approaches

Knowledge-based technique classify text by affect categories based on the presence of


unambiguous affect words such as happy, sad, afraid, and bored. Some knowledge bases not
only list obvious affect words, but also assign arbitrary words a probable “affinity” to particular
emotions.

Statistical method leverage elements from machine learning such as latent semantic analysis,
support vector machines, “bag of words”, “Pointwise Mutual Information” for Semantic
Orientation, and deep learning. More sophisticated methods try to detect the holder of a
sentiment (i.e, the person who maintains the affective state) and the target (i.e, the entity about
which the effect is felt). To mine the opinion in context and get the feature about which the
speaker has oppined, the grammatical relationships of words are used. Grammatical
dependency relations are obtained by deep parsing of the text.

Hybrid approaches leverage both machine learning and elements from knowledge
representation such as ontologies and semantic networks in order to detect semantics that are

Department of Computer Science Engineering, ASIET 7


Sentiment Analysis on E-Commerce Platform May 2019

expressed in a subtle manner, e.g., through the analysis of concepts that do not explicitly
convey relevant information, but which are implicitly linked to other concepts that do so.

Knowledge-Based Statistical Hybrid

Classify text by affect Statistical method leverage Hybrid approaches leverage


categories based on the elements from machine learning both machine learning and
presence of unambiguous affect for Semantic Orientation, and elements from knowledge
words deep learning. representation to detect
semantics expressed in a
subtle manner,

Also assign arbitrary words a More sophisticated methods try It adds both features of
probable “affinity” to particular to detect the holder of a sentiment Knowledge-Based and
emotions. and the target Statistical methods

Knowledge is priority. Statistical values are prior. Both values are considered.

Table 2.1. Comparison of Sentiment Analysis Methods

Open source tools as well as a range of free and paid sentiment analysis tools deploy machine
learning, statistics, and natural language processing techniques to automate sentiment analysis
on large collections of texts, including web pages, online news, internet discussion groups,
online reviews, web blogs, and social media.

Sentiment classification approaches also rely on bag-of-words model, which disregards


context, grammar and even word order. Approaches that analyses the sentiment based on how
words compose the meaning of longer phrases have shown better results, but they incur an
additional annotation overhead.

A human analysis component is required in sentiment analysis, as automated systems are not
able to analyze historical tendencies of the individual commenter, or the platform and are often
classified incorrectly in their expressed sentiment.

Department of Computer Science Engineering, ASIET 8


Sentiment Analysis on E-Commerce Platform May 2019

The structure of sentiments and topics are often complex. The problem of sentiment analysis is
non-monotonic in respect to sentence extension and stop-word substitution. To address this
issue a number of rule-based and reasoning-based approaches have been applied to sentiment
analysis, including defeasible logic programming. There are a number of tree traversal rules
applied to syntactic parse tree to extract the topicality of sentiment in open domain setting.

2.3 RELATED WORKS

Sentiment analysis on E-commerce platforms is now a popular which all the developers use.
Many sentiment analysis works are being developed which are used in so many platforms.
Some of the works which are closer to our work are discussed in this section.

2.3.1 Review Meta

ReviewMeta.com is a free web tool that analyzes millions of reviews and helps you decide
which ones to trust. Simply copy and paste any Amazon product URL into the search bar on
ReviewMeta.com for a full analysis.Their Chrome Extension helps streamline this process by
providing you with an adjusted rating for each product based only on the most trustworthy
reviews, and displays it directly at the top of your browser. ReviewMeta.com analyzes millions
of reviews and helps you decide which one to trust. ReviewMeta.com is completely
independent of Amazon and Bodybuilding.com. They are not a replacement for reading
reviews, but is an Amazon review checker tool that analyzes reviews and helps improve your
shopping experience. The review analysis does not guarantee whether or not fake reviews are
not present - They simply show you some detailed stats and making an educated guess.

Simply browse Amazon as normal. When viewing a product, the extension will show the
adjusted rating and color based on the authenticity of reviews. To see detailed report, click the
icon. It will help you weed out the biased or fake reviews and leave you with the most honest
feedback.

Department of Computer Science Engineering, ASIET 9


Sentiment Analysis on E-Commerce Platform May 2019

Amazon.com/.ca/.co.uk/etc: The extension simply reads the current URL to figure out which
product you are viewing so it can show you the corresponding data that we have on that
product.
ReviewMeta.com: Simply tells our website to hide the notification about installing the
extension.

Figure 2.1. RevieMeta


2.3.2 Fakespot
Fakespot will ensure you get the most up to date and relevant information to make your online
shopping experience as safe and trustworthy as possible. Fakespot provides consumers with a
new way of filtering product reviews to find out what real users are saying about the products
you want to buy. It analyses millions of product reviews, looking for suspicious patterns and
incentivized reviews. It then weed out the reviews that are unreliable.

Department of Computer Science Engineering, ASIET 10


Sentiment Analysis on E-Commerce Platform May 2019

The user has to copy product or business link from the URL box of our browser and paste the
copied link to Fakespot Analyzer tool and click Analyze Reviews. It analyzes reviews and
reviewers of the product or business.

Figure 2.2. Fakespot


2.3.3 Metacritic

It began as a simple idea back in the summer of 1999: a single score could summarize the many
entertainment reviews available for a movie or a video game. Metacritic's three founding
members found a more constructive but less profitable use of time by launching the site in
January 2001 and Metacritic has evolved over the last decade to reflect their experience
distilling many critics' voices into the single Metascore, a weighted average of the most
respected critics writing reviews online and in print.

Metacritic's mission is to help consumers make an informed decision about how to spend their
time and money on entertainment. They believe that multiple opinions are better than one, user
voices can be as important as critics, and opinions must be scored to be easy to use.

Their Metascore system is unique and merits its own ​explanation page​.

Department of Computer Science Engineering, ASIET 11


Sentiment Analysis on E-Commerce Platform May 2019

Creating their proprietary Metascores is a complicated process. They carefully curate a large
group of the world’s most respected critics, assign scores to their reviews, and apply a
weighted average to summarize the range of their opinions. The result is a single number that
captures the essence of critical opinion in one Metascore. Each movie, games, television shows
and album featured on Metacritic gets a Metascore when we've collected at least four critics'
reviews.
Metascore is a weighted average because we assign more importance, or weight, to some critics
and publications than others, based on their quality and overall stature. In addition, for music
and movies, we also normalize the resulting scores (akin to "grading on a curve" in college),
which prevents scores from clumping together.

Figure 2.3. Metacritic


2.3.4 Trust You

TrustYou is a platform to access your Meta-Review. In addition to that it provides


functionalities like Analyze and respond to guest feedback collected from hundreds of online
sources, Act on valuable KPI insights about your guest preferences and direct competitors,
Collect and analyze guest feedback via pre-stay, on-site, and post-stay surveys, Showcase
positive survey reviews on your website and on hundreds of travel sites, Communicate with
your guests on their preferred channel.
Because Meta-Reviews provide an aggregate of all verified reviews, they naturally weed out
the extremes to get down to the nuts and bolts of what travelers need to know about a property.

Department of Computer Science Engineering, ASIET 12


Sentiment Analysis on E-Commerce Platform May 2019

At the property level, the same sophisticated analytics used to create the Meta-Reviews can be
used to identify strengths and weaknesses impacting your ratings and reviews.

Figure 2.4. TRUSTYOU

Department of Computer Science Engineering, ASIET 13


Sentiment Analysis on E-Commerce Platform May 2019

CHAPTER 3
SYSTEM DESIGN
3.1 USE CASE DIAGRAM

Figure 3.1. Use Case Diagram

● User inputs his required data values on the website. The input contains values like
category of item (phone, laptop, camera) and the model from the category.
● This input is used to get the reviews from the database.
● A classifier model containing naive bayes algorithm to analyse the review of the
product chosen by the user is used to give a polarity for the review and a rating.
● The output is then displayed to the user through the website interface.
● Output displayed to the user is a rating on the genuineness of the product.

3.2. STATE DIAGRAM

● State Diagram describes the behavior of the system. It contains a finite number of states
to show the working of the system.
● User Interface: In this all the information are made visible to user. This takes input from
the user and feed it to the backend process.

Department of Computer Science Engineering, ASIET 14


Sentiment Analysis on E-Commerce Platform May 2019

● Classifier Model: The input from user is given to the classifier. The classifier classify
the review and give a rating to the genuineness of the product.

Figure 3.2. State Diagram

● Print Rating: This gets the output from the Classifier and the output is given to the
interface to display it to users.
● Print Description: Once the user gives the input the input is given to the database to get
the descriptions about the selected product. This is given in the webpage to display to
users.

3.3 ACTIVITY DIAGRAM

● Website is launched. This displays all the description and reviews to the users and gets
the input from the user.
● Check if any changes to the comments are made.
● If no changes are made then default values are used to make analysis
● If change is made then click evaluate again to make the ratings.
● After evaluating the rating is automatically calculated.

Department of Computer Science Engineering, ASIET 15


Sentiment Analysis on E-Commerce Platform May 2019

​Figure 3.3. Activity Diagram

3.4 SEQUENCE DIAGRAM

● The dotted line depicts the lifeline of the process


● There are 3 entities - Web interface, Classification model and Print.
● The process starts from web interface entity
● We move to the next schedule when the desired product is entered by the user.
● Then midway through the classification process we move to next schedule which is to
print the result
● After printing, that process is terminated
● In the end all the schedules terminate

Department of Computer Science Engineering, ASIET 16


Sentiment Analysis on E-Commerce Platform May 2019

Figure 3.4. Sequence Diagram

3.5 CLASS DIAGRAM

● Contains 4 classes- Web interface, Reviews, Analyse, Description


● Web interface contains various attributes like categories- phone camera, laptop.
● The operations used in Web interface are Analyse, Description, Revews.
● The Analyse class is used for making analysis of a comment of a product and giving a
rating.
● Review class gives the review details of products, how many reviews and all.
● Description gives description of various products.
● Web interface is the base class.
● Review, Analyse and Description are the derived class.

Department of Computer Science Engineering, ASIET 17


Sentiment Analysis on E-Commerce Platform May 2019

Figure 3.5. Class Diagram

Department of Computer Science Engineering, ASIET 18


Sentiment Analysis on E-Commerce Platform May 2019

CHAPTER 4
ARCHITECTURE OF SENTIMENT ANALYSER

The basic idea of our project is to gather data from E-Commerce platforms and run a sentiment
analysis on the gathered data. Then get the calculated polarity of each review of a product.
After that the average of the polarity of the reviews of the product is calculated and converted
to 10 which is the rating of the product.

4.1 PROCESS
The processes in the proposed model are:

1. The first step is the collection of data from different E-commerce platforms.
2. The second step is the preprocessing of the gathered data to a supervised form.
3. The third step is the building of a list of positive and negative words added.
4. The fourth step is to collect data to be tested from Amazon comments.
5. The fifth step is to cluster different attributes of the product.
6. The sixth step is tokenizing and parts of speech tagging.
7. The seventh step is to do a sentiment analysis on the data to get the polarity.
8. The eight step is to get the average polarity and generate a rating based on the polarity.
9. The final step is to design a front end website to present the findings.

4.2 DATA COLLECTION

For gathering data, we used an automation tool ParseHub to collect the comments of a
particular product from amazon. The steps followed to extract data using ParseHub are:

1. Open ParseHub Desktop application


2. Open a New Project in ParseHub
3. Type in the URL (​https://www.amazon.in/​) and Start the project on this URL.
4. Select Settings and enter keywords in “Starting Value” in JSON format.

{“keyword”:[“canon”,”nikon”]}

Department of Computer Science Engineering, ASIET 19


Sentiment Analysis on E-Commerce Platform May 2019

Figure 4.1. Setting ParseHub for keyword based extraction

5. Create a loop to search through all of the categories.

6. Open command menu and select advanced menu and add loop tool

7. In “for each” textbox leave text a name(item) and in the “In” textbox enter name of
list(keyword)

Department of Computer Science Engineering, ASIET 20


Sentiment Analysis on E-Commerce Platform May 2019

Figure 4.2. ParseHub Looping Option

8. Click on + button next to Loop and choose Begin New Entry from Advanced Option.

9. Click on + button next to Begin New entry and add Select command.

10. Click on Amazon search bar to select it and change input type to “expression”.

11. In the input textbox write ​item​ with no quotation mark

Department of Computer Science Engineering, ASIET 21


Sentiment Analysis on E-Commerce Platform May 2019

Figure 4.3. ParseHub Amazon selection.

12. Shift + Click on Plus button next to Input Item.

13. Add another Select command and click plus next to it.

14. Add Create New Template and click button.

Figure 4.4. Template Creation

15. Scrape all of the products for each brand

Department of Computer Science Engineering, ASIET 22


Sentiment Analysis on E-Commerce Platform May 2019

16. Change the mode from select mode to browse mode for ease of searching.

17. To select products change back to select mode.

18. A Select command will be automatically added. Scroll through the page to select all
products.

19. Selected products will be highlighted in green

Figure 4.5. Selecting attributes to scrape

20. Scrape the price, reviews and description of all the products.

21. Navigate details of page of each product.

22. Click on plus next to Begin New Entry and choose Click command.

23. In text box write details and click create new template

Department of Computer Science Engineering, ASIET 23


Sentiment Analysis on E-Commerce Platform May 2019

Figure 4.6. Creating new template.

24. Add select command and click price of product.

25. Scroll down to see Customer Reviews selection.

26. Click plus next to select page.

27. Choose select command and click on product description to extract it as well

Figure 4.7. Product extraction

28. Run the project and download your results.

29. Click the Get Data button

Department of Computer Science Engineering, ASIET 24


Sentiment Analysis on E-Commerce Platform May 2019

30. Click on Run and Run and Save.

31. Download the data in CSV format.

Figure 4.8. Running ParseHub

4.3 PRE-PROCESSING OF DATA

After the collection of data which is in CSV format, collected data need to be pre-processed in
a supervised form. It means there cannot be a punctuation or additional symbol. To pre-process
the data means to remove stop words and punctuation and to tokenize the sentence. We use a
program to process the CSV file to remove unwanted characters, numbers, and space. We split
the data accordingly when full stop is identified.

Department of Computer Science Engineering, ASIET 25


Sentiment Analysis on E-Commerce Platform May 2019

Figure 4.9. Data before processing

Figure 4.10. Data After Processing

Department of Computer Science Engineering, ASIET 26


Sentiment Analysis on E-Commerce Platform May 2019

4.4 WORD LIST AND ATTRIBUTE ADDITION

We added a list of positive and negative words with the provided wordlist by python
NLTK.​We can remove Stop Words Using NLTK easily. NLTK is shipped with stop words lists
for most languages. To get English stop words, you can use this code:

from nltk.corpus import stopwords


stopwords.words('english')

4.5 TRAINING DATASET

We used the data collected from Amazon and added some other data from other sites manually.
We manually classified the training dataset into positive data and negative data. The data for
training are stored in as data.pos and data.neg This data is pre processed and stored in the right
format which makes training process easy.

4.6 TOKENIZATION AND POS TAGGING

After reading the comment the classifier first tokenizes the words based on comma (,), full stop
(.), space and any other punctuation. After the stop words are removed by using parts of speech
tagging we can indicate subjectivity of comment better.Parts of speech tagging is done by
NLTK pos tagger which was pre-trained by python NLTK

4.7 SENTIMENT ANALYSIS ON DATA

Sentiment Analysis is the process of detecting the contextual polarity of text. In other words, it
determines whether a piece of writing is positive, negative or neutral. After preprocessing the
data, we did sentiment analysis on the datasets. For sentiment analysis we used the Naive
Bayes Classifier algorithm in PHP.

This algorithm is used for predicting the probability of words being in any particular class (neg
or pos). This is used due to its ease during both training and classifying steps. Preprocessed
data is given as input to train the classifier and that model is applied on test to generate positive
or negative or neutral sentiment.

Department of Computer Science Engineering, ASIET 27


Sentiment Analysis on E-Commerce Platform May 2019

4.7.1 SentimentAnalyzerTest:

This is the main class of sentiment analyzer. This contains all the functions essential to
sentiment analysis.all the methods and declarations are in a class file named
SentimentAnalyzer.class.php .This class initialize all the working functions for the analyzer
module with their parameters and specify their return values .

This class contains the following functions:

_construct()

trainAnalyzer()

analyseSentence()

analyseDocument

Figure 4.11. Sentiment Analysis Class

4.7.2 _construct():

Department of Computer Science Engineering, ASIET 28


Sentiment Analysis on E-Commerce Platform May 2019

This function is used to declare constructor method for the given class by initializing a array0
named arrBayesDifference. arrBayesDifference is an array of numbers from -1 to 1.5 with 0.1
increment.

4.7.3 splitSentence():

This particular function is used to perform a regular expression match. We are checking the
pattern \w that is only words numbers and _ omitting white space and punctuation in the
subject $word . The text that matched the full pattern will be contained in $matches.

4.7.4 insertTestData():

This function is used to take test data and process it passes it to the classifier.

This produces an exception if any other sentiment type other than the predefined types are
encountered.

Department of Computer Science Engineering, ASIET 29


Sentiment Analysis on E-Commerce Platform May 2019

In this portion the test data it into words and occurence of each word is calculated. If the
sentiment of word matches that of the test data type 1 is incremented. Otherwise it is set to 0.

4.7.5 analyseSentence():

Analyse Sentence is the main part of the whole classifier. This part analyse a sentence and give
polarity to it.

This portion fixes the sentiment score as an array of positive and negative. Sentence is split to
words and stored as words.

Laplace Correction is used in this portion. To avoid the value of sentiment to be zero laplace is
used and 1 is added to the value before multiplying.

Department of Computer Science Engineering, ASIET 30


Sentiment Analysis on E-Commerce Platform May 2019

If Sentiment is positive then bayesdiffernce is calculated as ratio of positive score to negative


score, if sentiment is negative then negative to positive ratio is calculated.

This function is used to return the sentiment polarity and values of their polarity of a given
sentence.this functions works only for single sentence.this function itself split the sentence
using the function Splitsentence and count the words in it and finally finding the average
polarity and polarity score from these individual word values

4.7.6 analyseDocument():

analyseDocument() has similiar Working with Function analyseSentence,here we it is used for


finding polarity and polarity score from a file document which contains text data.ie instead of
single sentence it uses many sentences in a single file.the parameter for the function is
$filelocation, which is the location of the target file.

This part in both analyseSentence and analyseDocument is used for the same purpose, that is to
evaluate the polarity and sentiment score of document.

Department of Computer Science Engineering, ASIET 31


Sentiment Analysis on E-Commerce Platform May 2019

4.8 RATING GENERATION.

Rating is calculated by averaging the polarity(positive, negative and neutral) of all the reviews
of a product. As these polarities are fraction values, we convert it to decimal, round it up using
floor and show the rating. As neutral polarities are considered to have a polarity of 0 and it
does not affect the average polarity cunt, these values are not taken in order to make our system
more efficient. We save the number of positive, negative comments and we average them to
find out the rating on a scale of 1 to 10. This is how polarity is calculated of a single product.

4.9 DATABASE CONNECTION

This initialize a safer connection with the database server with the given credentials. The
mysqli_connect command is used to ensure the connection to the database. If the connection is
not established a connection failed message is shown. mysqli_select_db is used to connect to
the database commodity_dataset.

The structure of the four tables which are used in this database are:

camera:

This is used to store the camera information like brand, model name, model, type, camera
resolution, display, warranty, video recording, review, status and score.

Department of Computer Science Engineering, ASIET 32


Sentiment Analysis on E-Commerce Platform May 2019

filtered_input:

This table is used to filter the inputs and store them for all the three categories used.

laptop:

This table stores the information regarding a laptop like its brand, model name, screen-size,
price, screen-resolution, RAM, Hard disk capacity, processor, graphics, battery, color,weight,
warranty, review, status and score.

Department of Computer Science Engineering, ASIET 33


Sentiment Analysis on E-Commerce Platform May 2019

phone:

This table contains the information related to a phone like its brand, model name, resolution,
RAM,price, front camera, back camera, processor, review, battery capacity , status and score.

4.10 WEB PAGE DEVELOPMENT

We are using a webpage as a medium to present our analysis result to users. A web page which
is run on a local server by Xampp is designed. The webpage contain different pages for
different categories which on a click will be redirected to these pages from the home page.

Here we use bootstrap for styling our webpage. Bootstrap is a free and open source CSS
framework for developing responsive websites. Free Bootstrap templates are available for easy
styling of websites. This gives home page interface of the project. Here the page is styled with
bootstrap framework. Here the different categories of the items are displayed with button, and
if pressed it will redirect to its consecutive category php page.

For home page we have a home.php which deals with the display of all the categories available.

Department of Computer Science Engineering, ASIET 34


Sentiment Analysis on E-Commerce Platform May 2019

While clicking on these categories it will redirected to these respective pages. For camera we
have camera.php, for laptop laptop.php, for phone phone.php

4.11 DESIGN STRUCTURE OF THE WHOLE PROCESS.

The design is done in a stepwise procedural manner . Collection of data followed by Pre
processing it. After the classifier model is constructed and Training is done. Using this model
the test data is classified. After all this process is completed a web interface is designed for
presenting the analysis result

Figure 4.12. Design Diagram of Sentiment analysis

Department of Computer Science Engineering, ASIET 35


Sentiment Analysis on E-Commerce Platform May 2019

CHAPTER 5
RESULT AND ANALYSIS

5.1 USER INTERFACE

The result of sentiment analysis is presented as a website run on local server.XAMPP is used
for thispurpos.​XAMPP is a free and open-source cross-platform web server solution stack
package developed by Apache Friends, consisting mainly of the Apache HTTP Server,
MariaDB database, and interpreters for scripts written in PHP and Perl programming
languages.

To run our project after initialising the Xampp apache server and MySQL go to the google
chrome and in the URL area type: ​http://localhost/Project_Dynamic_Facet/home.php

Here Project_Dynamic_Facet is the name of the folder in which home.php is stored


This will open the following page:

Theses 3 buttons on click will be redirected to their respective pages.

Camera:
It will be redirected to the following page:
​http://localhost/Project_Dynamic%20Facet/camera.php

Department of Computer Science Engineering, ASIET 36


Sentiment Analysis on E-Commerce Platform May 2019

In the drop down box type we can select the required criteria and in the input box next to it type
down the value.

On clicking Submit the list of cameras with specified criteria will be displayed.

We can select the product and this will redirect to the product page. :
http://localhost/Project_Dynamic%20Facet/camera2.php?model=Canon%20EOS%20100D%2
0SLR
Every comment can be evaluated individually and the overall average will be given at the top,
below the description of the product.

Department of Computer Science Engineering, ASIET 37


Sentiment Analysis on E-Commerce Platform May 2019

Laptop:
It will be redirected to the following page:
http://localhost/Project_Dynamic%20Facet/laptop.php

Since this section has so many products the show drop down box is used to limit the entries to a
number.
In the Search input box we can look for any particular brand or property of that laptop.

In the bottom we even have a separate search for particular properties.


We have 4 pages to navigate.

Department of Computer Science Engineering, ASIET 38


Sentiment Analysis on E-Commerce Platform May 2019

We can select the product and this will redirect to the product page. :
http://localhost/Project_Dynamic%20Facet/laptop2.php?model=Toshiba%20C50-A%20P0011
%20Satellite%20Laptop
Every comment can be evaluated individually and the overall average will be given at the top,
below the description of the product.
Mobile phone:
It will be redirected to the following page:
http://localhost/Project_Dynamic%20Facet/phone.php

In the drop down box type we can select the required criteria and in the input box next to it type
down the value.

On clicking Submit the list of phones with specified criteria will be displayed.

Department of Computer Science Engineering, ASIET 39


Sentiment Analysis on E-Commerce Platform May 2019

We can select the product and this will redirect to the product page. :
http://localhost/Project_Dynamic%20Facet/phone2.php?model=Apple%20iPhone%206
Every comment can be evaluated individually and the overall average will be given at the top,
below the description of the product.
The final results are displayed along with the description and all the reviews. When evaluating
a review that particular review’s score will be given at the top along with the overall rating and
polarity.

Department of Computer Science Engineering, ASIET 40


Sentiment Analysis on E-Commerce Platform May 2019

5.2 ANALYSIS OF THE ALGORITHM

Sentiment analysis is done using Naive Bayes Algorithm. The accuracy of this algorithm can
be calculated as

Table 5.1. Confusion matrix

Accuracy = (a+d)/(a+b+c+d)
A confusion matrix is a table that is often used to describe the performance of a classification
model (or "classifier") on a set of test data for which the true values are known. The confusion
matrix itself is relatively simple to understand, but the related terminology can be confusing

The accuracy of naive bayes used in our sentiment analysis is calculated to determine how well
the analyser works. Using 600 test data the accuracy of the system was calculated.
Total Observations in Table: 600

Table 5.2. Confusion Matrix Example

Department of Computer Science Engineering, ASIET 41


Sentiment Analysis on E-Commerce Platform May 2019

Accuracy=(241+233)/600
Model accuracy ​= 0.790

5.3 WHY NAIVE BAYES?

Naive Bayes has some characteristics which makes it different from other algorithms. It is very
simple, easy to implement and fast. If the Naive Bayes conditional independence assumption
holds, then it will converge quicker than discriminative models like logistic regression.
Even if the Naive Bayes assumption doesn’t hold, it works great in practice.It need less training
data.It is highly scalable. It scales linearly with the number of predictors and data points. It can
be used for both binary and mult-iclass classification problems.It can also make probabilistic
predictions.It handles continuous and discrete data. It is not sensitive to irrelevant features.

The characteristic comparison of Naive Bayes with other algorithms are:

Features Naive Bayes Max Entropy Boosted trees SVM KNN

Based on Bayes Feature Decision tree Distance Nearest


Theorem based Learning vector neighbor
classifier

Simplicity Very Simple Hard Moderate Moderate Simple

Performance Better Good Good Better Poor

Accuracy Good High Poor Good Good

Memory Low High Low High Low


Requirement

Time Low Moderate High Moderate Very low


Required
Table 5.3. Comparison of algorithms

Department of Computer Science Engineering, ASIET 42


Sentiment Analysis on E-Commerce Platform May 2019

CHAPTER 6
ADVANTAGES

● Easy to use Interface


User just need to evaluate a comment to know its polarity. Just by clicking the product
the user can know about the genuineness of a product.

● Avoid Cheating By Sellers


Users can actually find the genuineness of a product from the user reviews. So the fake
description of the product by Sellers can be identified and thus sellers won't be able to
cheat buyers anymore.

● Know the product before buying it:


People can know more about the quality of products even before buying it. The product
genuineness is taken into consideration thus the quality of the product is never
compromised. People can learn the negatives and positives of the product from already
existing users even before buying a product.

● Free to Use:
The project is implemented and developed to improve the customer problems as an
open source. This is available free of cost to customers.

● Zero budget project:


Apart from internet charges the project is developed at zero cost. No further expenses
were there.

● Useful to sellers to identify target customers:


They can create better products and services, and they can formulate the marketing
messages they send out according to the sentiments being expressed by their target
audience or customers.

Department of Computer Science Engineering, ASIET 43


Sentiment Analysis on E-Commerce Platform May 2019

CHAPTER 7
DISADVANTAGES

● The analyser comes with certain disadvantages.


● The accuracy and efficiency of the algorithm is based on the train dataset.
● With our current version data should be entered to the database.
● Have problems in recognizing sentences with sarcasm and irony, negations, jokes, and
exaggerations - these sort of things can cause trouble to a person also. And failing to
recognize these can skew the results.
● Fake comments - To identify legitimate users and their comments apart from fake
comments are a great threat to the whole process.
● Disappointed' may be classified as a negative word for the purposes of sentiment
analysis, but within the phrase “I wasn't disappointed", it should be classified as
positive.
● Sentiment is inherently subjective from person to person, and can even be outright
irrational. An individual’s sentiment toward a brand or product may be influenced by
one or more indirect causes. Since sentiment very likely changes over time according to
a person’s mood, world events, and so forth, it’s usually important to look at data from
the standpoint of time.
● It's the aggregate that matters. It's critical to mine a large and relevant sample of data
when attempting to measure sentiment. No particular data point is necessarily relevant.
With a large enough sample, outliers are diluted in the aggregate.
● Cultural and local differences, where some people from some countries might be more
or less effusive in their use of language(Every review should be in common
language--english)
● Going beyond the polarity of “positive” and “negative” to classify sentiment, and
using more fine-grained categories like “angry,” “happy,” “frustrated,” and “sad.”
So, automated sentiment analysis tools do a really great job of analysing text for
opinion and attitude, but they're not perfect.

Department of Computer Science Engineering, ASIET 44


Sentiment Analysis on E-Commerce Platform May 2019

CHAPTER 8
APPLICATIONS

When experimenting with machine learning and big data, you may identify data sets that
contain streams of text that contain customer reviews, or social media posts where customers
(or potential customers) are talking about a product, brand or service that you offer. Mining
such data to determine how people feel about your product, brand, or service, is called
Sentiment Analysis.The applications of sentiment analysis in business are plenty and
overwhelming. Gaining a greater business value with sentiment analysis depends on what tool
you use and how well to use it to your advantage.

● E-commerce platforms like Amazon.

This application when embedded with the E-commerce platforms can help in improving the
working of the platforms in many ways. It improves the platform’s reliability and can improve
marketing and become more profitable. The two main advantages or aspects while using this in
the platforms are:

❏ Reputation management

It can also be called as brand monitoring. We all know how much good reputation means these
days when the majority of us check social media reviews as well as review sites before making
a purchase decision. Now people don't decide to eat out without checking the reviews of a
place beforehand. The same thing applies to buying stuff online, or researching tools used
daily at work.

Negative reviews put people off and how it is handled can define your future as a business. It
could either ignore them (highly not recommended), act rude and make the situation even
worse, or apologise for whatever caused a person to write a negative opinion and do what is
best to make up for it.
But we have to be aware of those opinions in the first place. That’s where social media
monitoring combined with sentiment analysis comes in! While some say it’s just a fad or

Department of Computer Science Engineering, ASIET 45


Sentiment Analysis on E-Commerce Platform May 2019

something that only big businesses can use, it is believed that a social media monitoring tool
not only will help you manage your reputation, but also prevent your customers from turning to
your competitors and earn you money they could spend elsewhere.

A brand is not defined by the product it manufactures or the services it provides. The name
and fame that build a brand majorly depend on their online marketing, social campaigning,
content marketing and customer support services. Sentiment analysis in business helps in
quantifying the perception of the present and potential customers regarding all these factors.
Keeping the negative sentiments in knowledge, you can develop more appealing branding
techniques and marketing strategies to switch from torpid to terrific brand status. Sentiment
analysis in business can majorly help you to make a quick transition.

❏ Customer support

Social media are channels of communication with your customers these days, and whenever
they’re unhappy about something related to product, whether or not it’s the fault of the product,
they’ll call that out on Facebook/Twitter/Instagram.
Such mentions will appear in dashboard with a flashing red colour, and it should be engaged as
soon as they are there.

People nowadays expect brands to respond on social media almost immediately, and if you’re
not quick enough, you might as well see them moving on to your competitors instead of
waiting for your reply.

A business breathes on the gratification of its customers. The experience of the customers can
either be positive, negative or neutral. Owing to the internet savvy era, this experience becomes
the text of their social posting and online feedback. The tone and temperament of this data can
be detected and then categorized according to the sentiments attached. This helps to know what
is being properly implemented with regard to products, services and customer support and what
needs improvement.

Department of Computer Science Engineering, ASIET 46


Sentiment Analysis on E-Commerce Platform May 2019

Getting a positive response to your product is not always enough. The customer support system
of your company should always be impeccable no matter how phenomenal your services are.

❏ Competitor monitoring

Chances are some of your competitors are getting bad press online. It’s where you could step in
as long as you’re aware of those negative mentions. It is not about taking advantage of
whatever they had neglected in an aggressive way, but chiming in conversations when they
don’t even bother to reply to the mentions they are getting can be helpful.

It doesn’t necessarily need to put a competitor in a bad light, it can be a situation when it’s
totally fine to pop in with a helping hand. Not only does it solve the problem of a person
asking, but also represents a proactive approach indicating that you have your ear to the ground
with whatever’s going on in the industry. It is still difficult for the vast majority of tools to
precisely evaluate what truly is a negative, neutral, and a positive statement. At the moment it’s
not advanced enough to successfully deal with sarcasm or context of some of the discussions.

The applications of sentiment analysis in business should be open to experimenting with it


tactfully. Sentiment analysis can be performed on any piece of text. So, why just settle for
applying it to your brand? Getting x% negative or positive reviews on a certain product doesn’t
make much sense if you don’t have a y% metric to compare it with. Knowing the sentiment
data of your competitors gives you the opportunity as well as the incentive to perk up your
performance. Sentiment analysis in businesses can be very helpful in predicting the customer
trends. Once you get acquainted with the current customer trends, strategies can easily be
developed to capitalize on them. And eventually, gain a leading edge in the competition.

❏ Sentiment Analysis in Business Intelligence Buildup

Having insights-rich information eliminates the guesswork and execution of timely decisions.
With the sentiment data about your established and the new products, it’s easier to estimate
your customer retention rate. Based on the reviews generated through sentiment analysis in

Department of Computer Science Engineering, ASIET 47


Sentiment Analysis on E-Commerce Platform May 2019

business, you can always adjust to the present market situation and satisfy your customers in a
better way. Overall, you can make immediate decisions with automated insights. Business
intelligence is all about staying dynamic throughout. Having the sentiments data gives you that
liberty. If you develop a big idea, you can test it before bringing life to it. This is known as
concept testing. Whether it is a new product, campaign or a new logo, just put it to concept
testing and analyze the sentiments attached to it.

Department of Computer Science Engineering, ASIET 48


Sentiment Analysis on E-Commerce Platform May 2019

CHAPTER 9
SUMMARY
In this project the sentiment analysis on E-commerce platform was developed and discussed.
Sentiment analysis is an emerging technology which helps in mining opinions from a large
group of texts. To implement this project the naive bayes classifier was used. Naive Bayes
classifier makes use of naive bayes theorem with conditional probability. The sentiment of a
word is determined by its ratio of number of occurrence to the total number of words in both
the positive and negative datasets. The bigger ratio set is assigned as the sentiment of that
word.

The related works are also discussed in this report. Some of the applications which performs
the similar functions like Review meta, Fakespot, Metacritic, Trust You are discussed.

The workings of the system is shown using different uml diagrams. The process and the code is
also discussed in this report.

The project collects data from Amazon using ParseHub and this data is preprocessed and kept
in formats to help in easily analysing the data. Using the training data set a classifier is built
and using this classifier the test data is analysed.

The reviews from amazon are displayed as a web page interface and the user can evaluate the
rating of each comment and from this average value is found and the total rating is displayed.

Department of Computer Science Engineering, ASIET 49


Sentiment Analysis on E-Commerce Platform May 2019

CHAPTER 10
CONCLUSION

Analysing the genuineness of a product is an important factor to be considered while shopping


online. With the infusion of natural language processing and algorithms like naive bayes to
perform sentiment analysis, we propose an application to analyse the user reviews and predict
the genuineness of a product.
This project is going to be focused on solving the problem of customers or online buyers who
worry about the quality of a product. We are going to take advantage of user reviews available
to us and use it to analyse and predict the genuineness of a product. We are going to break
everything into logical steps that allow us to ensure the cleanest, most realistic data for our
model to make accurate analysis from.
For this we take a large data set which is used to create a classification model for predicting the
genuineness of a product, this classification model is further developed into a website which
provide rich interface for the user .This app will provide accurate and efficient output based on
the input parameters.
From this project report we can analyse the working of a sentiment analysis on E-commerce
platforms and its importance in the future days. This project also states the importance of
sentiment analysis in future and explains the working of naive bayes used in sentiment
analysis.

Department of Computer Science Engineering, ASIET 50


Sentiment Analysis on E-Commerce Platform May 2019

CHAPTER 11
FUTURE ENHANCEMENT

In recent years, we have seen the democratization of sentiment analysis, in that it’s now being
offered as-a-service. Companies such as Microsoft, IBM and smaller emerging companies offer
REST APIs that integrate easily with your existing software applications. For example, using
the following publicly available Sentiment Analysis REST API from a small start-up called
Social Opinion, we pass in the text, “this phone is awesome”, to the following URL:

http://api.socialopinion.co.uk/api/sentiment/?text=phone%25awesome&token=00000

The REST API then returns the following response:

REST API response after passing a text to Social Opinion in sentiment analysis

In the response, we can see the text has been identified as expressing positive emotion, with a
64% probability of that being true.

Sentiment Analysis has been more than just a social analytic tool. It’s been an interesting field
of study. But it is a field that is still being studied, although not at great lengths due to the
intricacies of this analysis. That is this field has functions that are too complicated for machines
to understand. The ability to understand sarcasm, hyperbole, positive feelings, or negative
feelings has been difficult, for machines that lack feelings. Algorithms have not been able to
predict with more than 60% accuracy the feelings portrayed by people. Yet with so many
limitations this is one field which is growing at great pace within many industries. Companies
want to accommodate the sentiment analysis tools into areas of customer feedback, marketing,
CRM, and ecommerce.

Department of Computer Science Engineering, ASIET 51


Sentiment Analysis on E-Commerce Platform May 2019

REFERENCES

1. Tanjim Ul Haque​, ​Nudrat Nawal Saber​,​Faisal Muhammad Shah​(2018), “Sentiment

analysis on large scale Amazon product reviews”, ​10.1109/ICIRD.2018.8376299​ , 30

2. Shailendra Narayan Singh,​Nitu Kumar​i​(2016), “Sentiment analysis on E-commerce

application by using opinion mining”​10.1109/CONFLUENCE.2016.7508136​, 16

3. Prof. K.Sudheer,Dr. B Valarmathi​(2018),“REAL TIME SENTIMENT ANALYSIS

OF E-COMMERCE WEBSITES USING MACHINE LEARNING ALGORITHMS

-Volume 9, Issue 2“,IJMET_09_02_018,14.

4. Pablo Gamallo, MarcosGarcia ​(2014), “Citius: A Naïve Bayes Strategy for Sentiment

Analysis on English Tweets”, SemEval2014026, 5.

5. T. Wilson, J. Wiebe, and P. Hoffmann ​(2005), “Recognizing contextual polarity in

phrase-level sentiment analysis”,emnlp05polarity, 8 .

6. Mrs. Sayantani Ghosh, Mr. sudipta Roy, Prof. Samir K. Bandyopadhyay ​(2012)

“A tutorial review on Text Mining Algorithm - Vol. 1, Issue 4,” 2278 – 1021,11.

7. Gautami Tripathi and Naganna S ​(2014) “Opinion Mining: A Review-Volume 4,

Number 16”, 0974-2239, 12.

8. Alec Go, Richa Bhayani, Lei Huang (2016) “Twitter Sentiment Classification using

Distant Supervision”34632156, 6.

9. Tina R. Patil, Mrs. S. S. Sherekar (2013)“Performance Analysis of Naive Bayes and

J48 Classification Algorithm for Data Classification Vol. 6, No.2”, 0974-1011, 6.

10. Callen Rain ​(2014) “Sentiment Analysis in Amazon Reviews Using Probabilistic

Machine Learning ,7.

Department of Computer Science Engineering, ASIET 52


Sentiment Analysis on E-Commerce Platform May 2019

11. Jesus Serrano-Guerreroa, Jose A. Olivasa, Francisco P. Romeroa,

EnriqueHerrera-Viedma (2014) “Sentiment analysis: A review and comparative

analysis of web services”, S0020025515002054, 36.

12. ​Wenyuan Dai† Gui-Rong Xue, Qiang Yang, Yong YuTransferring (2007),”Naive

Bayes Classifiers for Text Classification”,AAAI07-085,6.

Department of Computer Science Engineering, ASIET 53

You might also like