Download as pdf or txt
Download as pdf or txt
You are on page 1of 27

SENTIMENTS ANALYSIS USING AI

PROJECT REPORT
Submitted in the Partial Fulfillment of the Requirements for the Award of Degree of

BACHELOR OF TECHNOLOGY
In
Electronics & Communication
Engineering

SUBMITTED BY
Name of Candidate : Shivangi
RollNo :1816691
Batch :2018-2022

CHANDIGARH ENGINEERING COLLEGE


JHANJERI, MOHALI

Affiliated to I.K Gujral Punjab Technical University, Jalandhar


CERTIFICATE

I hereby certify that the work which is being presented in the project report entitled

“Sentiments Analysis Using AI” by Shivangi in partial fulfillment of requirements for

the award of degree of B.Tech. Electronics& Communication Engineering submitted in the

Department of Electronics and Communication Engineering at CHANDIGARH

ENGINEERING COLLEGE, JHANJERI, Mohali under I.K. GUJRAL PUNJAB

TECHNICAL UNIVERSITY, JALANDHAR is an authentic record of my own work

under the supervision of Ms. Malika Arora

Date: 13/06/2021

Shivangi
Roll No: 1816691

Signature of the SUPERVISOR


Ms Malika Arora

Signature of H.O.D
ACKNOWLEDGEMENT

Working on this project is a great experience and sincere thanks to my faculty members. It is

a great opportunity to work under guidance of Ms. Malika Arora .It would have not been

possible to carry out the work with such ease without his immense help and motivation. I

consider my privilege to express my gratitude, respect and thanks to all of them who are

behind who guide me in choosing this project. I express sincere gratitude to Dr. Sajjan

Singh (HOD, ECE) , forth is everlasting support towards the students for providing us this

opportunity and his support

Shivangi
ABSTRACT

Over the last few years, e-commerce has become an indispensable part of the global retail
framework. Like many other industries, the retail landscape has undergone a substantial
transformation following the advent of the internet. Internet users can choose from various
online platforms to browse, compare, and purchase the items or services they need. With
increase in modernization number of e- commerce sites are also increasing . Therefore , it
becomes very important for every e-commerce company to analyse their customer’s reviews
and work upon their services to eliminate their competition.This project focuses
implementing machine learning algorithm to do a collective analysis of all the reviews
received by various e- commerce sites .A major focus of this study was on comparing
different machine learning algorithms for the task of sentiment classification . The major
findings were that out of the classification algorithms evaluated it was found that the
Logistic Regression provides the highest classification accuracy for this domain . From the
evaluation of this study it can be concluded that the proposed machine learning and natural
language processing technique are an efficient and practical methods of sentiment analysis.
TABLE OF CONTENTS

Contents
Title Page
Certificate
Acknowledgments
Abstract
Table of Contents
Chapter1: INTRODUCTION
1.1 General Introduction
1.2 Current Open problems/Issues
1.2.1 linguistic approach
1.2.2 Machine Learning Approach
1.3 Problem Statement
Chapter2: LITERATURE SURVEY
Chapter3: SYSTEM DESIGN
3.1 Software Requirements
3.2 Hardware Requirements
3.3 Technology Used
3.4 Data information
3.5 Data format
Chapter4: METHODOLOGY FOR IMPLEMENTATION
4.1 Data Collection
4.2 Sentiment Sentence Extraction & Pos Tagging
4.3 Negative Phrase Identification
4.4 Sentiments Analysis Algorithm

Chapter5: IMPLEMENTATION DETAILS


5.1 Data preprocessing
5.2 Training of Model
5.3 Web scrapper

References

A APPENDIX
5.4 Designing UI

Chapter 6: RESULT AND SAMPLE OUTPUTS


6.1 Collective Analysis
6.2 Dropdown Box
6.3 Textbox
Conclusion And Findings
References
Chapter1

INTRODUCTION

1.1 General Introduction


The Sentiment analysis is part of natural language processing. Natural language
Processing is used for data analytics purpose, to extract meaningful information from lots of
data. This is one of the methods to get information about current trend in the market of what
people are thinking or talking on social media. There are so many practical applications
present in the current world like in election which party is favourable or gaining popularity or
a customer watching for reviews before actually buying something online. These are few of
the applications which are getting harder to solve as size of data keeps on increasing. Big part
goes to arrange this data into something meaningful before analysing it. This part of
arrangement of data is called Text Classification.

Sentiment classification and analysis is performed in python using NLTK module. Python
has special module NLTK to do tasks in natural language processing. It supports multiple
languages like English, Hindi, Chinese etc to do classification of text or data into something
meaningful.
Text Classification can be performed in following ways:
1. Sentiment-Classification
2. Features-based-Sentiment-classification
3. Summarization-of-sentiments

These classifications classify the complete document in accordance with the sentiments or
opinions listed in the text. Feature based approach however, classifies the sentiments based
on specifications of the entity(Noun) listed in the text. This approach reveals about good or
bad quality about certain entities based on the details listed with it. Opinions summarization
is similar to text summarization but opinion summarization gives a clear indication about the
sentiment attached with the text. It outputs the sentiment precisely not in the form of
substring of the given text, It mentions the text in the positive or negative words about the
entities so that a whole document can be best described in few words without losing the
abstract of the document. These types of classification can be performed before actually
analysing the text. After text classification, it performs tagging with the words.

Sentiment classification can be performed at different level.


1. Document Level
2. Sentence level
3. Word level
English is one of the most preferred language to work for natural language processing. This
project is based on opinions in English language, does not support other languages at all.

Consider an example : "I watched the movie burger. The movie was very good and the actor
did an awesome job."
"When Modi returned from U.S.A., I got my 15 lakhs as promised by PM Modi"

It clearly tells about the movie and the actor stating positive review. However the sentiment
classifier is still not able to classify sarcasm. It is still a big problem for data analytics and a
topic of research. How to perform this in a machine language is much harder. There are
approaches which perform such operations
1. Linguistic approach
2. Machine Learning

1.2 Current Open Problems/Issues


1. Linguistic Approach:
It the basic approach to deal with the sentiment analysis. It uses tagging technique with
the tokens and then starts analyzing it. The problems with this approach is
 Negation: This approach can not deal with negation very well. Few times, this
approach produces opposite in sense result .
 Grammatically incorrect sentences: This approach uses datasets to match words
during
 tagging. So if a sentences with polarity is formed grammatically incorrect, it is not
possible to match it with the existing datasets of polar words. The datasets must
contain all the polar words used in regular language to make it more efficient
 Sometimes, users say something but mean something else type sentences in the text
which make sense but not analyzable by a machine.
2. Machine-Learning-approach:
There are methods which classify the text like naive_bias or S_V_M also suffers from
problems like:
 Sarcasm:.
 Jumbling of words
 Chatting text or tweets
 Limited words to type

1.3 Problem Statement

 Scrapping product reviews on various websites featuring various products specifically


amazon.com.
 Analyze and categorize review data.
 Analyze sentiment on dataset from document level (review level).
 Categorization or classification of opinion sentiment into-
o Positive
o Negative
Chapter 2
LITERATURE SURVEY
Paper-1: Sentiment Analysis and Opining Mining By: G.Vinodhini and RM.
Chandrasekaran [June 6,2012]
Department of Computer Science and Engineering, Annamalai University, Annamalai
Nagar608002.
Summary: The big volume of data present on internet today consisting of regular updating and
increasing in size of social networks, news, entertainment, reviews, blogs, discussions forums
provides a large number of opinions. The data analytics focus of these opinions for sentiment
analysis work. Researchers are currently working to build a software to detect and classify the
texts available online. The precise information extracted from these type of resources present on
internet today can give us lots of information about user's liking, disliking, what they want or do
not want to buy and it can be used by the other party to take advantage of this information to
provide better deals to the users or help users to get better deals in case of reviews. The data
available on internet after classification and analyzing can be very valuable to the users. This
paper detailed about the survey describing about the methods in data analytics and the problems
exist in the area of data analytics /sentiment analysis.
Weblink- http://www.dmi.unict.it/~faro/tesi/sentiment_analysis/SA2.pdf

Paper-2: Boost up! Sentiment Categorization with Machine Learning Techniques By:
Andres Cassinelli, Chih-Wei Chen [June 5,2009]
Summary: To calculate the sentiment of a given text or opinion or review, it is noted that
methods have an analysis nearly same to the past works in data analytics in reviews or sentiment
analysis, it works precisely in a better way. If these methods are applied to the multi-
classification techniques, the results could be quite same. On applying classification techniques
on the data, it first uses the data as training set to train itself and the evaluates the rest of the data,
so the technique mentioned in the paper describes the relationship between the objects in an
efficient way. Weblink- http://www.cs.cornell.edu/home/llee/papers/sentiment.pdf
Paper-3: Twitter as a Corpus for Sentiment Analysis and Opinion Mining By: Alexander
Pak, Patrick Paroubek [2010] University de Paris-Sud, Laboratory LIMSI-CNRS,
Batiment 508, F- 91405 Orsay Cedex, France
Summary: Today Social network sites like twitter, facebook, google plus, linkedin etc are
famous tools to communicate with other people on internet. Thousands of people shares
information with each other. This information may be useful for some or waste data for some. If
properly analysed, this data could be very useful for some purposes. It may be in the form of
opinions or results to others. So these social sites can be very effective in generating information
(also useful) about so many aspects in today's life for human. But there is less work done in
recent times because these social networking sites came into existence shortly
In this paper, the author specifies the details using Twitter, one of the most famous social
network in present world, for the works of sentiment analysis.
Weblink: http://lrec-conf.org/proceedings/lrec2010/pdf/385_Paper.pdf 2.2

Paper-4: Semantic Sentiment Analysis of Twitter By: Hassan Saif, Yulan He and Harith
Alani [Nov 2012] Knowledge Media Institute, The Open University, United Kingdom
Summary: They have introduce a novel approach of adding semantics as additional features into
the training set for sentiment analysis. For each extracted entity (e.g. iPhone) from tweets, we
add its semantic concept (e.g. “Apple product”) as an additional feature, and measure the
correlation of the representative concept with negative/positive sentiment.

Paper-5: What’s Great and What’s Not: Learning to Classify the Scope of Negation for
Improved Sentiment Analysis By: Isaac G. Councill, Ryan McDonald, Leonid Velikovich
[July 2010]
Summary: They presents a negation detection system based on a conditional random field
modelled using features from an English dependency parser. The scope of negation detection is
limited to explicit rather than implied negations within single sentences. Paper-6: TwiSent: A
Multistage System for Analyzing Sentiment in Twitter By: Subhabrata Mukherjee1, Akshat
Malu1, Balamurali A.R, Pushpak Bhattacharyya [Feb 2013] Dept. of Computer Science and
Engineering, IIT Bombay,2IITB-Monash Research Academy Summary: They have presented
TwiSent, a sentiment analysis system for Twitter. Based on the topic searched, TwiSent collects
tweets pertaining to it and categorizes theminto the different polarity classes positive, negative
and objective. However, analyzing micro-blog posts have many inherent challenges compared to
the other text genres.
Chapter 3
SYSTEM DESIGN
3.1 Hardware Requirements:
 Core i5/i7 processor
 At least 8 GB RAM
 At least 30 GB of Usable Hard Disk Space

3.2 Software Requirements:


 Python 3.x
 Anaconda Distribution
 Any Operating System.

3.3 Technology required:


 Python
 Pandas
 Machine learning(classification)
 NLP(NLTK, Textblob, Vader)
 Web scrapping(beautiful soup)
 Database handling
 Dash from plotly
 Transfer Learning
 Power BI/ Tableau

3.3 Data Information:

The Amazon reviews dataset consists of reviews from amazon. The data span a period of 18
years, including ~35 million reviews up to 2018. Reviews include product and user
information, ratings, and a plaintext review The Amazon reviews full score dataset is
constructed by randomly taking 24,000 samples for each review score from 1 to 5. In total
there are 1,000,000 samples in one chunk . total there are 33 chunks
1.4 Data Format:

The dataset we will use is .json file. The sample of the dataset is given below.
{
"reviewSummary": "Surprisingly delightful", "reviewText": “ This is a first read filled
with unexpected humor and profound insights into the art of politics and policy. In brief, it is
sly, wry, and wise. ”, "reviewRating": “4”,}
Chapter 4
METHODOLOGY FOR IMPLEMENTATION

4.1 Data Collection:


Data which means product reviews collected from amazon.com from May 1996 to July 2014.
Each review includes the following information: 1) reviewer ID; 2) product ID; 3) rating; 4)
time of the review; 5) helpfulness; 6) review text. Every rating is based on a 5-star scale,
resulting all the ratings to be ranged from 1-star to 5-star with no existence of a half-star or a
quarter-star.

4.2 Sentiment Sentence Extraction & Pos Tagging:


Tokenization of reviews after removal of STOP words which mean nothing related to
sentiment is the basic requirement for POS tagging. After proper removal of STOP words
like “am, is, are, the, but” and so on the remaining sentences are converted in tokens. These
tokens take part in POS tagging In natural language processing, part-of-speech (POS) taggers
have been developed to classify words based on their parts of speech. For sentiment analysis,
a POS tagger is very useful because of the following two reasons: 1) Words like nouns and
pronouns usually do not contain any sentiment. It is able to filter out such words with the
help of a POS tagger; 2) A POS tagger can also be used to distinguish words that can be used
in different parts of speech.

4.3 Negetive Phrase Identification:


Words such as adjectives and verbs are able to convey opposite sentiment with the help of
negative prefixes. For instance, consider the following sentence that was found in an
electronic device’s review: “The built in speaker also has its uses but so far nothing
revolutionary." The word, “revolutionary" is a positive word according to the list in.
However, the phrase “nothing revolutionary" gives more or less negative feelings. Therefore,
it is crucial to identify such phrases. In this work, there are two types of phrases have been
identified, namely negation-of-adjective (NOA) and negation-of-verb (NOV)
4.4 Sentiments Analysis Algorithms
1. K-Nearest Neighbour :

 K-NN algorithm assumes the similarity between the new case/data and available cases
and put the new case into the category that is most similar to the available categories.
 K-NN algorithm stores all the available data and classifies a new data point based on
the similarity. This means when new data appears then it can be easily classified into a
well suite category by using K- NN algorithm.
 K-NN algorithm can be used for Regression as well as for Classification but mostly it
is used for the Classification problems.
 K-NN is a non-parametric algorithm, which means it does not make any assumption on
underlying data.
 It is also called a lazy learner algorithm because it does not learn from the training set
immediately instead it stores the dataset and at the time of classification, it performs an
action on the dataset.
 KNN algorithm at the training phase just stores the dataset and when it gets new data,
then it classifies that data into a category that is much similar to the new data.

2. Naïve Bayesian classifier:

 It is mainly used in text classification that includes a high-dimensional training dataset.


 Naïve Bayes Classifier is one of the simple and most effective Classification algorithms
which helps in building the fast machine learning models that can make quick
predictions.
 It is a probabilistic classifier, which means it predicts on the basis of the probability of
an object.
 Some popular examples of Naïve Bayes Algorithm are spam filtration, Sentimental
analysis, and classifying articles.
 The formula for Bayes' theorem is given a

3. SVM (Support Vector Machine):

Support Vector Machine or SVM is one of the most popular Supervised Learning
algorithms, which is used for Classification as well as Regression problems. However,
primarily, it is used for Classification problems in Machine Learning.

The goal of the SVM algorithm is to create the best line or decision boundary that can
segregate n-dimensional space into classes so that we can easily put the new data point in
the correct category in the future. This best decision boundary is called a hyperplane.

SVM chooses the extreme points/vectors that help in creating the hyperplane. These
extreme cases are called as support vectors, and hence algorithm is termed as Support
Vector Machine. Consider the below diagram in which there are two different categories
that are classified using a decision boundary or hyperplane:
4. Logistic Regression:
 Logistic regression predicts the output of a categorical dependent variable. Therefore the
outcome must be a categorical or discrete value. It can be either Yes or No, 0 or 1, true
or False, etc. but instead of giving the exact value as 0 and 1, it gives the probabilistic
values which lie between 0 and 1.
 Logistic Regression is much similar to the Linear Regression except that how they are
used. Linear Regression is used for solving Regression problems, whereas Logistic
regression is used for solving the classification problems.
 In Logistic regression, instead of fitting a regression line, we fit an "S" shaped logistic
function, which predicts two maximum values (0 or 1).
 The curve from the logistic function indicates the likelihood of something such as
whether the cells are cancerous or not, a mouse is obese or not based on its weight, etc.
 Logistic Regression is a significant machine learning algorithm because it has the ability
to provide probabilities and classify new data using continuous and discrete datasets.
 Logistic Regression can be used to classify the observations using different types of data
and can easily determine the most effective variables used for the classification. The
below image is showing the logistic function:
Chapter 5
IMPLEMENTATION DETAILS
5.1 Data Preprocessing
The data which is used to train the model is of almost 14 gb . That much amount of data
cannot be processed at once. So data was processed in chunks and then combined further
after retrieving usable data form whole dataset. And final data was stired as

“final_review”

5.2. Training the model:


Using preprocessed data and various machine learning algorithms model was trained.
5.3. Sentiment Sentence Extraction & Pos Tagging:
Tokenization of reviews after removal of STOP words which mean nothing related to
sentiment is the basic requirement for POS tagging. After proper removal of STOP words
like “am, is, are, the, but” and so on the remaining sentences are converted in tokens. These
tokens take part in POS tagging In natural language processing, part-of-speech (POS)
taggers have been developed to classify words based on their parts of speech
5.4. Web Scraper
Creating a web scrapper to scrape the data from Flipkart/etsy to test the newly created
model by saving the reviews/feedback into a database.
5.5 Designing UI:
Create a Dashboard using the Dash framework to integrate the picked file and predict the
sentiments of the stored data from the database.
Chapter 6
RESULTS AND SAMPLE OUTPUTS
The ultimate outcome of this Training of Public reviews dataset is that, the machine is capable
of judging whether an entered sentence bears positive response or negative response and to give
a collective result on dataset .
6.1.Collective Analysis:
This part of project takes up dataset as input and give output in the form of pie chart as shown:

6.2.Drop downbox:
This part of project will select one review from dropdown which are extracted from website
and give result as positive or negative
6.3. Textbox:
This part of project will take input as a texted review and give result as positive or negative
.
FINDINGS AND CONCLUSION

Findings:
The sentiment analysis is efficient for simple English, not for any other language. The sentence
formation must be simple and straight forward because it does not handle various cases of
sentences formation like jumbling of words or sarcastic sentences. Input can be taken from
NLTK in text format and similarly displayed. NLTK module works really good for natural
language processing. It also provides other techniques to classify the text like naive-bias
classifier or SVM includes different kind of tagging functions to add tags with tokens

Conclusion:
Sentiment analysis deals with the classification of texts based on the sentiments they contain.
This article focuses on a typical sentiment analysis model consisting of three core steps, namely
data preparation, review analysis and sentiment classification, and describes representative
techniques involved in those steps.

Sentiment analysis is an emerging research area in text mining and computational linguistics, and
has attracted considerable research attention in the past few years. Future research shall explore
sophisticated methods for opinion and product feature extraction, as well as new classification
models that can address the ordered labels property in rating inference. Applications that utilize
results from sentiment analysis is also expected to emerge in the near future.

Future Scops:
 Using different techniques like machine learning ,super_wised learnig to train the one part of
text and use this training to analyze the rest of the text.
 Combine different techniques to see the result of combined approach of algorithms
 This work can be extended for other languages like Hindi etc
 Construction of Regular Grammar makes the tagging part more efficient. Generate own
regular expressions.
REFERENCES

1. http://www.dmi.unict.it/~faro/tesi/sentiment_analysis/SA2.pdf
2. http://www.cs.cornell.edu/home/llee/papers/sentiment.pdf
3. http://lrec-conf.org/proceedings/lrec2010/pdf/385_Paper.pdf 2.2
4. http://help.sentiment140.com/for-students
5. http://www.gbsheli.com/2009/03/twitgraph-en.html
6. http://en.wikipedia.org
7. http://ravikiranj.net/drupal/201205/code/machine-learning/how-build-twitter-
sentimentanalyzer
8. www.javatpoint.com
9. https://www.google.com/search?q=dash+tutorial+for+beginners&oq=Dash+tutoria&aqs=chr
ome.2.0j69i57j0l3j69i60l3.8612j0j7&sourceid=chrome&ie=UTF-8
10. https://www.edureka.co/blog/web-scraping-with-python/

You might also like