Professional Documents
Culture Documents
Sentiments Analysis Using Ai: Project Report
Sentiments Analysis Using Ai: Project Report
PROJECT REPORT
Submitted in the Partial Fulfillment of the Requirements for the Award of Degree of
BACHELOR OF TECHNOLOGY
In
Electronics & Communication
Engineering
SUBMITTED BY
Name of Candidate : Shivangi
RollNo :1816691
Batch :2018-2022
I hereby certify that the work which is being presented in the project report entitled
Date: 13/06/2021
Shivangi
Roll No: 1816691
Signature of H.O.D
ACKNOWLEDGEMENT
Working on this project is a great experience and sincere thanks to my faculty members. It is
a great opportunity to work under guidance of Ms. Malika Arora .It would have not been
possible to carry out the work with such ease without his immense help and motivation. I
consider my privilege to express my gratitude, respect and thanks to all of them who are
behind who guide me in choosing this project. I express sincere gratitude to Dr. Sajjan
Singh (HOD, ECE) , forth is everlasting support towards the students for providing us this
Shivangi
ABSTRACT
Over the last few years, e-commerce has become an indispensable part of the global retail
framework. Like many other industries, the retail landscape has undergone a substantial
transformation following the advent of the internet. Internet users can choose from various
online platforms to browse, compare, and purchase the items or services they need. With
increase in modernization number of e- commerce sites are also increasing . Therefore , it
becomes very important for every e-commerce company to analyse their customer’s reviews
and work upon their services to eliminate their competition.This project focuses
implementing machine learning algorithm to do a collective analysis of all the reviews
received by various e- commerce sites .A major focus of this study was on comparing
different machine learning algorithms for the task of sentiment classification . The major
findings were that out of the classification algorithms evaluated it was found that the
Logistic Regression provides the highest classification accuracy for this domain . From the
evaluation of this study it can be concluded that the proposed machine learning and natural
language processing technique are an efficient and practical methods of sentiment analysis.
TABLE OF CONTENTS
Contents
Title Page
Certificate
Acknowledgments
Abstract
Table of Contents
Chapter1: INTRODUCTION
1.1 General Introduction
1.2 Current Open problems/Issues
1.2.1 linguistic approach
1.2.2 Machine Learning Approach
1.3 Problem Statement
Chapter2: LITERATURE SURVEY
Chapter3: SYSTEM DESIGN
3.1 Software Requirements
3.2 Hardware Requirements
3.3 Technology Used
3.4 Data information
3.5 Data format
Chapter4: METHODOLOGY FOR IMPLEMENTATION
4.1 Data Collection
4.2 Sentiment Sentence Extraction & Pos Tagging
4.3 Negative Phrase Identification
4.4 Sentiments Analysis Algorithm
References
A APPENDIX
5.4 Designing UI
INTRODUCTION
Sentiment classification and analysis is performed in python using NLTK module. Python
has special module NLTK to do tasks in natural language processing. It supports multiple
languages like English, Hindi, Chinese etc to do classification of text or data into something
meaningful.
Text Classification can be performed in following ways:
1. Sentiment-Classification
2. Features-based-Sentiment-classification
3. Summarization-of-sentiments
These classifications classify the complete document in accordance with the sentiments or
opinions listed in the text. Feature based approach however, classifies the sentiments based
on specifications of the entity(Noun) listed in the text. This approach reveals about good or
bad quality about certain entities based on the details listed with it. Opinions summarization
is similar to text summarization but opinion summarization gives a clear indication about the
sentiment attached with the text. It outputs the sentiment precisely not in the form of
substring of the given text, It mentions the text in the positive or negative words about the
entities so that a whole document can be best described in few words without losing the
abstract of the document. These types of classification can be performed before actually
analysing the text. After text classification, it performs tagging with the words.
Consider an example : "I watched the movie burger. The movie was very good and the actor
did an awesome job."
"When Modi returned from U.S.A., I got my 15 lakhs as promised by PM Modi"
It clearly tells about the movie and the actor stating positive review. However the sentiment
classifier is still not able to classify sarcasm. It is still a big problem for data analytics and a
topic of research. How to perform this in a machine language is much harder. There are
approaches which perform such operations
1. Linguistic approach
2. Machine Learning
Paper-2: Boost up! Sentiment Categorization with Machine Learning Techniques By:
Andres Cassinelli, Chih-Wei Chen [June 5,2009]
Summary: To calculate the sentiment of a given text or opinion or review, it is noted that
methods have an analysis nearly same to the past works in data analytics in reviews or sentiment
analysis, it works precisely in a better way. If these methods are applied to the multi-
classification techniques, the results could be quite same. On applying classification techniques
on the data, it first uses the data as training set to train itself and the evaluates the rest of the data,
so the technique mentioned in the paper describes the relationship between the objects in an
efficient way. Weblink- http://www.cs.cornell.edu/home/llee/papers/sentiment.pdf
Paper-3: Twitter as a Corpus for Sentiment Analysis and Opinion Mining By: Alexander
Pak, Patrick Paroubek [2010] University de Paris-Sud, Laboratory LIMSI-CNRS,
Batiment 508, F- 91405 Orsay Cedex, France
Summary: Today Social network sites like twitter, facebook, google plus, linkedin etc are
famous tools to communicate with other people on internet. Thousands of people shares
information with each other. This information may be useful for some or waste data for some. If
properly analysed, this data could be very useful for some purposes. It may be in the form of
opinions or results to others. So these social sites can be very effective in generating information
(also useful) about so many aspects in today's life for human. But there is less work done in
recent times because these social networking sites came into existence shortly
In this paper, the author specifies the details using Twitter, one of the most famous social
network in present world, for the works of sentiment analysis.
Weblink: http://lrec-conf.org/proceedings/lrec2010/pdf/385_Paper.pdf 2.2
Paper-4: Semantic Sentiment Analysis of Twitter By: Hassan Saif, Yulan He and Harith
Alani [Nov 2012] Knowledge Media Institute, The Open University, United Kingdom
Summary: They have introduce a novel approach of adding semantics as additional features into
the training set for sentiment analysis. For each extracted entity (e.g. iPhone) from tweets, we
add its semantic concept (e.g. “Apple product”) as an additional feature, and measure the
correlation of the representative concept with negative/positive sentiment.
Paper-5: What’s Great and What’s Not: Learning to Classify the Scope of Negation for
Improved Sentiment Analysis By: Isaac G. Councill, Ryan McDonald, Leonid Velikovich
[July 2010]
Summary: They presents a negation detection system based on a conditional random field
modelled using features from an English dependency parser. The scope of negation detection is
limited to explicit rather than implied negations within single sentences. Paper-6: TwiSent: A
Multistage System for Analyzing Sentiment in Twitter By: Subhabrata Mukherjee1, Akshat
Malu1, Balamurali A.R, Pushpak Bhattacharyya [Feb 2013] Dept. of Computer Science and
Engineering, IIT Bombay,2IITB-Monash Research Academy Summary: They have presented
TwiSent, a sentiment analysis system for Twitter. Based on the topic searched, TwiSent collects
tweets pertaining to it and categorizes theminto the different polarity classes positive, negative
and objective. However, analyzing micro-blog posts have many inherent challenges compared to
the other text genres.
Chapter 3
SYSTEM DESIGN
3.1 Hardware Requirements:
Core i5/i7 processor
At least 8 GB RAM
At least 30 GB of Usable Hard Disk Space
The Amazon reviews dataset consists of reviews from amazon. The data span a period of 18
years, including ~35 million reviews up to 2018. Reviews include product and user
information, ratings, and a plaintext review The Amazon reviews full score dataset is
constructed by randomly taking 24,000 samples for each review score from 1 to 5. In total
there are 1,000,000 samples in one chunk . total there are 33 chunks
1.4 Data Format:
The dataset we will use is .json file. The sample of the dataset is given below.
{
"reviewSummary": "Surprisingly delightful", "reviewText": “ This is a first read filled
with unexpected humor and profound insights into the art of politics and policy. In brief, it is
sly, wry, and wise. ”, "reviewRating": “4”,}
Chapter 4
METHODOLOGY FOR IMPLEMENTATION
K-NN algorithm assumes the similarity between the new case/data and available cases
and put the new case into the category that is most similar to the available categories.
K-NN algorithm stores all the available data and classifies a new data point based on
the similarity. This means when new data appears then it can be easily classified into a
well suite category by using K- NN algorithm.
K-NN algorithm can be used for Regression as well as for Classification but mostly it
is used for the Classification problems.
K-NN is a non-parametric algorithm, which means it does not make any assumption on
underlying data.
It is also called a lazy learner algorithm because it does not learn from the training set
immediately instead it stores the dataset and at the time of classification, it performs an
action on the dataset.
KNN algorithm at the training phase just stores the dataset and when it gets new data,
then it classifies that data into a category that is much similar to the new data.
Support Vector Machine or SVM is one of the most popular Supervised Learning
algorithms, which is used for Classification as well as Regression problems. However,
primarily, it is used for Classification problems in Machine Learning.
The goal of the SVM algorithm is to create the best line or decision boundary that can
segregate n-dimensional space into classes so that we can easily put the new data point in
the correct category in the future. This best decision boundary is called a hyperplane.
SVM chooses the extreme points/vectors that help in creating the hyperplane. These
extreme cases are called as support vectors, and hence algorithm is termed as Support
Vector Machine. Consider the below diagram in which there are two different categories
that are classified using a decision boundary or hyperplane:
4. Logistic Regression:
Logistic regression predicts the output of a categorical dependent variable. Therefore the
outcome must be a categorical or discrete value. It can be either Yes or No, 0 or 1, true
or False, etc. but instead of giving the exact value as 0 and 1, it gives the probabilistic
values which lie between 0 and 1.
Logistic Regression is much similar to the Linear Regression except that how they are
used. Linear Regression is used for solving Regression problems, whereas Logistic
regression is used for solving the classification problems.
In Logistic regression, instead of fitting a regression line, we fit an "S" shaped logistic
function, which predicts two maximum values (0 or 1).
The curve from the logistic function indicates the likelihood of something such as
whether the cells are cancerous or not, a mouse is obese or not based on its weight, etc.
Logistic Regression is a significant machine learning algorithm because it has the ability
to provide probabilities and classify new data using continuous and discrete datasets.
Logistic Regression can be used to classify the observations using different types of data
and can easily determine the most effective variables used for the classification. The
below image is showing the logistic function:
Chapter 5
IMPLEMENTATION DETAILS
5.1 Data Preprocessing
The data which is used to train the model is of almost 14 gb . That much amount of data
cannot be processed at once. So data was processed in chunks and then combined further
after retrieving usable data form whole dataset. And final data was stired as
“final_review”
6.2.Drop downbox:
This part of project will select one review from dropdown which are extracted from website
and give result as positive or negative
6.3. Textbox:
This part of project will take input as a texted review and give result as positive or negative
.
FINDINGS AND CONCLUSION
Findings:
The sentiment analysis is efficient for simple English, not for any other language. The sentence
formation must be simple and straight forward because it does not handle various cases of
sentences formation like jumbling of words or sarcastic sentences. Input can be taken from
NLTK in text format and similarly displayed. NLTK module works really good for natural
language processing. It also provides other techniques to classify the text like naive-bias
classifier or SVM includes different kind of tagging functions to add tags with tokens
Conclusion:
Sentiment analysis deals with the classification of texts based on the sentiments they contain.
This article focuses on a typical sentiment analysis model consisting of three core steps, namely
data preparation, review analysis and sentiment classification, and describes representative
techniques involved in those steps.
Sentiment analysis is an emerging research area in text mining and computational linguistics, and
has attracted considerable research attention in the past few years. Future research shall explore
sophisticated methods for opinion and product feature extraction, as well as new classification
models that can address the ordered labels property in rating inference. Applications that utilize
results from sentiment analysis is also expected to emerge in the near future.
Future Scops:
Using different techniques like machine learning ,super_wised learnig to train the one part of
text and use this training to analyze the rest of the text.
Combine different techniques to see the result of combined approach of algorithms
This work can be extended for other languages like Hindi etc
Construction of Regular Grammar makes the tagging part more efficient. Generate own
regular expressions.
REFERENCES
1. http://www.dmi.unict.it/~faro/tesi/sentiment_analysis/SA2.pdf
2. http://www.cs.cornell.edu/home/llee/papers/sentiment.pdf
3. http://lrec-conf.org/proceedings/lrec2010/pdf/385_Paper.pdf 2.2
4. http://help.sentiment140.com/for-students
5. http://www.gbsheli.com/2009/03/twitgraph-en.html
6. http://en.wikipedia.org
7. http://ravikiranj.net/drupal/201205/code/machine-learning/how-build-twitter-
sentimentanalyzer
8. www.javatpoint.com
9. https://www.google.com/search?q=dash+tutorial+for+beginners&oq=Dash+tutoria&aqs=chr
ome.2.0j69i57j0l3j69i60l3.8612j0j7&sourceid=chrome&ie=UTF-8
10. https://www.edureka.co/blog/web-scraping-with-python/