Sentiment Analysis: A NLP And: 2. Detailed Approach

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

of companies and Brand Businesses .

The Sentiment Analysis : A NLP and


methodology for precise real-world machine learning technique used to
calculation of a online social reputation / classify and interpret emotions in
influence score is already available. subjective data. It is often used in
However,there isn’t neither an applicable business to detect sentiment in social
nor suitable catalogue of such tangible data, brand reputation, and understand if
data available . This is observed in the customers like a product or not.
real-world cases where close- networks There’s two main types of sentiment
are preferred over real data. analysis, lexicon/rule based and
The use of clustering algorithms from automated/machine learning based,Here
various public data-sets can provide a I apply and implement the machine
real way to classify ,notarize and rate learning based. [2]. Main components of
based on merit any Brands this system are shown here :
perception.Through this project ,I hope
to be able to extract accurate outlook of
10 entities /Businesses.

2. Detailed Approach
The aim of this project is to extract
useful information or knowledge from
available data-sets by using some
Programs incorporating machine learning
and NLP to show insights, that were
once hidden in unstructured data, more Aspect-based Sentiment Analysis :
feasible. Natural language processing has Aspect-based classifier would be able to
many applications in today’s business determine that the sentence expresses a
world . positive, neutral, or negative Opinion.By
automatically sorting the sentiment
Natural Language Processing (NLP): behind reviews, social media
It’s a sub-field of computer science and conversations, and more, you can make
artificial intelligence which focuses on faster and more accurate decisions.[3]
how to program computers to process The approaches algorithms are
and analyze large amounts of data.[1] Rule-based: these systems automatically
perform sentiment analysis based on a
set of manually crafted rules. Automatic
methods, contrary to rule-based systems,
don't rely on manually crafted rules, but
on machine learning techniques &
classification Algorithms . 3 parts are:

 Training and Prediction Processes


 Feature Extraction from Text
 Classification Algorithms

2
2.1 Classification Challenges Brand monitoring offers a wealth of
insights from conversations happening
Sentiment analysis is one of the hardest about your brand from all over the
tasks in natural language processing internet.This model is designed to
because even humans struggle to analyze analyze the sentiment of tweets. It’s ideal
sentiments accurately.Objective texts do for social listening and detecting brand
not contain explicit sentiments, whereas sentiment in real time.
subjective texts do. Irony and Sarcasm
people express their negative sentiments
using positive words, which can be
difficult for machines to detect without
having a thorough understanding of the
context of the situation in which a feeling
was expressed.
Context and Polarity ; All utterances
are uttered in context. Analyzing
sentiment without context gets pretty
difficult. Defining Neutral; [4] As in all We use a variety of algorithms and
classification problems, defining your classifiers to determine the most positive
categories -and, in this case, the neutral rated Businesses and subsequently the
tag- is one of the most important parts of ratio a which they are positive to
the problem. What you mean by neutral, determine and calculate the most and
positive, or negative does matter when least Socially reputed brands .
you train sentiment analysis models.And For the purpose of this project I used
finally the issue of Human Annotator Python and NLTK to plot and
Accuracy;since machines learn from the extrapolate the final resulting ranking for
data they are fed, sentiment analysis a set of 10 Chinese Brands/Business.
classifiers might not be as precise as
other types of classifiers in reading what
input plans and accuracy is about
60-70% for text classifications

3. Problem Definition
The applications of sentiment analysis
are endless and can be applied to any
industry, from finance and retail to
hospitality and technology. Some of the
use cases are Social Media Monitoring,
Brand Monitoring,Voice of customer
(VoC), Customer Service, Market
Research .In this paper I have focused on
Brand-monitoring.
Table 1: Random Unstructured Data

3
4. Framework Characteristics scalar response and one or more
explanatory variables. Its a very
The following Opinion analysis well-known algorithm in statistics used
algorithms and classification library are to predict some value (Y) given a set of
used with Python Programming features (X). During the model/plot
Language on Sypder IDE part of over Twitter data ,it was used to
extrapolate data as shown in fig 1.
4.1 Classification Algorithms
I. Naïve Bayes Classifier :
A family of probabilistic algorithms that
uses Bayes’s Theorem to predict the
category of a text.They are among the
simplest Bayesian network models, but
coupled with Kernel density estimation,
they can achieve higher accuracy levels.
Naive Bayes assumes that all predictors

Fig 1.Sentimental Analysis plotting

Numpy and Scikit-learn scientific


packages were used to access the
regression .As demonstrated below:

(or features) are independent, rarely


happening in real life. This limits the
applicability of this algorithm in
real-world use cases.If its assumption of
the independence of features holds true,
III. Gradient Descent
it can perform better than other models
An optimization algorithm that's used
and requires much less training data.[5]
when training a machine learning model.
It's based on a convex function and
tweaks its parameters iteratively to
minimize a given function to its local
minimum.The equation below describes
Naive Bayes is better suited for what gradient descent does:
categorical input variables than
numerical variables.

II. Linear Regression The gamma in the middle is a waiting


linear regression is a linear approach to factor and the gradient term ( Δf(a) ) is
modeling the relationship between an simply the direction of the steepest
descent.This is mainly for future Project.

4
4.2 Python Classification Library Developer account as shown here:[7]
 Scikit- learn is the go-to library for
machine learning and has useful tools
for text vectorization .Scikit-learn has
implementations for Support Vector
Machines, Naïve Bayes.
 Natural Language Toolkit (NLTK)
has been the traditional NLP library
for Python. It has an active
community and offers the possibility
to train ML classifiers.[6]
 Tweepy An easy-to-use Python
library for accessing the Twitter API.
All the functionality of Twitter API
can be used through Tweepy.
 TextBlob A library for processing
textual data. It provides a simple API
for diving into common (NLP) tasks
such as part-of-speech tagging, noun
phrase extraction, sentiment analysis,
classification, translation, and more.
Some use cases For API are Data
5. Methodology Analytics ,Making a twitter bot for
automated replies, Publishing and
Python is used to extrapolate and chart Curating, Advertisement campaign
the data-set .In chronological order .IDE
environment is setup and Pip 2) Public Data-Set
requirements are installed ,after which Smaller public data-set from
the Application can be accessed using Sentiment140 ( discover the sentiment of
Twitter API , subsequently Googles open a brand, product, or topic on Twitter).a
source data-set in analyzed and the site created by Stanford with databases
resultant data is set into a Ranking chart used to discover the sentiment from a
for the 10 different Businesses . brand, product or Business.[8] It uses
classifiers built from machine learning
1) Twitter developer API algorithms. use a simpler keyword-based
Twitter API provides companies, approach which may have higher
developers, and users with programmatic precision but lower recall.
access to Twitter public data which users It also consists of a comprehensive list
have agreed to share with the world. In of data-set sites built for specific
order to access it initially an twitter sentimental analysis A.I and Data mining
Account was used to apply for individual purpose.

5
3) Subjectivity and Polarity recording higher scores and ranking
In order to classifying the language used higher up in the Reputation Index. Only
in tweets we create a data-frame where: one of the Brands had an net Negative
Subjectivity = ( a score of 0 is fact, and a rank (below 55%) one Brand had Neutral
score of +1 is very much an opinion) Score(55-65%) and 8 out of 10 had
positive scores(above 65%).As Seen in
Polarity =(a score of -1 is the highest Table 2 below:
negative score, and a score of +1 is the
highest positive score).

6. Experiments
Initial Experiments were conducted by
observing the output of Positive Tweets
and Negative tweets after which the
Ratio of Tweets with scores were
determined and Plotted in a graph to
indicate Positive(<65%), Negative(>55%)
and Neutral (55-65%) Tweets .The
Table 2: Reputation Ranking
percentage of Positive to Negative
among-st 100tweets was observed.(fig 2)
7. Conclusion
It was observed by that Sentimental
analysis can be applied to countless
aspects of business, from brand
monitoring and product analytics, to
customer service and market research.
By incorporating it into their existing
systems and analytics, leading brands
can work faster, with more accuracy,
Fig2. Plotting Graph toward more useful ends to better their
Public Reputation .
6.1 Evaluation metric
This project will produce a report in During the duration of this Artificial
semi-structured XML output with Intelligence course variety of Ml and
evaluated score of all 10 tallied Intelligence Algorithms were learned .By
sets/nodes Results should be comparable using the aforementioned techniques and
to assumed scores for companies/nodes. the chosen data-sets from sentiment 140 .
Insight into how to use Machine learning
6.2 Results to Analyse , plot , clean , classify derive
The first author is observed an and sequence multitudes of API led and
comprehensive ranking for the Brands Open-source data were Learned
/Businesses with higher metric nodes effectively and efficiently.

6
8. References
[1] Natural Language Processing Archives

[2] Opinion mining and sentiment analysis

[3] Aspect -Based Sentimental Analysis

[4] The Training and Prediction Processes

[5] A practical explanation of a Naive Bayes classifier

[6] Natural Language Toolkit

[7] Apply for a Twitter Developer account

[8] Sentiment Analysis Sites /Sentiment 140

[9] Stanford Coursera course by Dan Jurafsky and Christopher Manning

[10] A SURVEY OF OPINION MINING AND SENTIMENT ANALYSIS

[11] Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis

You might also like