Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 12

Sentiment Analysis

Lexicon - Based
An Overview of Concepts and
Techniques

BY: NAVEENKUMAR.M
PRADEEP.D
Abstract
 Sentiment
 A thought, view, or attitude, especially based
mainly on emotion instead of reason
 Sentiment Analysis
 opinion mining
 use of natural language processing (NLP) and
computational techniques to automate the
extraction or classification of sentiment from
typically unstructured text
Motivation
 Consumer information
 Product reviews
 Marketing
 Consumer attitudes
 Trends
 Politics
 Politicians want to know voter's views
 Voters want to know policitian's stances and who else
supports them
 Social
 Find like-minded individuals or communities
Problem
 Which features to use?
 Words (unigrams)
 Phrases/n-grams
 Sentences
 How to interpret features for sentiment
detection?
 Bag of words (IR)
 Annotated lexicons (WordNet, SentiWordNet)
 Syntactic patterns
 Paragraph structure
Software and Hardware
Requirements
Software:
-> Analysis Tool : R
-> Data Sets in excel format or Online data
-> OS: Windows/Linux
Hardware:
-> RAM : 512 MB or more
-> 80GB HardDisk
-> Processor : Dual core or higher
Sentiment Analysis
Challenges
 Harder than topical classification, with
which bag of words features perform well
 Must consider other features due to…
 Subtlety of sentiment expression
 irony
 expression of sentiment using neutral words
 Domain/context dependence
 words/phrases can mean different things in different
contexts and domains
 Effect of syntax on semantics
SentiWordNet
 Based on WordNet “synsets”
 http://wordnet.princeton.edu/
 Ternary classifier
 Positive, negative, and neutral scores for each
synset
 Provides means of gauging sentiment for
a text
SentiWordNet: Results
 24.6% synsets with Objective<1.0
 Many terms are classified with some degree of
subjectivity
 10.45% with Objective<=0.5
 0.56% with Objective<=0.125
 Only a few terms are classified as definitively
subjective
 Difficult (if not impossible) to accurately
assess performance
Overall Comparison

Procedure R-Programming RapidMiner Weka Orange


Partitioning of Pass (but limited Pass (but limited Pass (but limited Pass (but limited
dataset into training partitioning partitioning partitioning partitioning
and testing sets. methods) methods) methods) methods)
Fail (cannot save
parameters for Fail (no scaling
Descriptor scaling Pass Pass
scaling to apply to methods)
future datasets)
Fail (no wrapper Pass (but is not part Fail (no wrapper
Descriptor selection Pass
methods) of KnowledgeFlow) methods)
Parameter
optimization of
machine Fail (not automatic) Pass Fail (not automatic) Fail (not automatic)
learning/statistical
methods
Model validation
Pass (but cannot Pass (but cannot
using cross- Pass (but limited
save model so have save model so have
validation and/or error measurement Pass
to rebuild model for to rebuild model for
independent methods)
every future dataset) every future dataset)
validation set
FUTURE WORK
THANK YOU

You might also like