Professional Documents
Culture Documents
Session 7
Session 7
1
What is Sentiment Analysis?
• Technique that identifies the sentiment of a given text/piece of text
• Given a sentence/text, identify if it is positive/negative/neutral
• (Still) a hot topic in NLP with lots of new developments
• Very application-oriented area of NLP
2
Labels for Sentiment Analysis
• Common labels:
• Binary: positive vs. negative, sometimes also neutral
• Ranges (e.g. between -3 and +3)
• Usually a model outputs the probability for each label (between 0 and 1, e.g.
pos=0.367, neg=0.456)
• A text can be assigned with an overall polarity by summing up the
values of the positively assigned words and subtracting the values of
the negative ones
3
Related fields/subfields
4
Subtasks of Sentiment Analysis
5
Challenges in Analyzing Sentiment
Easy examples of sentiment analysis
• Netflix has the best selection of films.
• I dislike the new crime series.
• I hate waiting for the next series to come out.
6
Challenges in Analyzing Sentiment
• Negations
• Modifiers
• Ambiguous words
• Negative terms used in a positive way
• New terms (e.g. in social media)
• Domain dependance
• Noisy text
• Multimodality
• Sarcasm
• Fake reviews
7
Methods for Performing Sentiment Analysis
• Lexicon-based methods (or rule-based or dictionary-based methods):
• uses lexicons consisting of words that are pre-annotated concerning their
sentiment expression (sentiment bearing words)
• Ways to create such lexicons:
• Crowdsourcing
• expert annotations
• semi-automatic approaches
• Machine learning approaches:
• Supervised training of neural or feature-based models on sentiment-
annotated corpora
• Often makes use of other linguistic
features such as dependency structures
or POS tags
8
Applications of Sentiment Analysis
• Monitoring social media mentions of brands etc.
• Analyzing feedback from surveys and product reviews
• Analyzing incoming support tickets, e.g. to detect angry customers
• … and many more!
9
Sentiment Analysis – Assignments
Zip all edited tasks (code, outputs, PDF documents etc.) in one
repository and send it to me (maria.becker@gs.uni-heidelberg.de) by
09/07/2023.
10
ASSIGNMENT 1: Choose between option 1 and 2
Option 1 (with coding):
• Write a small Python program that counts all occurrences of sentiment bearing words in each article
of your text corpus and outputs the sum of positive vs. negative words per article.
• Use a sentiment lexicon of your choice, e.g. http://mpqa.cs.pitt.edu/lexicons/subj_lexicon/
• Send me the code (in a py/ipynb file) and its output (in a PDF file).
• Textblob and NLTK both offer sentiment analysis modules. Find out how the
models work and on which methods/algorithms they are based on.
• As a starting point, you can use this webpages:
• https://investigate.ai/investigating-sentiment-analysis/comparing-sentiment-analysis-tools/
• https://towardsdatascience.com/my-absolute-go-to-for-sentiment-analysis-textblob-
3ac3a11d524
• Send me your observations (key points are sufficient) in a PDF file.
12
ASSIGNMENT 3 (Everybody)
13
ASSIGNMENT 4:
Choose between option 1, 2 or 3
Option 1 (with coding):
• Modify the Jupyter Notebook in order to apply it to your whole corpus
and analyze the results.
• This includes the following steps:
• Split each articles into sentences by using punctuations as separators (., !, :, ?)
• Iterate over each sentence
• Calculate the medium sentiment scores per article
• Evaluate the results manually: Are the sentiment scores per sentence/per
article correct? What are possible sources for errors?
• Send me the code (in a py/ipynb file), its output (in a text or word file) and
your observations (key points are sufficient, in a PDF file)
14
ASSIGNMENT 4:
Choose between option 1, 2 or 3
15
ASSIGNMENT 4:
Choose between option 1, 2 or 3
16
Next Session
17