Sentiment, News and The Polarity Problem: Leslie Barrett

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 27

Sentiment, News and the Polarity Problem

Leslie Barrett www.lbtechconsulting.com April 13, 2010

Sentiment and Opinion


Are sentiment and opinion the same? Are feelings the same as beliefs? Sentiment can be applied to opinion but not the other way around (Kim And Hovy 2004) The question is should it apply to anything else? Does it make sense in narrative, exposition, news data? How much text should we apply it to?

Sources
Sentiment analysis has been applied where opinion is the norm blogs and Tweets It has also been applied where opinion is designed to be subtle, if expressed at all news data So maybe news data is never really objective, or else maybe sentiment is really used as simple polarity separating the world into human ideas of positive and negative buckets blind to objectivity

Polarity
Polarity is the stuff through which sentiment is measured Sentiment is usually considered to have the poles positive and negative These are most often translated into good and bad Sentiment analysis is really considered useful for telling us what is good and bad in our information stream

The Machine
So the sentiment analysis machine takes in some text and tells us whether that text says something good or bad. OK..but before we unveil our machine, we need to ask some important but often overlooked questions: - what text is going in? - where does good stop and bad begin? - what is the text about?

Why do we need Sentiment Analysis Beavis?

So well know what were thinking!

Lets Try Feeding the Machine News Data!


News Headlines sound like a pretty straightforward text type to apply sentiment to, given what weve just said. Even though news is supposed to be objective, headlines sell papers and often can be dramatic Keywords like crash, downturn and disaster are abundant and strong sentiment indictors. - but are headlines enough? - we may want document-level sentiment for news - does it matter what the news is about?

Some real headlines


Short-lived Coup Disappoints Bears

Beware of Headlines in Financial News


financial news especially is really a genre unto itself Its polarity perspective is skewed constantly by pundit benchmarking Beating bad expectations is better than a good quarter that falls short in pundit opinion

Can Sentiment Analysis beat expectations?


All kinds of negatives here but the document-level sentiment should be positive thats how an analyst would see it So if you skew to this, what about other news?

Objectively bad Events Happen


Some events dont require an opinion holder They simply have a generally agreed upon negative or positive polarity And we need to get them right because they affect other events (e.g. crop yields, etc)

When Bad Things Happen to Positive Sentiment


But objectively bad events have their own problems, even in the absence of expectations. The problem with polarity measures outside of the presence of an opinion holder is topic drift An editorial or blog is likely to stick to one sentiment, but bad events can have the dreaded silver lining

Disaster+Relief Can Spell Trouble


Despite some strong negative polarity indicators like traumatized, disaster and tsunami this article has an overall positive theme

Dont Quote Me!


Another problem in news data is opinion blend Often you have an authors opinion but other opinions that may differ directly or indirectly cited Or an author using quotes to showcase two different opinions Coverage of a debate for example can get very difficult for even a human to judge

Attribution vs. Quoting


The author clearly does not believe the positive topic of the article But Clinton believes it So is this positive sentiment about Clinton?

Pundits vs. Authors vs. Topics


How can I be sure that bad news about my client is about my client?
Make sure the named entity in question is a topic of the document So-called document mates dont matter

Do author names matter? Should I extract them?


Yes! Over time if you classify by author name against other entities you might detect bias Do the same for known pundits on a topic..same result may emerge

Whats it all About?


Some data just tends to be multi-thematic or non-thematic In particular, market and financial reports, which often make their way into news feeds, tend to be this way. It is very hard to get a reasonable sentiment reading on either type of document.

SEC Reports: too big, too many sections


There is the Management Discussion, which can have appropriate sentiment scores But there are so many other sections, no single theme Many sections have boilerplate, such as the accounting review

Scraping
Your data is only as good as your news feed. Sometimes a site will deliver excess content that creeps into the text field of a feed That content could be an ad or even another article, skewing the sentiment reading for the expected article and hurting topic detection too.

Field Overlap from a Typical News Page

What to Do?
Stop doing Sentiment Analysis on news data? NO! News data is very valuable for reputation management Also can be valuable for investment firms *if* you can tease out the jargon and pundit-speak Document-level is still OK!

Best Practices
Good topic detection - see whats closely aligned with a theme and eliminate non-thematic or weakthematic documents Good feed maintenance - you or your feed provider need to spot check for scraping problems

Tricks & Tips


Data extraction for problem documents
If document sections are identified with tags, use them (this is true for SEC reports) and extract the good data (see Pang and Lee 2004 on extracting document portions) Write regular expression libraries to find quoted and cited material. Remove or use separately

Topic drift is harder but.


you can extract the first n paragraphs. Main topical material in news generally in top 25% of document Secondary topics dont carry same weight

Whats Next for Polarity?


Future directions for news-based sentiments analysis are based on looking outside of Positive and Negative poles Think about all the opposites in the world
Sweet/sour Cold/hot Inside/outside Wet/dry Hard/soft

Leverage the Semantics of Opposition


There are many types of opposition to study and they can be used in different ways
Complementary opposites (male,female) Reversatives (backwards, forwards) Scalar opposites (tall, short)

A good deal of semantic research that has yet to be leveraged for opinion analysis and classification (Mettinger, Pustejovsky, Kennedy, Miller, inter alia)

Opposites and Opinions


Lets think of some opinions that fit into poles not definable in terms of positive and negative
Conserative vs. Liberal Government Expansion vs. Privatization

Can these positions be detected automatically? ..

Appendix/Bibliography
Kim, Soo-Min and Eduard Hovy. 2004. Determining the Sentiment of Opinions. Proceedings of COlING-04. pp. 1367--1373. Geneva, Switzerland. James Pustejovsky, "Events and the Semantics of Opposition" in Events as Grammatical Objects , C. Tenny and J. Pustejovsky (eds.), 2000, CSLI Publications. Arthur Mettinger, Aspects of Semantic Opposition in English, Clarendon Press, Oxford, 1994 Bo Pang and Lillian Lee, A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts, In Proceedings of the Association for Computational Linguistics, 2004

You might also like