Professional Documents
Culture Documents
Literature Review and General Observation of Recent Research in The Emerging Field of Sentiment Analysis
Literature Review and General Observation of Recent Research in The Emerging Field of Sentiment Analysis
Literature Review and General Observation of Recent Research in The Emerging Field of Sentiment Analysis
LiteratureReviewandGeneralObservationofRecentResearchintheEmergingFieldof
SentimentAnalysis
ByPaulPrae
October5
th
,2010
Prae2
Therecentdataexplosionhasspawnedanincredibleincreaseininnovation.Whilemanynew
fieldsareemerging,manyoldfieldshavebeenredefined.Theinternetisthecatalystforthesechanges.
Thismassivenetworkholdsthedatathatsomeofthesenewfieldsarefocusedonleveraging.Muchof
thisdataisorganizedandretrievedthroughmethodsthatfocusondefinitionsandcontext.However,
thesemethodsleaveoutoneofthemostimportantaspectsofthecreatorsofthisdata:emotion.The
emotionalsubjectivityofhumanbeingsdrivesthechoiceswemake.Aconceptthatinvolvestheuseof
thisdigitaldataincreaseandtheemotionsoftheusersandcreatorsofthedataistheareaofsentiment
analysis.Thispaperwillcoverthegeneralconceptsbehindsentimentanalysisandtheusesofthe
conceptincurrentsociety.Itwillalsofocusonareasinvolvingthebenefitsofsentimentanalysisfor
corporationsandconsumers.
Sentimentanalysisisanewerfieldthathasonlyrecentlytraversedfromtheacademicrealmto
corporateuse.Muchofthecurrentpublishedresearchonthesubjectwasdevelopedbyresearch
facilitiesstronglyassociatedwithcompaniessuchasIBM.Thesentimentdetectionoftextshas
witnessedaboominginterestinrecentyears(Tangetal.,2009)with[t]heemergenceofnewsocial
mediasuchasblogs,messageboards,news,andwebcontentdramaticallychangingtheecosystemsof
corporations(Caietal.,2010).Theacademiccontributorstothesubjecthavecombinedmanyspecific
areasoflinguistics,computerscience,artificialintelligence,andpsychology.Morespecificallyitisa
disciplineatthecrossroadsofNLP[naturallanguageprocessing]andIR[informationretrieval],andas
suchitsharesanumberofcharacteristicswithothertaskssuchasinformationextractionand
textmining"(Tangetal.,2009).Machinelearningtechniques,basicstatisticalanalysis,andlinguistic
semanticrepresentationarealsowellrepresentedinthedesignsofthefield.Aswithmanynewfields,
sentimentanalysisisacombinationofafewnovelconceptsreappliedtoawiderangeofspecific
Prae3
aspectsofotherolderfields.
Sentimentanalysisisasystemoftechniquesthatareorganizedandapplieddifferentlydepending
onthedesigner.Beforelookingathowscientistsanddevelopersarecurrentlysearchingforsentimentin
text,itisbesttounderstandwheretheysearchandwhy.Theinternetisaneverexpandingsearch
space.Searchingandanalyzingallpossiblesourcesofrelevantinformationwouldbeenormously
complex.Companies,scientists,andsoftwaredevelopersmustchooseasubsetofthismassivesearch
spacetoapplytheirsoftware.Itisimportantthatasearchspaceischosenthatwillhavethehighest
concentrationofeasilyaccessiblerelevantdata.Thispaperwilldiscusssomeoftheproblemsthathighly
unstructuredtextandnoisy,uselessorirrelevanttext,cancause.ThefollowinggraphfromAltaPlanas
TextAnalytics2009researchstudy,whichsurveyed116companiesthatusetextanalyticssoftware,
listssomeofthetopareasthatcompaniesuseasthesourceofthetext.Noticethatcontentgenerated
bygeneraluserdiscussioninopensocialsettingsdominatesthelist.
TheimportanceofthedatageneratedbytheWeb2.0phenomenaisreadilyapparent.Cai
Prae4
(2010)describesthisimportance,"Thewidespreadavailabilityofconsumergeneratedmedia(CGM)
suchasblogs,messageboards,andnewsarticlespostgreatopportunitiesaswellasriskstotodays
enterprises."Asof2009companieshavealreadybeenapplyingthisrealization.Thecomplexityissueis
stillrelevantevenwhennarrowingthesearchspacetoasinglesourceofinformation.Facebookisa
goodexampleofanextraordinarilypopularsocialmediaplatformthatgeneratesalargeamountoftext
thatcouldbeanalyzedthroughitsAPI.Thesearchspacehereinvolves[m]orethan500millionactive
users,over900million[facebookspecific]objects(pages,groups,eventsandcommunitypages),
andthe[m]orethan30billionpiecesofcontent(weblinks,newsstories,blogposts,notes,photo
albums,etc.)sharedeachmonth(http://www.facebook.com/press/info.php?statistics,2010).The
usefuldataisjustasplentifulastheirrelevant.Thereareendlessamountsofbothbeingproducedin
outletsacrosstheinternet.Itistherelevantsubjectivehumanopinionthatisarichandusefulsourcefor
marketingintelligence,socialpsychologists,andothersinterestedinextractingandminingopinions,
views,moods,andattitudes(Tangetal.,2009).Withthisinformationsentimentanalysiscanbegin.
Thechallengethatexistsafterthesearchspaceisestablishedistolocatetherelevantdata.After
therelevantdataisestablisheditcanthenbeassessedforsentiment.Thesetwostagesarecommonly
referredtoassubjectivityclassificationandsentimentclassification."Subjectivityclassificationisatask
toinvestigatewhetheraparagraphpresentstheopinionofitsauthororreportsfactsSubjectivity
classificationcanpreventthepolarity[i.e.sentiment]classifierfromconsideringirrelevantoreven
potentiallymisleadingtext"(Tangetal.,2009).Dependingontheapplication,contextualmatchingor
similarmaybeappliedtotheresultingdatathatisalreadydeemedsubjective.Guaranteeingthatthe
sectionsoftheoriginaldocumentthatareextractedarecontextualensuresthatthetopicsbeing
discussedinthetextarethosethatareimportanttotheresultsthedesignerisexamining.Thisconceptis
Prae5
commoninautomatedadvertisingdisplays"Contextualadvertisingisamajortypeofonlineadvertising,
inwhichadsareplacedonWebpagesaccordingtotheircontent"(Qiuetal.,2010).Aftertheprocess
hasnarrowedtheinitialdatadowntotherelevantsnippets,theapplicationofsentimentcanbegin.
Sentimentclassificationhassomevariationamongdesignersofeachapproachbutultimately
servesthesameabstractpurpose."Sentimentanalysistraditionallyemphasizesonclassificationofweb
commentsintopositive,neutral,andnegativecategories(Caietal.,2010).Thereareseveralvariations
ofthistradition.Amorecommontrendinrecentresearchistogetmorespecificindefiningthe
sentimentspectrum."Sentimentclassificationincludestwokindsofclassificationforms,i.e.,binary
sentimentclassificationandmulticlasssentimentclassification"(Tangetal.,2009).Thismulticlass
sentimentapproachwilllikelybethestandardofthefuture.Humanemotionspansamuchmore
complicatedspectrumthanthesimpleblackandwhitenotionsofpositiveandnegative.Humanbeings
havethestrangecapabilitytoloveandtohatesomethingatthesametime.Takethissimulatedexample
thatImayhearfromaroommatethatisanewuserwhojustpurchasedarecentvideogame:Ihate
thatIamnotacquiringthesamekilltodeathratiointhenewCallofDuty.Thenewuserinterfaceis
quitefrustrating.Ilovethechallengethough.ItwillbefuntolearnanewUI.Heretheuserportrays
negativeandpositivesentimentsonthesameproduct.Thisiseasyforhumanstodecipherbutmuch
morecomplicatedforamachine.Thisandmanyotherproblemsarebeingaddressedincurrent
research.
Afewdifferentapproacheshavebeendevelopedtocreatemoreaccurateresults.General
polaritybasedsentimentclassificationisagreatstepforwardfromthepreviouscontextualonly
approaches.Cai(2010)mentionsthat[s]uchanalysisisuseful,butitlacksinsightsonthedrivers
behindthesentiments.Hisgroupdevelopedabettersolution:Toaddressthisproblem,weintroduce
Prae6
oursentimentanalysisapproachwhichcombinesauniquesentimentclassificationapproachwithatopic
detectionapproachthatdiscoverstermsthatarehighlycorrelatedtodifferentsentimentclassification
categories.Thisallowsresultsthatcatertotheoriginalreasonsforthegivensentiment.Therearemore
elaboratedesignsthatbreakdownthecontentintogreaterdetailallowingformoreresultsthataremore
specific.
Afterthesentimentsareestablishedeachsentimentanalysissystemwillthenusetheresultsin
waysappropriatetotheapplication.Qiu(2010)developedanideatitledDissatisfactionoriented
AdvertisingSentimentAnalysisorDASAthatcombinestraditionalsentimentanalysiswithbasic
keywordmatching.Inthisapproachthesoftwaredetectsthenegativesentimentofcertainproducts.The
advertisingonthewebpagethatcontainsthetextthendisplaysaproductthathasthepositiveattributes
thattheoriginaltextcomplainedabout.TheexampleusedinQius(2010)paperisoneinwhichthe
writerontheforumcomplainsaboutthesafetyofacar.Afterthecommentispostedandanewuser
loadstheforumpage,theadvertisementsarereestablishedbasedonthenewcomment.Thenew
advertisementsnowhaveaVolvoadthatexemplifiesnewsafetyfeaturesandahistoryofsafe
productionstandards.Thisprocessisshowninthefollowingdiagramfromthesameresearchpaper.
Theusesofsentimentanalysiscanbeappliedtomanyindustries.Anycompanyunderthe
Prae7
scrutinyofpublicopinionshouldbeanalyzingallrelevantdatatheycanobtain.AsNickBiltonofthe
NewYorkTimesmentions,Whenpeoplewanttoknowhowthemediabusinesswilldealwiththe
internet,thebestwaytobegintounderstandthesweepingchangesistorecognizethattheconsumerof
entertainmentandinformationisnowinthecenter."Currentapplicationstakethisintoaccountandfocus
onthesubjectiveuserorconsumerviewsofcertainareasthattheenterpriseswillgenerallybeinterested
insurveying.Themostpopularandbasicuseofsentimentanalysisinvolvesminingtextofwritten
reviewsfromcustomersforcertainproductsorservices,andclassifyingthereviewsintopositiveor
negativeopinions"(Yeetal.,2009).Itisthistypeofclassificationthathasbecomeoneofthefociof
recentresearchendeavorssponsoredbycompaniesthatrealizethepotentialvalueofsentimentanalysis
ontheirdata(Yeetal.,2009).Companieswithaheavyonlinepresencehaveamyriadofdatathat
couldeasilyutilizethisresearch.
Thesesamecompaniescanchoosetousetextanalyticsoftwareindifferentwaystomeet
differentgoals.AnothergraphfromAltaPlanasTextAnalytics2009researchstudyshowsthewide
arrayofendgoalsthatcompaniesmaybelookingtomeetwhenusingtextanalyticsoftware.
Prae8
Thehighestusepercentageshownaboveinvolvesbrandingandreputationmanagement.Most
applicationsofsentimentanalysisinrecentresearchrepresentasimilartrend.Thetechnologies
surroundingtextanalyticswillbedesiredbymanyindustriesandfordifferentapplicationsineach
industry.Takingthisintoconsideration,differentalgorithms,techniques,andsometimesjustsmall
alterationswillberequiredbeforesentimentanalysissoftwarefromoneindustrywillbeabletobe
appliedtoanother.Thisalsomayforeshadowthatthetextanalyticssoftwareindustrymaybeableto
createlucrativeconsultingfirmssimilartothosethatarecurrentlyfaringwellinthegeneralmanagement
informationsystemssector.
Themassiveinformationsystemsthatcorporationsalreadyhavecouldintegrateaspectsof
existingprocesseswithsentimentanalysis.Newlyrefinedsystemscouldextendthecapabilitiesofsearch
engines,classifyreviews,summarizereviews,trackopinionsinonlinediscussions,analyzesurvey
Prae9
responses,implementonlinemessagesentimentfiltering,createemailmessageclassificationsystems,
andmanymoreyettobediscoveredtechniques(Tangetal.,2009).Thismayresultinmoreefficient
communicationforthepublicrelationsdepartmentsandbetterproductscreatedbythedevelopment
teams.Companieswillbeabletonavigatethroughallavailabledataandfindcomparisonsofspecific
productfeaturesfromcompetitors."Foraproductmanufacturer,thecomparisonenablesittoeasily
gathermarketingintelligenceandproductbenchmarkinginformation"(Tangetal.,2009).Sentiment
analysiswillallowbusinessestheabilitytousetheirpreexistingtextdatainwaystobenefitseveral
departmentswithinthetraditionalbusinessstructure.Businessesonlyrequirenewsoftwareplusthe
necessaryhardwaretohandlethenewprocessingtechniquesandstorageoftheresults.
Marketingcompaniesandadvertisingbranchesofbusinessesareeasybenefactorsofthe
resultingconclusionsderivedfromsentimentanalysis.Majorsearchenginesandemailhostssuchas
YahooandGoogle,aswellassocialmediacompaniessuchasFacebook,havebeenimplementing
contextuallyrelevantadvertisingtousersforyearsnow.Thecurrentweblandscapedemandsrelevancy
andpersonalizedinformationforusersandpotentialconsumers."Thetradeoffbetweenfinancialrevenue
andmarketsharetriggerstheemergenceofrelevantadvertisingtoemphasizetherelevancebetweenads
andWebpagesforthesakeofconsumers(Quietal.,2010).Qui(2010)goesontomentionthat
"[t]argetedadvertisingisofgreatimportanceforinternetcompaniestogainrevenuefromboth
advertisersandconsumers.Previousapproachesfocusonlyonthetopicalrelevancewhilethe
consumersattitudesareignored.Theseapproachesfailtomeettheactualneedsofconsumers
especiallywhentheymayhavenegativeattitudestowardsthementionedtopics."Cai(2010)addsthata
companysresistancetothesenewtrendscouldhaveseriousimpactontheircompetitivemarket
advantages.Leveragingthemassiveamountofdatathatisproducedbytheconsumervoicecould
Prae10
catalyzethegrowthofacompany.Theopposingdangertothisconceptisthatignoringthevoicesofthe
everincreasingamountofpublicopinioncouldresultinacompanybeingsociallyoutcast.Itistothe
purebenefitofcompaniestoimplementsentimentanalysisifthesecompanieshavetherelevant
informationavailableforsuchaprocess.Thebrandingandmarketingaspectsofbusinessesrevolve
aroundtheconsumerpsychology.Sentimentanalysiscouldrevealthispsychologyinaformthatcould
beusedforfurtheranalysisandstudy.
Itisimportanttonoticethattheimplementationsofsuchtechnologyonthebusinesssidehave
mutuallybeneficialeffectsfortheconsumer.Dependingontheindustryandthemannerinwhich
sentimentanalysisisbeingapplied,asystemforpresentingtheresultsandorganizedconclusionsfrom
theanalysiscouldbecreated.Ye(2009)mentionsarelationshiphere,Withtheresultsofsentiment
classification,consumerswouldknowthenecessaryinformationtodeterminewhichproductsto
purchaseandsellerswouldknowtheresponsefromtheircustomersandtheperformancesoftheir
competitors.Itthenturnsintoacyclicalsystemthatshouldresultinhigherqualityproductsovertime.It
isanefficientwaytocrowdsourceusefuldatawithouttheusersputtingforthanyextraeffort.Theusers
couldevenbeunawarethattheyareimprovingtheirfutureshoppingexperiences.Theusersand
creatorsofthetexttobeanalyzedwillsilentlybebenefitingtwopartieswhileexpressingtheirnatural
opinions.
Sentimentanalysisisausefultoolforallusersoftheinternet.Emotionalclassificationand
organizationofcontentwillbeabeneficialcontributiontothevastreservoirofdatatheinternetholds.
Thefieldhasmadesteadyachievementoverthelasttwentyyearsbutstillhasmuchroomtogrowand
improve.Thisisanexcitingpursuitforthoseinvolved.Thecompaniesandresearcherssupportingthe
improvementofsentimentanalysiswillbecontributingtoanimprovedenvironmentforallusers.Users
Prae11
shouldenjoycommunicatingwithamachinethatunderstandstheemotionalneedsoftheusersandcan
offereffectivesolutionstotheusersproblems.Thisisthebasicresultofsentimentanalysiseveninthe
currentform.Itwillonlyevolvetolearnhowtomeetourneedsmoreeffectively.
Prae12
Listofreferences
Bilton,N.(2010,September13).ATechWorldthatCenterontheUser.NewYorkTimes:New
YorkEdition.p.B1.
Cai,K.,Spangler,S.,Chen,Y.,&Zhang,Li.(2010).Leveragingsentimentanalysisfortopicdetection.
WebIntelligenceandAgentSystems:AnInternationalJournal,8(2010),291302.
Grimes,S.(2009).TextAnalytics2009:UserPerspectivesonSolutionsandProviders.AltaPlana.
PublishedundertheCreativeCommonsAttribution3.0License.
Kho,N.D.(2010).Customerexperienceandsentimentanalysis.KMWorld,February2010,1020.
Li,N.,Liang,X.,Li,X.,Wang,C.,&Wu,D.(2009).NetworkEnvironmentandFinancialRiskUsing
MachineLearningandSentimentAnalysis.HumanandEcologicalRiskAssessment,15,
227252.
Qiu,G.,He,X.,Zhang,F.,Shi,Y.,Bu,J.,&Chen,C.(2010).DASA:Dissatisfactionoriented
AdvertisingbasedonSentimentAnalysis.ExpertSystemswithApplications,37(2010),
61826191.
Tang,H.,Tan,S.,&Cheng,X.(2009).Asurveyonsentimentdetectionofreviews.ExpertSystems
withApplications,36(2009),1076010773.
Ye,Q.,Zhang,Z.,&Law,R.(2009).Sentimentclassificationofonlinereviewstotraveldestinationsby
supervisedmachinelearningapproaches.ExpertSystemswithApplications,36(2009),
65276535.