Literature Review and General Observation of Recent Research in The Emerging Field of Sentiment Analysis

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Prae1

LiteratureReviewandGeneralObservationofRecentResearchintheEmergingFieldof
SentimentAnalysis

ByPaulPrae
October5
th
,2010

Prae2

Therecentdataexplosionhasspawnedanincredibleincreaseininnovation.Whilemanynew
fieldsareemerging,manyoldfieldshavebeenredefined.Theinternetisthecatalystforthesechanges.
Thismassivenetworkholdsthedatathatsomeofthesenewfieldsarefocusedonleveraging.Muchof
thisdataisorganizedandretrievedthroughmethodsthatfocusondefinitionsandcontext.However,
thesemethodsleaveoutoneofthemostimportantaspectsofthecreatorsofthisdata:emotion.The
emotionalsubjectivityofhumanbeingsdrivesthechoiceswemake.Aconceptthatinvolvestheuseof
thisdigitaldataincreaseandtheemotionsoftheusersandcreatorsofthedataistheareaofsentiment
analysis.Thispaperwillcoverthegeneralconceptsbehindsentimentanalysisandtheusesofthe
conceptincurrentsociety.Itwillalsofocusonareasinvolvingthebenefitsofsentimentanalysisfor
corporationsandconsumers.
Sentimentanalysisisanewerfieldthathasonlyrecentlytraversedfromtheacademicrealmto
corporateuse.Muchofthecurrentpublishedresearchonthesubjectwasdevelopedbyresearch
facilitiesstronglyassociatedwithcompaniessuchasIBM.Thesentimentdetectionoftextshas
witnessedaboominginterestinrecentyears(Tangetal.,2009)with[t]heemergenceofnewsocial
mediasuchasblogs,messageboards,news,andwebcontentdramaticallychangingtheecosystemsof
corporations(Caietal.,2010).Theacademiccontributorstothesubjecthavecombinedmanyspecific
areasoflinguistics,computerscience,artificialintelligence,andpsychology.Morespecificallyitisa
disciplineatthecrossroadsofNLP[naturallanguageprocessing]andIR[informationretrieval],andas
suchitsharesanumberofcharacteristicswithothertaskssuchasinformationextractionand
textmining"(Tangetal.,2009).Machinelearningtechniques,basicstatisticalanalysis,andlinguistic
semanticrepresentationarealsowellrepresentedinthedesignsofthefield.Aswithmanynewfields,
sentimentanalysisisacombinationofafewnovelconceptsreappliedtoawiderangeofspecific

Prae3

aspectsofotherolderfields.
Sentimentanalysisisasystemoftechniquesthatareorganizedandapplieddifferentlydepending
onthedesigner.Beforelookingathowscientistsanddevelopersarecurrentlysearchingforsentimentin
text,itisbesttounderstandwheretheysearchandwhy.Theinternetisaneverexpandingsearch
space.Searchingandanalyzingallpossiblesourcesofrelevantinformationwouldbeenormously
complex.Companies,scientists,andsoftwaredevelopersmustchooseasubsetofthismassivesearch
spacetoapplytheirsoftware.Itisimportantthatasearchspaceischosenthatwillhavethehighest
concentrationofeasilyaccessiblerelevantdata.Thispaperwilldiscusssomeoftheproblemsthathighly
unstructuredtextandnoisy,uselessorirrelevanttext,cancause.ThefollowinggraphfromAltaPlanas
TextAnalytics2009researchstudy,whichsurveyed116companiesthatusetextanalyticssoftware,
listssomeofthetopareasthatcompaniesuseasthesourceofthetext.Noticethatcontentgenerated
bygeneraluserdiscussioninopensocialsettingsdominatesthelist.

TheimportanceofthedatageneratedbytheWeb2.0phenomenaisreadilyapparent.Cai

Prae4

(2010)describesthisimportance,"Thewidespreadavailabilityofconsumergeneratedmedia(CGM)
suchasblogs,messageboards,andnewsarticlespostgreatopportunitiesaswellasriskstotodays
enterprises."Asof2009companieshavealreadybeenapplyingthisrealization.Thecomplexityissueis
stillrelevantevenwhennarrowingthesearchspacetoasinglesourceofinformation.Facebookisa
goodexampleofanextraordinarilypopularsocialmediaplatformthatgeneratesalargeamountoftext
thatcouldbeanalyzedthroughitsAPI.Thesearchspacehereinvolves[m]orethan500millionactive
users,over900million[facebookspecific]objects(pages,groups,eventsandcommunitypages),
andthe[m]orethan30billionpiecesofcontent(weblinks,newsstories,blogposts,notes,photo
albums,etc.)sharedeachmonth(http://www.facebook.com/press/info.php?statistics,2010).The
usefuldataisjustasplentifulastheirrelevant.Thereareendlessamountsofbothbeingproducedin
outletsacrosstheinternet.Itistherelevantsubjectivehumanopinionthatisarichandusefulsourcefor
marketingintelligence,socialpsychologists,andothersinterestedinextractingandminingopinions,
views,moods,andattitudes(Tangetal.,2009).Withthisinformationsentimentanalysiscanbegin.
Thechallengethatexistsafterthesearchspaceisestablishedistolocatetherelevantdata.After
therelevantdataisestablisheditcanthenbeassessedforsentiment.Thesetwostagesarecommonly
referredtoassubjectivityclassificationandsentimentclassification."Subjectivityclassificationisatask
toinvestigatewhetheraparagraphpresentstheopinionofitsauthororreportsfactsSubjectivity
classificationcanpreventthepolarity[i.e.sentiment]classifierfromconsideringirrelevantoreven
potentiallymisleadingtext"(Tangetal.,2009).Dependingontheapplication,contextualmatchingor
similarmaybeappliedtotheresultingdatathatisalreadydeemedsubjective.Guaranteeingthatthe
sectionsoftheoriginaldocumentthatareextractedarecontextualensuresthatthetopicsbeing
discussedinthetextarethosethatareimportanttotheresultsthedesignerisexamining.Thisconceptis

Prae5

commoninautomatedadvertisingdisplays"Contextualadvertisingisamajortypeofonlineadvertising,
inwhichadsareplacedonWebpagesaccordingtotheircontent"(Qiuetal.,2010).Aftertheprocess
hasnarrowedtheinitialdatadowntotherelevantsnippets,theapplicationofsentimentcanbegin.
Sentimentclassificationhassomevariationamongdesignersofeachapproachbutultimately
servesthesameabstractpurpose."Sentimentanalysistraditionallyemphasizesonclassificationofweb
commentsintopositive,neutral,andnegativecategories(Caietal.,2010).Thereareseveralvariations
ofthistradition.Amorecommontrendinrecentresearchistogetmorespecificindefiningthe
sentimentspectrum."Sentimentclassificationincludestwokindsofclassificationforms,i.e.,binary
sentimentclassificationandmulticlasssentimentclassification"(Tangetal.,2009).Thismulticlass
sentimentapproachwilllikelybethestandardofthefuture.Humanemotionspansamuchmore
complicatedspectrumthanthesimpleblackandwhitenotionsofpositiveandnegative.Humanbeings
havethestrangecapabilitytoloveandtohatesomethingatthesametime.Takethissimulatedexample
thatImayhearfromaroommatethatisanewuserwhojustpurchasedarecentvideogame:Ihate
thatIamnotacquiringthesamekilltodeathratiointhenewCallofDuty.Thenewuserinterfaceis
quitefrustrating.Ilovethechallengethough.ItwillbefuntolearnanewUI.Heretheuserportrays
negativeandpositivesentimentsonthesameproduct.Thisiseasyforhumanstodecipherbutmuch
morecomplicatedforamachine.Thisandmanyotherproblemsarebeingaddressedincurrent
research.
Afewdifferentapproacheshavebeendevelopedtocreatemoreaccurateresults.General
polaritybasedsentimentclassificationisagreatstepforwardfromthepreviouscontextualonly
approaches.Cai(2010)mentionsthat[s]uchanalysisisuseful,butitlacksinsightsonthedrivers
behindthesentiments.Hisgroupdevelopedabettersolution:Toaddressthisproblem,weintroduce

Prae6

oursentimentanalysisapproachwhichcombinesauniquesentimentclassificationapproachwithatopic
detectionapproachthatdiscoverstermsthatarehighlycorrelatedtodifferentsentimentclassification
categories.Thisallowsresultsthatcatertotheoriginalreasonsforthegivensentiment.Therearemore
elaboratedesignsthatbreakdownthecontentintogreaterdetailallowingformoreresultsthataremore
specific.
Afterthesentimentsareestablishedeachsentimentanalysissystemwillthenusetheresultsin
waysappropriatetotheapplication.Qiu(2010)developedanideatitledDissatisfactionoriented
AdvertisingSentimentAnalysisorDASAthatcombinestraditionalsentimentanalysiswithbasic
keywordmatching.Inthisapproachthesoftwaredetectsthenegativesentimentofcertainproducts.The
advertisingonthewebpagethatcontainsthetextthendisplaysaproductthathasthepositiveattributes
thattheoriginaltextcomplainedabout.TheexampleusedinQius(2010)paperisoneinwhichthe
writerontheforumcomplainsaboutthesafetyofacar.Afterthecommentispostedandanewuser
loadstheforumpage,theadvertisementsarereestablishedbasedonthenewcomment.Thenew
advertisementsnowhaveaVolvoadthatexemplifiesnewsafetyfeaturesandahistoryofsafe
productionstandards.Thisprocessisshowninthefollowingdiagramfromthesameresearchpaper.

Theusesofsentimentanalysiscanbeappliedtomanyindustries.Anycompanyunderthe

Prae7

scrutinyofpublicopinionshouldbeanalyzingallrelevantdatatheycanobtain.AsNickBiltonofthe
NewYorkTimesmentions,Whenpeoplewanttoknowhowthemediabusinesswilldealwiththe
internet,thebestwaytobegintounderstandthesweepingchangesistorecognizethattheconsumerof
entertainmentandinformationisnowinthecenter."Currentapplicationstakethisintoaccountandfocus
onthesubjectiveuserorconsumerviewsofcertainareasthattheenterpriseswillgenerallybeinterested
insurveying.Themostpopularandbasicuseofsentimentanalysisinvolvesminingtextofwritten
reviewsfromcustomersforcertainproductsorservices,andclassifyingthereviewsintopositiveor
negativeopinions"(Yeetal.,2009).Itisthistypeofclassificationthathasbecomeoneofthefociof
recentresearchendeavorssponsoredbycompaniesthatrealizethepotentialvalueofsentimentanalysis
ontheirdata(Yeetal.,2009).Companieswithaheavyonlinepresencehaveamyriadofdatathat
couldeasilyutilizethisresearch.
Thesesamecompaniescanchoosetousetextanalyticsoftwareindifferentwaystomeet
differentgoals.AnothergraphfromAltaPlanasTextAnalytics2009researchstudyshowsthewide
arrayofendgoalsthatcompaniesmaybelookingtomeetwhenusingtextanalyticsoftware.

Prae8

Thehighestusepercentageshownaboveinvolvesbrandingandreputationmanagement.Most
applicationsofsentimentanalysisinrecentresearchrepresentasimilartrend.Thetechnologies
surroundingtextanalyticswillbedesiredbymanyindustriesandfordifferentapplicationsineach
industry.Takingthisintoconsideration,differentalgorithms,techniques,andsometimesjustsmall
alterationswillberequiredbeforesentimentanalysissoftwarefromoneindustrywillbeabletobe
appliedtoanother.Thisalsomayforeshadowthatthetextanalyticssoftwareindustrymaybeableto
createlucrativeconsultingfirmssimilartothosethatarecurrentlyfaringwellinthegeneralmanagement
informationsystemssector.
Themassiveinformationsystemsthatcorporationsalreadyhavecouldintegrateaspectsof
existingprocesseswithsentimentanalysis.Newlyrefinedsystemscouldextendthecapabilitiesofsearch
engines,classifyreviews,summarizereviews,trackopinionsinonlinediscussions,analyzesurvey

Prae9

responses,implementonlinemessagesentimentfiltering,createemailmessageclassificationsystems,
andmanymoreyettobediscoveredtechniques(Tangetal.,2009).Thismayresultinmoreefficient
communicationforthepublicrelationsdepartmentsandbetterproductscreatedbythedevelopment
teams.Companieswillbeabletonavigatethroughallavailabledataandfindcomparisonsofspecific
productfeaturesfromcompetitors."Foraproductmanufacturer,thecomparisonenablesittoeasily
gathermarketingintelligenceandproductbenchmarkinginformation"(Tangetal.,2009).Sentiment
analysiswillallowbusinessestheabilitytousetheirpreexistingtextdatainwaystobenefitseveral
departmentswithinthetraditionalbusinessstructure.Businessesonlyrequirenewsoftwareplusthe
necessaryhardwaretohandlethenewprocessingtechniquesandstorageoftheresults.
Marketingcompaniesandadvertisingbranchesofbusinessesareeasybenefactorsofthe
resultingconclusionsderivedfromsentimentanalysis.Majorsearchenginesandemailhostssuchas
YahooandGoogle,aswellassocialmediacompaniessuchasFacebook,havebeenimplementing
contextuallyrelevantadvertisingtousersforyearsnow.Thecurrentweblandscapedemandsrelevancy
andpersonalizedinformationforusersandpotentialconsumers."Thetradeoffbetweenfinancialrevenue
andmarketsharetriggerstheemergenceofrelevantadvertisingtoemphasizetherelevancebetweenads
andWebpagesforthesakeofconsumers(Quietal.,2010).Qui(2010)goesontomentionthat
"[t]argetedadvertisingisofgreatimportanceforinternetcompaniestogainrevenuefromboth
advertisersandconsumers.Previousapproachesfocusonlyonthetopicalrelevancewhilethe
consumersattitudesareignored.Theseapproachesfailtomeettheactualneedsofconsumers
especiallywhentheymayhavenegativeattitudestowardsthementionedtopics."Cai(2010)addsthata
companysresistancetothesenewtrendscouldhaveseriousimpactontheircompetitivemarket
advantages.Leveragingthemassiveamountofdatathatisproducedbytheconsumervoicecould

Prae10

catalyzethegrowthofacompany.Theopposingdangertothisconceptisthatignoringthevoicesofthe
everincreasingamountofpublicopinioncouldresultinacompanybeingsociallyoutcast.Itistothe
purebenefitofcompaniestoimplementsentimentanalysisifthesecompanieshavetherelevant
informationavailableforsuchaprocess.Thebrandingandmarketingaspectsofbusinessesrevolve
aroundtheconsumerpsychology.Sentimentanalysiscouldrevealthispsychologyinaformthatcould
beusedforfurtheranalysisandstudy.
Itisimportanttonoticethattheimplementationsofsuchtechnologyonthebusinesssidehave
mutuallybeneficialeffectsfortheconsumer.Dependingontheindustryandthemannerinwhich
sentimentanalysisisbeingapplied,asystemforpresentingtheresultsandorganizedconclusionsfrom
theanalysiscouldbecreated.Ye(2009)mentionsarelationshiphere,Withtheresultsofsentiment
classification,consumerswouldknowthenecessaryinformationtodeterminewhichproductsto
purchaseandsellerswouldknowtheresponsefromtheircustomersandtheperformancesoftheir
competitors.Itthenturnsintoacyclicalsystemthatshouldresultinhigherqualityproductsovertime.It
isanefficientwaytocrowdsourceusefuldatawithouttheusersputtingforthanyextraeffort.Theusers
couldevenbeunawarethattheyareimprovingtheirfutureshoppingexperiences.Theusersand
creatorsofthetexttobeanalyzedwillsilentlybebenefitingtwopartieswhileexpressingtheirnatural
opinions.
Sentimentanalysisisausefultoolforallusersoftheinternet.Emotionalclassificationand
organizationofcontentwillbeabeneficialcontributiontothevastreservoirofdatatheinternetholds.
Thefieldhasmadesteadyachievementoverthelasttwentyyearsbutstillhasmuchroomtogrowand
improve.Thisisanexcitingpursuitforthoseinvolved.Thecompaniesandresearcherssupportingthe
improvementofsentimentanalysiswillbecontributingtoanimprovedenvironmentforallusers.Users

Prae11

shouldenjoycommunicatingwithamachinethatunderstandstheemotionalneedsoftheusersandcan
offereffectivesolutionstotheusersproblems.Thisisthebasicresultofsentimentanalysiseveninthe
currentform.Itwillonlyevolvetolearnhowtomeetourneedsmoreeffectively.

Prae12

Listofreferences
Bilton,N.(2010,September13).ATechWorldthatCenterontheUser.NewYorkTimes:New
YorkEdition.p.B1.
Cai,K.,Spangler,S.,Chen,Y.,&Zhang,Li.(2010).Leveragingsentimentanalysisfortopicdetection.
WebIntelligenceandAgentSystems:AnInternationalJournal,8(2010),291302.
Grimes,S.(2009).TextAnalytics2009:UserPerspectivesonSolutionsandProviders.AltaPlana.
PublishedundertheCreativeCommonsAttribution3.0License.
Kho,N.D.(2010).Customerexperienceandsentimentanalysis.KMWorld,February2010,1020.
Li,N.,Liang,X.,Li,X.,Wang,C.,&Wu,D.(2009).NetworkEnvironmentandFinancialRiskUsing
MachineLearningandSentimentAnalysis.HumanandEcologicalRiskAssessment,15,
227252.
Qiu,G.,He,X.,Zhang,F.,Shi,Y.,Bu,J.,&Chen,C.(2010).DASA:Dissatisfactionoriented
AdvertisingbasedonSentimentAnalysis.ExpertSystemswithApplications,37(2010),
61826191.
Tang,H.,Tan,S.,&Cheng,X.(2009).Asurveyonsentimentdetectionofreviews.ExpertSystems
withApplications,36(2009),1076010773.
Ye,Q.,Zhang,Z.,&Law,R.(2009).Sentimentclassificationofonlinereviewstotraveldestinationsby
supervisedmachinelearningapproaches.ExpertSystemswithApplications,36(2009),
65276535.

You might also like