Professional Documents
Culture Documents
Business Intelligence: Introduction
Business Intelligence: Introduction
Business Intelligence: Introduction
Briefoverviewofcoursecoverage
BusinessIntelligenceoverview
BIdefinitionsandconcepts
T d i BI
TrendsinBI
Dataminingfromabusinessperspective
DataWarehousingconcepts
Dataminingtechniquesandapplications
AssociationRules
Classificationandclustering
Numeric prediction
Numericprediction
Dataminingtools
UsingWeka,Matlab (iftimepermits)
BIandcloudcomputing benefitsandrisks
CognitiveBI
Evaluationcomponents
Tests(2) 30%
Assignments/Presentations 15%
Assignments/Presentations
Termpaper/Project 20%
FinalExam 35%
Readingmaterials
Willbeprescribedasthecourseprogresses.
Willconsistmostlyofrelevantjournalarticles.
BI Consultantsview..1
HowardDresner ofGartnergroupcoinedthetermBIin1989.
HedefinedBIasasetofconceptsandmethodologiesto
improvedecisionmakinginbusinessthroughtheuseoffacts
p
g
g
andfactbasedsystems
Businessintelligence(BI) isanumbrellatermthatincludes
theapplications,infrastructureandtools,andbestpractices
that enable access to and analysis of information to improve
thatenableaccesstoandanalysisofinformationtoimprove
andoptimizedecisionsandperformance.(Gartner)
BI Consultantsview..2
Businessintelligence (BI)isasetofmethodologies,processes,
g
architectures,andtechnologiesthattransformrawdatainto
meaningfulandusefulinformation.Itallowsbusinessuserstomake
informedbusinessdecisionswithrealtimedatathatcanputa
companyaheadofitscompetitors.Traditionally,corefeatureslike
reportingandanalyticshavebeenthefocusofBItechnology
choices,butasthosefeaturesgetcommoditized,awholenewset
ofpossibilitieshasemerged.Forrester'sBIresearchshowsthatthe
technology is evolving and that enterprises on the cutting edge of
technologyisevolvingandthatenterprisesonthecuttingedgeof
thesenewtrendscangaincompetitiveadvantageintheir
industries.
(http://www.forrester.com/Topic+Overview+Business+Intelligence//E
RES39218?objectid=RES39218)
BIdenotesontheonehandananalyticprocessthat
transforms internal and external data into information about
transformsinternalandexternaldataintoinformationabout
capabilities,marketpositions,activities,andgoalsthatthe
companyshouldpursueinordertostaycompetitive.
Ontheotherhand,BIstandsforInformationSystemconcepts
likeOnlineAnalyticalprocessing(OLAP),Queryingand
Reporting or Data Mining that provide different methods for
Reporting,orDataMiningthatprovidedifferentmethodsfor
aflexiblegoaldrivenanalysisofbusinessdata,provided
throughacentraldatapool
BusinessIntelligence
BusinessIntelligence(BI)referstotechnologies,applications
and practices for the collection, integration, storage, access,
andpracticesforthecollection,integration,storage,access,
analysis,andpresentationofbusinessinformationtohelp
usersmakebetterdecisions(derivedfromMarr,2010;Wixom,Watson,&
Werner,2011).
BIhasgainedmuchtractionintheITpractitionercommunity
andacademiaoverthepasttwodecades
BusinessIntelligence(BI)applicationshavebeendominating
thetechnologyprioritylistofmanyCIOs
ArepresentationofBI
BIprocesses
Making
decisions
DataPresentation:
Visualisationtechniques
DataMining:Informationdiscovery
DataExploration:Statisticalanalysis,
Querying&Reporting
DataWarehouses/DataMarts:OLAP,MDA
DataSources:Paper,files,informationproviders,databases,etc.
ABIframework
Source:Watson&Wixom,2007.ThecurrentstateofBI
AsimpleexampleofBI
Source:http://exonous.typepad.com/mis/business_intelligence.jpg
BIandmanagementlevels
BIsupportsdecisionmakingatalllevelsintheorganisation.
Low
High
Strategic
Impact
Frequency
Tactical
Tactical
Operational
Low
High
Example
Forarestaurant:
Wherecouldbethenext5restaurants?
Where could be the next 5 restaurants? [S]
Iscruiselinercateringmoreattractivethanflightkitchen
managementopportunity? [S]
Whataretherightmonthsintheyearforencouraging
customerstoredeemloyaltypoints? [T]
Whatmenuitemshouldbedroppedthisweektohandlebad
weather? [O]
Makesusseof:
Answersthequestions
BIvs BA
BusinessIntelligence
BusinessAnalytics
Whathappened?
Whendidit
h d d happen?
h
?
Whoisaccountableforwhathappened?
Howmany?
Howoften?
Wheredidithappen?
Whydidithappen?
h
?
Willll ithappenagain?
Whatwillhappenifwechangex?
Whatelsedoesthedatatellthatwe
neverthoughttoask?
Whatisthebestthatcanhappen?
Reporting(KPIs,metrics)
Automatedmonitoring/alerting
(thresholds)
Dashboards/Scorecards
hb d /
d
OLAP
Adhocquery
Statistical/Quantitative analysis
Datamining
Predictivemodeling
Designofexperimentstoextractlearning
f
l
outofbusinessdata
Multivariatetesting
Source:Prasad&Acharya,2011,page96
EvolutionofBIandBA
Thetermintelligencehasbeenusedbyresearchersinartificial
g
intelligencesincethe1950s.
BusinessintelligencebecameapopularterminthebusinessandIT
communitiesonlyinthe1990s.
In2000s,businessanalyticswasintroducedtorepresentthekey
analyticalcomponentinBI.
Morerecentlybigdataandbigdataanalyticshavebeenusedto
describethedatasetsandanalyticaltechniquesinapplicationsthat
aresolarge(fromterabytestoexabytes)andcomplex(fromsensor
tosocialmediadata)thattheyrequireadvancedanduniquedata
storage,management,analysis,andvisualizationtechnologies.
Source:Chen,H.,Chiang,R.H.L.,&Storey,V.C.2012.Businessintelligenceand
analytics Frombigdatatobigimpact.MISQuarterly,36(4):11651188
Typesofanalytics
Source:Banerjee,Arindam.,Bandyopadhyay,Tathagata.,&Acharya,Prachi.2013.Dataanalytics:Hypedup
aspirationsortruepotential?Vikalpa,38(4):111.
GartnersvaluechainmodelofAnalytics
Source:Koch,Rod.2015.Frombusinessintelligencetopredictiveanalytics,StrategicFinance,97(1):5657.
SevenRealitiesthatJeopardizeBusiness
Survival
InInformationRevolution,JimDavis,GloriaJ.Miller,
andAllanRusselldiscussthesevenrealitiesthat
d All R
ll di
th
liti th t
jeopardizebusinesssurvival.
Eachrealityilluminatestheneedfornewbusiness
modelsaswellasstylesofleadership.
19
BusinessReality1:Businesscyclesareshrinking
Intodayswebenabledeconomy,speedwithinallpartsofthebusinessmodelis
the great differentiator
thegreatdifferentiator.
Toaccommodatechangingmarketsandconsumerpreferences,product
developmentandtestingthatusedtotakeyearshasbeenshrunktomonthsor
evenweeks.Today,thefirsttomarketoftenenjoysthecompetitiveedge.
Thisshortenedcyclechallengesmanagerstomakedecisionswithlesstimefor
considerationoranalysis.Asaresult,theymustdependonacombinationof
accurate,actionableinformationandintuition.
Andtheirdecisionmustbeinalignmentwiththeoverallstrategyofthecompany.
20
10
BusinessReality2:Youcanonlysqueezesomuchjuice
outofanorange
Thegoalofimprovingoperationalefficiencydrovea
majorityoftheinvestmentinthelastdecade.
j it f th i
t
t i th l t d d
Initiallythereturnswerehighandprovidedacompetitive
advantage.
However,nowthatenterpriseresourceplanning(ERP)
software is available the field has been leveled
softwareisavailable,thefieldhasbeenleveled.
Thenextstepisgreaterinnovationandagility.
21
BusinessReality3:TheRuleshavechanged;Thereis
nomoreBusinessasusual
Thedaysoffollowingatypicalpathtobusinesssuccessareover.
Thesamefactorsapply:profitability,customersatisfaction,stakeholder
value,andcompetition.
However,thepathtosuccessisverydifferentandisfraughtwithnew
challenges:
Mergersandacquisitionshavehinderedagilityandcohesiveness.
Productivityadvancementshaveincreasedexpectationsfrombothcustomersand
management.
AdvancementsinIThaveoverwhelmedtheabilitiesofsomecompaniestomanageand
leveragetheknowledge.
Thetechnologiesthatwereintroducedasthekeytosuccessoftenfailedbecausethehuman
issueswereoverlooked.
22
11
BusinessReality4:Theonlyconstantispermanent
volatility
Thisisacommonthemebutbearsrepeating:Thecompanythat
ismostagileandadaptablewillgainandmaintainacompetitive
advantage.
Insteadofjustrelyingonpastresultstopredictthefuture,
companies need to tap into current trends through social
companiesneedtotapintocurrenttrendsthroughsocial
networking,Webanalysis,andemployeefeedback.
23
BusinessReality5:Globalizationhelpsandhurts
Globalizationpresentsmanyadvantages,especiallytosmall
Globalization
presents many advantages, especially to small
companiesseekingaworldwidepresence.
AnycompanythatisconnectedtotheWebcanstrategically
partner,outsource,orinsource withrelativeease.
Thedownsideisincreasedcomplexitywhendealingwith
internationallanguages,standards,andcultures.Strong
communicationskillsareessentialfornavigatingthisterrain.
24
12
BusinessReality6:Thepenaltiesofnotknowingare
harsherthanever
Intheneweraofbillion dollarcorporatescandals,personal
accountabilityatthehighestlevelsisnotonlyprudent,itisnow
t bilit t th hi h t l l i
t l
d t it i
legallymandated.
Eg:TheSarbanes OxleyActintheUSwasdesignedtosystematize
ethicalbehavior.
Indianequivalent??
Inadditiontotheneedforstrong,honestleadership,information
systemstohandlethiscomplexbusinessdataareessential.
25
BusinessReality7:Informationisnotabyproductof
business;itisthelifebloodofbusiness
Theseventhbusinessrealityisadirectresultofthefirstsix.Duetoshrinking
business cycles level playing fields changing rules volatility globalization and
businesscycles,levelplayingfields,changingrules,volatility,globalization,and
thecostofignorance,informationhasbecomethelifebloodofmanybusinesses.
Today,accurate,accessible,actionableinformationisnecessarytocompetein
theglobaleconomy.
Therearestrongpressurestoachievemoreresultswhilespendinglesstimeand
money.
money
Companiesneeduptotheminuteinformationabouttheircustomers,suppliers,
competitors,andmarkets.
26
13
BusinessIntelligencehasthe
Business
Intelligence has the
ultimategoalofgettingtheright
informationtotherightpeople
attherighttimethroughthe
rightchannel
i ht h
l (Rud,2009)
27
ThebusinessvalueofBI
ExperienceworkingwithandtalkingtobusinessandIT
j
p
y
gg
leadersatmajorcompaniesinavarietyofindustriessuggests
thatthesecompaniesarestilldatarichbutinformationpoor.
Inotherwords,theseenterpriseslackthekindofactionable
informationandanalyticaltoolsneededtoimproveprofits
andperformance(Williams&Williams,2007)
Businessintelligence(BI)isaresponsetothisneed.
28
14
Examplesofcompaniesthatdemonstratethe
truepotentialofBI..1
WesternDigital,amanufacturerofcomputerharddiskdrives
withannualsalesofmorethan$3billion,usesBItobetter
ih
l l
f
h $3 billi
BI b
manageitsinventory,supplychains,productlifecycles,and
customerrelationships.BIenabledthecompanytoreduce
operatingcostsby50%.
CapitalOne,aglobalfinancialservicesfirmwithmorethan50
Capital
One, a global financial services firm with more than 50
millioncustomeraccounts,usesBItoanalyzeandimprovethe
profitabilityofitsproductlinesaswellastheeffectivenessof
itsbusinessprocessesandmarketingprograms.
29
Examplesofcompaniesthatdemonstratethe
truepotentialofBI..2
ContinentalAirlines,aU.S.airlinecompanythatwasnear
bankruptcy in the 1990s, invested $30 million in BI to improve
bankruptcyinthe1990s,invested$30millioninBItoimprove
itsbusinessprocessesandcustomerservice.Inthefollowing
sixyears,Continentalreapedastaggering$500millionreturn
onitsBIinvestmentforareturnoninvestment(ROI)ofmore
than1,000%.
CompUSA, amajorretailerofcomputerequipmentand
a major retailer of computer equipment and
software,usesBItoanalyzeitssalestrends.Thecompany
earnedanROIofmorethan$6millioninthefirstphaseofthe
project.
30
15
Theenterprisesthatarecapableoftransformingdataintoinformationand
knowledgecanusethemtomakequickerandmoreeffectivedecisionsandthus
toachieveacompetitiveadvantage
'Businessintelligence'includesmathematicalmodelsandanalysismethodologies
thatsystematicallyexploittheavailabledatatoretrieveinformationand
knowledgeusefulinsupportingcomplexdecisionmakingprocesses.
Abusinessintelligenceenvironmentoffersdecisionmakersinformationand
knowledgederivedfromdataprocessing,throughtheapplicationof
mathematical models and algorithms
mathematicalmodelsandalgorithms.
Insomeinstances,thesemaymerelyconsistofthecalculationoftotalsand
percentages,whilemorefullydevelopedanalysesmakeuseofadvancedmodels
foroptimization,inductivelearningandprediction
31
Theadventof lowcostdatastoragetechnologiesandthewide
availabilityofInternetconnections havemadeiteasierfor
individualsandorganizationstoaccesslargeamountsofdata.
Suchdataareoftenheterogeneous inorigin,contentand
representation
p
Advancesintechnologyoverthelasttwodecadeshave
enabledcompaniestoobtain,organize,analyze,store,and
retrievehugeamountsofdata.
Someegs:commercial,financialandadministrative
transactions,webnavigationpaths,emails,textsand
hypertexts,theresultsofclinicaltestsetc.
32
16
Decisionmakinginorganizations
Incomplexorganizations,publicorprivate,decisionsare
madeonacontinualbasis.
Decisions
Moreorlesscritical
Havelongtermor
shorttermeffects
Mayinvolvepeople
androlesatvarious
hierarchicallevels
Theabilityoftheseknowledgeworkerstomakedecisions,
bothasindividualsandasacommunity,isoneoftheprimary
factorsthatinfluencetheperformanceandcompetitive
strengthofagivenorganization.
33
Examplesofhighly
complexdecisionmakingprocessesinrapidly
changingconditions
Source:Vercellis,2009
34
17
Retentioninthemobilephoneindustry
Themarketingmanagerofamobilephonecompanyrealizesthatalarge
numberofcustomersarediscontinuingtheirservice,leavinghercompany
infavorofsomecompetingprovider.Ascanbeimagined,lowcustomer
loyalty,alsoknownascustomerattritionorchurn,isacriticalfactorfor
many companies operating in service industries Suppose that the
manycompaniesoperatinginserviceindustries.Supposethatthe
marketingmanagercanrelyonabudgetadequatetopursueacustomer
retentioncampaignaimedat2000individualsoutofatotalcustomerbase
of2millionpeople.Hence,thequestionnaturallyarisesofhowsheshould
goaboutchoosingthosecustomerstobecontactedsoastooptimizethe
effectivenessofthecampaign.Inotherwords,howcantheprobability
thateachsinglecustomerwilldiscontinuetheservicebeestimatedsoas
totargetthebestgroupofcustomersandthusreducechurningand
maximizecustomerretention?Byknowingtheseprobabilities,thetarget
y
g
p
g
groupcanbechosenasthe2000peoplehavingthehighestchurn
likelihoodamongthecustomersofhighbusinessvalue.Withoutthe
supportofadvancedmathematicalmodelsanddataminingtechniques,it
wouldbearduoustoderiveareliableestimateofthechurnprobability
andtodeterminethebestrecipientsofaspecificmarketingcampaign.
35
Logisticsplanning
Thelogisticsmanagerofamanufacturingcompanywishesto
developamediumtermlogisticproductionplan.Thisisadecision
makingprocessofhighcomplexitywhichincludes,amongother
choices,theallocationofthedemandoriginatingfromdifferent
h i
h ll
i
f h d
d i i i f
diff
marketareastotheproductionsites,theprocurementofraw
materialsandpurchasedpartsfromsuppliers,theproduction
planningoftheplantsandthedistributionofendproductsto
marketareas.Inatypicalmanufacturingcompanythiscouldwell
entailtensoffacilities,hundredsofsuppliers,andthousandsof
finishedgoodsandcomponents,overatimespanofoneyear
dividedintoweeks.Themagnitudeandcomplexityoftheproblem
suggestthatadvancedoptimizationmodelsarerequiredtodevise
thebestlogisticplan.Optimizationmodelsallowhighlycomplex
andlargescaleproblemstobetackledsuccessfullywithina
businessintelligenceframework.
36
18
Cycleofabusinessintelligenceanalysis..1
Eachbusinessintelligenceanalysisfollowsitsownpath
according to the application domain, the personal attitude of
accordingtotheapplicationdomain,thepersonalattitudeof
thedecisionmakersandtheavailableanalytical
methodologies.
However,itispossibletoidentifyanidealcyclicalpath
characterizingtheevolutionofatypicalbusinessintelligence
analysiseventhoughdifferencesstillexistbaseduponthe
p
peculiarityofeachspecificcontext.
y
p
37
Cycleofabusinessintelligenceanalysis..2
Analysis
Evaluation
Insight
Decision
38
19
Analysis
Duringtheanalysisphase,itisnecessarytorecognizeandaccuratelyspellouttheproblem
athand.
Decisionmakersmustthencreateamentalrepresentationofthephenomenonbeing
analyzed,byidentifyingthecriticalfactorsthatareperceivedasthemostrelevant.
Theavailabilityofbusinessintelligencemethodologiesmayhelpalreadyinthisstage,by
permittingdecisionmakerstorapidlydevelopvariouspathsofinvestigation.
Forinstance,theexplorationofdatacubesinamultidimensionalanalysisallowsdecision
i
h
l
i
fd
b i
l idi
i
l
l i ll
d ii
makerstomodifytheirhypothesesflexiblyandrapidly,untiltheyreachaninterpretation
schemethattheydeemsatisfactory.
Thus,thefirstphaseinthebusinessintelligencecycleleadsdecisionmakerstoaskseveral
questionsandtoobtainquickresponsesinaninteractiveway.
39
Insight
Thesecondphaseallowsdecisionmakerstobetterandmoredeeplyunderstandtheproblemat
hand, often at a causal level.
Forinstance,iftheanalysiscarriedoutinthefirstphaseshowsthatalargenumberofcustomers
arediscontinuinganinsurancepolicyuponyearlyexpiration,inthesecondphaseitwillbe
necessarytoidentifytheprofileandcharacteristicssharedbysuchcustomers.
Theinformationobtainedthroughtheanalysisphaseisthentransformedintoknowledgeduring
theinsightphase.
Ontheonehand,theextractionofknowledgemayoccurduetotheintuitionofthedecision
makersandthereforebebasedontheirexperienceandpossiblyonunstructuredinformation
availabletothem.
Ontheotherhand,inductivelearningmodelsmayalsoproveveryusefulduringthisstageof
analysis,particularlywhenappliedtostructureddata.
40
20
Decision
Duringthethirdphase,knowledgeobtainedasaresultoftheinsightphase
During the third phase knowledge obtained as a result of the insight phase
isconvertedintodecisionsandsubsequentlyintoactions.
Theavailabilityofbusinessintelligencemethodologiesallowstheanalysis
andinsightphasestobeexecutedmorerapidlysothatmoreeffectiveand
timelydecisionscanbemadethatbettersuitthestrategicprioritiesofa
givenorganization.
Thisleadstoanoverallreductionintheexecutiontimeoftheanalysis
decisionactionrevisioncycle,andthustoadecisionmakingprocessof
betterquality.
41
Evaluation
Thefourthphaseofthebusinessintelligencecycleinvolves
performancemeasurementandevaluation.
Extensivemetricsshouldthenbedevisedthatarenot
exclusivelylimitedtothefinancialaspectsbutalsotakeinto
l i l li it d t th fi
i l
t b t l t k i t
accountthemajorperformanceindicatorsdefinedforthe
differentcompanydepartments.
42
21
BIinaction
TheultimateaimofBIisimprovedbusinessperformance.
BIusesinformationandanalyseswithinthecontextofbusiness
y
processestoenabledecisionsandactionsthatultimatelyleadto
improvedbusinessperformance.
LetusconsiderhowthehotelandcasinooperatorHarrahs
EntertainmentusesBItoimproverevenueandprofitthrough
customerrelationshipmanagement.
43
Harrahsentertainment
HarrahsrunsnotonlyitsflagshiphotelandcasinoinLas
Vegas,Nevada,butmorethantwodozencasinosinadozen
otherstates.
ItsBIinvestmentenabledHarrahstoenjoy16consecutive
quartersofrevenuegrowth.In2002,itearneda$235million
profitonmorethan$4billioninrevenue(Loveman,2003).
ThatwasastartlingimprovementfromHarrahssolidbutnot
spectacularperformanceonlyafewyearsearlier.
Harrah
HarrahssinvestedinBItohelpitwinandconsolidatethe
invested in BI to help it win and consolidate the
loyaltyofitsbestcustomers.ItsfirsteffortwastheTotal
Goldprogram,whichwasmodeledonairlinefrequentflyer
programs.
44
22
However,TotalGoldwastoosimilartothecustomerloyalty
programsofferedbyothercasinostogiveHarrahsakiller
edge,butitdidprovetobearichresourceofdatafor
q
p
,
HarrahssubsequentBIefforts.Inparticular,theTotalGold
datawarehouseprovidedvaluablebusinessinformation
aboutHarrahs customers.
TotalGoldcardholderswerespendingonly36%oftheir
gamingdollarsinHarrahscasinos.Harrahswantedthat
percentage to increase.
percentagetoincrease.
TwentysixpercentofHarrahscasinocustomersgenerated
82%ofitsrevenues.
45
ThosehighvaluecustomerswerenotthepeopleHarrahs
expected.Insteadofhighrollerswearingcowboyboots
steppingoutoflimousines,thecustomerswhobroughtinthe
mostrevenueweredentists,schoolteachers,officeworkers,
,
,
,
andthelike.
Theydidntspendhugeamountsofmoneyinanyonevisit,
butweekin,weekout,monthaftermonththeystoppedat
Harrahsafterwork,intheevenings,oronweekendstorelax
in the casino or have a meal.
inthecasinoorhaveameal.
46
23
Thatbusinessinformation,combinedwithbusinessanalysis,
enabledHarrahsbothtoknowwhoitsmostvaluable
customerswereandtoofferthempersonalizedservice.
HarrahssevolvedTotalGoldintothe
Harrah
evolved Total Gold into the Total
TotalRewards
Rewardsprogram,
program
whichdivideditsgamingcustomersintothreelevelsof
service(gold,platinum,anddiamond)basedontheirlong
termrevenuevaluetothecompany.
Inadditiontoidentifyingitsmostvaluablecustomers,
H
HarrahsalsousedBItoanalyzewhatthosecustomerswanted
h l
d BI t
l
h t th
t
t d
andwhatmeasuresmightwintheirloyalty.
47
Diamondlevelcardholderswouldseldomifeverhavetowait
inlineforanything,whethertocheckintothehotel,gettheir
carsparked,orbeseatedinoneofHarrahsrestaurants.If
theycalledtoreservearoom,theymightqualifyforspecial
low rates based on predictions from BI about their probable
lowratesbasedonpredictionsfromBIabouttheirprobable
valueascasinocustomers.
Platinumlevelcardholdersreceivedaslightlylowerlevelof
service,whilegoldlevelcardholderswereessentiallyflying
coach.
Harrahssucceededinstructuringitsservicestomotivate
customerstotrytoqualifyforhigherlevelTotalRewards
cards.
48
24
BIfromthedatawarehouseevenprovidedinsightabouthow
Harrahsshouldarrangethefloorplansinitscasinosandhow
tomakeslotmachineslookmoreattractive.
Realtimeanalyticsenabledonthespotpersonalizedservice
forvaluedcustomers,suchasaninstantgrantof$100credit
toaloyalcustomerwhodhitalosingstreak.
Allthesefactorshelpedmotivatecustomerstocometo
y
p
g
g
Harrahsandstaytheretospendtheirgamingdollars.
AndthisprogramwouldnothavebeenpossiblewithoutBI
techniquesappliedtodatawarehouseinformation.
49
Thecombinationofbusinessinformationandbusiness
analysisisusedbyHarrahsandmanyothersuccessful
organizationstomakemorestructuredandrepeatable
business decisions about the features and targeted recipients
businessdecisionsaboutthefeaturesandtargetedrecipients
ofdirectmarketingoffers.
Becausemotivatingandretainingitsmostvaluablecasino
customersisakeydriverofprofits,Harrahshasrefinedits
customerrelationshipmanagementprocess,acorebusiness
process.
50
25
Theprocessexplicitlyembedstheuseoftheabovedescribed
businessinformationandbusinessanalysessothatbusiness
decisionsaboutwhomtotargetwithwhatmeasuresarefact
based,analyticallyrigorous,andrepeatable.
Thesedecisionsareimplementedthroughactionsfrom
Harrahsfrontdoortoitscasinos,restaurants,rooms,and
telephoneservices.
p
p
ThoseactionshaveimprovedHarrahsbusinessperformance,
resultinginincreasedprofit.
51
http://isites.harvard.edu/fs/docs/icb.topic1227012.files/Harra
h%20Entertainment.pdf
52
26
Formsofdigitaldata
Unstructured
Datawhichdoesnotconformtoadatamodelorisnotinaformthatcan
easilybeusedbyacomputerprogram
About8090%ofthedataofanorganisationisinthisform
Examples:Memos,chatsrooms,ppts,images,videos,letters,research
reports,whitepapers,bodyofanemailetc.
Semistructured
Datawhichdoesnotconformtoadatamodelbuthassomestructure
Howeveritisnotinaformthatcaneasilybeusedbyacomputerprogram
Examples:emails,XML,markup languageslikeHTML,etc.
Metadataforthisdataisavailable,butnotsufficient
Structured
Datawhichisinanorganisedform(eg:inrowsandcolumns)andcan
easilybeusedbyacomputerprogram
Datastoredindatabasesareexamples
53
EExampleofdatasources
l fd t
Thinkaboutahospital
Whatarethedifferentkindsofdatageneratedandstored?
54
27
Ponderthefollowingaspects
Whereiseachtypeofdatapresent?
How is it stored?
Howisitstored?
Howisthedesiredinformationextractedfromit?
Howimportantistheinformationprovidedbyit?
Howcanthisinformationaugmenttheservicesprovided?
55
Asnapshotofstructureddata
GoodLifeHealthcare
G
dLif H lth
PatientIndexCard
PatientID
<>
Date
<>
PatientName
<>
Patientage
<>
BodyTemperature
<>
Nurse Name
NurseName
<>
Bloodpressure
<>
56
28
Characteristicsofstructureddata
Conformstoa
datamodel
Dataisstoredin
theformof
rowsand
columns
Similarentities
aregrouped
Structured
data
Data resides in
Dataresidesin
fixedfields
withinarecord
orfile
Attributesina
grouparethe
same
Definition,
formatand
meaningofdata
isexplicitly
known
57
Structureddata
Summary
C
Consistsoffullydescribed
i t f f ll d
ib d
datasets
Hasclearlydefinedcategories
andsubcategories.
Isplacedneatlyinrowsand
columns.
Goesintotherecordsand
hencethedatabaseis
regulatedby awelldefined
structure.
Canbeindexedeasilyeitherby
theDBMSitselformanually
Sources
Databases(Eg.Access)
Spreadsheets
SQL
OLTPsystems
58
29
Easewithstructureddata
Storage
Update
and
delete
Easewith
structured
data
Scalability
Security
59
Easeofworkingwithstructureddata
Retrieving
iinformation
f
ti
BI
operations
Easeof
workingwith
structured
data
Indexing
and
searching
Miningdata
60
30
Unstructureddata Ascenario
Dr.Ben,Dr.StanleyandDr.MarkworkatthemedicalfacilityofGoodLife.Overthe
pastfewdays,Dr.BenandDr.Stanleyhadbeenexchanginglongemailsabouta
particularcaseofgastrointestinalproblem.Dr.Stanleyhaschancedupona
p
particularcombinationofdrugsthathassuccessfullycuredgastrointestinal
g
y
g
disordersinhispatients.Hehaswrittenanemailaboutthiscombinationofdrugs
toDr.Ben.
Dr.MarkhasapatientintheGoodLifeemergencyunitwithquiteasimilarcaseof
gastrointestinaldisorderwhosecureDr.Stanleyhaschancedupon.Dr.Markhas
alreadytriedregulardrugsbutwithnoluck.Theinformationhewantsistucked
awayintheemailconversationbetweentwootherGoodLifedoctors,Dr.Benand
Dr.Stanley.Dr.Markwouldhaveaccessedthesolutionwithafewmouseclickshad
thestorageandanalysisofunstructureddatabeenundertakenbyGoodLife.
As in the case at GoodLife
Asinthecaseat
GoodLife ,80to85%ofdatainanyorganisationisunstructured
80 to 85% of data in any organisation is unstructured
andisgrowingatanalarmingrate.Anenormousamountofknowledgeisburiedin
thisdata.Intheabovescenario,Dr.StanleysemailtoDr.Benhadnotbeen
successfullyupdatedintothemedicalsystemdatabaseasitfellintheunstructured
format.
61
Characteristicsofunstructureddata
Doesnot
conformtoany
datamodel
Datacannotbe
storedinthe
formofrows
andcolumnsas
inadatabase
Hasnoeasily
identifiable
structure
Unstructured
data
Notinany
Not
in any
particular
formator
sequence
Doesnotfollow
anyrulesor
semantics
Noteasily
usablebya
program
62
31
Sourcesofunstructureddata
Webpages
Memos
Videos (MPEG etc )
Videos(MPEG,etc.)
Images(JPEG,GIFetc.)
Bodyofanemail
Worddocuments
PowerPointpresentations
Chats
Reports
Whitepapers
S
Surveys
63
Broadly2categoriesofunstructureddata
1. Bitmapobjects
2. Textualobjects
Alotofunstructureddataisalsonoisytext.Eg.Chats,emails,
SMStexts.
Thelanguageofnoisytextdifferssignificantlyfromthe
standardformoflanguage
64
32
Howtomanageunstructureddata
Indexing
Tags/Metadata
Classification/Taxonomyy
ContentAccessible
(Addressable)Storage
(CAS)
Helpsinsearchingandretrieval
Onthebasisofsomevalueinthedata,indexisdefinedwhichisnothingbutan
identifierandrepresentsthelargerecordinthedataset.
Usingmetadata,datainadocumentetc.,canbetagged
Thisenablessearchandretrieval
Thisisdifficultwithunstructureddata
Taxonomyisclassifyingdataonthebasisoftherelationshipsthatexistbetween
data
Datacanbearrangedingroupsandplacedinhierarchiesbasedonthetaxonomy
prevalentinanorganisation
l
i
i i
Thisisdifficultwithunstructureddata
Itstoresdatabasedontheirmetadata
Assignsauniquenametoeveryobjectstoredinit
Theobjectisretrievedbasedonitscontentandnotitslocation
Usedextensivelytostoreemailsetc.
http://www.emc.com/collateral/hardware/datasheet/c931emccenteracasds.pdf
65
Challengesofstoringunstructureddata
Storage
space
Indexingand
searching
Scalability
Challenges
ofstoring
unstructured
data
Updateand
delete
Retrieving
information
Security
66
33
Solutionstostoragechallengesof
unstructureddata
Change
f
t
formats
New
hardware
CAS
Possible
solutions
Storingin
XMLformat
RDBMS/BL
OBS
67
BLOB
Abinarylargeobject(BLOB)isadatatypethatcanstorebinaryobjectsor
data.
Binarylargeobjectsareusedindatabasestostorebinarydatasuchas
Binary large objects are used in databases to store binary data such as
images,multimediafilesandexecutablesoftwarecode.
Binarylargeobjectsareprimarilyusedinalldatabasesoftware.Generally,
databasesoftwareclassifiesbinarylargeobjectsintotwotypes:semi
structureddataandunstructureddata.XMLfilesarecategorizedassemi
structureddata,whereasimagesandmultimediadataareunstructured
datatypes.BothoftheseBLOBsaregenerallynotinterpretablebythe
database.
68
34
Apossiblesolutionforunstructureddata
UIMA UnstructuredInformationManagementArchitecture
IsanopensourceplatformfromIBMwhichintegratesdifferent
Is an open source platform from IBM which integrates different
kindsofanalysisenginestoprovideacompletesolutionfor
knowledgediscoveryfromunstructureddata.
http://www01.ibm.com/software/ecm/content
analytics/uima.html
69
UIMA
Analysis
Acquiredfrom
q
varioussources
Delivery
Queryand
presentation
Structured
information
access
Subjectedto
semantic
ti
analysis
Trransformed
into
Unstructureddatasuchas
chat,email,imagesetc.
Structured
information
Users
70
35
Characteristicsofsemistructured data
Doesnot
conformtoa
datamodel,but
containstags
andelements
(metadata)
Datacannotbe
storedinthe
formofrows
andcolumnsas
inadatabase
Similarentities
aregrouped
Semistructured
data
Thetagsand
The
tags and
elements
describethe
datastored
Attributesina
groupmaynot
bethesame
Notsufficient
metadata
72
36
Forexample,twoaddressesmayormaynotcontainthesamenumberof
properties/attributes:
Address1
<housenumber><streetname><areaname><city>
Address2
<housenumber><streetname><city>
Ontheotherhand,anemailfollowsastandardformatsuchas:
To: <Name>
From:
<Name>
Subject<Text>
CC: <Name>
Body:
<Text,Graphics,Imagesetc.>
Thoughtheaboveemailtagsgiveussomemetadata,thebodyoftheemailcontainsno
format.Neitherdoesitconveythemeaningofthedataitcontains.
73
Letusseewithanexamplehowentitiescanhavedifferentsetof
attributes.Theattributesmayalsohavedifferentdatatypesand
mostcertainlycanhavedifferentsizes.Foreg,namesandemailIDs
ofdifferentpeoplecanbestoredinmorethanonewayasshown
below.
below
Name:SebastianThomas
Email:sebthom@ymail.com,sebastiant@dcs.xyz.uk
Firstname:Mark
Second Name: Antony
SecondName:Antony
Email:MarkAntony@tmail.com
Name:JuliusCaesar
Email:julius@rmail.com
74
37
Dr.Mariannescase:Thebloodreportissemistructured
ABCHealthcare
BloodTestReport
Date
<>
Department
<>
Attendingdoctor <>
PatientName
<>
PatientAge
<>
Hemoglobincontent <>
RBCcount
<>
WBCcount
<>
Platelet count
Plateletcount
<>
Diagnosis<notes>
Conclusion<notes>
75
Howtomanagesemistructureddata
Schemas
Canbeusedtodescribethestructureofthedata
Schemasdefinetheconstraintsonthestructure,contentofthe
documentetc.
Graphbased
datamodels
Canbeusedtodescribedata
Thisisaschemalessapproach alsoknownasselfdescribing
asdataispresentedinsuchawaythatitexplainsitself
Treelikestructure verticescontainobjectorentityandleaves
containdata
XML
Widelyusedtostoreandexchangesemistructureddata
Itallowstheusertodefinetagstostoredatainhierarchicalor
nestedforms
SchemasinXMLarenottightlycoupledtodata
76
38
Challengesofstoringsemistructureddata
Storagecost
Distinction
between
schemaand
data
RDBMS
Challenges
ofstoring
unstructured
data
Irregularand
l
d
partial
structure
Evolving
schemas
Implicit
structure
77
Solutionstostoragechallengesofsemi
structureddata
XML
Datacanbestoredand
exchangedintheform
ofgraphwhereentities
arerepresentedas
objectswhichare
vertices in the graph
verticesinthegraph
OEM
XMLallowstodefinetagsandattributedto
storedata.Datacanbestoredina
hierarchical/nested structure
hierarchical/nestedstructure
Possible
solutions
RDBMS
Semistructureddatacan
bestoredinarelational
databasebymappingthe
datatoarelational
schemawhichisthen
mappedtoatable.
Special
purpose
DBMS
Databaseswhichare
specificallydefinedto
storesemistructureddata
78
39
OEM ObjectExchangeModel
OEM ObjectExchangeModelstructuresdataintheformofgraphs
AnOEMdatagraphisarooted,labeled,directedgraph.Itsedgelabelsmaptostrings.Only
itsleafnodeshavelabelswhichmaptodatavalues.Thereisnoorderingofedgesleavinga
node
79
XML
XMLisanopensourcemarkuplanguagewritteninplaintext.
Itisindependentofhardwareandsoftware
It is designed to store and transport data o er the internet
Itisdesignedtostoreandtransportdataovertheinternet
Itallowsdatatobestoredinhierarchicalornestedfashion
Theusercandefinetagstostoredata.XMLhasnopredefinedtags.
Italsoenablesseparationofcontent(eXtensible MarkupLanguage)from
presentation(eXtensible Stylesheet Language)
XMLisemergingasthesolutionforsemistructureddatamanagement
80
40
ExamplesofXML
81
Differencebetweensemistructuredand
structureddata
Semistructured
Structured
Name
emailID
AlternateemailID
AjayThakur
AjayT@yahoo.com,a
thakur@ymail.com
Ajay
Thakur
AjayT@yahoo.com
athakur@ymail.com
NeerjaGulati
ngulati@nmail.com
Neerja
Gulati
ngulati@nmail.com
firstname:Sanjay
lastname:Burman sanjayb@kmail.com
Sanjay
Burman
sanjayb@kmail.com
82
41