Business Intelligence: Introduction

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 41

BusinessIntelligence

Briefoverviewofcoursecoverage

BusinessIntelligenceoverview
BIdefinitionsandconcepts
T d i BI
TrendsinBI
Dataminingfromabusinessperspective
DataWarehousingconcepts
Dataminingtechniquesandapplications
AssociationRules
Classificationandclustering
Numeric prediction
Numericprediction
Dataminingtools
UsingWeka,Matlab (iftimepermits)
BIandcloudcomputing benefitsandrisks
CognitiveBI

Evaluationcomponents

Tests(2) 30%
Assignments/Presentations 15%
Assignments/Presentations
Termpaper/Project 20%
FinalExam 35%

Readingmaterials
Willbeprescribedasthecourseprogresses.
Willconsistmostlyofrelevantjournalarticles.

BI Consultantsview..1
HowardDresner ofGartnergroupcoinedthetermBIin1989.
HedefinedBIasasetofconceptsandmethodologiesto
improvedecisionmakinginbusinessthroughtheuseoffacts
p
g
g
andfactbasedsystems

Businessintelligence(BI) isanumbrellatermthatincludes
theapplications,infrastructureandtools,andbestpractices
that enable access to and analysis of information to improve
thatenableaccesstoandanalysisofinformationtoimprove
andoptimizedecisionsandperformance.(Gartner)

BI Consultantsview..2
Businessintelligence (BI)isasetofmethodologies,processes,
g
architectures,andtechnologiesthattransformrawdatainto
meaningfulandusefulinformation.Itallowsbusinessuserstomake
informedbusinessdecisionswithrealtimedatathatcanputa
companyaheadofitscompetitors.Traditionally,corefeatureslike
reportingandanalyticshavebeenthefocusofBItechnology
choices,butasthosefeaturesgetcommoditized,awholenewset
ofpossibilitieshasemerged.Forrester'sBIresearchshowsthatthe
technology is evolving and that enterprises on the cutting edge of
technologyisevolvingandthatenterprisesonthecuttingedgeof
thesenewtrendscangaincompetitiveadvantageintheir
industries.

(http://www.forrester.com/Topic+Overview+Business+Intelligence//E
RES39218?objectid=RES39218)

BIdenotesontheonehandananalyticprocessthat
transforms internal and external data into information about
transformsinternalandexternaldataintoinformationabout
capabilities,marketpositions,activities,andgoalsthatthe
companyshouldpursueinordertostaycompetitive.
Ontheotherhand,BIstandsforInformationSystemconcepts
likeOnlineAnalyticalprocessing(OLAP),Queryingand
Reporting or Data Mining that provide different methods for
Reporting,orDataMiningthatprovidedifferentmethodsfor
aflexiblegoaldrivenanalysisofbusinessdata,provided
throughacentraldatapool

BusinessIntelligence
BusinessIntelligence(BI)referstotechnologies,applications
and practices for the collection, integration, storage, access,
andpracticesforthecollection,integration,storage,access,
analysis,andpresentationofbusinessinformationtohelp
usersmakebetterdecisions(derivedfromMarr,2010;Wixom,Watson,&
Werner,2011).

BIhasgainedmuchtractionintheITpractitionercommunity
andacademiaoverthepasttwodecades
BusinessIntelligence(BI)applicationshavebeendominating
thetechnologyprioritylistofmanyCIOs

ArepresentationofBI

BIprocesses
Making
decisions
DataPresentation:
Visualisationtechniques
DataMining:Informationdiscovery
DataExploration:Statisticalanalysis,
Querying&Reporting
DataWarehouses/DataMarts:OLAP,MDA
DataSources:Paper,files,informationproviders,databases,etc.

ABIframework

Source:Watson&Wixom,2007.ThecurrentstateofBI

AsimpleexampleofBI

Source:http://exonous.typepad.com/mis/business_intelligence.jpg

BIandmanagementlevels
BIsupportsdecisionmakingatalllevelsintheorganisation.
Low

High
Strategic

Impact

Frequency
Tactical
Tactical

Operational
Low

High

Example
Forarestaurant:
Wherecouldbethenext5restaurants?
Where could be the next 5 restaurants? [S]
Iscruiselinercateringmoreattractivethanflightkitchen
managementopportunity? [S]
Whataretherightmonthsintheyearforencouraging
customerstoredeemloyaltypoints? [T]
Whatmenuitemshouldbedroppedthisweektohandlebad
weather? [O]

Makesusseof:

Answersthequestions

BIvs BA
BusinessIntelligence

BusinessAnalytics

Whathappened?
Whendidit
h d d happen?
h
?
Whoisaccountableforwhathappened?
Howmany?
Howoften?
Wheredidithappen?

Whydidithappen?
h
?
Willll ithappenagain?
Whatwillhappenifwechangex?
Whatelsedoesthedatatellthatwe
neverthoughttoask?
Whatisthebestthatcanhappen?

Reporting(KPIs,metrics)
Automatedmonitoring/alerting
(thresholds)
Dashboards/Scorecards
hb d /
d
OLAP
Adhocquery

Statistical/Quantitative analysis
Datamining
Predictivemodeling
Designofexperimentstoextractlearning
f
l
outofbusinessdata
Multivariatetesting

Source:Prasad&Acharya,2011,page96

EvolutionofBIandBA
Thetermintelligencehasbeenusedbyresearchersinartificial
g
intelligencesincethe1950s.
BusinessintelligencebecameapopularterminthebusinessandIT
communitiesonlyinthe1990s.
In2000s,businessanalyticswasintroducedtorepresentthekey
analyticalcomponentinBI.
Morerecentlybigdataandbigdataanalyticshavebeenusedto
describethedatasetsandanalyticaltechniquesinapplicationsthat
aresolarge(fromterabytestoexabytes)andcomplex(fromsensor
tosocialmediadata)thattheyrequireadvancedanduniquedata
storage,management,analysis,andvisualizationtechnologies.
Source:Chen,H.,Chiang,R.H.L.,&Storey,V.C.2012.Businessintelligenceand
analytics Frombigdatatobigimpact.MISQuarterly,36(4):11651188

Typesofanalytics

Source:Banerjee,Arindam.,Bandyopadhyay,Tathagata.,&Acharya,Prachi.2013.Dataanalytics:Hypedup
aspirationsortruepotential?Vikalpa,38(4):111.

GartnersvaluechainmodelofAnalytics

Source:Koch,Rod.2015.Frombusinessintelligencetopredictiveanalytics,StrategicFinance,97(1):5657.

SevenRealitiesthatJeopardizeBusiness
Survival
InInformationRevolution,JimDavis,GloriaJ.Miller,
andAllanRusselldiscussthesevenrealitiesthat
d All R
ll di
th
liti th t
jeopardizebusinesssurvival.
Eachrealityilluminatestheneedfornewbusiness
modelsaswellasstylesofleadership.

19

BusinessReality1:Businesscyclesareshrinking
Intodayswebenabledeconomy,speedwithinallpartsofthebusinessmodelis
the great differentiator
thegreatdifferentiator.
Toaccommodatechangingmarketsandconsumerpreferences,product
developmentandtestingthatusedtotakeyearshasbeenshrunktomonthsor
evenweeks.Today,thefirsttomarketoftenenjoysthecompetitiveedge.
Thisshortenedcyclechallengesmanagerstomakedecisionswithlesstimefor
considerationoranalysis.Asaresult,theymustdependonacombinationof
accurate,actionableinformationandintuition.

Andtheirdecisionmustbeinalignmentwiththeoverallstrategyofthecompany.

20

10

BusinessReality2:Youcanonlysqueezesomuchjuice
outofanorange
Thegoalofimprovingoperationalefficiencydrovea
majorityoftheinvestmentinthelastdecade.
j it f th i
t
t i th l t d d
Initiallythereturnswerehighandprovidedacompetitive
advantage.
However,nowthatenterpriseresourceplanning(ERP)
software is available the field has been leveled
softwareisavailable,thefieldhasbeenleveled.
Thenextstepisgreaterinnovationandagility.
21

BusinessReality3:TheRuleshavechanged;Thereis
nomoreBusinessasusual
Thedaysoffollowingatypicalpathtobusinesssuccessareover.
Thesamefactorsapply:profitability,customersatisfaction,stakeholder
value,andcompetition.
However,thepathtosuccessisverydifferentandisfraughtwithnew
challenges:
Mergersandacquisitionshavehinderedagilityandcohesiveness.
Productivityadvancementshaveincreasedexpectationsfrombothcustomersand
management.
AdvancementsinIThaveoverwhelmedtheabilitiesofsomecompaniestomanageand
leveragetheknowledge.
Thetechnologiesthatwereintroducedasthekeytosuccessoftenfailedbecausethehuman
issueswereoverlooked.
22

11

BusinessReality4:Theonlyconstantispermanent
volatility

Thisisacommonthemebutbearsrepeating:Thecompanythat
ismostagileandadaptablewillgainandmaintainacompetitive
advantage.
Insteadofjustrelyingonpastresultstopredictthefuture,
companies need to tap into current trends through social
companiesneedtotapintocurrenttrendsthroughsocial
networking,Webanalysis,andemployeefeedback.

23

BusinessReality5:Globalizationhelpsandhurts

Globalizationpresentsmanyadvantages,especiallytosmall
Globalization
presents many advantages, especially to small
companiesseekingaworldwidepresence.

AnycompanythatisconnectedtotheWebcanstrategically
partner,outsource,orinsource withrelativeease.
Thedownsideisincreasedcomplexitywhendealingwith
internationallanguages,standards,andcultures.Strong
communicationskillsareessentialfornavigatingthisterrain.
24

12

BusinessReality6:Thepenaltiesofnotknowingare
harsherthanever
Intheneweraofbillion dollarcorporatescandals,personal
accountabilityatthehighestlevelsisnotonlyprudent,itisnow
t bilit t th hi h t l l i
t l
d t it i
legallymandated.
Eg:TheSarbanes OxleyActintheUSwasdesignedtosystematize
ethicalbehavior.
Indianequivalent??
Inadditiontotheneedforstrong,honestleadership,information
systemstohandlethiscomplexbusinessdataareessential.

25

BusinessReality7:Informationisnotabyproductof
business;itisthelifebloodofbusiness
Theseventhbusinessrealityisadirectresultofthefirstsix.Duetoshrinking
business cycles level playing fields changing rules volatility globalization and
businesscycles,levelplayingfields,changingrules,volatility,globalization,and
thecostofignorance,informationhasbecomethelifebloodofmanybusinesses.
Today,accurate,accessible,actionableinformationisnecessarytocompetein
theglobaleconomy.
Therearestrongpressurestoachievemoreresultswhilespendinglesstimeand
money.
money
Companiesneeduptotheminuteinformationabouttheircustomers,suppliers,
competitors,andmarkets.

26

13

BusinessIntelligencehasthe
Business
Intelligence has the
ultimategoalofgettingtheright
informationtotherightpeople
attherighttimethroughthe
rightchannel
i ht h
l (Rud,2009)

27

ThebusinessvalueofBI
ExperienceworkingwithandtalkingtobusinessandIT
j
p
y
gg
leadersatmajorcompaniesinavarietyofindustriessuggests
thatthesecompaniesarestilldatarichbutinformationpoor.
Inotherwords,theseenterpriseslackthekindofactionable
informationandanalyticaltoolsneededtoimproveprofits
andperformance(Williams&Williams,2007)
Businessintelligence(BI)isaresponsetothisneed.

28

14

Examplesofcompaniesthatdemonstratethe
truepotentialofBI..1
WesternDigital,amanufacturerofcomputerharddiskdrives
withannualsalesofmorethan$3billion,usesBItobetter
ih
l l
f
h $3 billi
BI b
manageitsinventory,supplychains,productlifecycles,and
customerrelationships.BIenabledthecompanytoreduce
operatingcostsby50%.

CapitalOne,aglobalfinancialservicesfirmwithmorethan50
Capital
One, a global financial services firm with more than 50
millioncustomeraccounts,usesBItoanalyzeandimprovethe
profitabilityofitsproductlinesaswellastheeffectivenessof
itsbusinessprocessesandmarketingprograms.

29

Examplesofcompaniesthatdemonstratethe
truepotentialofBI..2
ContinentalAirlines,aU.S.airlinecompanythatwasnear
bankruptcy in the 1990s, invested $30 million in BI to improve
bankruptcyinthe1990s,invested$30millioninBItoimprove
itsbusinessprocessesandcustomerservice.Inthefollowing
sixyears,Continentalreapedastaggering$500millionreturn
onitsBIinvestmentforareturnoninvestment(ROI)ofmore
than1,000%.
CompUSA, amajorretailerofcomputerequipmentand
a major retailer of computer equipment and
software,usesBItoanalyzeitssalestrends.Thecompany
earnedanROIofmorethan$6millioninthefirstphaseofthe
project.

30

15

Theenterprisesthatarecapableoftransformingdataintoinformationand
knowledgecanusethemtomakequickerandmoreeffectivedecisionsandthus
toachieveacompetitiveadvantage

'Businessintelligence'includesmathematicalmodelsandanalysismethodologies
thatsystematicallyexploittheavailabledatatoretrieveinformationand
knowledgeusefulinsupportingcomplexdecisionmakingprocesses.

Abusinessintelligenceenvironmentoffersdecisionmakersinformationand
knowledgederivedfromdataprocessing,throughtheapplicationof
mathematical models and algorithms
mathematicalmodelsandalgorithms.

Insomeinstances,thesemaymerelyconsistofthecalculationoftotalsand
percentages,whilemorefullydevelopedanalysesmakeuseofadvancedmodels
foroptimization,inductivelearningandprediction
31

Theadventof lowcostdatastoragetechnologiesandthewide
availabilityofInternetconnections havemadeiteasierfor
individualsandorganizationstoaccesslargeamountsofdata.
Suchdataareoftenheterogeneous inorigin,contentand
representation
p
Advancesintechnologyoverthelasttwodecadeshave
enabledcompaniestoobtain,organize,analyze,store,and
retrievehugeamountsofdata.

Someegs:commercial,financialandadministrative
transactions,webnavigationpaths,emails,textsand
hypertexts,theresultsofclinicaltestsetc.

32

16

Decisionmakinginorganizations
Incomplexorganizations,publicorprivate,decisionsare
madeonacontinualbasis.
Decisions

Moreorlesscritical

Havelongtermor
shorttermeffects

Mayinvolvepeople
androlesatvarious
hierarchicallevels

Theabilityoftheseknowledgeworkerstomakedecisions,
bothasindividualsandasacommunity,isoneoftheprimary
factorsthatinfluencetheperformanceandcompetitive
strengthofagivenorganization.
33

Examplesofhighly
complexdecisionmakingprocessesinrapidly
changingconditions

Source:Vercellis,2009

34

17

Retentioninthemobilephoneindustry
Themarketingmanagerofamobilephonecompanyrealizesthatalarge
numberofcustomersarediscontinuingtheirservice,leavinghercompany
infavorofsomecompetingprovider.Ascanbeimagined,lowcustomer
loyalty,alsoknownascustomerattritionorchurn,isacriticalfactorfor
many companies operating in service industries Suppose that the
manycompaniesoperatinginserviceindustries.Supposethatthe
marketingmanagercanrelyonabudgetadequatetopursueacustomer
retentioncampaignaimedat2000individualsoutofatotalcustomerbase
of2millionpeople.Hence,thequestionnaturallyarisesofhowsheshould
goaboutchoosingthosecustomerstobecontactedsoastooptimizethe
effectivenessofthecampaign.Inotherwords,howcantheprobability
thateachsinglecustomerwilldiscontinuetheservicebeestimatedsoas
totargetthebestgroupofcustomersandthusreducechurningand
maximizecustomerretention?Byknowingtheseprobabilities,thetarget
y
g
p
g
groupcanbechosenasthe2000peoplehavingthehighestchurn
likelihoodamongthecustomersofhighbusinessvalue.Withoutthe
supportofadvancedmathematicalmodelsanddataminingtechniques,it
wouldbearduoustoderiveareliableestimateofthechurnprobability
andtodeterminethebestrecipientsofaspecificmarketingcampaign.
35

Logisticsplanning
Thelogisticsmanagerofamanufacturingcompanywishesto
developamediumtermlogisticproductionplan.Thisisadecision
makingprocessofhighcomplexitywhichincludes,amongother
choices,theallocationofthedemandoriginatingfromdifferent
h i
h ll
i
f h d
d i i i f
diff
marketareastotheproductionsites,theprocurementofraw
materialsandpurchasedpartsfromsuppliers,theproduction
planningoftheplantsandthedistributionofendproductsto
marketareas.Inatypicalmanufacturingcompanythiscouldwell
entailtensoffacilities,hundredsofsuppliers,andthousandsof
finishedgoodsandcomponents,overatimespanofoneyear
dividedintoweeks.Themagnitudeandcomplexityoftheproblem
suggestthatadvancedoptimizationmodelsarerequiredtodevise
thebestlogisticplan.Optimizationmodelsallowhighlycomplex
andlargescaleproblemstobetackledsuccessfullywithina
businessintelligenceframework.
36

18

Cycleofabusinessintelligenceanalysis..1
Eachbusinessintelligenceanalysisfollowsitsownpath
according to the application domain, the personal attitude of
accordingtotheapplicationdomain,thepersonalattitudeof
thedecisionmakersandtheavailableanalytical
methodologies.
However,itispossibletoidentifyanidealcyclicalpath
characterizingtheevolutionofatypicalbusinessintelligence
analysiseventhoughdifferencesstillexistbaseduponthe
p
peculiarityofeachspecificcontext.
y
p

37

Cycleofabusinessintelligenceanalysis..2
Analysis

Evaluation

Insight

Decision
38

19

Analysis
Duringtheanalysisphase,itisnecessarytorecognizeandaccuratelyspellouttheproblem
athand.

Decisionmakersmustthencreateamentalrepresentationofthephenomenonbeing
analyzed,byidentifyingthecriticalfactorsthatareperceivedasthemostrelevant.

Theavailabilityofbusinessintelligencemethodologiesmayhelpalreadyinthisstage,by
permittingdecisionmakerstorapidlydevelopvariouspathsofinvestigation.
Forinstance,theexplorationofdatacubesinamultidimensionalanalysisallowsdecision
i
h
l
i
fd
b i
l idi
i
l
l i ll
d ii
makerstomodifytheirhypothesesflexiblyandrapidly,untiltheyreachaninterpretation
schemethattheydeemsatisfactory.
Thus,thefirstphaseinthebusinessintelligencecycleleadsdecisionmakerstoaskseveral
questionsandtoobtainquickresponsesinaninteractiveway.
39

Insight
Thesecondphaseallowsdecisionmakerstobetterandmoredeeplyunderstandtheproblemat
hand, often at a causal level.
Forinstance,iftheanalysiscarriedoutinthefirstphaseshowsthatalargenumberofcustomers
arediscontinuinganinsurancepolicyuponyearlyexpiration,inthesecondphaseitwillbe
necessarytoidentifytheprofileandcharacteristicssharedbysuchcustomers.
Theinformationobtainedthroughtheanalysisphaseisthentransformedintoknowledgeduring
theinsightphase.
Ontheonehand,theextractionofknowledgemayoccurduetotheintuitionofthedecision
makersandthereforebebasedontheirexperienceandpossiblyonunstructuredinformation
availabletothem.
Ontheotherhand,inductivelearningmodelsmayalsoproveveryusefulduringthisstageof
analysis,particularlywhenappliedtostructureddata.
40

20

Decision
Duringthethirdphase,knowledgeobtainedasaresultoftheinsightphase
During the third phase knowledge obtained as a result of the insight phase
isconvertedintodecisionsandsubsequentlyintoactions.

Theavailabilityofbusinessintelligencemethodologiesallowstheanalysis
andinsightphasestobeexecutedmorerapidlysothatmoreeffectiveand
timelydecisionscanbemadethatbettersuitthestrategicprioritiesofa
givenorganization.
Thisleadstoanoverallreductionintheexecutiontimeoftheanalysis
decisionactionrevisioncycle,andthustoadecisionmakingprocessof
betterquality.

41

Evaluation

Thefourthphaseofthebusinessintelligencecycleinvolves
performancemeasurementandevaluation.

Extensivemetricsshouldthenbedevisedthatarenot
exclusivelylimitedtothefinancialaspectsbutalsotakeinto
l i l li it d t th fi
i l
t b t l t k i t
accountthemajorperformanceindicatorsdefinedforthe
differentcompanydepartments.

42

21

BIinaction
TheultimateaimofBIisimprovedbusinessperformance.
BIusesinformationandanalyseswithinthecontextofbusiness
y
processestoenabledecisionsandactionsthatultimatelyleadto
improvedbusinessperformance.
LetusconsiderhowthehotelandcasinooperatorHarrahs
EntertainmentusesBItoimproverevenueandprofitthrough
customerrelationshipmanagement.

43

Harrahsentertainment
HarrahsrunsnotonlyitsflagshiphotelandcasinoinLas
Vegas,Nevada,butmorethantwodozencasinosinadozen
otherstates.
ItsBIinvestmentenabledHarrahstoenjoy16consecutive
quartersofrevenuegrowth.In2002,itearneda$235million
profitonmorethan$4billioninrevenue(Loveman,2003).
ThatwasastartlingimprovementfromHarrahssolidbutnot
spectacularperformanceonlyafewyearsearlier.
Harrah
HarrahssinvestedinBItohelpitwinandconsolidatethe
invested in BI to help it win and consolidate the
loyaltyofitsbestcustomers.ItsfirsteffortwastheTotal
Goldprogram,whichwasmodeledonairlinefrequentflyer
programs.
44

22

However,TotalGoldwastoosimilartothecustomerloyalty
programsofferedbyothercasinostogiveHarrahsakiller
edge,butitdidprovetobearichresourceofdatafor
q
p
,
HarrahssubsequentBIefforts.Inparticular,theTotalGold
datawarehouseprovidedvaluablebusinessinformation
aboutHarrahs customers.
TotalGoldcardholderswerespendingonly36%oftheir
gamingdollarsinHarrahscasinos.Harrahswantedthat
percentage to increase.
percentagetoincrease.
TwentysixpercentofHarrahscasinocustomersgenerated
82%ofitsrevenues.
45

ThosehighvaluecustomerswerenotthepeopleHarrahs
expected.Insteadofhighrollerswearingcowboyboots
steppingoutoflimousines,thecustomerswhobroughtinthe
mostrevenueweredentists,schoolteachers,officeworkers,
,
,
,
andthelike.
Theydidntspendhugeamountsofmoneyinanyonevisit,
butweekin,weekout,monthaftermonththeystoppedat
Harrahsafterwork,intheevenings,oronweekendstorelax
in the casino or have a meal.
inthecasinoorhaveameal.

46

23

Thatbusinessinformation,combinedwithbusinessanalysis,
enabledHarrahsbothtoknowwhoitsmostvaluable
customerswereandtoofferthempersonalizedservice.
HarrahssevolvedTotalGoldintothe
Harrah
evolved Total Gold into the Total
TotalRewards
Rewardsprogram,
program
whichdivideditsgamingcustomersintothreelevelsof
service(gold,platinum,anddiamond)basedontheirlong
termrevenuevaluetothecompany.
Inadditiontoidentifyingitsmostvaluablecustomers,
H
HarrahsalsousedBItoanalyzewhatthosecustomerswanted
h l
d BI t
l
h t th
t
t d
andwhatmeasuresmightwintheirloyalty.

47

Diamondlevelcardholderswouldseldomifeverhavetowait
inlineforanything,whethertocheckintothehotel,gettheir
carsparked,orbeseatedinoneofHarrahsrestaurants.If
theycalledtoreservearoom,theymightqualifyforspecial
low rates based on predictions from BI about their probable
lowratesbasedonpredictionsfromBIabouttheirprobable
valueascasinocustomers.
Platinumlevelcardholdersreceivedaslightlylowerlevelof
service,whilegoldlevelcardholderswereessentiallyflying
coach.
Harrahssucceededinstructuringitsservicestomotivate
customerstotrytoqualifyforhigherlevelTotalRewards
cards.
48

24

BIfromthedatawarehouseevenprovidedinsightabouthow
Harrahsshouldarrangethefloorplansinitscasinosandhow
tomakeslotmachineslookmoreattractive.
Realtimeanalyticsenabledonthespotpersonalizedservice
forvaluedcustomers,suchasaninstantgrantof$100credit
toaloyalcustomerwhodhitalosingstreak.
Allthesefactorshelpedmotivatecustomerstocometo
y
p
g
g
Harrahsandstaytheretospendtheirgamingdollars.
AndthisprogramwouldnothavebeenpossiblewithoutBI
techniquesappliedtodatawarehouseinformation.
49

Thecombinationofbusinessinformationandbusiness
analysisisusedbyHarrahsandmanyothersuccessful
organizationstomakemorestructuredandrepeatable
business decisions about the features and targeted recipients
businessdecisionsaboutthefeaturesandtargetedrecipients
ofdirectmarketingoffers.
Becausemotivatingandretainingitsmostvaluablecasino
customersisakeydriverofprofits,Harrahshasrefinedits
customerrelationshipmanagementprocess,acorebusiness
process.

50

25

Theprocessexplicitlyembedstheuseoftheabovedescribed
businessinformationandbusinessanalysessothatbusiness
decisionsaboutwhomtotargetwithwhatmeasuresarefact
based,analyticallyrigorous,andrepeatable.
Thesedecisionsareimplementedthroughactionsfrom
Harrahsfrontdoortoitscasinos,restaurants,rooms,and
telephoneservices.
p
p
ThoseactionshaveimprovedHarrahsbusinessperformance,
resultinginincreasedprofit.

51

http://isites.harvard.edu/fs/docs/icb.topic1227012.files/Harra
h%20Entertainment.pdf

52

26

Formsofdigitaldata
Unstructured

Datawhichdoesnotconformtoadatamodelorisnotinaformthatcan
easilybeusedbyacomputerprogram
About8090%ofthedataofanorganisationisinthisform
Examples:Memos,chatsrooms,ppts,images,videos,letters,research
reports,whitepapers,bodyofanemailetc.

Semistructured

Datawhichdoesnotconformtoadatamodelbuthassomestructure
Howeveritisnotinaformthatcaneasilybeusedbyacomputerprogram
Examples:emails,XML,markup languageslikeHTML,etc.
Metadataforthisdataisavailable,butnotsufficient

Structured

Datawhichisinanorganisedform(eg:inrowsandcolumns)andcan
easilybeusedbyacomputerprogram
Datastoredindatabasesareexamples

53

EExampleofdatasources
l fd t
Thinkaboutahospital
Whatarethedifferentkindsofdatageneratedandstored?

54

27

Ponderthefollowingaspects

Whereiseachtypeofdatapresent?
How is it stored?
Howisitstored?
Howisthedesiredinformationextractedfromit?
Howimportantistheinformationprovidedbyit?
Howcanthisinformationaugmenttheservicesprovided?

55

Asnapshotofstructureddata
GoodLifeHealthcare
G
dLif H lth
PatientIndexCard
PatientID

<>

Date

<>

PatientName

<>

Patientage

<>

BodyTemperature

<>

Nurse Name
NurseName

<>

Bloodpressure

<>

56

28

Characteristicsofstructureddata
Conformstoa
datamodel

Dataisstoredin
theformof
rowsand
columns

Similarentities
aregrouped

Structured
data
Data resides in
Dataresidesin
fixedfields
withinarecord
orfile

Attributesina
grouparethe
same

Definition,
formatand
meaningofdata
isexplicitly
known
57

Structureddata
Summary
C
Consistsoffullydescribed
i t f f ll d
ib d
datasets
Hasclearlydefinedcategories
andsubcategories.
Isplacedneatlyinrowsand
columns.
Goesintotherecordsand
hencethedatabaseis
regulatedby awelldefined
structure.
Canbeindexedeasilyeitherby
theDBMSitselformanually

Sources
Databases(Eg.Access)
Spreadsheets
SQL
OLTPsystems

58

29

Easewithstructureddata
Storage

Update
and
delete

Easewith
structured
data

Scalability

Security

59

Easeofworkingwithstructureddata
Retrieving
iinformation
f
ti

BI
operations

Easeof
workingwith
structured
data

Indexing
and
searching

Miningdata

60

30

Unstructureddata Ascenario

Dr.Ben,Dr.StanleyandDr.MarkworkatthemedicalfacilityofGoodLife.Overthe
pastfewdays,Dr.BenandDr.Stanleyhadbeenexchanginglongemailsabouta
particularcaseofgastrointestinalproblem.Dr.Stanleyhaschancedupona
p
particularcombinationofdrugsthathassuccessfullycuredgastrointestinal
g
y
g
disordersinhispatients.Hehaswrittenanemailaboutthiscombinationofdrugs
toDr.Ben.
Dr.MarkhasapatientintheGoodLifeemergencyunitwithquiteasimilarcaseof
gastrointestinaldisorderwhosecureDr.Stanleyhaschancedupon.Dr.Markhas
alreadytriedregulardrugsbutwithnoluck.Theinformationhewantsistucked
awayintheemailconversationbetweentwootherGoodLifedoctors,Dr.Benand
Dr.Stanley.Dr.Markwouldhaveaccessedthesolutionwithafewmouseclickshad
thestorageandanalysisofunstructureddatabeenundertakenbyGoodLife.
As in the case at GoodLife
Asinthecaseat
GoodLife ,80to85%ofdatainanyorganisationisunstructured
80 to 85% of data in any organisation is unstructured
andisgrowingatanalarmingrate.Anenormousamountofknowledgeisburiedin
thisdata.Intheabovescenario,Dr.StanleysemailtoDr.Benhadnotbeen
successfullyupdatedintothemedicalsystemdatabaseasitfellintheunstructured
format.

61

Characteristicsofunstructureddata
Doesnot
conformtoany
datamodel

Datacannotbe
storedinthe
formofrows
andcolumnsas
inadatabase

Hasnoeasily
identifiable
structure

Unstructured
data

Notinany
Not
in any
particular
formator
sequence

Doesnotfollow
anyrulesor
semantics

Noteasily
usablebya
program
62

31

Sourcesofunstructureddata

Webpages
Memos
Videos (MPEG etc )
Videos(MPEG,etc.)
Images(JPEG,GIFetc.)
Bodyofanemail
Worddocuments
PowerPointpresentations
Chats
Reports
Whitepapers
S
Surveys

63

Broadly2categoriesofunstructureddata
1. Bitmapobjects
2. Textualobjects
Alotofunstructureddataisalsonoisytext.Eg.Chats,emails,
SMStexts.
Thelanguageofnoisytextdifferssignificantlyfromthe
standardformoflanguage

64

32

Howtomanageunstructureddata
Indexing

Tags/Metadata

Classification/Taxonomyy

ContentAccessible
(Addressable)Storage
(CAS)

Helpsinsearchingandretrieval
Onthebasisofsomevalueinthedata,indexisdefinedwhichisnothingbutan
identifierandrepresentsthelargerecordinthedataset.

Usingmetadata,datainadocumentetc.,canbetagged
Thisenablessearchandretrieval
Thisisdifficultwithunstructureddata

Taxonomyisclassifyingdataonthebasisoftherelationshipsthatexistbetween
data
Datacanbearrangedingroupsandplacedinhierarchiesbasedonthetaxonomy
prevalentinanorganisation
l
i
i i
Thisisdifficultwithunstructureddata

Itstoresdatabasedontheirmetadata
Assignsauniquenametoeveryobjectstoredinit
Theobjectisretrievedbasedonitscontentandnotitslocation
Usedextensivelytostoreemailsetc.

http://www.emc.com/collateral/hardware/datasheet/c931emccenteracasds.pdf

65

Challengesofstoringunstructureddata
Storage
space

Indexingand
searching

Scalability

Challenges
ofstoring
unstructured
data
Updateand
delete

Retrieving
information

Security

66

33

Solutionstostoragechallengesof
unstructureddata
Change
f
t
formats

New
hardware

CAS

Possible
solutions

Storingin
XMLformat

RDBMS/BL
OBS

67

BLOB
Abinarylargeobject(BLOB)isadatatypethatcanstorebinaryobjectsor
data.
Binarylargeobjectsareusedindatabasestostorebinarydatasuchas
Binary large objects are used in databases to store binary data such as
images,multimediafilesandexecutablesoftwarecode.
Binarylargeobjectsareprimarilyusedinalldatabasesoftware.Generally,
databasesoftwareclassifiesbinarylargeobjectsintotwotypes:semi
structureddataandunstructureddata.XMLfilesarecategorizedassemi
structureddata,whereasimagesandmultimediadataareunstructured
datatypes.BothoftheseBLOBsaregenerallynotinterpretablebythe
database.

68

34

Apossiblesolutionforunstructureddata
UIMA UnstructuredInformationManagementArchitecture
IsanopensourceplatformfromIBMwhichintegratesdifferent
Is an open source platform from IBM which integrates different
kindsofanalysisenginestoprovideacompletesolutionfor
knowledgediscoveryfromunstructureddata.
http://www01.ibm.com/software/ecm/content
analytics/uima.html

69

UIMA
Analysis

Acquiredfrom
q
varioussources

Delivery
Queryand
presentation

Structured
information
access

Subjectedto
semantic
ti
analysis

Trransformed
into

Unstructureddatasuchas
chat,email,imagesetc.

Structured
information

Users
70

35

Semistructured data Ascenario


Dr.MarianneofGoodLife Healthcareusuallygetsablood
test done for migraine patients visiting her It is her
testdoneformigrainepatientsvisitingher.Itisher
observationthatpatientsdiagnosedwithmigraineusually
haveahighplateletcount.Shemakesanoteofthisinthe
diagnosisandconclusionsectioninthebloodtestreportif
patients. One day, another Good ife doctor, r. rian,
patients.Oneday,anotherGoodLifedoctor,Dr.Brian,
searchesthedatabasewhenheisunabletofindthecauseof
migraineinoneofhispatients,butwithnoluck.Theanswer
heislookingforisnestledinthevasthoardsofdata.
71

Characteristicsofsemistructured data
Doesnot
conformtoa
datamodel,but
containstags
andelements
(metadata)
Datacannotbe
storedinthe
formofrows
andcolumnsas
inadatabase

Similarentities
aregrouped

Semistructured
data

Thetagsand
The
tags and
elements
describethe
datastored

Attributesina
groupmaynot
bethesame

Notsufficient
metadata

72

36

Forexample,twoaddressesmayormaynotcontainthesamenumberof
properties/attributes:
Address1
<housenumber><streetname><areaname><city>
Address2
<housenumber><streetname><city>
Ontheotherhand,anemailfollowsastandardformatsuchas:
To: <Name>
From:
<Name>
Subject<Text>
CC: <Name>
Body:
<Text,Graphics,Imagesetc.>

Thoughtheaboveemailtagsgiveussomemetadata,thebodyoftheemailcontainsno
format.Neitherdoesitconveythemeaningofthedataitcontains.

73

Letusseewithanexamplehowentitiescanhavedifferentsetof
attributes.Theattributesmayalsohavedifferentdatatypesand
mostcertainlycanhavedifferentsizes.Foreg,namesandemailIDs
ofdifferentpeoplecanbestoredinmorethanonewayasshown
below.
below
Name:SebastianThomas
Email:sebthom@ymail.com,sebastiant@dcs.xyz.uk

Firstname:Mark
Second Name: Antony
SecondName:Antony
Email:MarkAntony@tmail.com

Name:JuliusCaesar
Email:julius@rmail.com
74

37

Dr.Mariannescase:Thebloodreportissemistructured

ABCHealthcare
BloodTestReport
Date
<>
Department
<>
Attendingdoctor <>
PatientName
<>
PatientAge
<>
Hemoglobincontent <>
RBCcount
<>
WBCcount
<>
Platelet count
Plateletcount
<>
Diagnosis<notes>
Conclusion<notes>
75

Howtomanagesemistructureddata

Schemas

Canbeusedtodescribethestructureofthedata
Schemasdefinetheconstraintsonthestructure,contentofthe
documentetc.

Graphbased
datamodels

Canbeusedtodescribedata
Thisisaschemalessapproach alsoknownasselfdescribing
asdataispresentedinsuchawaythatitexplainsitself
Treelikestructure verticescontainobjectorentityandleaves
containdata

XML

Widelyusedtostoreandexchangesemistructureddata
Itallowstheusertodefinetagstostoredatainhierarchicalor
nestedforms
SchemasinXMLarenottightlycoupledtodata
76

38

Challengesofstoringsemistructureddata
Storagecost

Distinction
between
schemaand
data

RDBMS

Challenges
ofstoring
unstructured
data
Irregularand
l
d
partial
structure

Evolving
schemas

Implicit
structure
77

Solutionstostoragechallengesofsemi
structureddata
XML

Datacanbestoredand
exchangedintheform
ofgraphwhereentities
arerepresentedas
objectswhichare
vertices in the graph
verticesinthegraph

OEM

XMLallowstodefinetagsandattributedto
storedata.Datacanbestoredina
hierarchical/nested structure
hierarchical/nestedstructure

Possible
solutions

RDBMS

Semistructureddatacan
bestoredinarelational
databasebymappingthe
datatoarelational
schemawhichisthen
mappedtoatable.

Special
purpose
DBMS

Databaseswhichare
specificallydefinedto
storesemistructureddata

78

39

OEM ObjectExchangeModel
OEM ObjectExchangeModelstructuresdataintheformofgraphs
AnOEMdatagraphisarooted,labeled,directedgraph.Itsedgelabelsmaptostrings.Only
itsleafnodeshavelabelswhichmaptodatavalues.Thereisnoorderingofedgesleavinga
node

79

XML

XMLisanopensourcemarkuplanguagewritteninplaintext.
Itisindependentofhardwareandsoftware
It is designed to store and transport data o er the internet
Itisdesignedtostoreandtransportdataovertheinternet
Itallowsdatatobestoredinhierarchicalornestedfashion
Theusercandefinetagstostoredata.XMLhasnopredefinedtags.
Italsoenablesseparationofcontent(eXtensible MarkupLanguage)from
presentation(eXtensible Stylesheet Language)
XMLisemergingasthesolutionforsemistructureddatamanagement

80

40

ExamplesofXML

81

Differencebetweensemistructuredand
structureddata
Semistructured

Structured

Name

emailID

FirstName LastName emailID

AlternateemailID

AjayThakur

AjayT@yahoo.com,a
thakur@ymail.com

Ajay

Thakur

AjayT@yahoo.com

athakur@ymail.com

NeerjaGulati

ngulati@nmail.com

Neerja

Gulati

ngulati@nmail.com

firstname:Sanjay
lastname:Burman sanjayb@kmail.com

Sanjay

Burman

sanjayb@kmail.com

82

41

You might also like