Chapter BI4

Chapter four
Business Intelligence
Lecture out line
 Databases to Improve Business Performance and Decision Making
 Business Intelligence
 Data Warehouses
 Association Rule Mining
 Classification
 Clustering
 Others
Using Databases to Improve Business
Performance and Decision Making
• Databases provide information to help the company
run the business more efficiently
• help managers and employees make better
decisions
 Tools for analyzing, accessing vast quantities of
data:
• Data warehousing
• Multidimensional data analysis
• Data mining
Using Databases to Improve Business
Performance and Decision Making
 Businesses use their databases to keep track of
basic transactions, such as paying suppliers,
processing orders, serving customers, and paying
employees.
 If a company wants to know which product is the
most popular or who is its most profitable
customer, the answer lies in the data.
A Good Data Warehouse
is a pre-requisite for
Business Decision Making
WHY & WHAT
DATA WAREHOUSE
MOTIVATION
““We are drowning in information,
but starving for knowledge
John Naisbett
A producer wants to know….
Which
Whichare
areour
our
lowest/highest
lowest/highestmargin
margin
customers ?
customers ?
Who
Whoare
aremy
mycustomers
customers
and what products
What
Whatisisthe
themost
most and what products
are
arethey
theybuying?
buying?
effective distribution
effective distribution
channel?
channel?
What
Whatproduct
productprom-
prom- Which
Whichcustomers
customers
-otions have the biggest are
-otions have the biggest are mostlikely
most likelyto
togo
go
impact
impacton
onrevenue?
revenue? to
to the competition??
the competition
What
Whatimpact
impactwill
will
new products/services
new products/services
have
haveon
onrevenue
revenue
and
andmargins?
margins?
Data, Data……. Everywhere
I can’t find the data I need
yet ... datais scattered over the network
many versions, subtle differences
I can’t get the data I need

– need an expert to get the data
I can’t understand the data I found

– available data poorly documented
I can’t use the data I found

– results are unexpected
– data needs to be transformed from
one form to other
Data Mining
works with
Data Warehouse Data
Data Warehouse provides the Enterprise with
memory
Data Mining provides the Enterprise with intelligence

 Data Mining helps
 to extract and analyze
 such information
By to Motivation
Pla
What Is Data Mining? A Definition
Knowledge Discovery in
Databases
The non-trivial extraction of
implicit, previously
unknown and potentially
useful knowledge from
data in large data
repositories
Alternative names
• Knowledge Discovery
(mining) in Databases
(KDD)
• Knowledge extraction
• Data/pattern analysis
• Business Intelligence etc.,
Problem Behind…..
Heterogeneous Information Sources
“Heterogeneities are
everywhere”
Personal
Databases
World
Scientific Databases Medical data Wide
They have Web
Different interfaces
Different data representations
Duplicate and inconsistent
information
Problem
Data Management in Large Enterprises
 Application driven development of
operational systems resulted in vertical
fragmentation of informational systems .
Sales Planning Suppliers
Stock Mngmt Debt Mngmt Inventory Mngmt
Sales Administration Finance Manufacturing ...

Ultimate Goal
Unified Access to Data
Integration System
World
Wide
Personal
Web
Digital Libraries Scientific Databases Databases
· Collects and combines information

· Provides integrated view, uniform user interface
· Supports sharing
Best Solution
The Warehousing Approach
• An approach
Clients
where the
Information is
integrated in Data
advance & Warehouse
stored in a
warehouse for
Integration System Metadata
direct querying
and analysis
Extractor/ Extractor/ Extractor/

Monitor Monitor Monitor
Source Source Source

Data Warehousing -- a process
It is a technique for assembling and

managing data from various
sources for the purpose of
answering business questions,
thus making decisions that were
not previous possible
Process of constructing (and using) a

data warehouse
Data warehouse contd..
What is a data warehouse?

Huge database system that stores and manages data
required to analyze historical and current transactions
Quick and efficient Often uses a process called

way to access large data mining to find patterns
amounts of data and relationships among data
Uses multidimensional
databases
Components of a Data Warehouse
Why a Data Warehouse?
The Warehousing Approach

Clients
i Information
integrated in
Data
advance Warehouse
i Stored in ware house

for direct querying Integration System Metadata
and analysis
...
Extractor/ Extractor/ Extractor/
Monitor Monitor Monitor
...
Source Source Source
Business intelligence and, data mining
 Once data have been captured and organized in
data warehouses ,they are available for further
analysis.
 A series of tools enables users to analyze these data
to see new patterns, relationships, and insights that
are useful for guiding decision making
BI cont’d
Definition
According to (Adelman et.al, 2002), BI is a term that
encompasses a broad range of analytical software and solutions for
gathering, consolidating, analyzing and providing access to information in a
way that is supposed to let an enterprise's users make better business
decisions
Stackowiak et al. (2007) define Business intelligence as the process
of taking large amounts of data, analyzing that data, and presenting a high-
level set of reports that condense the essence of that data into the basis of
business actions, enabling management to make fundamental daily business
decisions.
Business intelligence as a “business management term used to
describe applications and technologies which are used to gather, provide
access to analyze data and information about an enterprise, in order to help
them make better informed business decisions.”
Cont’d
 These tools for consolidating, analyzing, and
providing access to vast amounts of data to help
users make better business decisions are often
referred to business intelligence (BI).
 business intelligence provides firms with the
capability to amass information; develop
knowledge about customers, competitors, and
internal operations; and change decision-making
behavior to achieve higher profitability and other
business goals
BI cont’d
 how business intelligence works

BI cont’d
Traditional BI systems consist of a back-end database, a front-end user interface,
software that processes the information to produce the business intelligence itself,
and a reporting system.
The capabilities of BI include decision support, online analytical processing,
statistical analysis, forecasting, and data mining.
Table 3,1 Current BI Techniques
TECHNIQUE DESCRIPTION
Predictive modeling Predict value for a specific data item attribute
Association, correlation, causality analysis Identify relationships between attributes
Classification Determine to which class a data Classification Determine to which class a data
item belongs item belongs
Clustering and outlier analysis Partition a Clustering and outlier analysis Partition a
set into classes, whereby items with similar set into classes, whereby items with similar
characteristics are grouped together characteristics are grouped together
Making discovered knowledge easily

Model Visualization understood using charts, plots, histograms,
Application area of BI
Manufacturers, electronic commence businesses, Banking
Telecommunication providers, Airlines, retailers, health
systems, financial services, bioinformatics and hotels use
BI for customer support, market research, segmenting,
product profitability, inventory and distribution analysis,
statistical analysis, detecting fraud detection etc.
BI cont’d
Why BI?
 Customers are the most critical aspect to a company's success.
 It is very important that firms have information on their preferences.
Firms must quickly adapt to their changing demands.
 Business Intelligence enables firms to gather information on the trends
in the marketplace and come up with innovative products or services in
anticipation of customer's changing demands.
 With BI, firms can identify their most profitable customers and the
underlying reasons for those customers’ loyalty
 Analyze click-stream data to improve e-commerce strategies
 Determine what combinations of products and service lines customers
are likely to purchase and when
 Etc.
Data mining
 Data mining is more discovery driven.
 Data mining provides insights into corporate data that
cannot be obtained with OLAP by finding hidden
patterns and relationships in large databases and
inferring rules from them to predict future behavior
 The patterns and rules are used to guide decision
making and forecast the effect of those decisions
 The types of information obtainable from data
mining are listed in Table 4,1
BI cont’d
• Data Mining
• Finds hidden patterns and relationships in large databases
and infers rules from them to predict future behavior
• Types of information obtainable from data mining
• Associations: Occurrences linked to single event
• Classifications: Patterns describing a group an item belongs to
• Clusters: Discovering as yet unclassified groupings
• Forecasting: Uses series of values to forecast future values
Basic Concepts: Association Rules
Association rule is a rule which is
described in the form of XY with
interestingness measure of support TID ITEMSET
and confidence where 1 Computer, printer, scanner,
antivirus
X and Y are simple or complex
2 Computer, printer, scanner
statements
A Simple Statement is to mean a 3 Computer, antivirus
statement formed from a single 4 antivirus,scanner
attribute say age, buy or sex and a
value which is related by relational
operator
Buy(X, “Computer”) Buy(X, “Printer”)[Sup = 50%, conf=66.67%]
Which is to mean a person X who buy a computer also buy a
printer . 50% shows that a person buy a computer and printer
among the entire data set (support). Out of the tuples that buy a
computer, 66.67% of them also certainly buy printer (confidence )
27
Market Basket Analysis…
 Analyzes customer buying habits by finding associations between
different items that customers place in their “Shopping Baskets”
 The discovery of the interesting correlations can help retailers develop
marketing strategies by gaining insight into .”which items are frequently
purchased together by the customers”.
 This information leads to increased sales by helping retailers to do
selective marketing and plan their shelf place.
Basket Data: Retail organizations, e.g., supermarkets, collect and store
massive amounts sales data, called basket data. A record consist of

 transaction date
 items bought
Or, basket data may consist of items bought by a customer over a
period.
BI continued
 Association Rules”
 Market Baskets
 Frequent Itemsets
 A-priori Algorithm
 The Market-Basket Model
 A large set of items, e.g., things sold in a supermarket.
 A large set of baskets, each of which is a small set of the
items, e.g., the things one customer buys on one day.
BI continued
Example
Items={milk, coke, pepsi, beer, juice}.
Support = 3 .
B1 = {m, c, b} B2 = {m, p, j}
B3 = {m, b} B4 = {c, j}
B5 = {m, p, b} B6 = {m, c, b, j}
B7 = {c, b, j} B8 = {b, c}
Frequent itemsets: {m}, {c}, {b}, {j},
{m, b}, {c, b}, {j, c}.
Why Association Rule Mining
 Do you buy a printer when you buy a Computer while
visiting ABC company ?
 Do you often use Google drive, when you use Gmail?
 Do you order tea when you order bread ?
 Given a database of transactions, where each transaction is

a list of items purchased by a customer in a visit
 Find all rules that correlate the presence of one set of items
(item set) with that of another set of items
Why Association Rule Mining
 Support
 Simplest question: find sets of items that appear “frequently”
in the baskets.
 Support for itemset I = the number of baskets containing all
items in I.
 Given a support threshold s, sets of items that appear in > s
baskets are called frequent itemsets.
Association mining from frequent Pattern
 The rule A  B holds in the transaction set D with support s,
where s is the percentage of transactions in D that contain A B
(i.e., the union of itemsets A and B, or say, both A and B).
 This is taken to be the probability, P(A  B) =
 Support shows the probability that all the predicates in A and

B fulfill together.

Association mining from frequent Pattern
 The rule A  B has confidence c in the transaction set D,
where c is the percentage of transactions in D containing A
that also contain B.
 This is taken to be the conditional probability, P(B|A)=
 Confidence measure how often predicates B fulfilled if

predicate A get fulfilled.
 Ie.,
support(A B) = P(A  B)
confidence(A B) = P(B|A)
Association Rule- Basic Concepts
 Association Rule form :
 Antecedent  Consequent [support, confidence]
 Examples:
 buys(x, “ computer”) ¨buys(x, “ financial Mgt. software”)
[0.5%, 60%]
 age(x, “30..39”) ^ income(x, “50000”) buys(x, “ car”) [1%,

75%]
 buys(x, “shoe) ® buys(x, “sock”) [60%, 80%]
 major(x, “MBA”) ^ takes(x, “Managerial Economics”) ®

grade(x, “A”) [50%, 75%]
Example
B1 = {m, c, b} B2 = {m, p, j}
B3 = {m, b} B4 = {c, j}
B5 = {m, p, b} B6 = {m, c, b, j}
B7 = {c, b, j} B8 = {b, c}
An association rule: {m, b} c. _x0002_ Confidence = 2/4
= 50%.
 Support count ()
 Frequency of occurrence of an
itemset
 E.g. ({I1, I2,I3}) = 2
TID Items
 Support 1 I1 I3 I2
 Fraction of transactions that contain 2 I1, I2, I3, I4
an itemset 3 I5, I2, I3, I6
 E.g. s(({I1, I2,I3}) ) = 2/5 4 I1, I5, I2, I3
 Frequent Itemset 5 I1, I5, I2, I6 I3
 An itemset whose support is greater
than or equal to a minsup threshold
frequent (or large) itemset is an itemset whose number of occurrences is

above a threshold s. A notation L is used to indicate large or frequent
itemset, and I is used to indicate a specific target itemset.
Classification
 Classification recognizes patterns that describe the group
to which an item belongs by examining existing items
that have been classified and by inferring a set of rules.
 For example, businesses such as credit card or
telephone companies worry about the loss of steady
customers.
 Classification helps discover the characteristics of
customers who are likely to leave and can provide a
model to help managers predict who those customers
are so that the managers can devise special campaigns
to retain such customers.
Classification Task
 Classification Task —A Two-Step Process:

 Model Construction
 Model Usage
 Model construction: describing a set of predetermined
classes
 Each tuple/sample is assumed to belong to a predefined
class, as determined by the class label attribute.
 The set of tuples used for model construction is training
set
 The model is represented as classification rules, decision
trees, or mathematical formulae
Classification Task
Model Construction:
Classification
Algorithms
Training
Data
NAME RANK YEARS TENURED Classifier

Damail Assistant Prof 5 no (Model)
Yordanos
Assistant Prof 9 yes
ZuriashProfessor 12 yes
Moha Associate Prof 8 yes IF rank = ‘professor’
Dawit Assistant Prof 7 no OR years > 7
Aman Associate Prof 8 yes
THEN tenured = ‘yes’
PART 1: General Introduction- Classification Task
Model Usage in Prediction:
Classifier
Testing
Data Unseen Data
(Merga, Professor,8)
NAME RANK YEARS TENURED
Tenured?
Kedir Assistant Prof 4 no
Abebe Associate Prof 8 yes
Kebede Professor 9 yes
Alima Assistant Prof 9 yes
Bayesian Theorem: for prediction
 Let X be a data sample (“evidence”): class label is unknown

 Let H be a hypothesis that X belongs to class C
 Classification is to determine P(H|X), known as posterior probability, the
probability that the hypothesis holds given the observed data sample X.
 P(H) (prior or apriori probability), the initial probability of H.
 E.g., X will buy computer, regardless of age, income, …
 P(X): probability that sample data is observed. It’s priori probability of X, called
marginal probability.
 P(X|H) is the known as the likelihood function is the posteriori probability of
X, i.e., the probability of observing the sample X, given that the hypothesis
holds.
Given training data X, posterior probability of a hypothesis H, P(H|X), follows the
Bayes theorem
P( H | X)  P(X | H ) P( H )
P(X)
Naïve Bayesian Classifier: Training Dataset
age income studentcredit_rating
buys_computer
ClGiven: <=30 high no fair no
C1 Buys Comp= ‘yes’
<=30 high no excellent no
C2:buys_comp = ‘no’
A data sample X: 31…40 high no fair yes
X = (age <=30, >40 medium no fair yes
Income = medium, >40 low yes fair yes
Student = yes >40 low yes excellent no
Credit_rating = Fair) 31…40 low yes excellent yes
Task: <=30 medium no fair no
Classify X using Bayesian
<=30 low yes fair yes
classifier
>40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
Naïve Bayesian Classifier: An Example
We need to maximize P(X|Ci)P(Ci), for i=1,2.
P(Ci): P(buys_Comp= “yes”) = 9/14 = 0.643

P(buys_Comp= “no”) = 5/14= 0.357
Compute P(X|Ci) for each class:
P(age = “<=30” | buys_Comp= “yes”) = 2/9 = 0.222

P(age = “<= 30” | buys_Comp= “no”) = 3/5 = 0.6
P(income = “medium” | buys_Comp= “yes”) = 4/9 = 0.444

P(income = “medium” | buys_Comp= “no”) = 2/5 = 0.4
P(student = “yes” | buys_Comp= “yes) = 6/9 = 0.667

P(student = “yes” | buys_Comp= “no”) = 1/5 = 0.2
P(credit_rating = “fair” | buys_Comp= “yes”) = 6/9 = 0.667

P(credit_rating = “fair” | buys_Comp= “no”) = 2/5 = 0.4
Naïve Bayesian Classifier: An Example(1)
 X = (age <= 30 , income = medium, student = yes,

credit_rating = fair)
• P(X|Ci) : P(X|buys_Compr= “yes”)
= 0.222 x 0.444 x 0.667 x 0.667
= 0.044
• P(X|buys_Comp= “no”)
= 0.6 x 0.4 x 0.2 x 0.4
= 0.019
• P(X|Ci)*P(Ci) : P(X|buys_Comp= “yes”) * P(buys_Comp= “yes”)

= 0.028
• P(X|buys_Car= “no”) * P(buys_Comp= “no”)

= 0.007
• Therefore, X belongs to class (“buys_Comp= yes”)
play tennis?
Naive Bayesian Classifier Example 2
Outlook Temperature Humidity W indy Class

sunny hot high false N
sunny hot high true N
overcast hot high false P
rain mild high false P
rain cool normal false P
rain cool normal true N
overcast cool normal true P
sunny mild high false N
sunny cool normal false P
rain mild normal false P
sunny mild normal true P
overcast mild high true P
overcast hot normal false P
rain mild high true N
Naive Bayesian Classifier Example
Outlook Temperature Humidity Windy Class
overcast hot high false P
rain mild high false P
rain cool normal false P
overcast cool normal true P
sunny cool normal false P
rain mild normal false P
sunny mild normal true P
overcast mild high true P
9
overcast hot normal false P
Outlook Temperature Humidity Windy Class

sunny hot high false N
sunny hot high true N
rain cool normal true N
sunny mild high false N
rain mild high true N 5
 Given the training set, we compute the probabilities:
Outlook P N Humidity P N
sunny 2/9 3/5 high 3/9 4/5
overcast 4/9 0 normal 6/9 1/5
rain 3/9 2/5
Tempreature Windy
hot 2/9 2/5 true 3/9 3/5
mild 4/9 2/5 false 6/9 2/5
cool 3/9 1/5
 We also have the probabilities
 P = 9/14
 N = 5/14
 To classify a new sample X:
 outlook = sunny
 temperature = cool
 humidity = high
 windy = false
 Prob(P|X) = Prob(P)*Prob(sunny|P)*Prob(cool|P)*
Prob(high|P)*Prob(false|P) = 9/14*2/9*3/9*3/9*6/9 =
0.01
 Prob(N|X) = Prob(N)*Prob(sunny|N)*Prob(cool|N)*
Prob(high|N)*Prob(false|N) = 5/14*3/5*1/5*4/5*2/5 =
0.013
 Therefore X takes class label N
What is Cluster Analysis?
Finding groups of objects such that the objects in a group will be similar (or
related) to one another and different from (or unrelated to) the objects in other
groups
Inter-
Intra- cluster
cluster distances
distances are
are maximized
minimized
Clustering
 Clustering works in a manner similar to classification when

no groups have yet been defined.
 A data mining tool can discover different groupings within
data, such as finding affinity groups for bank cards or
partitioning a database into groups of customers based on
demographics and types of personal investments
Exercise
 The following sample dataset
taken from ABC sparepart
shop database, consider 60%
and 80% for support count S ID Spare part Type
and cobfidence respectively. 1 Tyer , Innertube , seatbelts , Brake,
 1 find frequent item set in FuelLine , FuelFilter
each level
2 Tyer , Innertube , seatbelts , FuelLine ,
 2. generatee strong rule FuelFilter Fuel Tank
 3 upon your finding give
advice for sales man or 3 FuelLine , FuelFilter ,Fuel Tank
managers to imrove the
business 4 FuelLine , FuelFilter ,Fuel Tank
Foot Brake
 5 FuelLine , FuelFilter ,Airbags
What is Machine Learning?
Machine learning allows

computers to learn and
infer from data.
for successful BI: Align Business and IT for
the Long Haul

Chapter BI4

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter BI4

Uploaded by

Copyright:

Available Formats

Chapter four

I can’t get the data I need

I can’t understand the data I found

I can’t use the data I found

Data Mining provides the Enterprise with intelligence

Sales Administration Finance Manufacturing ...

· Collects and combines information

Extractor/ Extractor/ Extractor/

Source Source Source

It is a technique for assembling and

Process of constructing (and using) a

What is a data warehouse?

Quick and efficient Often uses a process called

The Warehousing Approach

i Stored in ware house

 how business intelligence works

Predictive modeling Predict value for a specific data item attribute

Association, correlation, causality analysis Identify relationships between attributes

Making discovered knowledge easily

Basket Data: Retail organizations, e.g., supermarkets, collect and store

massive amounts sales data, called basket data. A record consist of

 Do you often use Google drive, when you use Gmail?

 Do you order tea when you order bread ?

 Given a database of transactions, where each transaction is

 This is taken to be the probability, P(A  B) =

 Support shows the probability that all the predicates in A and

 Confidence measure how often predicates B fulfilled if

 age(x, “30..39”) ^ income(x, “50000”) buys(x, “ car”) [1%,

 buys(x, “shoe) ® buys(x, “sock”) [60%, 80%]

 major(x, “MBA”) ^ takes(x, “Managerial Economics”) ®

frequent (or large) itemset is an itemset whose number of occurrences is

 Classification Task —A Two-Step Process:

NAME RANK YEARS TENURED Classifier

Model Usage in Prediction:

 Let X be a data sample (“evidence”): class label is unknown

P(Ci): P(buys_Comp= “yes”) = 9/14 = 0.643

Compute P(X|Ci) for each class:

P(age = “<=30” | buys_Comp= “yes”) = 2/9 = 0.222

P(income = “medium” | buys_Comp= “yes”) = 4/9 = 0.444

P(student = “yes” | buys_Comp= “yes) = 6/9 = 0.667

P(credit_rating = “fair” | buys_Comp= “yes”) = 6/9 = 0.667

 X = (age <= 30 , income = medium, student = yes,

• P(X|Ci)*P(Ci) : P(X|buys_Comp= “yes”) * P(buys_Comp= “yes”)

• P(X|buys_Car= “no”) * P(buys_Comp= “no”)

Outlook Temperature Humidity W indy Class

Outlook Temperature Humidity Windy Class

 Given the training set, we compute the probabilities:

 Clustering works in a manner similar to classification when

Machine learning allows

You might also like

• P(X|Ci)P(Ci) : P(X|buys_Comp= “yes”) P(buys_Comp= “yes”)