Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 18

Lecture-3

Introduction and Background

1
What is a Data Warehouse ?
It is a blend of many technologies, the basic
concept being:

 Take all data from different operational systems.

 If necessary, add relevant data from industry.

 Transform all data and bring into a uniform format.

 Integrate all data as a single entity.

2
What is a Data Warehouse ?
(Cont…)
It is a blend of many technologies, the basic
concept being:

Store data in a format supporting easy access for


decision support.

 Create performance enhancing indices.

B-Tree vs Bitmapped /cluster


 Implement performance enhancement joins.

Nested loop vs hash/sort-merge

 Run ad-hoc queries with low selectivity. 3


How is it Different?
 Fundamentally different
Business user
needs info

Answers result
User requests
in more questions
IT people

?
Business user
may get answers
 IT people do
system analysis
and design

IT people
send reports to IT people
business user create reports

4
How is it Different?
 Different patterns of hardware utilization

100%

0%

Operational DWH

Bus Service vs. Train

5
How is it Different?
 Combines operational and historical data.
 Don’t do data entry into a DWH, OLTP or ERP are the
source systems.
 OLTP systems don’t keep history, cant get balance
statement more than a year old.
 DWH keep historical data, even of bygone customers. Why?
 In the context of bank, want to know why the customer left?

 What were the events that led to his/her leaving? Why?


 Customer retention.

6
How much history?

 Depends on:
 Industry.
 Cost of storing historical data.

 Economic value of historical data.

7
How much history?
 Industries and history
 Telecomm calls are much much more as compared to bank
transactions- 18 months.

 Retailers interested in analyzing yearly seasonal patterns-


65 weeks.
 Insurance companies want to do actuary analysis, use the
historical data in order to predict risk- 7 years.

8
How much history?

Economic value of data


Vs.
Storage cost

Data Warehouse a
complete repository of data?

9
How is it Different?
 Usually (but not always) periodic or batch updates
rather than real-time.

 The boundary is blurring for active data warehousing.


 For an ATM, if update not in real-time, then lot of real
trouble.
 DWH is for strategic decision making based on historical
data. Wont hurt if transactions of last one hour/day are
absent.

10
How is it Different?

 Rate of update depends on:


 volume of data,
 nature of business,
 cost of keeping historical data,
 benefit of keeping historical data.

11
How is it Different?
 Starts with a 6x12 availability requirement ... but
7x24 usually becomes the goal.
 Decision makers typically don’t work 24 hrs a day and 7
days a week. An ATM system does.

 Once decision makers start using the DWH, and start


reaping the benefits, they start liking it…

 Start using the DWH more often, till want it available


100% of the time.

12
How is it Different?
 Starts with a 6x12 availability requirement ... but
7x24 usually becomes the goal.
 For business across the globe, 50% of the world may be
sleeping at any one time, but the businesses are up 100%
of the time.

 100% availability not a trivial task, need to take into


account loading strategies, refresh rates etc.

13
How is it Different?
 Does not follows the traditional development model

Requirements

 Program

Classical SDLC

 Requirements gathering
 Analysis
 Design
 Programming
 Testing
 Integration
 Implementation
14
How is it Different?
 Does not follows the traditional development model

DWH

Program

 Requirements
DWH SDLC (CLDS)

 Implement warehouse
 Integrate data
 Test for biasness
 Program w.r.t data
 Design DSS system
 Analyze results
 Understand requirement
15
Data Warehouse Vs. OLTP

OLTP (On Line Transaction Processing)


Select tx_date, balance from tx_table
Where account_ID = 23876;

16
Data Warehouse Vs. OLTP

DWH
Select balance, age, sal, gender from
customer_table, tx_table
Where age between (30 and 40) and
Education = ‘graduate’ and
CustID.customer_table =
Customer_ID.tx_table;

17
Data Warehouse Vs. OLTP

OLTP DWH
Primary key used Primary key NOT used
No concept of Primary Index Primary index used
Few rows returned Many rows returned

May use a single table Uses multiple tables


High selectivity of query Low selectivity of query
Indexing on primary key Indexing on primary index
(unique) (non-unique)

18

You might also like