DWH Spring 2011 Lecture Slides Week6&7

You might also like

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 18

Dr.

Abdul Basit Siddiqui


FUIEMS
(Lecture Slides Weeks # 6&7)
The need for ER modeling?
Problems with early COBOLian data processing
systems.

Data redundancies

From flat file to Table, each entity ultimately


becomes a Table in the physical schema.

Simple O(n2) Join to work with Tables

FUIEMS Data Warehousing - Spring2011


Why ER Modeling has been so successful?

Coupled with normalization drives out all the


redundancy out of the database.

Change (or add or delete) the data at just one


point.

Can be used with indexing for very fast access.

Resulted in success of OLTP systems.

FUIEMS Data Warehousing - Spring2011


Need for DM: Un-answered Qs
 Lets have a look at a typical ER data model first.

 Some Observations:
 All tables look-alike, as a consequence it is difficult to identify:

 Which table is more important ?

 Which is the largest?

 Which tables contain numerical measurements of the


business?

 Which table contain nearly static descriptive attributes?

FUIEMS Data Warehousing - Spring2011


Need for DM: Complexity of Representation
 Many topologies for the same ER diagram, all appearing
different.
 Very hard to visualize and remember.
12
7 6
3 12 7
11 4 8
8
9
1 10
10 9 11
6 1

 A large number 3
of possible 2 5
connections to 4any two (or
2 5
more) tables

FUIEMS Data Warehousing - Spring2011


Need for DM: The Paradox
 The Paradox: Trying to make information accessible using tables
resulted in an inability to query them!

 ER and Normalization result in large number of tables which are:


 Hard to understand by the users (DB programmers)

 Hard to navigate optimally by DBMS software

 Real value of ER is in using tables individually or in pairs

 Too complex for queries that span multiple tables with a large
number of records

FUIEMS Data Warehousing - Spring2011


ER vs. DM
ER DM
 Constituted to optimize  Constituted to optimize DSS
OLTP performance. query performance.
 Models the micro  Models the macro
relationships among data relationships among data
elements. elements with an overall
deterministic strategy.
A wild variability of the  All dimensions serve as equal
structure of ER models. entry points to the fact table.
 Changes in users' querying
 Very vulnerable to changes in habits can be accommodated
the user's querying habits, by automatic SQL generators.
because such schemas are
asymmetrical.

FUIEMS Data Warehousing - Spring2011


How to simplify a ER data model?

Two general methods:

 De-Normalization

 Dimensional Modeling (DM)

FUIEMS Data Warehousing - Spring2011


What is DM?

A simpler logical model optimized for decision


support.
Inherently dimensional in nature, with a single
central fact table and a set of smaller
dimensional tables.
Multi-part key for the fact table
Dimensional tables with a single-part PK.
Keys are usually system generated

FUIEMS Data Warehousing - Spring2011


What is DM?
Results in a star like structure, called star schema
or star join.

All relationships mandatory M-1.

Single path between any two levels.

Supports ROLAP operations.

FUIEMS Data Warehousing - Spring2011


Dimensions have Hierarchies

Items

Books Cloths

Fiction Text Men Women

Engg Medical

Analysts tend to look at the data through


dimension at a particular “level” in the
hierarchy

FUIEMS Data Warehousing - Spring2011


The two Schemas

Star
Snow-flake

FUIEMS Data Warehousing - Spring2011


“Simplified” 3NF (Retail)
CITY DISTRICT M DIVISION PROVINCE
1 district
1 1
zone M division
M DISTRICT DIVISION
ZONE CITY
1
store M week
1
STORE # STREET ZONE ... DATE WEEK
1 M
sale_header quarter
M M
RECEIPT # STORE # DATE ... MONTH QTR
1 1
M M
1
WEEK MONTH
M sale_detail month 1
RECEIPT # ITEM # ... $
YEAR QTR
1 M M
1 year
ITEM # CATEGORY
ITEM # SUPPLIER
item_x_cat M
1 item_x_splir
CATEGORY DEPT
cat_x_dept
FUIEMS Data Warehousing - Spring2011
Vastly Simplified Star Schema
Product Dim
Geography Dim
1 ITEM#
STORE# 1
Fact Table CATEGORY
ZONE
RECEIPT#
DEPT
CITY
STORE#
M SUPPLIER
DISTRICT
ITEM# M
DIVISION
DATE Time Dim
M
PROVINCE . DATE
. 1
facts . WEEK

Sale Rs. MONTH

QUARTER

YEAR
FUIEMS Data Warehousing - Spring2011
The Benefit of Simplicity

Beauty lies in close


correspondence with the
business, evident even to
business users.

FUIEMS Data Warehousing - Spring2011


Features of Star Schema
Dimensional hierarchies are collapsed into a single table
for each dimension. Loss of Information?

A single fact table created with a single header from the


detail records, resulting in:

A vastly simplified physical data model!

Fewer tables (thousands of tables in some ERP systems).

Fewer joins resulting in high performance.

Some requirement of additional space.

FUIEMS Data Warehousing - Spring2011


Quantifying space requirement
Quantifying use of additional space using star schema

There are about 10 million mobile phone users in Pakistan.


Say the top company has half of them = 500,000

Number of days in 1 year = 365


Number of calls recorded each day = 250,000 (assumed)
Maximum number of records in fact table = 91 billion rows
Assuming a relatively small header size = 128 bytes
Fact table storage used = 11 Tera bytes
Average length of city name = 8 characters  8 bytes
Total number of cities with telephone access = 170 (1 byte)
Space used for city name in fact table using Star = 8 x 0.091 = 0.728 TB
Space used for city code using snow-flake = 1x 0.091= 0.091 TB
Additional space used  0.637 Tera byte i.e. about 5.8%

FUIEMS Data Warehousing - Spring2011

You might also like