Professional Documents
Culture Documents
Data Warehousing - C03 - DM
Data Warehousing - C03 - DM
Data Warehousing - C03 - DM
Lecture-3
Dimensional Modeling (DM)
1
Dimensional Modeling (DM)
2
What is DM?
A simpler logical model optimized for decision
support.
Inherently dimensional in nature, with a single
central fact table and a set of smaller
dimensional tables.
3
What is DM?
4
Dimensions have Hierarchies
Items
Books Cloths
Engg Medical
5
The two Schemas
Star
Snow-flake
6
“Simplified” 3NF (Retail)
CITY DISTRICT M DIVISION PROVINCE
1 district BACK
1 1
zone M division
M DISTRICT DIVISION
ZONE CITY
1
store M week
1
STORE # STREET ZONE ... DATE WEEK
1 M
sale_header quarter
M M
RECEIPT # STORE # DATE ... MONTH ✞TR
1 1
M M
1
WEEK MONTH
M sale_detail month 1
RECEIPT # ITEM # ... $
YEAR ✞TR
1 M M
1 year
ITEM # CATEGORY
ITEM # SUPPLIER
item_x_cat M
1 item_x_splir
CATEGORY DEPT
cat_x_dept 7
Vastly Simplified Star Schema
Product Dim
Geography Dim
1 ITEM#
STORE# 1
Fact Table CATEGORY
ZONE
RECEIPT#
DEPT
CITY
STORE#
M SUPPLIER
DISTRICT
ITEM# M
DIVISION
DATE Time Dim
M
PROVINCE . DATE
. 1
facts . WEEK
QUARTER
YEAR
8
Features of Star Schema
10
The Process of Dimensional Modeling
Four Step Method from ER to DM
11
Step-1: Choose the Business Process
• A business process is a major operational
process in an organization.
• Typically supported by a legacy system
(database) or an OLTP.
– Examples: Orders, Invoices, Inventory etc.
12
Step-1: Separating the Process
Star-1
Snow-flake
Star-2
13
Step-2: Choosing the Grain
• Grain is the fundamental, atomic level of data to be
represented.
• Typical grains
– Individual Transactions
– Daily aggregates (snapshots)
– Monthly aggregates
15
The case FOR data aggregation
17
The case AGAINST data aggregation
• Aggregation can hide crucial facts.
–The average of 100 & 100 is same as 150 & 50
18
Aggregation hides crucial facts Example
19
Aggregation hides crucial facts chart
250
Z1 Z2 Z3 Z4
200
150
100
50
0
Week-1 Week-2 Week-3 Week-4
21
Step 3: Choose Facts
22
Step 4: Choose Dimensions
23
Step-4: How to Identify a Dimension?
• The single valued attributes during recording of a
transaction are dimensions.
Fact Table
Calendar_Date
Time_of_Day
Dim Account _No
ATM_Location
Transaction_Type
Transaction_Rs
25
Step-4: Can Dimensions be Multi-valued?
26
Step-4: Dimensions & Grain
🞤 Several grains are possible as per business
requirement.
28
Additive vs. Non-Additive facts
• Additive facts are easy to Month Crates of
Bottles Sold
work with
May 14
– Summing the fact value
gives meaningful results Jun. 20
– Additive facts: Jul. 24
• Quantity sold TOTAL 58
• Total Rs. sales
Month % discount
– Non-additive facts:
• Averages (average sales price, unit May 10
price) Jun. 8
• Percentages (% discount)
• Ratios (gross margin) Jul. 6
• Count of distinct products sold TOTAL 24% ← Incorrect!
29
Classification of Aggregation Functions
• How hard to compute aggregate from sub-
aggregates?
– Three classes of aggregates:
• Distributive
– Compute aggregate directly from sub-aggregates
– Examples: MIN, MAX ,COUNT, SUM
• Algebraic
– Compute aggregate from constant-sized summary of subgroup
– Examples: STDDEV, AVERAGE
– For AVERAGE, summary data for each group is SUM, COUNT
• Holistic
– Require unbounded amount of information about each subgroup
– Examples: MEDIAN, COUNT DISTINCT
– Usually impractical for a data warehouses!
30
Not recording Facts
32
Fact-less Fact Tables - Example
Examples:
– Department/Student mapping fact table
34
OLTP & Slowly Changing Dimensions
38
Handling Slowly Changing Dimensions
39
Handling Slowly Changing Dimensions
• Option-1: Overwrite History
– Example: Code for a city, product entered incorrectly
41
Handling Slowly Changing Dimensions
42
Pros and Cons of Handling
• Option-1: Overwrite existing value
+ Simple to implement
- No tracking of history