Professional Documents
Culture Documents
Dimen Modeling
Dimen Modeling
Krzysztof Dembczyński
Intelligent Decision Support Systems Laboratory (IDSS)
Poznań University of Technology, Poland
1 / 48
Review of the Previous Lecture
2 / 48
Outline
1 Motivation
3 Dimensional Modeling
4 Summary
3 / 48
Outline
1 Motivation
3 Dimensional Modeling
4 Summary
4 / 48
Motivation
5 / 48
Motivation
5 / 48
Motivation
5 / 48
Motivation
5 / 48
Motivation
5 / 48
Motivation
5 / 48
Motivation
5 / 48
Motivation
5 / 48
Example
6 / 48
Motivation
7 / 48
Outline
1 Motivation
3 Dimensional Modeling
4 Summary
8 / 48
Conceptual schemes of data warehouses
9 / 48
Three basic conceptual schemes
• Star schema,
• Snowflake schema,
• Fact constellations.
10 / 48
Star schema
Grade
id (PK)
student (FK)
academic year (FK)
instructor (FK)
lecture (FK)
Grade
Student Lecture
id (PK) id (PK)
first name name
last name type
city
country
secondary school
11 / 48
Star schema
12 / 48
Star schema
12 / 48
Star schema
12 / 48
Star schema
12 / 48
Star schema
12 / 48
Star schema
• Dimension tables
12 / 48
Star schema
• Dimension tables
I Represent information about dimensions (student, academic year, etc.).
12 / 48
Star schema
• Dimension tables
I Represent information about dimensions (student, academic year, etc.).
I Each dimension has a set of descriptive attributes.
12 / 48
Fact table
13 / 48
Fact table
13 / 48
Fact table
13 / 48
Fact table
13 / 48
Dimension tables
14 / 48
Dimension tables
14 / 48
Dimension tables
14 / 48
Dimension tables
14 / 48
Dimension tables
14 / 48
Dimension tables
14 / 48
Fact table vs. Dimension tables
• Fact table:
15 / 48
Fact table vs. Dimension tables
• Fact table:
I narrow,
15 / 48
Fact table vs. Dimension tables
• Fact table:
I narrow,
I big (many rows),
15 / 48
Fact table vs. Dimension tables
• Fact table:
I narrow,
I big (many rows),
15 / 48
Fact table vs. Dimension tables
• Fact table:
I narrow,
I big (many rows),
15 / 48
Fact table vs. Dimension tables
• Fact table:
I narrow,
I big (many rows),
• Dimension table
15 / 48
Fact table vs. Dimension tables
• Fact table:
I narrow,
I big (many rows),
• Dimension table
I wide,
15 / 48
Fact table vs. Dimension tables
• Fact table:
I narrow,
I big (many rows),
• Dimension table
I wide,
15 / 48
Fact table vs. Dimension tables
• Fact table:
I narrow,
I big (many rows),
• Dimension table
I wide,
15 / 48
Fact table vs. Dimension tables
• Fact table:
I narrow,
I big (many rows),
• Dimension table
I wide,
15 / 48
Dimension hierarchies
Semester Course
Instructor Student
Institute City
Faculty Country
16 / 48
Snowflake schema
17 / 48
Normalization
18 / 48
Denormalization
19 / 48
Fact constellation schema
20 / 48
Fact constellation schema
20 / 48
Fact constellation schema
20 / 48
Outline
1 Motivation
3 Dimensional Modeling
4 Summary
21 / 48
Dimensional modeling
22 / 48
Dimensional modeling
22 / 48
Dimensional modeling
22 / 48
Dimensional modeling
22 / 48
Dimensional modeling
22 / 48
Business process
23 / 48
Business process
23 / 48
Business process
23 / 48
Business process
23 / 48
Grain of the business process
24 / 48
Grain of the business process
24 / 48
Grain of the business process
24 / 48
Grain of the business process
24 / 48
Dimensions describing the business process
25 / 48
Dimensions describing the business process
25 / 48
Dimensions describing the business process
25 / 48
Dimensions describing the business process
25 / 48
Numeric measures
26 / 48
Numeric measures
26 / 48
Numeric measures
26 / 48
Numeric measures
26 / 48
Numeric measures
26 / 48
Numeric measures
26 / 48
Dimensional modeling techniques
27 / 48
Derived measures vs. Calculated measures
28 / 48
Derived measures vs. Calculated measures
28 / 48
Derived measures vs. Calculated measures
28 / 48
Describing dimension
29 / 48
Describing dimension
29 / 48
Date dimension
30 / 48
Date dimension
30 / 48
Date dimension
30 / 48
Date dimension
30 / 48
Date dimension
30 / 48
Date dimension
31 / 48
Date dimension
32 / 48
Date dimension
32 / 48
Date dimension
32 / 48
Date dimension
32 / 48
Date dimension
32 / 48
Surrogate keys
33 / 48
Surrogate keys
33 / 48
Surrogate keys
33 / 48
Surrogate keys
33 / 48
Surrogate keys
33 / 48
Surrogate keys
33 / 48
Surrogate keys
33 / 48
Degenerate dimensions
34 / 48
Degenerate dimensions
34 / 48
Degenerate dimensions
34 / 48
Degenerate dimensions
34 / 48
Degenerate dimensions
34 / 48
Degenerate dimensions
34 / 48
Degenerate dimensions
34 / 48
Degenerate dimensions
35 / 48
Degenerate dimensions
35 / 48
Degenerate dimensions
35 / 48
Role-playing dimensions
36 / 48
Role-playing dimensions
36 / 48
Junk dimension
• Some columns may not fit to any of the dimensions, but creating a
separate dimension for each of them would increase significantly the
size of fact table (a foreign key would be required for each such
column).
37 / 48
Junk dimension
• Some columns may not fit to any of the dimensions, but creating a
separate dimension for each of them would increase significantly the
size of fact table (a foreign key would be required for each such
column).
I Example: consider such attributes as payment method (cash and
credit card) and packing type (box, paper, or none).
37 / 48
Junk dimension
• Some columns may not fit to any of the dimensions, but creating a
separate dimension for each of them would increase significantly the
size of fact table (a foreign key would be required for each such
column).
I Example: consider such attributes as payment method (cash and
credit card) and packing type (box, paper, or none).
• A possible solution is to create a single junk dimension combining
these columns together.
37 / 48
Junk dimension
• Some columns may not fit to any of the dimensions, but creating a
separate dimension for each of them would increase significantly the
size of fact table (a foreign key would be required for each such
column).
I Example: consider such attributes as payment method (cash and
credit card) and packing type (box, paper, or none).
• A possible solution is to create a single junk dimension combining
these columns together.
• This dimension does not need to be the Cartesian product of all the
attributes’ possible values, but should only contain the combination of
values that actually occur in the source data.
37 / 48
Slowly changing dimensions
38 / 48
Slowly changing dimensions
38 / 48
Slowly changing dimensions
38 / 48
Slowly changing dimensions
38 / 48
Slowly changing dimensions
38 / 48
Slowly changing dimensions
38 / 48
Slowly changing dimensions
38 / 48
Slowly changing dimensions
38 / 48
Slowly changing dimensions
39 / 48
Slowly changing dimensions
39 / 48
Slowly changing dimensions
39 / 48
Slowly changing dimensions
39 / 48
Slowly changing dimensions
• Examples
39 / 48
Slowly changing dimensions
• Examples
I Wrong name of the street (correct approach),
39 / 48
Slowly changing dimensions
• Examples
I Wrong name of the street (correct approach),
I Change of customer address:
39 / 48
Slowly changing dimensions
40 / 48
Slowly changing dimensions
40 / 48
Slowly changing dimensions
40 / 48
Slowly changing dimensions
40 / 48
Slowly changing dimensions
40 / 48
Slowly changing dimensions
• Examples:
40 / 48
Slowly changing dimensions
• Examples:
I Nowak moves to Warsaw:
40 / 48
Slowly changing dimensions
• Examples:
I Nowak moves to Warsaw:
40 / 48
Slowly changing dimensions
• Examples:
I Nowak moves to Warsaw:
40 / 48
Slowly changing dimensions
41 / 48
Slowly changing dimensions
41 / 48
Slowly changing dimensions
41 / 48
Slowly changing dimensions
• Examples:
41 / 48
Slowly changing dimensions
• Examples:
I Administrative reform in Poland:
41 / 48
Slowly changing dimensions
• Examples:
I Administrative reform in Poland:
41 / 48
Mini dimension
42 / 48
Mini dimension
42 / 48
Mini dimension
42 / 48
Mini dimension
42 / 48
Factless fact tables
• Fact tables take the advantage of sparsity (much less data to store, if
events are rare).
43 / 48
Factless fact tables
• Fact tables take the advantage of sparsity (much less data to store, if
events are rare).
• Fact tables do not contain rows for non-events: no rows for products
that did not sell.
43 / 48
Factless fact tables
• Fact tables take the advantage of sparsity (much less data to store, if
events are rare).
• Fact tables do not contain rows for non-events: no rows for products
that did not sell.
• Factless Fact Tables:
43 / 48
Factless fact tables
• Fact tables take the advantage of sparsity (much less data to store, if
events are rare).
• Fact tables do not contain rows for non-events: no rows for products
that did not sell.
• Factless Fact Tables:
I Fact table without numeric measure columns.
43 / 48
Factless fact tables
• Fact tables take the advantage of sparsity (much less data to store, if
events are rare).
• Fact tables do not contain rows for non-events: no rows for products
that did not sell.
• Factless Fact Tables:
I Fact table without numeric measure columns.
I Dummy measure is included that always has value 1.
43 / 48
Factless fact tables
• Fact tables take the advantage of sparsity (much less data to store, if
events are rare).
• Fact tables do not contain rows for non-events: no rows for products
that did not sell.
• Factless Fact Tables:
I Fact table without numeric measure columns.
I Dummy measure is included that always has value 1.
I Describe relationships between dimensions.
43 / 48
Factless fact tables
• Fact tables take the advantage of sparsity (much less data to store, if
events are rare).
• Fact tables do not contain rows for non-events: no rows for products
that did not sell.
• Factless Fact Tables:
I Fact table without numeric measure columns.
I Dummy measure is included that always has value 1.
I Describe relationships between dimensions.
days?
43 / 48
Data warehouse bus architecture
Common dimensions
Warehouse
Promotion
Contract
Product
Shipper
Vendor
Store
Date
Business process
Retail sales * * * *
Retail inventory * * *
Retail deliveries * * *
Warehouse inventory * * * *
Warehouse deliveries * * * *
Purchase orders * * * * * *
...
44 / 48
Some other problems
45 / 48
Outline
1 Motivation
3 Dimensional Modeling
4 Summary
46 / 48
Summary
47 / 48
Bibliography
• R. Kimball and M. Ross. The Data Warehouse Toolkit: The Definitive Guide to
Dimensional Modeling, 3rd Edition.
John Wiley & Sons, 2013
• http://www.kimballgroup.com/
48 / 48