Professional Documents
Culture Documents
Lecture 04 DMS
Lecture 04 DMS
Datawarehouse Design
Lecture 4: Data Warehouse & Dimensional Modeling
2
11/4/2023
Lecture Agenda
• Data warehouse
• Dimensional Modeling
• Star schema
• Four steps to build a dimension model
• Dims table & Fact table
4
11/4/2023
Allows business users to quickly access critical data from some sources all in
one place.
Provides consistent information on various cross-functional activities and
supports ad-hoc reporting and query.
Helps to integrate many sources of data to reduce stress on the production
system.
Restructures and integrates make it easier for the user to use for reporting
and analysis.
Stores a large amount of historical data. This helps users to analyze different
time periods and trends to make future predictions.
6
11/4/2023
Data Lake
8
11/4/2023
HT0
Meta
data
Raw data
Summary
data
Operational system Sales
Flat files
Inventory
10
Slide 9
HT0 [@Quỳnh Nguyễn Ngọc Diễm] giúp anh vẽ lại sơ đồ này nhé.
Hoàng Tô Mạnh, 2023-11-03T02:23:42.320
11/4/2023
11
HT0
1 Complex Different tables and Simple Dimension table has a direct relationship with
relationships the fact table
2 Flexible Rigid
3 Normalization common Repetition allowed
4 OLTP Data updated frequently OLAP Minimum number of joins, which is provided in
multi-dimensional by a single join to a fact table
5 Table fields store actual data Dimensions and measures store actual data
6 Fundamental business tasks Planning, problem solving, decision making
12
Slide 12
HT0 [@Quỳnh Nguyễn Ngọc Diễm] giúp anh vẽ lại bảng này nhé.
Em thử mấy công cụ convert text to table cho nhanh.
Hoàng Tô Mạnh, 2023-11-03T02:20:15.274
11/4/2023
HT0
13
14
Slide 13
HT0 [@Quỳnh Nguyễn Ngọc Diễm] giúp anh vẽ lại diagram này
nhé.
Hoàng Tô Mạnh, 2023-11-03T02:32:50.727
11/4/2023
15
Dimensional Modeling
16
11/4/2023
Star schema
17
Diagram tables in Star schema model are divided into two types:
Dimension tables: describe business entities
Fact tables: store observation or events
Comment:
Dimension tables contain a relatively small number of rows
Fact tables can contain a very large number of rows and continue to grow over
time
18
11/4/2023
Example
19
Terms to Remember
20
11/4/2023
Dimensional Modeling
21
Identify the
Identify the Facts
Dimensions
22
11/4/2023
The first step in the design is to decide what business process to model by
combining an understanding of the business requirements with an understanding
of the available source data.
Business processes are the operational activities performed by your
organization.
Business process events generate or capture performance metrics that
translate into facts in a fact table.
Most fact tables focus on the results of a single business process.
23
Grain
Declaring the grain means specifying exactly what an individual face table row
represents. It provides the answer to the question, ‘How do you describe a single
row in the fact table ?’.
For example:
One row per bank account each month.
One row per scan of an individual product on a customer’s sales transaction.
-> Declaring the grain is a critical step that can’t be taken lightly.
24
11/4/2023
Dimensions provide the “who, what, where, when, why, and how” context
surrounding a business process event.
A dimension should be single valued when associated with a given fact row.
Dimension tables are the “SOUL” of the data model
25
Facts are the measurements that result from a business process event and are
almost always numeric
A single fact table row has a one-to-one relationship to a measurement event
as described by the fact table’s grain
26
11/4/2023
Use Case
27
28
11/4/2023
After the business process has been identified, the design team faces a serious
decision about the granularity. What level of data detail should be made available
in the dimensional model ?
In our case study, the most granular data is an individual product on a POS
transaction.
29
After the grain of the fact table has been chosen, the choice of dimensions is
straightforward.
You can ask whether other dimensions can be attributed to the POS
measurements, such as the date of the sale, the store where the sale occurred,
the promotion under which the product is sold, the cashier who handled the
sale, and potentially the method of payment.
30
11/4/2023
The fourth and final step in the design is to make a careful determination of
which facts will appear in the fact table.
31
More details
32
11/4/2023
Dimensional Modeling
33
34
11/4/2023
35
Dimensional Modeling
36
11/4/2023
37
Snowflake dimensions
38
11/4/2023
39
40
11/4/2023
Role-Playing Dimensions
41
Role-Playing Dimensions
42
11/4/2023
Database Design
© Foreign Trade University. All rights reserved. © Foreign Trade University. All rights reserved.
43
THANK YOU !
44
11/4/2023
Surrogate Keys
45
46
11/4/2023
• Type 1 SCD: always reflects the latest values, and when changes in source
data are detected, the dimension table data is simply overwritten.
• Type 2 SCD: always versions the changes of values, by using effective date,
end date, and valid fields
47
48
11/4/2023
Junk Dimensions
49