Download as pdf or txt
Download as pdf or txt
You are on page 1of 28

11/4/2023

Data Science for Economics and Business

DATABASE MANAGEMENT SYSTEM

© Foreign Trade University. All rights reserved.

Datawarehouse Design
Lecture 4: Data Warehouse & Dimensional Modeling

© Foreign Trade University. All rights reserved.

2
11/4/2023

Lecture Agenda

• Data warehouse
• Dimensional Modeling
• Star schema
• Four steps to build a dimension model
• Dims table & Fact table

© Foreign Trade University. All rights reserved.

Section 1: Data Warehouse

© Foreign Trade University. All rights reserved.

4
11/4/2023

What is Data Warehouse ?

 A Data Warehousing (DW) is process for


collecting and managing data from varied
sources to provide meaningful business
insights.
 A Data warehouse is typically used to connect
and analyze business data from
heterogeneous sources.
 The data warehouse is the core of the BI
system which is built for data analysis and
reporting

© Foreign Trade University. All rights reserved. 5

Advantages of Data Warehouse

 Allows business users to quickly access critical data from some sources all in
one place.
 Provides consistent information on various cross-functional activities and
supports ad-hoc reporting and query.
 Helps to integrate many sources of data to reduce stress on the production
system.
 Restructures and integrates make it easier for the user to use for reporting
and analysis.
 Stores a large amount of historical data. This helps users to analyze different
time periods and trends to make future predictions.

© Foreign Trade University. All rights reserved. 6

6
11/4/2023

Data Warehouse vs Data Lake

© Foreign Trade University. All rights reserved. 7

Data Lake

Data Lake Data Warehouse

• Scalable, distributed file storage • Relational schema modeling


• Flexible schema-on-read semantics • SQL-based querying
• Big data technology compatibility • Proven basis for reporting and analytics

© Foreign Trade University. All rights reserved. 8

8
11/4/2023

HT0

Data Warehouse Layer


Data Sources Staging area Warehouse Data Marts Users

Operational system Purchasing

Meta
data
Raw data
Summary
data
Operational system Sales

Flat files

Inventory

© Foreign Trade University. All rights reserved. 9

General Data Platform Architecture

© Foreign Trade University. All rights reserved. 10

10
Slide 9

HT0 [@Quỳnh Nguyễn Ngọc Diễm] giúp anh vẽ lại sơ đồ này nhé.
Hoàng Tô Mạnh, 2023-11-03T02:23:42.320
11/4/2023

Section 2: Dimensional Modeling

© Foreign Trade University. All rights reserved.

11

HT0

Relational vs. Multi-dimensional Design

Relational Database Multi-Dimensional Cube

1 Complex Different tables and Simple Dimension table has a direct relationship with
relationships the fact table

2 Flexible Rigid
3 Normalization common Repetition allowed
4 OLTP Data updated frequently OLAP Minimum number of joins, which is provided in
multi-dimensional by a single join to a fact table

5 Table fields store actual data Dimensions and measures store actual data
6 Fundamental business tasks Planning, problem solving, decision making

© Foreign Trade University. All rights reserved. 12

12
Slide 12

HT0 [@Quỳnh Nguyễn Ngọc Diễm] giúp anh vẽ lại bảng này nhé.
Em thử mấy công cụ convert text to table cho nhanh.
Hoàng Tô Mạnh, 2023-11-03T02:20:15.274
11/4/2023

HT0

What is Dimensional Modeling ?

The process of thinking and


designing the data model, including
tables and their relationships.
Output: A diagram in which tables
are connected to each other.

The process of designing and


building data model.

© Foreign Trade University. All rights reserved. 13

13

Why Dimensional Modeling ?

• Deliver data that’s


understandable for business
users
• Deliver fast query
performance

© Foreign Trade University. All rights reserved. 14

14
Slide 13

HT0 [@Quỳnh Nguyễn Ngọc Diễm] giúp anh vẽ lại diagram này
nhé.
Hoàng Tô Mạnh, 2023-11-03T02:32:50.727
11/4/2023

Convert the Story to a Data Model


List your big questions:
1. What is my Total Sales for a Selected Year and Region?
2. How is my Total Sales doing Year Over Year?
3. How are my Units trending for various States in my region?
4. How is my Sales doing by Channel, Device, Category for selected Year?
5. Which categories are performing best to worst by Total Sales ?

• Units These are Measures


What are you measuring? • Total Sales which live in Fact tables
• Gross Profit

• By Time (Year, Month) These are Attributes that


How are you describing or slicing? • By Geography (Region, State or City) live in Dimension tables
• By Campaign (Channel or Device)

© 2021 Microsoft. All rights reserved.

15

Dimensional Modeling

Section 2.1 Star Schema

© Foreign Trade University. All rights reserved.

16
11/4/2023

Star schema

© Foreign Trade University. All rights reserved. 17

17

What is Star Schema Model

Diagram tables in Star schema model are divided into two types:
 Dimension tables: describe business entities
 Fact tables: store observation or events

Comment:
 Dimension tables contain a relatively small number of rows
 Fact tables can contain a very large number of rows and continue to grow over
time

© Foreign Trade University. All rights reserved. 18

18
11/4/2023

Example

© Foreign Trade University. All rights reserved. 19

19

Terms to Remember

Fact table  Grain


Dimension table
Relationship between fact and dimension table  Cardinality

→ Dimension tables support filtering and grouping


→ Fact tables support summarization

© Foreign Trade University. All rights reserved. 20

20
11/4/2023

Dimensional Modeling

Section 2.1 Design Process

© Foreign Trade University. All rights reserved.

21

Four step to build

Select the Declare the


Business Process Grain

Identify the
Identify the Facts
Dimensions

© Foreign Trade University. All rights reserved. 22

22
11/4/2023

Business Process in Dimensional Model

The first step in the design is to decide what business process to model by
combining an understanding of the business requirements with an understanding
of the available source data.
 Business processes are the operational activities performed by your
organization.
 Business process events generate or capture performance metrics that
translate into facts in a fact table.
 Most fact tables focus on the results of a single business process.

 Choosing the process is important because it defines a specific design target


and allows the grain, dimensions, and facts to be declared

© Foreign Trade University. All rights reserved. 23

23

Grain

Declaring the grain means specifying exactly what an individual face table row
represents. It provides the answer to the question, ‘How do you describe a single
row in the fact table ?’.

For example:
 One row per bank account each month.
 One row per scan of an individual product on a customer’s sales transaction.

-> Declaring the grain is a critical step that can’t be taken lightly.

© Foreign Trade University. All rights reserved. 24

24
11/4/2023

Dimensions for Descriptive Context

 Dimensions provide the “who, what, where, when, why, and how” context
surrounding a business process event.
 A dimension should be single valued when associated with a given fact row.
 Dimension tables are the “SOUL” of the data model

© Foreign Trade University. All rights reserved. 25

25

Facts for Measurements

 Facts are the measurements that result from a business process event and are
almost always numeric
 A single fact table row has a one-to-one relationship to a measurement event
as described by the fact table’s grain

© Foreign Trade University. All rights reserved. 26

26
11/4/2023

Use Case

Imagine you work in the headquarters


of a large grocery chain. The business
has 100 grocery stores spread across
five states. Each store has a full
complement of departments,
including grocery, frozen foods, dairy,
meat, produce, bakery. Each store has
approximately 60,000 individual
products, called stock keeping units
(SKUs), on its shelves

Sample cash register receipt


© Foreign Trade University. All rights reserved. 27

27

Step 1 : Select the Business Process

Management wants to better understand customer purchases as captured by the


POS system so which business process you should model ?

Point-of-sale (POS) system

© Foreign Trade University. All rights reserved. 28

28
11/4/2023

Step 2: Declare the Grain

After the business process has been identified, the design team faces a serious
decision about the granularity. What level of data detail should be made available
in the dimensional model ?

In our case study, the most granular data is an individual product on a POS
transaction.

© Foreign Trade University. All rights reserved. 29

29

Step 3: Identify the Dimensions

After the grain of the fact table has been chosen, the choice of dimensions is
straightforward.
You can ask whether other dimensions can be attributed to the POS
measurements, such as the date of the sale, the store where the sale occurred,
the promotion under which the product is sold, the cashier who handled the
sale, and potentially the method of payment.

© Foreign Trade University. All rights reserved. 30

30
11/4/2023

Step 4: Identify the Facts

The fourth and final step in the design is to make a careful determination of
which facts will appear in the fact table.

© Foreign Trade University. All rights reserved. 31

31

Step 4: Identify the Facts

More details

© Foreign Trade University. All rights reserved. 32

32
11/4/2023

Dimensional Modeling

Section 2.3 Basic Fact Table Techniques

© Foreign Trade University. All rights reserved.

33

Basic Fact Table Techniques

 Fact table structure


 Types of fact measures:
o Additive
o Semi-Additive,
o Non-Additive Facts
 Types of fact tables:
o Transaction fact table
o Periodic snapshot fact table
o Accumulating snapshot fact table
o Factless fact table

© Foreign Trade University. All rights reserved. 34

34
11/4/2023

Factless Fact Tables

 Does not include any measure column


 A table stores relationship between dimensions  Good practice to
define many-to-many dimensions relationship

© Foreign Trade University. All rights reserved. 35

35

Dimensional Modeling

Section 2.4 Basic Dim Table Techniques

© Foreign Trade University. All rights reserved.

36
11/4/2023

Basic Dimension Table Techniques

 Dimension table structure


 Surrogate key in dimension table
 Types of dimension:
o Snowflake dimensions
o Role-playing dimensions
o (optional, self-study) Slowly changing dimensions
o (optional, self-study) Junk dimensions

© Foreign Trade University. All rights reserved. 37

37

Snowflake dimensions

 A snowflake dimension is a set of normalized tables for a single business


entity

© Foreign Trade University. All rights reserved. 38

38
11/4/2023

Should we use Snowflake dimensions?

 Longer relationship filter propagation chains  Less efficient than in a


single table
 More tables in Fields pane  Less intuitive experience
 Impossible to create a hierarchy  Cannot use Drill down feature
 More tables need to be loaded Less efficient from storage and
performance perspective

© Foreign Trade University. All rights reserved. 39

39

De-normalize Snowflake dimensions!

© Foreign Trade University. All rights reserved. 40

40
11/4/2023

Role-Playing Dimensions

• A role-playing dimension is a dimension that can filter related facts differently

© Foreign Trade University. All rights reserved. 41

41

Role-Playing Dimensions

© Foreign Trade University. All rights reserved. 42

42
11/4/2023

Database Design

It’s time for your questions

© Foreign Trade University. All rights reserved. © Foreign Trade University. All rights reserved.

43

THANK YOU !

© Foreign Trade University. All rights reserved.

44
11/4/2023

Surrogate Keys

 A surrogate key is a unique identifier that you add to a table to support


star schema modeling. By definition, it's not defined or stored in the
source data.

© Foreign Trade University. All rights reserved. 45

45

Star Schema: Slowly changing dimensions

• A dimension table that is changed overtime


• Why dimension table changes? When business entity values change over
time, but “SLOWLY”, in an ad hoc manner
• Rapidly changing? Consider this dimension table as actual Fact table !

© Foreign Trade University. All rights reserved. 46

46
11/4/2023

Types of Slowly Changing Dimensions

• Type 1 SCD: always reflects the latest values, and when changes in source
data are detected, the dimension table data is simply overwritten.
• Type 2 SCD: always versions the changes of values, by using effective date,
end date, and valid fields

© Foreign Trade University. All rights reserved. 47

47

Types of Slowly Changing Dimensions

A good model should


support versioning and
allow query data both
current and the past from
Fact table

© Foreign Trade University. All rights reserved. 48

48
11/4/2023

Junk Dimensions

© Foreign Trade University. All rights reserved. 49

49

You might also like