Professional Documents
Culture Documents
Part 1-3 Dimensional Modeling
Part 1-3 Dimensional Modeling
Part 1-3 Dimensional Modeling
1.3 DIMENSIONAL
MODELING
PART 1.3
By the end of this part, you
should be able to:
Objectives ▪ Explain the main concept of
dimensional modeling.
▪ Define and describe the components of
dimensional model (data marts,
dimensions, fact …)
▪ Differentiate between star schema and
snowflake schema.
▪ Explain the concept of DW bus
architecture.
▪ Design a dimensional model.
PART 1.3
Dimensional modeling is a
technique used for logical design
Dimensional
of data warehouses
Modeling –
Overall ▪ Differs from ER modeling.
Perspective ▪ ER is conceptual, not logical
modeling.
▪ Some books have some
misunderstanding of ER modeling
o The industry sometimes refers
to 3NF as ER models.
PART 1.3 Fact tables
Contain measurements of the business -
numerical
Dimensional Contain value data gleaned from the
Modeling – source
Overall Contain foreign keys
Have one tuple per recorded fact
Perspective
Dimension tables
One of a set of companion tables to the
fact table
Hierarchical
Must be joined to the fact table to
produce results
Dimensional Time
Product
Modeling – Prod_ID
T_ID
Hour
Prod_name
Overall … Sales Day
Month
Perspective Region_ID
Prod_ID
T_ID
Ch_ID
Amount
…
Region
Region_ID FACT TABLE
Channel
Region_name Ch_ID
Subregion_name C_Name
city ….
…
Dimensional
TIME Q3
Modeling – Q2
Q1
Overall
Washington 130 70
Perspective REGION Northwest
Oregon 120 57
California 170 70
West
Arizona 110 77
Breakfast
foods produce
PRODUCT
PART 1.3
Sample Reports Based on Dimensional
Models
Dimensional
Modeling –
Overall
Perspective
Region Sales
Northwest $1,557M
West $1,771M
PART 1.3
Sample Reports Based on Dimensional
Models
Dimensional Region Quarter Sales
Modeling – Q1 $377M
Overall
Q2 $370M
Perspective Northwest
Q3 $425M
Q4 $385M
Q1 $427M
Q2 $437M
West
Q3 $473M
Q4 $434M
PART 1.3 Sample Reports Based on Dimensional Models
Region Quarter Product Line Sales
Modeling – Q2
Breakfast Foods $225M
Produce $145M
Overall Northwest
Breakfast Foods $275M
Q3
Perspective Produce $150M
Time
Region
Dimensions FACTS
Product
PART 1.3
Drilling-down; Rolling-up
▪ Kind of equivalent to SQL GROUP BY
Dimensional clauses, and also ORDER BY
Modeling – ▪ All columns in dimension tables can
Overall become report headers
Perspective ▪ Nice and easy if hierarchies are
compatible across groups
▪ Not so easy if they are not
▪ If hierarchies are incompatible, all of the
info needs to be included in the
dimension tables, and may come from
multiple logical OLTP tables
▪ Performance wise, this should not have
much of an impact
PART 1.3
Multiple hierarchies
Dimensional STORE
Modeling – Store_ID
S_Name
Overall
Perspective S_city
S_state
S_zip_code
S_country
S_sales region
S_sales zone
S_distribution center
S_distribution region
PART 1.3
What kind of analysis can be
performed here?
Dimensional
PRODUCT
Modeling – Prod_id
TIME
Sale_date
Overall Prod_name
Desc
….
Amount
Quantity
CUSTOMER Cust_id
Cust_id Prod_id
STORE
Store_id
Last_name Store_id Name
First_name
Address Sale_date City
State
…
Country
PART 1.3
PRODUCT
Prod_id TIME
Prod_name Sale_date
Desc
Dimensional Category
….
Modeling – SALES
analyze purchasing
Overall pattern by product Amount categorize
and groups of Quantity transactions
Perspective product by store,
Cust_id including
Prod_id Location info.
Store_id
CUSTOMER
analyze purchases Cust_id Sale_date
by customer, such as Last_name
STORE
purchasing frequency, Store_id
First_name
Name
purchases by customer Address
City
location. …
State
Country
Dimension table are used to guide selections of rows from the fact
table – how many X products were sold in Y timeframe by Z stores
to customers living in W cities?
PART 1.3 Multi-dimensional models take
advantage of inherent relationships in
Dimensional data to populate multidimensional
Modeling – matrices called data cubes
❑ Data cubes may be called hyper-cubes if
Overall they have >3 dimensions
Perspective ❑ These cubes may be queried directly
❑ Changing from 1D to another is pivoting
Multi-dimensional models readily lend
themselves to hierarchical views
❑Roll Up
❑ Goes higher up in the hierarchy
❑ May aggregate data
❑ Drill Down
❑ Goes finer (lower) in the hierarchy
PART 1.3 ➢ Various Multidimensional Schemas
❑ Star Schema
Dimensional ▪ A fact table with a single table for
each dimension
Modeling –
❑ Snowflake schema
Overall ▪ Variation of the star schema
Perspective ▪ Dimensional tables are organized
into a hierarchy by normalizing them
▪ This allows the finest granularity in
queries
❑ Fact Constellation
▪ Set of fact tables that share some
dimension tables
▪ Limit the possible queries for the
warehouse
Prod_category Snowflake Schema
PART 1.3 category_id
description Sale_period
period_id
from_date
to_date
Dimensional Product
prod_id
Modeling – prod_name
desc
FACT TABLE
Cust_category
category_id
description
Region
region_id
description
Prod_category Sale_period
PART 1.3 category_id Snowflake Schema period_id
description from_date
to_date
Dimensional Product
prod_id Time
Modeling – prod_name
desc
sale_date
Perspective amount
Customer qty Store
cust_id cust_id store_id
last_name name
first_name
prod_id
city
Address store_id state
category_id sale_date Country
region_id
Cust_category
category_id Region
description region_id
description
Dimensional metro_id
description
region_id
Modeling – Community
community_id
territory_id
Overall description
metro_id
Perspective
Store
Store_id
vs.
Name
City
State
Country
Star Schema
Community_id
SALES Store
Store_id
Name
SALES Amount City
Amount Quantity State
Quantity Country
Cust_id Cust_id Community
Prod_id Prod_id Metro
Store_id Store_id Territory
Sale_date Sale_date Region
PART 1.3
➢Snowflaking is OK sometimes
Dimensional
Modeling – ➢However, keep in mind that
Overall snowflaking may drastically
Perspective affect performance
Product
Prod_id Time
Prod_name
Sale_date
Desc
Category Customer Transaction Store
Cust_id Tran_id Store_id
Last_name Store_id Name
SALES First_name Cust_id City
Amount Address Date State
Qty … … Country
Cust_id
Prod_id
Store_id
Sale_date
Dimensional
▪ Easier to understand and navigate
Modeling –
Overall
Perspective ▪ Better performance due to fewer
joins.
.......
The Data
PURCHASE ORDERS
Warehouse
. .. ... .
Bus
Architecture INVENTORY
.......
Date Item Store
SALES
The Data
Warehouse
Bus
Shipper
Vendor
Promo
Center
Architecture
Store
Distr.
Date
Item
Sales X X X X
Inventory X X X
Purchase Order X X X X
PART 1.3
Conformed Dimension Options:
The Data
1. Identical Dimensions
Warehouse Sales
Bus Cust_id
Product
Architecture Sales Data Mart
Prod_id
Prod_name
Prod_id
Store_id
Sale_date
Brand
Desc
(Sales Facts)
Category
Inventory
Product
Prod_id Cust_id
Inventory Data Mart Prod_name Prod_id
Brand Store_id
Desc Sale_date
Category
(Inventory Facts)
PART 1.3
Conformed Dimension options:
The Data
2. Subset of base dimensions
Warehouse
Bus SALES
PRODUCT
Architecture Prod_id
Cust_id
Prod_id
Sales Data Mart Prod_name
Store_id
Brand
Sale_date
Desc
Category
(Sales Facts)
BRAND INVENTORY
Purchase Analysis Brand_id
Brand Cust_id
Data Mart Desc Prod_id
Category Store_id
Sale_date
(Inventory Facts)
PART 1.3
Advantages:
▪ A single dimension table can be
The Data used against multiple fact tables
Warehouse
Bus
Architecture ▪ User interfaces and data content
are consistent whenever the
dimension is used
▪ There is a consistent
interpretation of attributes and,
therefore, rollups across data
marts are possible.
PART 1.3 The Matrix Method
Step 1 – Identify data marts based on
Building business process
Dimensional
Models Step 2 – Declare the grain
Building
Dimensional
Models:
Example
Customer Billing
Statement
Trouble Reports
Sales Forecast
PART 1.3
Step 2 – Declare the grain
❑ Clearly define what a fact table record is.
Building ❑ In general, as low or as granular as possible
Dimensional ✓ each sales transaction is a fact record
Models Individual ✓ each insurance claims is a fact record
transaction ✓ each ATM transaction is a fact record
✓ each daily product sales total in each
Snapshot store is a fact record
✓ each monthly account snapshot is a
fact record
✓ each line item on each order is a fact
Individual record
line item ✓ each coverage in each insurance
policy is a fact record
PART 1.3
Step 2 – Declare the grain
❑granularity
Building ✓ We must have the most atomic items we will
need in the DW
Dimensional
✓ Ask “what is a fact record, exactly”
Models ✓ These items cannot be “drilled down” any more
✓ The finer the granularity, the more flexible the
data mart
✓ They can then be combined for aggregates
✓ Data Mining is typically less effective on
aggregated data
✓ Typical granularity levels are going to be:
▪ Individual transactions
▪ Higher-level snapshots (for snapshots, we
have to determine time period)
▪ Line items from reports
PART 1.3
Step 2 – Declare the grain
Building
Dimensional back to the “large telephone
Models: company” example:
Example
The grain could be:
individual line item on each monthly
customer bill
PART 1.3 Step 3 – Identify dimensions
▪ Choose dimensions
Building ▪ Many times, the grain is stated in terms
Dimensional of primary dimensions.
Models ▪ Straightforward once granularity is
decided
For example, “Daily inventory levels of
individual stock items in a distribution
center”
✓ time dimension – daily (when)
✓ stock item (what)
✓ location – distribution center (where)
✓ who
✓ how
PART 1.3 Step 3 – Identify dimensions
▪ Good Analogy: decorating the set of
measurements (facts) with dimensions
Building
▪ Measures as single-valued descriptions –
Dimensional descriptive data
Models ✓ daily time dimension for daily
measurements.
▪ The granularity of the dimension cannot be
lower than the overall fact table granularity
✓ Example: when fact table gives (monthly
sales), then then time dimension must be
(monthly), cannot be (weekly) or (daily).
▪ However, it can be of “less quality”
✓ Example: “delivery mode” dimension is
limited to (air) and (land) and fact table
contains data including (air), (land), and
(sea).
PART 1.3 Step 3 – Identify dimensions
▪ Make a list of all the descriptive attributes.
Building Dimensional descriptive attributes can be
used to populate the dimension.
Dimensional ▪ For example, the attributes in “time”
Models dimension may include:
✓ calendar year
✓ calendar quarter
✓ calendar month
✓ calendar week
✓ calendar day
✓ fiscal year
✓ fiscal quarter
✓ fiscal month
✓ fiscal week
✓ fiscal day
✓ etc.
PART 1.3 Step 3 – Identify dimensions
Building
Dimensions
Calling party
Called Party
Dimensional
customer
Provider
Service
Time
Rate
Models –
Example Data Mart
Customer Billing X X X X X
Statement
Trouble Reports X X X
Sales Forecast X X X X
PART 1.3 Step 3 – Identify dimensions
How to describe attributes of dimensions?
Building ➢ Well, this should have been done correctly in
Dimensional OLTP DB design, but it may not have.
Models ➢ Ensure that attributes are:
❑ Descriptive words
❑ Complete
❑ Cleaned data
❑ Indexed
❑ Documented
➢ Time dimension
❑ Figure out what atomicity you need
➢ Name/address
❑ Make a standard and stick with it
PART 1.3
Step 4 – Identify Facts
Step 4.1 – Choose the data mart
Building Start with a single source data mart
Dimensional
Models
Step 4.2 – Declare the grain
Determine the grain of a single data mart
from Step 2
Building Time
Providor
Providor
Dimensional prov_id
prov_id
prov_name
prov_name
Time_id
T_day
T_month
Models – Prov_service
Prov_service
Prov_category
Prov_category
T_year
Billing Statement
Example cust_id
prov_id
service_id
Time_id
Rate_id
Amount
Customer Quantity
Service
cust_id Minutes Service_id
last_name Service_name
first_name Service_catg
address
… Rate
Rate_id
Rate_charge
Rate_desc
…
PART 1.3
Step 4 – Identify Facts
Building Time
Providor
Providor
Dimensional prov_id
prov_id
prov_name
prov_name
Time_id
T_day
T_month
Models – Prov_service
Prov_service
Prov_category
Prov_category
T_year
Scheduled Service
Example &Installed Orders
cust_id
prov_id
service_id
Time_id
Trouble Reports
cust_id
service_id
Time_id
Building Time
Called Party
Providor
Dimensional Called_id
prov_id
Called_prov
prov_name
Time_id
T_day
T_month
Models – Called_service
Prov_service
Called_category
Prov_category
T_year
Sales Forcast
Example Called_id
Calling_id
service_id
Time_id
Amount
Calling Party Quantity
Service
Calling_id Minutes Service_id
Calling_prov Service_name
Calling_service Service_catg
Calling_ category
…
PART 1.3
Fact-less Fact Table
Building
❑Used to describe events and coverage
Dimensional
❑ Table consists of all Dimension table
Models:
PKs, with a dummy field which will
Special Type of always have a value
Fact Tables ❑ A record will only exist in this table
when an event occurred (sale,
attendance…)
PART 1.3
Facts in Attendance fact-less tables
✓ which classes were the most heavily attended?
Building ✓ which classes were the most consistently
attended?
Dimensional ✓ which teachers taught the most students?
Models:
Fact-less Fact Time
time_key
Attendance Student
time_key student_key
Table – date
day_of_week
student_key
course_key
name
week_number address
Example 1 month
teacher_key
attendance = 1
major
…
…
Course
course_key Teacher
Name teacher_key
course_number name
dept dept
description …
…
PART 1.3
Facts in Coverage fact-less tables
✓ which product were on promotion
Building but didn’t sell?
Dimensional
Models:
Fact-less Fact Time Coverage Product
time_key
Table – date
day_of_week
time_key
product_key
product_key
name
store_key SKU
Example 2 week_number
month
promotion_key brand
… …
Store Promotion
store_key promotion_key
Name promotion_name
address type
region price
… …
PART 1.3
Dimension Evolution
Building
❑Data Warehouses contain history. So,
Dimensional
accounting for changes should be one of the
Models: analyst's most important responsibilities.
Special Types ❑ When a dimension attributes evolve over
of Dimension time, then the dimension is called slowly
Tables changing dimension (SCD) such as:
✓ people change names, education levels,
income, marital status, number of
children
✓ sales regions realignment
✓ moves
PART 1.3
Large dimensions
Building Conundrum: the larger the dimension it is, the more
Dimensional likely it is to change
Models: ❑ Maybe break into separate dimension tables to
manage/minimize change
Special Types ❑ This is a relational approach!
of Dimension
Tables
Degenerate dimensions
❑ Typically an order/invoice/ticket number
❑ Has no associated dimension tables
PART 1.3
Junk dimensions
Building
Dimensional ❑ Flags such that don’t seem to fit into any
Models: other specific dimension table
Special Types ❑ Your book suggests that these all be
of Dimension thrown into a single dimension table
Tables ❑ You will end up with an additional order of
magnitude for every yes/no flag
PART 1.3
Keys
❑ Dimension table keys usually correspond to
Building the original logical table keys
Dimensional ❑ Recommend use surrogate keys
Models ✓ do not use operational keys
✓ integer, non-meaningful, sequence
number
✓ advantages
▪ isolate data warehouse keys from
operational changes
▪ improve performance
▪ support integration from multiple
sources
▪ enable tracking of dimension changes
PART 1.3
Keys
❑ Fact table keys are compound and made up
Building of all the associated dimension tables to
Dimensional which they relate
Models ❑ The compound fact table key contains all of
the foreign keys into the dimension tables
❑ Be careful when using time as part of a key
✓ Granularity
✓ Nulls
❑ Avoid “smart keys”
✓ Keys made out of multiple attributes in
the dimension table
✓ This also means that we don’t necessarily
want to combine the dimension tables
PART 1.3 Misc Design Tips
❑ Labels that identify marts, dimensions,
Building attributes, and facts will probably be the
labels that are displayed to users – choose
Dimensional these labels carefully
Models ❑ An attribute lives in one and only one
dimension; a fact may be in multiple fact
tables
❑ A single field in the underlying source data
can have one ore more logical columns
associated with it. For example, the product
attribute field may translate to product
code, product short description, etc.
❑ Every fact should have a default
aggregation rule
PART 1.3
Building
Dimensional Go to the case studies of DW logical
Models design