Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 35

Dimensional Modeling

E-BIZ Practice
Tata Consultancy Services, India
 What is a Dimension?

 What is a Fact?

 What is Dimensional Modeling?

 Data Warehouse Schemas


What is a Dimension?

Subject Dimension What was sold ?


Whom was it sold to ?
When was it sold ?
Where was it sold ?
Customer Time
Geography

• Dimensions put measures in perspective


• What, when and where qualifiers to the measures
• Dimensions could be products, customers, time,
geography etc.
Dimensional Hierarchy
Geography Dimension
World Level World

n
latio
Continent Level
Re

America Europe Asia


nt
re
Pa

Country Level USA Canada Argentina

State Level FL GA VA CA WA

City Level Miami Tampa Orlando NaplesDimension


Member / Business
Attributes: Population, Entity
Tourist’s Place
Slowly Changing Dimension (SCD)

Various data elements in the dimension undergo changes (e.g.


changes in attributes, hierarchical structures) which need to be
captured for analysis.
Slowly Changing Dimension (SCD) - Solutions

Slowly changing dimensions are classified into


three different types

•TYPE I

•TYPE II

•TYPE III
Slowly Changing Dimensions Type I

Emp id Name Email Emp id Name Email

1001 Shane Shane@ 1001 Shane Shane@


xyz.com xyz.com

Emp id Name Email Emp id Name Email

Shane@xyz.
1001 Shane Shane@ 1001 Shane Shane@ com
abc.co.in abc.co.in
Slowly Changing Dimensions Type II

Source Target

Emp Id Name Email

10 Shane shane@xyz.com

Primary Key EmpId Name Email Version No


1000 10 Shane Shane@xyz.com 0
Slowly Changing Dimensions Type II (Versioning)

Emp Id Name Email

10 Shane shane@abc.co.in

Source

Primary Key EmpId Name Email Version No

1000 10 Shane Shane@xyz.com 0

1001 10 Shane Shane@abc.co.in 1

Target
Emp Id Name Email

10 Shane shane@abc.com
Source

Primary Key EmpId Name Email Version No


1000 10 Shane Shane@xyz.com 0
1001 10 Shane Shane@abc.co.in 1
1002 10 Shane Shane@abc.com 2

Target
Slowly Changing Dimensions Type II - Flag
Source
Emp Id Name Email
10 Shane Shane@xyz.com

Current_Fla
Primary_Key Emp Id Name Email g
Shane@xyz.co
1000 10 Shane m Y

Target
Slowly Changing Dimensions Type II (Flag)

Emp Id Name Email

10 Shane shane@abc.co.in

Source

Primary_Key EmpId Name Email Current_Flag

1000 10 Shane Shane@xyz.com N

1001 10 Shane Shane@abc.co.in Y

Target
Emp Id Name Email

10 Shane shane@abc.com

Source

Primary_Key EmpId Name Email Current_flag


1000 10 Shane Shane@xyz.com N
1001 10 Shane Shane@abc.co.in N
1002 10 Shane Shane@abc.com Y

Target
Slowly Changing Dimensions Type II – Effective Date

Source

Emp Id Name Email


10 Shane Shane@xyz.com

Primary Key Emp Id Name Email Begin Date End date


Shane@xyz.co
1000 10 Shane m 01/01/06

Target
Slowly Changing Dimensions Type II – Effective Date
Source

Emp Id Name Email


10 Shane Shane@abc.co.in

Primary Emp Begin End


Key Id Name Email Date date
1000 10 Shane Shane@xyz.com 01/01/06 02/01/06

Shane@abc.co.i
1000 10 Shane n 03/01/06
Target
Slowly Changing Dimensions Type II – Effective Date
Source

Emp Id Name Email


10 Shane Shane@abc.com

Primary Key Emp Id Name Email Begin Date End date


1000 10 Shane Shane@xyz.com 01/01/06 02/01/06

Shane@abc.co.i
1001 10 Shane n 03/01/06 05/01/06

1002 10 Shane Shane@abc.com 06/01/06

Target
Slowly Changing Dimensions Type III
Source
Emp Id Name Email
10 Shane Shane@xyz.com

Effective
Primary Key Emp Id Name Email Prev Col name date
Shane@xyz.co
1000 16 Shane m 01/01/06
Target
Slowly Changing Dimensions Type III
Source
Emp Id Name Email
10 Shane Shane@abc.co.in

Prev Col Effective


Primary Key Emp Id Name Email name date

1000 10 Shane Shane@abc.co.in Shane@xyz.com 01/02/06

Target
Slowly Changing Dimensions Type III
Source
Emp Id Name Email
10 Shane Shane@abc.com

Prev Col Effective


Primary Key Emp Id Name Email name date

1000 10 Shane Shane@abc.com Shane@xyz.com 01/03/06

Target
What is a Fact?

Fact Measure

Revenue No. of Accounts


Cost

• Facts or Measures are the Key Performance


Indicators of an enterprise
• Factual data about the subject area
• Generally numeric, summarized
Types of Facts

Value Based Classification


• Numeric Facts
•Non-numeric Facts (e.g. Comments in fact tables)

Summary Based Classification


• Additive (along all dimensions)
• Semi Additive (mostly along Time dimension)
• Non Additive (cannot be added along any dimension)
Additive Facts
Date

Store

Product

Sales_Amount

Semi Additive Facts and Non Additive Facts


Date
Account
Current_Balance
Profit_Margin
Types of Fact Tables

• Transaction Tables
• Snapshot Tables
• Summary Tables
Conformed Dimensions

• Conformed dimensions are those which are consistent across


Data Marts.

• Essential for integrating the Data Marts into an Enterprise Data


warehouse.
Junk Dimensions

A junk dimension is a convenient grouping of flags and


indicators
Factless Facts
• Factless Facts are the Fact tables with out Facts
These Facts will be used for in the scenarios of event
tracking. For example Tracking Student attendance.
Surrogate Keys

• Joins between fact and dimension tables should be based on


surrogate keys

• Surrogate keys should not be composed of natural keys glued


together

• Users should not obtain any information by looking at these


keys

• These keys should be simple integers


Dimensional Modeling
STEP 1
• Identify Subjects (Dimensions)
• Identify Hierarchies of a Dimension
• Identify Attributes of levels in Hierarchies
• Define Grain Country

Industry Segment State

Industry Type City Fin. Class

Customer

Contd.
Dimensional Modeling
STEP 2
• Identify the Facts
• Group the Facts in a logical set

Financial Non-Financial
Transactions Transactions
Trans. Amount No. of Cheques Cleared
No. of Bonds No. of Visits to a Branch
No. of Transactions No. of DEMAT Transactions
Service Cost
... ...

Contd.
Dimensional Modeling
STEP 3
• Link the Group of Facts to the Dimensions that
participate in the Facts

Customer Product

Time Financial Organization


Transactions

Channel
Dimensional Modeling
STEP 4
• Define Granularity for each Group of Facts

Customer Product
(Customer) (Scheme)

Time Financial Organization


(Day-Hour) Transactions (Branch)

Channel
(Channel)
Data Warehouse Schemas

Star Schema
• A Group of Facts connected to Multiple Dimensions

Channel

Time Financial Organization


Transactions

Customer Product

Contd.
Data Warehouse Schemas

Snow-flake Schema (= Extended Star Schema)


• A Group of Facts connected to Dimensions, which are
split across multiple hierarchies and attributes

Time Product

Financial Organization
Channel
Transactions

Customer

Segment Geography
Contd.
Data Warehouse Schemas

Galaxy Schema
• Multiple Groups of Facts links by few common
dimensions
Dimension1 Dimension2

Fact1
Dimension3 Dimension4

Fact2 Dimension5 Fact3

Dimension6 Dimension7
QUESTIONS ???

You might also like