Professional Documents
Culture Documents
DW Life Cycle
DW Life Cycle
Life Cycle
Data Warehouse Defined
• Modeling techniques
– E-R Modeling
– Dimensional Modeling
Implementation and modeling
styles
• Poor Performance
• Tend to be very complex and difficult to
navigate.
Dimensional Modeling
• Must identify
– Business process to be supported
– Grain (level of detail)
– Dimensions
– Facts
Conventions used in
Dimensional modeling
• Facts
• Measures(Variables)
• Dimensions
– Dimension members
– Dimension hierarchies
Facts
• A fact is a collection of related data items,
consisting of measures and context data.
• Each fact typically represents a business
item, a business transaction, or an event
that can be used in analyzing the
business or business process.
• Facts are measured, “continuously
valued”, rapidly changing information.
Can be calculated and/or derived.
Fact Table
• verbose, descriptive
• complete
• no misspellings, impossible values
• indexed
• equally available
• documented ( metadata to explain origin,
interpretation of each attribute)
Dimensional model
• Visualise a dimensional model as a CUBE (hypercube
because dimensions can be more than
3 in number)
• Operations for OLAP
Drill Down :Higher level of detail
Roll Up: summarized level of data
(The navigation path is determined by hierarchies within
dimensions.)
Slice: cuts through the cube.Users can focus on specific
perspectives
Dice: rotates the cube to another perspective (change the
dimension)
Drill down …. Roll up
Slice and Dice
Dimensions
• Collection of members or units of the same type of
views.
• determine the contextual background for the facts.
• the parameters over which we want to perform
OLAP (eg. Time, Location/region, Customers)
• Member is a distinct name to determine data item’s
position (eg. Time - Month, quarter)
• Hierarchy arrange members into hierarchies or levels
Hierarchies
• Non Additive
– Numeric measures that cannot be added across
any dimensions
– Intensity measure averaged across all
dimensions eg. Room temperature
– Textual facts - AVOID THEM
Common structures for
Data Marts :
Denormalize!
• Star
– Single fact table surrounded by denormalized
dimension tables
– The fact table primary key is the composite of the
foreign keys (primary keys of dimension tables)
– Fact table contains transaction type information.
– Many star schemas in a data mart
– Easily understood by end users, more disk storage
required
Example of Star Schema
Common structures for
Data Marts:
Denormalize!
• Snowflake
– Single fact table surrounded by normalized dimension
tables
– Normalizes dimension table to save data storage
space.
– When dimensions become very very large
– Less intuitive, slower performance due to joins
• Primary Keys
– uniquely identify a record
• Foreign Keys
– primary key of another table referred here
• Surrogate Keys
– system-generated key for dimensions
– key on its own has no meaning
– integer key, less space
More Keys …
• Smart Keys
– primary key out of various attributes of
dimension
– AVOID THEM!
– Join to Fact table should be on single
surrogate key
• Production Keys
– DO NOT USE Production defined attributes
– Business may reuse/change them - DW
cannot!
Basic Dimensional Modeling
Techniques
• Slowing changing Dimensions
• Rapidly changing Small Dimensions
• Large Dimensions
• Rapidly changing Large Dimensions
• Degenerate Dimensions
• Junk Dimensions
Slowly Changing Dimensions
12 Steps :
Source
System 1
E
E
T
Source V Staging Area T
V Data warehouse
System 2 L
L
Source
System 3
Extraction
– Loading Frequency
– Optimized Loading
• Indexing
• Partitioning
– Aggregation
• Sum
• Average
• Max
– Update Strategy
– Error Handling
Synopsis
• Staging Area
– optional
– to cleanse the source data
– Accepts data from different sources
– Data model is required at staging area
– Multiple data models may be required for
parking different sources and for transformed
data to be pushed out to warehouse
ODS - Some Clarity
L
Q
O
U
A
E
Operational D
R
Data Y
M
A
Summary
M
information
N Detailed A
A Information N
External G
A
Meta Data OLAP
data E
G
R
E
R
Warehouse Manager
Data Warehouse Architecture - 2
Data Warehouse Architecture - 3
Data Warehouse Architecture - 4
DW Architecture
• Global Architecture
– related to scope of data access and storage
– does not mean centralized
– can be physically centralized or distributed
– enterprise view of data
– time-consuming & costly to implement
Global Architecture
DW Architecture
• Independent Architecture
– stand-alone
– controlled by a department
– minimal integration
– no global view
– very fast to implement
DW Architecture
• Interconnected Architecture
– distributed
– integrated and interconnected
– gives a global view of enterprise
– more complexity
• who manages / controls data
• another tier in architecture to share common data
between multiple data marts
• have a data sharing schema across data marts
Independent
&
Interconnected Architecture
Types of Data Warehouse
Enterprise
Data Warehouse
Solution
• Share a uniform architecture to allow them
to be fused coherently
Classical Architectures
•SOURCE DATA
•Operational Data
•Staging Area
SOURCE DATA
Operational Data
Data Warehouse
Data Marts
Staging Area
External
Data
SOURCE DATA
Staging Area
Operational Data
Data Marts
OLAP focuses on
Data transformed into information that
meets the end-user’s analytical requirements
Data modeling and computation processes
is consistent
OLTP and DW provides the source data
whereas, OLAP turns that data into
information.
OLAP - Functionality
• OLAP functionality is characterized by
– Dynamic multi-dimensional analysis of consolidated
enterprise data supporting end user analytical and
navigational activities.
– Calculations and modeling applied across dimensions,
through hierarchies and/or across members
– Trend analysis over sequential time periods
– Slicing subsets for on-screen viewing
– Drill -down to deeper levels of consolidation
– Reach-through to underlying detail data
– Rotation to new dimensional comparisons in the viewing area
OLAP - Functionality
t
O en
PR r t m
D
a
ep
A D
c
c
Dept. Mgr. View o Actuals Accounting Dir. View
u
n
t
i
n
g Time
Dimensions
The Ability to display Cubes or dimensions
Hierarchies
Formulas and Links
OLAP - Features
• Multidimensional data
storage
• Dimensions &
Variables
• Summarized data
• Calculation Support
Expense
Multidimensional Analysis
Exception and Trend Reporting
Division A
• Which expenses are 5% or Division B
more below budget and
represent more than 2% of Division C
total expenses?
Labor 120 115 123
• Display all exp.. lines where the
trend over the last 6 months is
negative. Supplies 60 75 73
• How has expense mix changed
over the past 52 weeks? Travel 92 87 106
Expense
Multidimensional Analysis
Modeling, Forecasting, etc.
Division A
• What is the lag factor of Division B
expenses when adding new
Division C
employees?
• What-if I add three more Labor 120 115 123
employees in Division A Group?
• Project next quarter’s expenses Supplies 60 75 73
based on the last 12 months.
Travel 92 87 106
Expense
Fundamental Data Model is the same
KEYS DIMENSIONS
District Product
Boston
New York
Philadelphia Tents
• Offset addressing
Canoes
analytics Sportswear
• Better Footwear
performance Q1 Q2 Q3 Q4
Quarter
Derived Measures
Boston
New York
6 8 3
Philadelphia Tents
UNITS * PRICE = SALES
Canoes
SALES
Racquets
Sportswear
Footwear
Q2 Q3 Q4
Q1
Tents 3 4 1 2 2 3 Tents
Canoes Canoes
Racquets
UNITS PRICE Racquets
Sportswear Sportswear
Footwear Footwear
Q1 Q2 Q3 Q4
Q1 Q2 Q3 Q4
Data Storage
Boston
New York
Philadelphia Tents Data Page
Sales Sales Sales Sales
Canoes
Sales Sales Sales Sales
Racquets Sales Sales Sales Sales
Sales Sales Sales Sales
Sportswear Sales Sales Sales Sales
Footwear
Q1 Q2 Q3 Q4
Sample of Built-In Functions
Example :
Functional ROLAP Vs MOLAP
Tactical Strategic