Data Warehouse Implementation

Data Warehouse Introduction
Spring 2016
Zafar Heydari, Farid
Outlines About Lecturer
OLAP vs OLTP
Why Data Warehouse
Designing Data Warehouse
Designing Dimensions
Designing Fact Tables

OLAP vs. OLTP
ROLAP
OLAP
MOLAP
OLTP SSIS
OLTP- Online Transactional Processing
 Designed to solve everyday work transactions (i.e. sales, customer care, manufacturing)
 Points where the data is captured and recorded in the company
 Very efficient in the management of specific information : ERP, CRM, HRM, SCM, Email, Others
Disadvantages Advantages
• Difficult to manage historical • Reduces paper trails

information • Simple & effective
• High cost of maintenance and • Highly accurate Advantages
gathering distributed data
• Need technical skills and
resources
OLAP – Online Analytical Processing
 Designed specifically to obtain information, analyze and solve business problems

 Specific analytical information is added to the data
 They use a different database technology, optimized to extract information.
 Analytical systems unify all of the company’s data in one system: the Data Warehouse
What’s in Cube?
OLAP Cubes
 They let you analyze all of the information available in the Data
Warehouse
 Each cube stores a set of specific information, and contains different
“measures” and “dimensions”
 Measures
numbers (amounts, quantities, percentages)
 Dimensions
contain attributes to filter and order information
Samples of Dimension & MeasuresDimension
Sales Stock / Inventory Supply Orders
• Dimensions: • Dimensions: • Dimensions: • Dimensions:

• Date / Time • Date / Time • Date / Time • Date / Time
• Customer • Store / • Supplier • Product
• Branch / Branch • Product • Customer
Store • Product • Contract • Salesperson
• Product • Contract • Terms of sale
• Measures: • Measures: terms • Measures:
• Quantity • Quantity • Type of • Quantity
• Price • Price transactions • Amount
• Discount • Cost • Measures: • Discount
• Profit • Quantity
• Amount
• Cost
Why is it multidimensional?
A dimension is one of the “edges” of your business
It is called multidimensional because you can see the

information from different “edges” at the same time
 Customers
 Suppliers
 Employees
 Products
 Places (Factories, Branches,…)
 Time
 Transaction Types ( Invoices, Orders, Quotes,…)
OLAP vs. OLTP
OLTP DB of Sales
 Highly Normalized
 Lots of Joins
 Archived Data from 1/1/1
 Low Performance
 Multiple Data Bases
 Different Naming &
definitions
Why Data Warehouses?
 Simplified Reporting
 Archived Data
 Different Data Sources
 Consistent Central Hub for Data Analysis

 OLTP is highly Normalized design
 OLAP/DW are highly DE-Normalized design
Designing Data Warehouse (1)
FACT Tables
(Numbers)
Production De-Normalized
Environment Dimension
Dimension
(Attributes)
DIM Sub-Category
Prototype Normalized DIM Region

DIM Category
Environment Dimension DIM Country
Designing Data Warehouse (2)
Star Schema : Focuses on an area of OLTP data
Central Fact table contains Metric data for Analysis
Dimensions contain attributes to describe fact data
Snowflake Schema extends De-normalized Star Schema

Designing Dimensions (1)
 Dimension tables contain:
 Attributes : to filter/slice Fact
data
 Labels: to get context of the
report
 Keys: Primary Keys(Surrogate
Keys) & Natural Keys (OLTP
Key)
 Lineage Key: Track where &
when the data comes (Master
lineage data)
 SCD: Slowly Changing Dimensions

 How do we deal with updates
 Determine historical aspects of dimension
data
 Type 1: Don’t maintain history
 Type 2: Save history by adding a new
record
 Type 3: Only Save Previous Value (Less
Popular)
 Sample : Department name :
Marketing Sales  Marketing
 Natural, Durable, and Supernatural Keys

 employee number (natural key)
 durable key/ durable supernatural key: persist key, independent of the original business process
 Flags and Indicators as Textual Attributes

 Cryptic abbreviations, true/false flags, and operational indicators should be supplemented in
dimension tables with full text meaningful words
 Null Attributes in Dimensions

 a given dimension row has not been fully populated,
 or when there are attributes that are not applicable to all the dimension’s rows.
 In both cases, we recommend substituting a descriptive string, such as Unknown or Not Applicable
 Calendar Date Dimensions

 Dim Date has different attributes such as week
number, month name, fiscal period, and national
holiday indicator, …
 To facilitate partitioning, the primary key of a date
dimension can be more meaningful, such as an
integer representing YYYYMMDD
 the date dimension table needs a special row to
represent unknown or to-be-determined dates
 a separate date/time stamp can be added to the
fact table. The date/time stamp is not a foreign key
to a dimension table
 Role-Playing Dimensions
 A single physical dimension can be referenced multiple times in a fact table
 For instance: A fact table can have several dates, each of which is represented by a foreign
key to the date dimension. It is essential that each foreign key refers to a separate view of
the date dimension so that the references are independent.
 These separate dimension views (with unique attribute column names) are called roles.
 Hierarchies:
 De-normalized Flattened Dimensions (Star)
 Multiple Hierarchies in Dimensions (Snow flake)
 Which Data levels (Star/ Snowflake?) : Depends on usability in all fact tables
Star Sample
Snowflake Sample
Hybrid Sample
Designing Fact Tables (1)
 Fact tables contain:

 Key: Primary Keys , Dimension
Keys
 Measure: Calculated Numbers
 Lineage: Track where & when
the data comes (Master
lineage data)
 Supportive labels
 Measures Additivity:
 Additive: All-dimensions
 Non-Additive: No dimensions
(Unit Cost)
 Semi-Additive: Some
dimensions (All except time)
(Units Balance)
 Fact-less Table:
represent Many-to-Many
relationships in Data
Warehouse.
 Nulls in Fact Tables:

 The aggregate functions (SUM, COUNT, MIN, MAX, and AVG) all do
the “right thing” with null facts.
 Nulls must be avoided in the fact table’s foreign keys
 The associated dimension table must have a default row (and

surrogate key) representing the unknown or not applicable
condition.
 Conformed Facts
 If the same measurement appears in separate fact
tables:
 If the separate fact definitions are consistent, the
conformed facts should be identically named
 but if they are incompatible, they should be differently

named to alert the business users and BI applications
 Periodic Snapshot Fact Tables

 A row in a periodic snapshot fact table summarizes many
measurement events occurring over a standard period, such as
a day, a week, or a month.
 The grain is the period, not the individual transaction.
 if no activity takes place during the period, a row is typically
inserted in the fact table containing a zero or null for each fact.
 Accumulating Snapshot Fact Tables

 A row in an accumulating snapshot
fact table summarizes the
measurement events occurring at
predictable steps between the
beginning and the end of a
process.
 Samples: Pipeline or workflow
processes, such as order
fulfillment or claim processing
 There is a date foreign key in the
fact table for each critical
milestone in the process.
 As pipeline progress occurs, the
accumulating fact table row is
revisited and updated.
 Consolidated Fact Tables

 to combine facts from multiple processes together into
a single consolidated fact table.
(Sales actuals can be consolidated with sales forecasts)
 Header/Line Fact Tables

 parent/child schemas), all the header-level dimension foreign keys and degenerate
dimensions should be included on the line-level fact table.
Managing Fact Data
 Indexing Fact Tables:

 Clustered on 1 small numeric PK
 Non-clustered on FK
 Column Store on Large Tables
Lineage Key
 Partitioning on Large table
P-Schema
 Simplified maintenance
 Improve performance P-Function
 Lineage
 Directly in Fact tables
 Centralized Lineage Dimension
 Meta data about columns
Kimball Dimensional Modeling Techniques
 Gather Business Requirements and Data Realities

 Collaborative Dimensional Modeling Workshops
 Four-Step Dimensional Design Process:
 1. Select the business process.
 2. Declare the grain. (atomic grain)
 3. Identify the dimensions.
 who, what, where, when, why, and how OF CONTECT OF BUSINESS
PROCESS EVENT
 4. Identify the facts.
Exercise 1
Answer 1
Any Question?
Spring 2016
Zafar Heydari, Farid

Data Warehouse Implementation

Uploaded by

Copyright:

Available Formats

You might also like

Data Warehouse Implementation

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Warehouse Implementation

Uploaded by

Copyright:

Available Formats

Data Warehouse Introduction

Why Data Warehouse

Designing Data Warehouse

Designing Fact Tables

• Difﬁcult to manage historical • Reduces paper trails

 Designed speciﬁcally to obtain information, analyze and solve business problems

Sales Stock / Inventory Supply Orders

• Dimensions: • Dimensions: • Dimensions: • Dimensions:

A dimension is one of the “edges” of your business

It is called multidimensional because you can see the

 Different Data Sources

 Consistent Central Hub for Data Analysis

Prototype Normalized DIM Region

Star Schema : Focuses on an area of OLTP data

Central Fact table contains Metric data for Analysis

Dimensions contain attributes to describe fact data

Snowflake Schema extends De-normalized Star Schema

 SCD: Slowly Changing Dimensions

 Natural, Durable, and Supernatural Keys

 Flags and Indicators as Textual Attributes

 Null Attributes in Dimensions

 Calendar Date Dimensions

 Fact tables contain:

 Nulls in Fact Tables:

 Nulls must be avoided in the fact table’s foreign keys

 The associated dimension table must have a default row (and

 but if they are incompatible, they should be differently

 Periodic Snapshot Fact Tables

 Accumulating Snapshot Fact Tables

 Consolidated Fact Tables

 Header/Line Fact Tables

 Indexing Fact Tables:

 Gather Business Requirements and Data Realities

You might also like