Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 89

An Introduction

to
Data Warehousing

Lnt Infotech Use Only


Objectives

• Data Warehouse Overview


• Data Warehouse ,OLTP & ODS
• Data Warehouse Architecture
• Data Models in Data Warehousing
• Slowly changing dimensions
• Surrogate Keys

Lnt Infotech Use Only


A producer wants to know….

Which
Whichare
areour
our
lowest/highest
lowest/highestmargin
margin
customers
customers??
Who
Whoare
aremymycustomers
customers
What and
andwhat
whatproducts
Whatisisthe
themost
most products
effective are
arethey
theybuying?
effectivedistribution
distribution buying?
channel?
channel?

What
Whatproduct
productprom- Which
prom- Whichcustomers
customers
-otions
-otionshave
havethe
thebiggest are
biggest aremost
mostlikely
likelyto
togo
go
impact
impactononrevenue? to
revenue? tothe
thecompetition
competition??
What
Whatimpactimpactwill will
new
newproducts/services
products/services
have
have on
onrevenue
Lnt Infotech revenue
Use Only
and margins?
and margins?
Data, Data everywhere
yet ...

• I can’t find the data I need


– data is scattered over the network
– many versions, subtle differences
• I can’t get the data I need
– need an expert to get the data

• I can’t understand the data I found


– available data poorly documented

• I can’t use the data I found


– results are unexpected
– data needs to be transformed from one form to other

Lnt Infotech Use Only


What are the users saying...

• Data should be integrated


across the enterprise
• Summary data has a real
value to the organization
• Historical data holds the
key to understanding data
over time
• What-if capabilities are
required

Lnt Infotech Use Only


Data Warehouse

• A data warehouse is a
 Subject-oriented
 Integrated
 Time-varying
 Non-volatile
collection of data that is used primarily in
organizational decision making

-- Bill Inmon, Building the Data Warehouse 1996

Lnt Infotech Use Only


What is Data Warehousing

Information A process of
transforming data into
information and
making it available to
users in a timely
enough manner to
make a difference

[Forrester Research, April 1996]


Data
Lnt Infotech Use Only
Need for Data Warehousing

• Better business intelligence for end-users


• Reduction in time to locate, access, and analyze
information
• Consolidation of disparate information sources
• Strategic advantage over competitors
• Faster time-to-market for products and services
• Replacement of older, less-responsive decision
support systems
• Reduction in demand on IS to generate reports

Lnt Infotech Use Only


Evolution of Data Warehousing

1960 - 1985 : MIS Era

• Unfriendly
• Slow
• Dependent on IS programmers
• Inflexible
• Analysis limited to defined reports
Lnt Infotech Use Only

Focus on Reporting
Evolution of Data Warehousing

1985 - 1990 : Querying Era

• Adhoc, unstructured access to corporate data

• SQL as interface not scalable

• Cannot handle complex analysis

Lnt Infotech Use Only

Focus on Online Querying


Evolution of Data Warehousing

1990 - 20xx : Analysis Era

• Trend Analysis
• What If ?
• Moving Averages
• Cross Dimensional Comparisons
• Statistical profiles
• LntAutomated
Infotech Use Only
pattern and rule discovery
Focus on Online Analysis
Data Warehousing Concepts and Terms

Some terms that are of great importance in understanding of data


warehousing concepts are
Operational Data : It is the data that is used to run a business. This
data is what is typically stored, retrieved and updated by Online
Transaction Processing (OLTP) system. Operational data is stored in a
relational database, but may be stored in legacy, hierarchical or flat file
formats as well.
Informational Data: It is stored in a format that makes analysis much
easier. Analysis can be in the form of decision support(queries), report
generation, executive information systems, and more in-depth
statistical analysis. Informational data is created from the wealth of
operational data that exists in the business. Informational data is what
makes up a data warehouse.

Lnt Infotech Use Only


OLTP Systems Vs Data Warehouse

Remember
Between OLTP and Data Warehouse systems

users are different

data content is different,

data structures are different

hardware is different
Lnt Infotech Use Only

Understanding The Differences Is The Key


Capacity Planning
Processing Power

Time of day
Lnt Infotech Use Only

Processing Load Peaks During the Beginning and End of Day


OLTP Vs Data Warehouse
Characteristic OLTP Data Warehouse

Orientation Transaction Analysis


Data Content Current values Summarized,
Archived, Derived,
Usually historical
values
Data Structure Optimized for Optimized for
transactions complex queries
Highly Normalized Often De-Normalized

Lnt Infotech Use Only


OLTP Vs Data Warehouse
Characteristic OLTP Data Warehouse

Data Access Record at a time Data set at a time


Access Frequency Read/Update/Delete Read / Aggregate

Concurrent Users Many Few

Data Stability Dynamic Static until refreshed


Data Organization By Application By Subject
Usage Predictable, repetitive Adhoc, Heuristic
Support Day-to-day operations Managerial needs
Response time Few seconds Several seconds to
minutes

Lnt Infotech Use Only


Do we need a separate database ?

• OLTP and data warehousing require two very


differently configured systems
• Isolation of Production System from Business
Intelligence System
• Significant and highly variable resource demands of
the data warehouse
• Cost of disk space no longer a concern
• Production systems not designed for query
processing

Lnt Infotech Use Only


Why Separate Data Warehouse?

• Performance
• special data organization, access methods, and
implementation methods are needed to support
multidimensional views and operations typical of OLAP
• Complex OLAP queries would degrade performance for
operational transactions
• Concurrency control and recovery modes of OLTP are
not compatible with OLAP analysis

Lnt Infotech Use Only


Why Separate Data Warehouse?

• Function
• missing data: Decision support requires historical data which
operational DBs do not typically maintain
• data consolidation: DS requires consolidation (aggregation,
summarization) of data from heterogeneous sources:
operational DBs, external sources
• data quality: different sources typically use inconsistent data
representations, codes and formats which have to be
reconciled.

Lnt Infotech Use Only


Operational Data Store - Definition

A
Data
B ODS Warehouse

Operational
DSS

Lnt Infotech Use Only


Can I see credit
report from
Accounts, Sales
from
Operational Data Store - Definition
marketing and Data from multiple
open order sources is integrated
report from for a subject
order entry for
this customer

A subject oriented, integrated,


volatile, current valued data store
Identical queries may containing only corporate
give different results
at different times.
Supports analysis
detailed data
requiring current
data Data stored only for
current period. Old
Data is either
archived or moved to
Data Warehouse
Lnt Infotech Use Only
Operational Data Store

• The ODS applies only to the world of operational


systems.
• The ODS contains current valued and near current
valued data.
• The ODS contains almost exclusively all detail data
• The ODS requires a full function, update, record
oriented environment.

Lnt Infotech Use Only


Different kinds of Information Needs

Is this medicine available


in stock
•• Current
Current

What are the tests this


•• Recent
Recent patient has completed so
far

•• Historical
Historical
Has the incidence of
Tuberculosis increased in
last 5 years in Southern
region
Lnt Infotech Use Only
OLTP Vs ODS Vs DWH

Characteristic OLTP ODS Data Warehouse

Audience Operating Analysts Managers and


Personnel analysts
Data access Individual records, Individual records, Set of records,
transaction driven transaction or analysis driven
analysis driven

Data content Current, real-time Current and near- Historical


current
Data Structure Detailed Detailed and lightly Detailed and
summarized Summarized
Data organization Functional Subject-oriented Subject-oriented

Type of Data Homogeneous Homogeneous Vast Supply of very


heterogeneous data

Lnt Infotech Use Only


OLTP Vs ODS Vs DWH

Characteristic OLTP ODS Data


Warehouse
Data redundancy Non-redundant within Somewhat Managed
system; Unmanaged redundant with redundancy
redundancy among operational
systems databases

Data update Field by field Field by field Controlled batch


Database size Moderate Moderate Large to very large
Development Requirements driven, Data driven, Data driven,
Methodology structured somewhat evolutionary
evolutionary

Philosophy Support day-to-day Support day-to- Support managing


operation day decisions & the enterprise
operational
activities

Lnt Infotech Use Only


Logical Transformation of operational data

Order processing
•2 second response time
Data
Daily c
•Last 6 months orders
losed o
rders Warehouse
•Last 5 years data
ory •Response time 2 seconds
Product Price/inventory /I nvent
rice to 60 minutes
d uct p
•10 second response time p ro
kly •Data is not modified
Wee s
•Last 10 price changes r am
g
•Last 20 inventory transactions g pro
tin
a rke
kl ym
Marketing ee
W
•30 second response time
•Last 2 years programs

•Different performance requirements


•Combine data from multiple applications
•Data is mostly non-volatile
•Data saved for a long time period

Figure 3. Reasons for moving data outside the operations systems


Lnt Infotech Use Only
Logical Transformation of operational data

Order processing
Customer Product
orders price Data
Available Inventory Warehouse
Customers

Products
Product Price/inventory
Product Product Orders
price Inventory
Product Inventory
Product Price changes
Product Price

Marketing
Customer Product
Profile price

Marketing programs

•No data model restrictions of the source application


•Data warehouse model has business entities

Figure 5. Data warehouse entities align with the business structure


Lnt Infotech Use Only
Logical Transformation of operational data

Order Processing
System Data
Daily c
losed or
d ers Warehouse
Editor: Orders (Closed)
Order Please add Open,
Backorder, Shipped, Inventory snapshot 1
Closed to the arrow
around the order
Inventory snapshot 2
ot
snapsh
y
ntor
inve
kly
Down

Wee
Up

Inventory

• Operational state information is not carried to the data warehouse


• Data is transferred to the data warehouse after all state changes

Figure 6. Transformation of the operational state information


Lnt Infotech Use Only
Advantages of Data Warehouse

• Time saving : The Warehouse has enabled employee


to shift their time from collecting information to
analyzing it & that helps the company make better
business decisions.
• Efficiency : A DW provides, in one central repository,
all the metrics necessary to support decision making
throughout the queries & reports.
• Complete documentation : A typical DW objective is
to store all the information including history

Lnt Infotech Use Only


Advantages of Data Warehouse

• Data Integration : Primary goal of all DW is to


integrate data because :
a) This is a primary deficiency in current decision
support systems.
b) Data content in one file is at a different level of
granularity than that in another file.
c) Same data in one file is updated at a different time
period than that in another file.

Lnt Infotech Use Only


Limitation of Data Warehouse

• High cost of building and on-going maintenance ($ 3 - 5 million).


• Complexity : Since it has to integrate all the data & transaction
system database and hence requires more time to design &
build (average DW requires approx. 3 years to implement).

• Answer to these limitations is Data Marts

Lnt Infotech Use Only


Data Marts

• Subject or Application Oriented Business View of


Warehouse
– Quick Solution to a specific Business Problem
– Finance, Marketing, Sales etc.
– Smaller amount of data used for Analytic Processing

Lnt Infotech Use Only

A Logical Subset of The Complete Data Warehouse


Data Marts

Marketing Data Mart Finance Data Mart Sales Data Mart

Current Level of Detail


( Data Warehouse)
Lnt Infotech Use Only
Data Mart Appeal

 What is the appeal of the Data Mart?


 Why do departments find it convenient to do their
decision support processing in their own data mart?
 What is wrong with the data warehouse as a basis for
standard decision support making?
There are several factors leading to the popularity of
the data mart.

Lnt Infotech Use Only


Data Mart Appeal

As Data warehouses grow,


 The competition to get inside the data warehouse grows
fierce. More and more departmental decision support
processing is done inside the data warehouse to the point
where resource consumption becomes a real problem
 Data becomes harder to customize
 The cost of doing processing in the data warehouse
increases as the volume of data increases
The department can build the data mart on its own
budget, thereby making all the technological decision it
wants
Lnt Infotech Use Only
Summary of Data Mart Appeal

• While DW was designed to manage bulk supply of


data from its suppliers(I.e. operational systems), and
to handle the organization and storage of this data,
the “retail stores” or “Data Marts” could be focussed
on packaging & presenting selections of data to end-
users, often to meet specialized needs.

Lnt Infotech Use Only


Data Warehouse and Data Mart

Data Warehouse Data Marts

Scope •Application Neutral •Specific Application


•Centralized, Shared Requirement
•enterprise •department
•Business Process Oriented

Data •Historical Detailed data •Detailed (some history)


Perspective •Some summary •Summarized

Subjects •Multiple subject areas •Single Partial subject


•Multiple partial subjects

Lnt Infotech Use Only


Data Warehouse and Data Mart

Data Warehouse Data Marts


Data Sources •Many •Few
•Operational/ External Data •Operational, external
data

Implement Time •9-18 months for first stage •4-12 months


Frame •Multiple stage implementation

Characteristics •Flexible, extensible •Restrictive, non


•Durable/Strategic extensible
•Data orientation •Short life/tactical
•Project Orientation

Lnt Infotech Use Only


Data Warehouses or Data Marts

For companies interested in changing their corporate cultures or


integrating separate departments, an enterprise wide approach makes
sense.

Companies that want a quick solution to a specific business


problem are better served by a standalone data mart.

Some companies opt to build a warehouse incrementally, data


mart by data mart.

Lnt Infotech Use Only

A Logical Subset of The Complete Data Warehouse


Warehouse or Mart First ?

Data Warehouse First Data Mart first


Expensive Relatively cheap
Large development cycle Delivered in < 6 months

Change management is difficult Easy to manage change

Difficult to obtain continuous corporate Can lead to independent and


support incompatible marts

Technical challenges in building large Cleansing, transformation, modeling


databases techniques may be incompatible

Lnt Infotech Use Only


Data Warehousing Model

Data Marts

Data Mining
Operational Data

ETL

OLAP Tools

Distributed data
Data
DSS Tools
Warehouse

External market Lnt Infotech Use Only

data
Typical Data Warehouse Architecture

Data
Marts
EIS /DSS

Metadata
Select Query Tools
Extract
Transform
Integrate
Data OLAP/ROLAP
Warehouse
Maintain

Web Browsers
Operational
Systems/Data Middleware/
Data API Data Mining
Preparation
Lnt Infotech Use Only

Multi-tiered Data Warehouse without ODS


Typical Data Warehouse Architecture

Data
Marts

Metadata Metadata

Select Select

Extract Extract
ODS Data
Transform
Transform Warehouse
Integrate Load

Maintain

Operational
Systems/Data
Data
Data
Preparation
Preparation
Lnt Infotech Use Only

Multi-tiered Data Warehouse with ODS


Application of Data Warehousing

• OLAP

• Data Mining

Lnt Infotech Use Only


Commonly used Terms in OLAP

 Measure: The entity in numeric figure that tells about the business.
 Dimension: A category of information that describes the measure.
For e.g The time dimension.
 Attribute: A unique level within a dimension, For e.g Month is an
attribute within the time dimension.
 Hierarchy: The specification of levels that represents relationship
between different attributes within a hierarchy. For example: one
possible hierarchy in the Time dimension is
Year-- Quarter--Month--Day

Lnt Infotech Use Only


OLAP : Online Analytical Processing

This is a common use of Data warehouse that involves real time


access and analysis of multi-dimensional data such as sales
information.
The term OLAP has been invented in the recent years to represent
the opposite of OLTP(Online Transaction Processing System).
Key characteristics of OLAP include
• Large data volumes
• Drill down along many dimensions
• Dynamic viewing and analysis of the data from a wide variety of
perspectives and through complex formulae

Lnt Infotech Use Only


Online Analytical Processing

OLAP EXAMPLE:
An example OLAP database may be comprised of sales data which has
been aggregated by region, product type, and sales channel. A typical
OLAP query might access a multi-year sales database in order to find
all product sales in each region for each product type.
After reviewing the results, an analyst might further refine the query to
find sales volume for each sales channel within region/product
classifications.
As a last step the analyst might want to perform year-to-year or quarter-
to-quarter comparison for each sales channel. This whole process must
be carried out on-line with rapid response time so that the analysis
process is undisturbed.

Lnt Infotech Use Only


•Introduction to Cubes

Location
Atlanta Product
Grapes
Denver
Detroit Cherries

Melons
Sales
Sales Apples

Pears

Q1 Q2 Q3 Q4
Time
Lnt Infotech Use Only
Online Analytical Processing

OLAP database servers support common analytical operations including:


“slicing and dicing”, drill down and Consolidation.
“Slicing and Dicing” Slicing and dicing refers to the ability to look at the
database from different view points. One slice of the sales database might
show all sales of product type within a region. Another slice might show all
sales by sales channel within each product type. Slicing and dicing is often
performed along a time axis in order to analyze trends and find patterns.
Drill-Down: OLAP database servers can also go in the reverse direction
and automatically display detail data which comprises consolidated data.
This is called drill-downs. Consolidation and drill-down are an inherent
property of OLAP servers.
Consolidation: Involves the aggregation of data such as simple rollups,
like for example sales officers can be rolled-up to districts and districts
rolled-up to regions.
Lnt Infotech Use Only
Data Mining

Data Mining is also called as “Knowledge Discovery in Databases


(KDD)”

Data Mining also refers to “using a variety of techniques to identify


nuggets of information or decision-making knowledge in bodies of data,
and extracting these in such a way that they can be put to use in the
areas such as decision support, prediction, forecasting and estimation.
The data is often voluminous, but as it stands of low value as no direct
use can be made of it; it is the hidden information in the data that is
useful.

Lnt Infotech Use Only


Applications of Data Mining

Data mining has varied fields of applications some of which are listed
below:
RETAIL/ MARKETING
 Identify buying patterns from customers
 Find associations among customer demographic characteristics
 Predict response to mailing campaigns
BANKING
 Detect patterns of fraudulent credit card use
 Identify loyal customers
 Determine credit card spending by customer groups
 Find hidden correlations between different financial indicators

Lnt Infotech Use Only


Who uses Data Warehouse

• Managers use sales data to improve forecasting &


planning for brands, product lines & business areas.
• Retail purchasing managers use DW to track fast-
moving lines & ensure an adequate supply of high
demand products.
• Financial analyst use warehouses to manage
currency & exchange exposures, oversee cash flow
& monitor capital expenditures.

Lnt Infotech Use Only


Questions

Lnt Infotech Use Only


Introduction
to
Data Modeling

Lnt Infotech Use Only


Objectives

• At the end of this lesson, you will know :


– Data Modeling for Data Warehouse
– What are dimensions and facts
– Star Schema and Snowflake Schemas
– Factless Tables
– Some modeling tools

Lnt Infotech Use Only


Data Modeling for Data Warehouse

• How to structure the data in your data warehouse ?


• Process that produces abstract data models for one
or more database components of the data warehouse
• Modeling for Warehouse is different from that for
Operational database
– Dimensional Modeling, Star Schema Modeling or
Fact/Dimension Modeling

Lnt Infotech Use Only


Modeling Techniques

• Entity-Relationship Modeling
– Traditional modeling technique
– Technique of choice for OLTP
– Suited for corporate data warehouse
• Dimensional Modeling
– Analyzing business measures in the specific business
context
– Helps visualize very abstract business questions
– End users can easily understand and navigate the data
structure

Lnt Infotech Use Only


Entity-Relationship Modeling - Basic
Concepts

• The ER modeling technique is a discipline used to


illuminate the microscopic relationships among data
elements.
• The highest art form of ER modeling is to remove all
redundancy in the data.
• Created databases that cannot be queried !!!!!

Lnt Infotech Use Only


An Order Processing ER Model

FK
City Salesrep table

FK
Sales District Order Header Customer Table

Sales Region FK
Order Details Item Table

Sales Country Product Brand

Product Category
Lnt Infotech Use Only
Entity-Relationship Modeling - Basic
Concepts

• Entity
– Object that can be observed and classified by its properties
and characteristics
– Business definition with a clear boundary
– Characterized by a noun
– Example
• Product
• Employee

Lnt Infotech Use Only


Entity-Relationship Modeling - Basic
Concepts

• Relationship
– Relationship between entities - structural interaction and
association
– described by a verb
– Cardinality
• 1-1
• 1-M
• M-M
– Example : Books belong to Printed Media

Lnt Infotech Use Only


Entity-Relationship Modeling - Basic
Concepts

• Attributes
– Characteristics and properties of entities
– Example :
• Book Id, Description, book category are attributes of entity
“Book”
– Attribute name should be unique and self-explanatory
– Primary Key, Foreign Key, Constraints are defined on
Attributes

Lnt Infotech Use Only


Entity-Relationship Modeling – Why Not ?

• End users cannot understand or remember an ER


model.
• No graphical user interface (GUI) that takes a general
ER model and makes it usable by end users.
• Softwares cannot usefully query a general ER model.
• Use of the ER modeling technique defeats the basic
allure of data warehousing, namely intuitive and high-
performance retrieval of data.

Lnt Infotech Use Only


Dimensional Modeling - Basic Concepts

• Represents the data in a standard, intuitive


framework that allows for high-performance access;
• Schema designed to process large, complex, adhoc
and data intensive queries.
• No concern for concurrency, locking and
insert/update/delete performance
• Every dimensional model is composed of one table
with a multipart key, called the fact table, and a set of
smaller tables called dimension tables.
• This characteristic "star-like" structure is often called
a star join.
Lnt Infotech Use Only
Star Schema Architecture

Lnt Infotech Use Only


Star Schema Example

Lnt Infotech Use Only


Star Schema
with Sample
Data

Lnt Infotech Use Only


Star Schema Architecture

Product Dimension
product_key
Sales Fact
description
Time Dimension time_key brand
product_key category
time_key store_key
day_of_week dollars_sold
month units_sold Store Dimension
quarter dollars_cost
year store_key
holiday_flag store_name
address
floor_plan_type
Lnt Infotech Use Only
Star Schema Architecture

 The previous example shows a STAR Schema


 The reason for this name is that your query takes on the
shape of a star.
 The Fact table is the body of the star and the dimension
tables are the points of the star.
 In the star schema design, a single object (the fact table)
sits in the middle and is radially connected to the other
surrounding tables(dimension tables) and looks like a STAR.

Lnt Infotech Use Only


Star Schema Architecture

FACT TABLES

The fact table is where the numerical measurements of the business

are stored.

Typically represents a business transaction, or event that can be used

in analyzing business process

Sparse

Access control to sensitive information is maintained in fact tables

 These tables can be veryLntlarge; asOnlymuch as several billion of rows .


Infotech Use
Star Schema Architecture

Dimension Tables
 The dimension tables are where the textual descriptions of the
dimensions of the business are stored.
Dimension tables are designed especially for selection and grouping.
 There is no access control on these tables, all users can view this
information
 These tables are much smaller than the Fact tables, may contain
10,000 rows of data.

Lnt Infotech Use Only


Star Schema Architecture

• Dimension Tables
 Each dimension table has a single-part primary key that
corresponds exactly to one of the components of the
multipart key in the fact table.
 Dimension tables, most often contain descriptive textual
information
 Determine contextual background for facts
 Examples :
• Time
• Location/Region
• Customers

Lnt Infotech Use Only


Star Schema Architecture

• The database consists of a single fact table and a single

table for each dimension.

• Each tuple in the fact table consists of a pointer (foreign

key ) to each of the dimension tables.

• Each dimension table consists of columns that

correspond to attributes of the dimension.

Lnt Infotech Use Only


Star Schema Architecture

• A key role for dimension table attributes is to serve as the


source of constraints in a query or to serve as row headers in
the user’s answer set.
• For example : A typical answer set returned from a query
looks like this :

Brand Dollar Sales Unit Sales


Axon 780 263
Framis 1044 509
Widget 213 444
Zapper 95 39

Lnt Infotech Use Only


Star Schema Architecture

• This query seeks to find all the product brands


(collection of individual products)that were sold in the
first quarter of 1995 and present the total dollar sales
as well as the number of units.
Thus both the dimension attributes the product and
time have been used for providing row headers
(product brands) and providing constraints (first
quarter of 1995) respectively.

Lnt Infotech Use Only


Components of a Star Schema
Employee_Dim
EmployeeKey
EmployeeID
...

Time_Dim Product_Dim
TimeKey Sales_Fact
Sales_Fact ProductKey
TheDate TimeKey
TimeKey ProductID
... EmployeeKey
EmployeeKey ...
Dimensional
Dimensional Keys
Keys ProductKey Multipart
Multipart Key
Key
CustomerKey
ShipperKey
RequiredDate
... Measures
Measures

Shipper_Dim Customer_Dim
ShipperKey CustomerKey
ShipperID CustomerID
Lnt Infotech Use Only
... ...
Fact Table & Dimension Tables

• Fact Tables • Dimensional Tables


• Numerical Measurements of • Dimensions are attributes
business are stored in Fact about facts.
Tables.

Lnt Infotech Use Only


Dimension Hierarchies

• For each dimension, the set of associated attributes


can be structured as a hierarchy

sType
store
city region

customer city state country

Lnt Infotech Use Only


Dimension Hierarchies

sType tId size location


t1 small downtown
store storeId cityId tId mgr t2 large suburbs
s5 sfo t1 joe
s7 sfo t2 fred city cityId pop regId
s9 la t1 nancy sfo 1M north
la 5M south

region regId name


north cold region
south warm region

Lnt Infotech Use Only


Snowflake Schema

Snowflake schema: A refinement of star


schema where the dimensional hierarchy
is represented explicitly by normalizing
the dimension tables

Lnt Infotech Use Only


Snowflake Schema

• Dimension tables are normalized by decomposing at


the attribute level
• Each dimension has one key for each level of the
dimension’s hierarchy
• Good performance when queries involve aggregation
• Complicated maintenance and metadata, explosion in
number of table.
• Makes user representation more complex and
intricate

Lnt Infotech Use Only


Snowflake schema - Example


Dim Dim
Table Table
Fact
Table

Dim Dim
Table Table

Lnt Infotech Use Only


Using a Snowflake Schema

Sales_Fact
Sales_Fact Product_Dim
TimeKey ProductKey
EmployeeKey
ProductKey Product Name
CustomerKey
ShipperKey Product Size
RequiredDate
... Product Brand ID

Product_Brand_ID
Product Brand
Product Category ID

Product_Category_ID
Product Category
Product Category ID
Lnt Infotech Use Only
Conformed Dimensions

• Dimension that means the same thing with every


possible fact table that it can be joined with
• Conformed dimensions most essential
– For the Bus Architecture
– Integrated function of the Data Warehouse
• Some common dimensions are :
– Customer
– Product
– Location
– Time

Lnt Infotech Use Only


Surrogate Keys

• All tables (facts and dimensions) should not use


production keys but Data Warehouse generated
surrogate keys
– Productions keys get reused sometimes
– In case of mergers/acquisitions, protects you from different
key formats
– Production systems may change their systems to generalize
key definitions
– Using surrogate key will be faster
– Can handle Slowly Changing dimensions well

Lnt Infotech Use Only


Slowly Changing Dimensions

Certain kinds of dimension attribute changes need to be


handled differently in Data Warehouse
• Type I - Overwrite
– e.g. Name Correction, Description changes
• Type II - Partition History
– Packing change, Customer movement
– Create a new dimension record with new surrogate key
• Type III - Organizational changes
– Sales Force Reorganization
– Show by sales broken by new and old organizations
– Need to create an old and a new field

Lnt Infotech Use Only


Factless Fact Tables

• For Event Tracking e.g. attendance


Date
Date_Key
Dimension Student
Student_Key
Dimension
Course Course_Key
Dimension Teacher
Teacher_Key
Dimension
Facility Facility_Key
Dimension

Lnt Infotech Use Only


Examples of Data Modeling Tools

• ERWIN
– Supports Data Warehouse design as a modeling technique
• Powersoft WarehouseArchitect
– Module of Power Designer specifically for DW Modeling
• Oracle Designer
– Can be extended for Warehouse modeling
• Others like Infomodeler, Silverrun are also used

Lnt Infotech Use Only


Questions

Lnt Infotech Use Only

You might also like