Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 27

Introduction to Data Warehousing

Concepts
Topics Covered

– What is a data warehouse


– Definition of a data warehouse
– Why organizations use data warehousing?
– OLTP vs. OLAP
– Dimensional Modeling
– Dimensions and Measures
– Types of data warehouses
– Data warehouse schemas and other basics

December 9, 2021
What is a data warehouse?

– Data warehouse is a database designed in such a way that it is optimized for querying
and data analysis.

– Data warehouse has a collection of historical data from different operations in a


company.

December 9, 2021
Definition of a data warehouse

A data warehouse is a :

– Subject-oriented
– Integrated
– Non-volatile
– Time variant
– Accessible

Store of data obtained from variety of sources and made available to end users in a way
that they can understand and use in a business context.

December 9, 2021
Definition of a data warehouse

Subject-oriented
– The data in the data warehouse is organized so that all the data elements relating to
the same real-world event or object are linked together.

Time-variant
– The changes to the data in the data warehouse are tracked and recorded so that
reports can be produced showing changes over time.

Non-volatile
– Data in the data warehouse is never over-written or deleted - once committed, the data
is static, read-only, and retained for future reporting.

Integrated
– The data warehouse contains data from most or all of an organization's operational
systems and this data is made consistent.

December 9, 2021
Why organizations use data warehousing?

Competitive business environment creates need for complex analysis of ever increasing
volume of business data.
Hence data warehousing is used:
– to turn vast volumes of business data into meaningful management information
– Give users online access to this information

Organizations need information which is :

– Holistic in its coverage of the business


– Selected and enriched
– Easily accessible
– Easily understandable
– Of a high quality
– Directly applicable to the decision situation

December 9, 2021
OLTP Vs. OLAP

There are two basic data processing models:

OLTP: On Line Transaction Processing

The main aim of OLTP is reliable and efficient processing of a large number of
transactions and ensuring data consistency.

OLAP: On Line Analytical Processing

The main aim of OLAP is efficient multidimensional processing of large data volumes.

December 9, 2021
OLTP Vs. OLAP

OLTP OLAP

– users Clerk, IT professional Knowledge worker


– function day to day operations decision support
– DB design application-oriented subject-oriented
– data current, up-to-date historical, summarized
detailed, flat relational multidimensional
isolated integrated, consolidated
– usage repetitive ad-hoc
– access read/write, lots of scans
index/hash on prim. key
– unit of work short, simple transaction complex query
– # records accessed tens millions
– #users thousands hundreds
– DB size 100MB-GB 100GB-TB
– metric transaction throughput query throughput, response

December 9, 2021
OLTP Vs. OLAP
OLTP OLAP

Few Indexes Many

Many Joins Some

Normalized Duplicated Denormalized


DBMS DBMS
Data

Derived Data and Common


Rare
Aggregates

December 9, 2021
Dimensional Modeling

Dimensional Modeling is a different approach to database design.


Features of Dimensional Modeling are:

– Highly denormalized schema


– Data is contained in 2 types of tables: Dimension and Fact tables
– Usually dimension tables have large number of columns and lesser number of rows.
– Usually fact tables have lesser number of columns and large number of rows.

December 9, 2021
Dimensions ( Who, what, when, where )

– Dimensions are the context of measurements.


– Dimension is a subject area of a business against which the facts are measured.
– For e.g. Sales summary in a fact table can be viewed by Region dimension ( sales by
country, state, city) or by Time dimension ( monthly or yearly sales)

December 9, 2021
For Example

Location Dimension - Table Schema

Field Name Type


Dim_Id Integer(4)

Loc_Code Varchar(4)

Name Varchar(50)

State_Name Varchar(20)

County_Name Varchar(20)

December 9, 2021
For Example

Location Dimension - Table Data

Dim_Id Loc_Code Name State_Name Country_Name

1001 IL01 Chicago Loop Illinois USA

1002 IL02 Brooklyn New York USA

1003 MX01 Mexico City Distirto Federal Mexico

1004 TO01 Toronto Ontario Canada

December 9, 2021
Measures ( metrics and measurements )
Measures are summarized numeric data regarding the actual business process.

Features of Measures:

– Usually measures are additive ( like total sales ). However they can be semi-additive
( like balances ) or non-additive ( like unit price ).
– Measures are aggregated/rolled up on the basis of the dimensions.
– Facts are an overall summary of the measures related to a business area i.e. fact
tables contain measures.

December 9, 2021
For Example

Monthly Sales Fact - Table Schema

Field Name Type


TM_Dim_Id Integer(4)

PR_Dim_Id Integer(4)

LOC_Dim_Id Integer(4)

Sales Integer(4)

Tax Integer(4)

December 9, 2021
For Example

Monthly Sales Fact - Table Data

TM_Dim_Id PR_Dim_Id LOC_Dim_Id Sales Tax

1001 1001 1003 89513383 8900

1002 1002 1001 25468926 2512

1003 1001 1003 777215631 7796

1001 1004 1001 65894001 6574

December 9, 2021
Types of data warehouses…
Data warehouse without staging area

Data Sources Data Warehouse Users

Operational
system Analysis

Metadata
repository

Summary data Raw data


Operational Reporting
system

Flat files Data mining

December 9, 2021
Types of data warehouses…
Data warehouse with staging area

Data Sources Staging area Data Warehouse Users

Operational
system Analysis

Metadata
repository

Summary data Raw data


Operational Reporting
system

Flat files Data mining

December 9, 2021
Types of data warehouses…
Data warehouse with staging area and data marts

Data Sources Staging area Data Warehouse Data Marts Users

Operational
system Purchasing Analysis

Metadata
repository

Summary data Raw data


Operational Sales Reporting
system

Inventory
Flat files Data mining

December 9, 2021
Data warehouse schemas and other basics

Three basic conceptual schemas are:

– Star Schema : A single object (fact table) in the middle connected to a number of
dimension tables

– Snowflake Schema : A refinement of star schema where the dimensional hierarchy is


represented explicitly by normalizing the dimension tables

– Fact Constellations : Multiple fact tables share dimension tables

December 9, 2021
Data warehouse schemas and other basics
Star Schema

Time Dimension Customer Dimension

Sales Fact

Product Dimension
Dimension 1

Store Dimension

December 9, 2021
Data warehouse schemas and other basics
Star Schema

Date Sales Fact Table Product

Date ID Product ID
Date ID
Month Prod Name
Year Product ID
Prod Desc

Store ID Category
Store QOH
Customer ID
Store ID
City Unit Sales Customer
State
Dollar Sales Customer ID
Country
Region Cust Name
Cust City

Measurements Cust Country

December 9, 2021
Data warehouse schemas and other basics
Snowflake Schema

Year

Quarter
Customer Dimension
Time

Product Sales Fact

Dimension 1
Sub Cat
Store City

Category
State

December 9, 2021
Data warehouse schemas and other basics
Snowflake Schema
Sub Cat

Year Sub cat ID

Month Sales Fact Table Sub cat


Year ID Date
Product Cat ID
Year Month ID
Date ID
Month Date ID Product ID
Date Category
Year ID Product
Month ID Product ID
Product desc
Cat ID
Store ID Sub cat ID
Category
City
Customer ID Cat ID
State
City ID Store
State ID City
Unit Sales
Store ID
State State ID Dollar Sales Customer
Store
Country ID
City ID Customer ID
Country
Cust Name
Country ID
Cust City
Country
Measurements Cust Country

December 9, 2021
Data warehouse schemas and other basics
Fact Constellation

Time Dimension Store Dimension

Sales Fact

Forecast Fact

Product Dimension Customer Dimension

December 9, 2021
Data warehouse schemas and other basics
Fact Constellation Sales Fact Table

Store Date ID
Product ID Product
Store ID
City Store ID
Product ID
State Customer ID Prod Name
Country Unit Sales Prod Desc
Region
Dollar Sales Category
QOH
Forecast Fact Table
Date
Date ID Customer
Date ID
Month ID
Month Customer ID
Year Product ID
Cust Name
Customer ID Cust City
Fcst_Weight_net Cust Country
Measurements
Fcst_Turnover

December 9, 2021
Thank You

December 9, 2021

You might also like