Professional Documents
Culture Documents
DW Training
DW Training
DW Training
What is ETL?
Filters and
Extractors
Cleanser
Error
Error
Operational systems Cleaning View
Rules Check
• Rule 1 Correct
• Rule 2
• Rule 3
Transformation
Rules
• Rule 1
• Rule 2
• Rule 3
Transformation
Engine
Integrator
Error
View
Check
Correct Loader Warehouse
1-2
Operational Data - Challenges
• Context
1-3
Extraction
Data
from
80 tables tables 30
Filter
Oracle
Data from 10
tables Where
Date<10/12/99
50 tables
f iles
Sybase ta f rom
Da
Target
Source
Emp id Last First
Name Name
10001 Jones Indiana
Name =
Concat(First Name,
Last Name)
Indiana Jones
Sherlock Homes
1-5
Loading
Data
Source Warehouse
Direct Load
Staging
Area g rated
i nt e
rm ed &
ra n sfo load
l ean,T data
C
Cleaning,
Transformation
& Integration of
Raw data
1-6
Volume of ETL in a Data warehouse
Source OLTP
Systems Data Marts
Metadata
e re
is h
Enterprise
or k
Data Warehouse
e w
o f th
0 %
o 8
0 t
6
•Extract •Load
•Design •Extract •Load •Replication •Access & Analysis
•Design •Scrub •Index •Replication •Access & Analysis
•Mapping •Scrub •Index •Data Set Distribution •Resource Scheduling & Distributio
•Mapping •Transform •Aggregation •Data Set Distribution •Resource Scheduling & Distributio
•Transform •Aggregation
Meta
MetaData
Data
System
SystemMonitoring
Monitoring
1-7
Factors Influencing ETL Architecture
1-8
Extraction Types
Extraction Types
Extraction
Periodic/
Full Extract Incremental
Extract
1-10
Full Extract
Existing data
Data Mart
Full Extract
Source System
1-11
Full Extract
New data
Data Mart
Full Extract
Source System
1-12
Incremental Extract
Existing data
Incremental
Data
Data Mart
Incremental Extract
Source System
1-14
Incremental Extract
Existing data
New data
Incremental
Data
Data Mart
Incremental Extract
Source System
Changed data
1-15
Incremental Extract
Source System
Incremental Extract
Changed data
Existing data updated
using changed data
1-16
Transformation
Data Transformation
• Conversions
• Classifications
• Splitting of fields
• Merging of fields
1-18
Structural Transformations
• Additive
OLTP
Orders arrive
Data ware
every Aggregate house
two minutes
OLTP
Average
Daily
Productivity Average Data ware
figures house
1-19
Format transformation
Source Target
Schema Schema
Data Type Transformation
Conversions “32” 32
Source Target
Splitting Schema Schema
Transformation
“15-10-1992” 15 10 1999
1-20
Simple Conversions
Source Target
Schema Schema
Multiply by 1/43
Rs. 10000 $232.56
Revenue in Revenue in
Rupees Dollars
Multiply by 0.4536
1000 lbs. 453.56 kgs.
Production in Production in
Pounds Kilograms
Source Target
Schema Schema
1-21
Classification
Name Age
John Black 27
Richard Wayne 53
Age GroupFrequency
Jennifer Goldman 45 20-25 1
Helmut Koch 37 26-30 4
Anna Ludwig 32
Shito Maketha 28 31-35 3
Tracy Withman 39 36-40 2
Ada Zhesky 25 Grouping
41-45 2
David Rosenberg 33
Pankaj Sharma 29 46-50 1
Zhu Ling 44 51-55 1
George Kurtz 27
Rita Hartman 34
56-60 0
1-22
Data Consistency Transformations
Target
Gender
Male – M
Female – F
1-23
Reconciliation of Duplicated data
Joseph R Smith
123 Maine St.
MA - 70127
1-24
Data Aggregation - Design Requirements
• Aggregates must be stored in their own fact tables and each level
should have its own fact table
• The base fact table and all of its related aggregate fact tables must be
associated together as a family of schemas
1-25
Loading
Types of Data warehouse Loading
1-27
Types of Data Warehouse Updates
Data Warehouse
1-28
New Data and Point-In-Time Data Insert
Source data
New data
OR
Point-in-Time
Snapshot New Data Added to
(e.g.. Monthly) Existing Data
1-29
Changed Data Insert
Source data
Changed Data Added to
Existing Data
Changed
data
1-30
Change of Dimension values
1-31
ETL - Approach in a nutshell
1-32