Professional Documents
Culture Documents
Dwi - Lecture - 9 - Etl
Dwi - Lecture - 9 - Etl
Data Warehouses
Organization
Source Staging area Data access
area
Extract Load Access
• Loading
refers to the process of
depositing the information
into a data storage system.
ETL
• ETL:
Data extraction
get data from multiple, heterogeneous, and external sources
Data cleaning
detect errors in the data and rectify them when possible
Data transformation
convert data from legacy or host format to warehouse format
Load
sort, summarize, consolidate, compute views, check integrity, and
• Change monitoring is
directly connected with
• Stream processing
E T L
Before the ETL
Have a plan
The ETL process must be
figured out logically and
Extract
• Batch
• Batch
Transformation
Data Flow
• Major Tasks
Data cleansing
Fill in missing values, smooth noisy data, identify or remove outliers,
and resolve inconsistencies
Clean
• Quality Screens
a set of quality screens that act as diagnostic filters in the
data flow pipelines – each quality screen is a test:
If the test against the data is successful, nothing happens and the
screen has no side effects.
But if the test fails, then it must drop an error event row into the
error event schema and choose to either halt the process, send the
offending data into suspension, or merely tag the data.
three categories of data quality screens:
column screens
• Audit dimension
a special dimension that is assembled in the back room by
the ETL system for each fact table
contains the metadata context at the moment when a specific
fact table row is created
Data Flow
• Structural conflicts
when the same concepts are modelled using different
structures.
Data delivery from the ETL system includes:
Loading flat and snowflaked dimensions (Loading subdimensions)
Generating time dimensions
Conforming dimensions and conforming facts
Loading text facts in dimensions
Running the surrogate key pipeline for fact tables
Loading fundamental fact table grains
Loading and updating aggregations
Staging the delivered data to disk
Data Flow
SELECT
• Inmon W.,
Building the Data Warehouse,
John Wiley & Sons, New York 2002
• https://www.youtube.com/watch?v=PXjSoM
dFoJg