Major Steps in ETL

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 8

Major steps in ETL

Name:-Ashay Subhash Chaudhari


Div:-TY-CO-A,Roll no:-05
Guided by:-Prof.D.R.Patil
What is ETL?
Definition:-It is the process in data warehousing for
pulling data out of source system and putting it in data
warehouse
Extraction of data from source systems
Source systems can be RDBMS and files
Data is extracted from source systems
The main objective of this step is to retrieve all
required data from source systems
The extraction step should be designed in such a way
that it should not have negative effect on source
systems.
Data Transformation
This step includes cleaning, filtering, validating and
application of rules to extracted data
The main objective of this step is to load the extracted
data into target database with clean and general
format
The data extraction is done with different sources
having their own format
E.g. Date formats from two sources, dd/mm/yyyy and
yyyy/mm/dd
Other things carried in transforma- tion are-
Cleaning (male to ‘M’ and female to ‘F’
Filtering(selecting only certain columns to load)
Enrichment(instead of full name->first and last name)
Splitting(one column into multiple)
Joining(gather data from multiple sources)
In some cases there can be Rich data also
Loading
Data extracted and transformed is of no use until it is
loaded in target database
In this step extracted and transformed data is loaded
to target database
In order to load data efficiently it is necessary to index
the database
ETL Process can run Parallel
Data Extraction step takes time so the 2nd step of
transformation can take place simultaneously
It prepares data for 3rd step of Loading
As soon as some data is ready it can be loaded without
completion of previous step

You might also like