Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

Data Transformation

Process Functions
Lesson 8:
ETL Requirements
And
Steps
Introduction:
• begins with one of the toughest challenges: surrounding the
requirements.
• gathering in one place all the known requirements, realities, and
constraints affecting the ETL system.
• it's essential to lay them on the table before launching a data
warehouse project.
• the requirements are mostly things you must live with and adapt your
system to.
• make your own decisions, exercise your judgment, and leverage your
creativity, but the requirements are just what they're named.
• they are required.

Source: https://www.informationweek.com/software/information-management/surrounding-the-etl-requirements/d/d-id/1028110
ETL Requirements
1. Business Needs
• are the information requirements of the data warehouse's
end users
• users need to make informed business decisions
• to identify the extended set of information sources that
the ETL team must introduce into the data warehouse
• understanding and constantly examining the business
needs
• maintain a dialog among the ETL team, data warehouse
architects, business analysts, and end users
• constantly need to be reexamined and discussed.
Source: https://www.informationweek.com/software/information-management/surrounding-the-etl-requirements/d/d-id/1028110
2. Compliance
• proof that the reported numbers are accurate,
complete, and untampered
• comply with regulatory reporting requirements
• financial reporting

Source: https://www.informationweek.com/software/information-management/surrounding-the-etl-requirements/d/d-id/1028110
3. Data Quality via Data Profiling
• necessary precursor to designing any kind of system
• employs analytic methods for looking at data
• can process very large amounts of data
• uncover all sorts of issues that need to be addressed
• complete and reliable
• examination of the quality, scope, and context of a data source
• minimal transformation and human intervention
• impose cancelation if quality will sacrifice
• prepare the business sponsors for the realistic development

Source: https://www.informationweek.com/software/information-management/surrounding-the-etl-requirements/d/d-id/1028110
4. Security
• data should be restricted
• controlled at the final applications delivery point
• authenticated
• extended to physical backups
• passwords

Source: https://www.informationweek.com/software/information-management/surrounding-the-etl-requirements/d/d-id/1028110
5. Data Integration
• it aims to make all systems work together
• takes the form of conforming dimensions and
conforming facts in the data warehouse
• across separated databases reports can be
generated using these attributes

Source: https://www.informationweek.com/software/information-management/surrounding-the-etl-requirements/d/d-id/1028110
6. Data Latency
• describes how quickly the data must be delivered to
the end users

Source: https://www.informationweek.com/software/information-management/surrounding-the-etl-requirements/d/d-id/1028110
7. Archiving and Lineage
• every data warehouse needs various copies of old data,
either for comparisons with new data to generate change
capture records or reprocessing
• keeping the data indefinitely on some form of permanent
media
• All staged data should be archived
• less of a headache to read the data back in from
permanent media
Source: https://www.informationweek.com/software/information-management/surrounding-the-etl-requirements/d/d-id/1028110
8. End-User Delivery Interfaces
• as much as we can control, makes the end-user
applications simple and fast
• need to work closely with the end-user application
developers to determine the exact requirements for the
final data handoff

Source: https://www.informationweek.com/software/information-management/surrounding-the-etl-requirements/d/d-id/1028110
9. Available Skills
• available resources to build and manage the system
• in house programming skills
• or use a vendor's package

Source: https://www.informationweek.com/software/information-management/surrounding-the-etl-requirements/d/d-id/1028110
10. Legacy Licenses
• existing legacy licenses

Source: https://www.informationweek.com/software/information-management/surrounding-the-etl-requirements/d/d-id/1028110
Major Steps in ETL
1. Determine all the target data needed in the data warehouse.
2. Determine all the data sources, both internal and external.
3. Prepare data mapping for target data elements from sources.
4. Establish comprehensive data extraction rules.
5. Determine data transformation and cleansing rules.
6. Plan for aggregate tables.
7. Organize data staging area and test tools.
8. Write procedures for all data loads.
9. ETL for dimension (standard unit of measurement) tables.
10. ETL for fact calculating differences and ratios) tables.
Source: https://www.informationweek.com/software/information-management/surrounding-the-etl-requirements/d/d-id/1028110

You might also like