Professional Documents
Culture Documents
DWH ETL and Business Intelligence Report
DWH ETL and Business Intelligence Report
INTRODUCTION ............................................................................................................................................................ 1
What is ETL Testing? ....................................................................................................................................................... 2
Data Extraction Testing Examples ........................................................................................................................... 3
Data Transformation Testing Examples ................................................................................................................... 3
Data Load Testing Examples ................................................................................................................................... 3
What is DWH/BI Infrastructure Testing? ......................................................................................................................... 3
What is BI Application/Report Testing? ........................................................................................................................... 5
Categories of Tests for DWH, ETL and BI Report Testing .............................................................................................. 5
INTRODUCTION
When referring to business intelligence quality assurance, we often discover that the terms data warehouse
(DWH) testing and ETL testing are used interchangeably as though they are one and the same.
Data warehouses can be defined as a collection of data that may include all of an organization’s data. They
came into existence due to more focus from senior management on data as well as data-driven decision
making (business intelligence). Historical collections of online transaction processing (OLTP) data,
combined with continuous updating of current data for analysis and forecasting, is implemented to support
management decisions. Since many organizational decisions depend on the data warehouse, the data should
be of the highest quality.
To ensure that organizations make smart, accurate decisions, testing should be planned and executed
efficiently to avoid erroneous data being pumped into the database—then ultimately obfuscating senior
management’s decision-making process.
This article summarizes three testing strategies often associated with business intelligence quality
assurance: ETL testing, data warehouse/BI infrastructure testing, and BI application/report testing. In doing
so, it aims to clarify distinctions among these three primary categories of testing. Considerable research has
proven how these strategies increase productivity and ensure accurate data flow into the final warehouse
and BI reports. Following these approaches is a reliable way to prevent data-integrity issues from
undermining the business value of the data warehouse.
1
What is ETL Testing?
ETL testing is a sub-component of overall DWH testing. A data warehouse is essentially built using data
extractions, data transformations, and data loads. ETL processes extract data from sources, transform the
data according to BI reporting requirements, then load the data to a target data warehouse. Figure 1 shows
the general components involved in the ETL process.
After selecting data from the sources, ETL procedures resolve problems in the data, convert data into a
common model appropriate for research and analysis, and write the data to staging and cleansing areas—
then finally to the target data warehouse. Among the four components presented in Figure 1, the design and
implementation of the ETL process requires the largest effort in the development life cycle. ETL’s
processes present many challenges, such as extracting data from multiple heterogeneous sources involving
different data models, detecting and fixing a wide variety of errors/issues in data, then transforming the data
into different formats that match the requirements of the target data warehouse.
Figure 1: ETL testing for data staging, data cleansing, and DWH loads
A data warehouse keeps data gathered and integrated from different sources and stores the large number of
records needed for long-term analysis. Implementations of data warehouses use various data models (such
as dimensional or normalized models), and technologies (such as DBMS, Data Warehouse Appliance
(DWA), and cloud data warehouse appliances).
ETL testing includes different types of testing for its three different processes (extract, transform, load).
2
Data Extraction Testing Examples
• Data extraction code is granted security access to each source system
• Updating of extract audit logs and time stamping is accomplished
• Data can be extracted from each required source field
• All extraction logic for each source system works as required
• Source to extraction destination is working in terms of completeness and accuracy
• All extractions are completed within the expected timeframe
For example, this includes use of tools to profile data sources for format and content issues, checks for
missing source data/records, DWH security, etc. These categories of testing can be considered “DWH
infrastructure verifications.”
3
“DWH/BI infrastructure” generally consists of:
DWH/BI infrastructure components must be tested for (among other things) scalability, security, reliability,
and performance (e.g., with load and stress tests). DWH/BI infrastructure as a whole supports data
warehouse data movement as shown in Figure 2.
Figure 2: The data warehouse infrastructure supports all DWH, ETL, and BI Functions
4
Data warehouse infrastructure basically supports a data warehousing environment with the aid of many
technologies.
End user reporting is a major component of any business intelligence project. The report code may execute
aggregate SQL queries against the data stored in data marts and/or the operational DW tables, then display
results in the required format (either in a web browser or on a client application interface).
For each type of report, there are several types of tests to be considered:
• Verify cross-field and cross report values
• Verify cross-references within reports
• Verify initialization of reports
• Verify input from user options and related output
• Verify SQL queries used to extract data for reports
• Verify internal and user-defined sorts
• Verify that there are no invalid data report fields
• Verify maximum and minimum field values
• Verify valid merging of data
• … much more
5
Table 1: Summary of testing to be considered for DWH, ETL, and BI tests
DWH Tests
• ETL Testing BI Application/Report Tests
• DWH Infrastructure Tests
Wayne Yaddow is an independent DW/BI QA consultant with 12 years of experience leading data
migration/integration/DW ETL testing projects at businesses such as J.P. Morgan Chase, Credit
Suisse, Standard and Poor’s, AIG, and IBM. Wayne has written extensively on the subject and offered
tutorials on this subject. Wayne co-authored the book, Testing the Data Warehouse, and continues to
write widely published DW QA articles. He continues to lead ETL testing and coaching projects as an
independent consultant. Wayne studied computer science at Technical University in Germany. You
can contact Wayne at wyaddow@gmail.com.