Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

DWH and Business Intelligence Reports

What Should be Tested?

INTRODUCTION ............................................................................................................................................................ 1
What is ETL Testing? ....................................................................................................................................................... 2
Data Extraction Testing Examples ........................................................................................................................... 3
Data Transformation Testing Examples ................................................................................................................... 3
Data Load Testing Examples ................................................................................................................................... 3
What is DWH/BI Infrastructure Testing? ......................................................................................................................... 3
What is BI Application/Report Testing? ........................................................................................................................... 5
Categories of Tests for DWH, ETL and BI Report Testing .............................................................................................. 5

INTRODUCTION
When referring to business intelligence quality assurance, we often discover that the terms data warehouse
(DWH) testing and ETL testing are used interchangeably as though they are one and the same.

Data warehouses can be defined as a collection of data that may include all of an organization’s data. They
came into existence due to more focus from senior management on data as well as data-driven decision
making (business intelligence). Historical collections of online transaction processing (OLTP) data,
combined with continuous updating of current data for analysis and forecasting, is implemented to support
management decisions. Since many organizational decisions depend on the data warehouse, the data should
be of the highest quality.

To ensure that organizations make smart, accurate decisions, testing should be planned and executed
efficiently to avoid erroneous data being pumped into the database—then ultimately obfuscating senior
management’s decision-making process.

This article summarizes three testing strategies often associated with business intelligence quality
assurance: ETL testing, data warehouse/BI infrastructure testing, and BI application/report testing. In doing
so, it aims to clarify distinctions among these three primary categories of testing. Considerable research has
proven how these strategies increase productivity and ensure accurate data flow into the final warehouse
and BI reports. Following these approaches is a reliable way to prevent data-integrity issues from
undermining the business value of the data warehouse.

1
What is ETL Testing?

ETL testing is a sub-component of overall DWH testing. A data warehouse is essentially built using data
extractions, data transformations, and data loads. ETL processes extract data from sources, transform the
data according to BI reporting requirements, then load the data to a target data warehouse. Figure 1 shows
the general components involved in the ETL process.

After selecting data from the sources, ETL procedures resolve problems in the data, convert data into a
common model appropriate for research and analysis, and write the data to staging and cleansing areas—
then finally to the target data warehouse. Among the four components presented in Figure 1, the design and
implementation of the ETL process requires the largest effort in the development life cycle. ETL’s
processes present many challenges, such as extracting data from multiple heterogeneous sources involving
different data models, detecting and fixing a wide variety of errors/issues in data, then transforming the data
into different formats that match the requirements of the target data warehouse.

Figure 1: ETL testing for data staging, data cleansing, and DWH loads

A data warehouse keeps data gathered and integrated from different sources and stores the large number of
records needed for long-term analysis. Implementations of data warehouses use various data models (such
as dimensional or normalized models), and technologies (such as DBMS, Data Warehouse Appliance
(DWA), and cloud data warehouse appliances).
ETL testing includes different types of testing for its three different processes (extract, transform, load).

2
Data Extraction Testing Examples
• Data extraction code is granted security access to each source system
• Updating of extract audit logs and time stamping is accomplished
• Data can be extracted from each required source field
• All extraction logic for each source system works as required
• Source to extraction destination is working in terms of completeness and accuracy
• All extractions are completed within the expected timeframe

Data Transformation Testing Examples


• Transaction processes are transforming the data according to the expected rules and logic
• One-time transformation for historical initial loads are working
• Detailed and aggregated data sets are created successfully
• Transaction audit logs and time stamping are recorded
• There is no data loss or corruption of data during transformations
• Transformations are completed within the expected timeframe

Data Load Testing Examples


• There is no data loss or corruption during the loading process
• All transformations during loading work as expected
• Data sets in staging to loading destination work without data loss
• Incremental data loads work with change data capture
• Transaction audit logs and time stamping are recorded
• Loads are completed within the expected timeframe

What is DWH/BI Infrastructure Testing?


Several components of DWH development and test are not usually components of the ETL tool or stored
procedures that may be used in the ETL process – therefore, testing these processes will be accomplished
independent of ETL tests.

For example, this includes use of tools to profile data sources for format and content issues, checks for
missing source data/records, DWH security, etc. These categories of testing can be considered “DWH
infrastructure verifications.”

3
“DWH/BI infrastructure” generally consists of:

• Hardware components including storage and memory


• Operating systems
• Utilities that support ETLs and BI applications
• Change data capture (CDC) operations
• Network and network software
• OLTP databases
• Data cleansing tools
• Metadata application servers
• OLAP data for BI reports
• Automated testing tools
• The database management systems (DBMSs)
• …more

DWH/BI infrastructure components must be tested for (among other things) scalability, security, reliability,
and performance (e.g., with load and stress tests). DWH/BI infrastructure as a whole supports data
warehouse data movement as shown in Figure 2.

Figure 2: The data warehouse infrastructure supports all DWH, ETL, and BI Functions

4
Data warehouse infrastructure basically supports a data warehousing environment with the aid of many
technologies.

What is BI Application/Report Testing?


Front-end BI applications are often desktop, web, and/or mobile applications and reports. They include
analysis and decision support tools, and online analytical processing (OLAP) report generators. These
applications make it easy for end-users to construct complex queries for requesting information from data
warehouses—without requiring sophisticated programming skills.

End user reporting is a major component of any business intelligence project. The report code may execute
aggregate SQL queries against the data stored in data marts and/or the operational DW tables, then display
results in the required format (either in a web browser or on a client application interface).

For each type of report, there are several types of tests to be considered:
• Verify cross-field and cross report values
• Verify cross-references within reports
• Verify initialization of reports
• Verify input from user options and related output
• Verify SQL queries used to extract data for reports
• Verify internal and user-defined sorts
• Verify that there are no invalid data report fields
• Verify maximum and minimum field values
• Verify valid merging of data
• … much more

Categories of Tests for DWH, ETL and BI Report Testing


The following graphic lists categories of tests that should be considered for DWH and BI report testing.
From this list, those planning DWH/ETL/BI tests can select and prioritized the types of testing they
will/should perform during each phase of testing during a project.

5
Table 1: Summary of testing to be considered for DWH, ETL, and BI tests
DWH Tests
• ETL Testing BI Application/Report Tests
• DWH Infrastructure Tests

ETL Testing BI Report testing

1. Column data boundaries defined 1. Data aggregation and calculation rules


2. Column data constraints applied 2. Data boundary tests (data within bounds)
3. Column data lengths 3. Data filtering
4. Column data types 4. Data formats
5. Column data when defined as not null 5. Data lengths
6. Column default values defined and applied 6. Data transformation rules applied
7. Column names correctness 7. Data types
8. Column to column testing, source to target 8. Data value sorting
9. Concatenated data from multiple fields 9. Date/time formats
10. Data aggregation and calculation rules applied 10. Default values
11. Data boundary tests (i.e., data within bounds) 11. Derived data
12. Data filtering, source to target 12. Domain ranges
13. Data formats 13. Drill up and down display correct data
14. Data granularity - precision, trimming 14. Exported data
15. Data inserts, updates, deletes 15. Field data boundaries
16. Data profiling on source and target data 16. Field data constraints
17. Data selection from sources 17. Field data traceable back to DWH
18. Data sorted as defined 18. Field data truncations/trimming
19. Data transformation rules and logic 19. Field names
20. Data transformations, cleaning, aggregations 20. Filtered data fields (ranges, ID’s, etc.)
21. Data value boundaries 21. Graphed data
22. Date/time format and values 22. Min, max, avg, sum values
23. Domain integrity maintained 23. Null and not null fields
24. Domain ranges defined 24. Numeric field precisions
25. Duplicate records and duplicate field data 25. Performance tests
26. DWH schema verifications 26. Recovery tests
27. ELT data lookup’s function correctly 27. Regression tests
28. ETL errors/anomalies logged 28. Report data match DWH/data mart
29. ETL procedures ordered correctly 29. Report field default values
30. Incremental data load testing 30. Report formats comply with requirements
31. Indices defined appropriately 31. Security tests
32. Initial data load testing 32. Security: validate the access to data security
33. Negative column values where positive expected 33. Summary field data
34. Numeric column precisions as defined 34. Usability tests
35. Primary & foreign keys assigned 35. User Help tests
36. Primary key to child key relationships
37. Reference data verifications
38. Referential integrity as required
39. Rejected records handled as required
40. Source and target metadata verifications
41. Source to target data copied with no changes
42. Source to target data mappings are correct
43. Source to target record counts as expected
44. Source to targets data is filtered correctly
45. Surrogate keys applied where required
46. Target column data truncations
47. Target columns loaded
48. Treatment of “change data capture” (CDC’s)
49. Treatment of “slowly changing dimensions” (SCD’s)
50. Triggers: update, inserting, deleting as defined
6
DWH Tests
• ETL Testing BI Application/Report Tests
• DWH Infrastructure Tests

DWH Infrastructure Testing

1. Change data capture (CDC)


2. Data access
3. Data indices definition & optimization
4. DWH job procedures
5. End-to-end DWH process
6. Error logging tests – ETL and other
7. ETL recoverability tests
8. Forced error tests
9. Installation tests
10. Operational readiness testing
11. Performance tests
12. Profiling of data sources
13. Regression tests
14. Scalability tests
15. Security tests
16. Stored procedure tests
17. Stress tests
18. System tests
19. User acceptance tests

About the Author

Wayne Yaddow is an independent DW/BI QA consultant with 12 years of experience leading data
migration/integration/DW ETL testing projects at businesses such as J.P. Morgan Chase, Credit
Suisse, Standard and Poor’s, AIG, and IBM. Wayne has written extensively on the subject and offered
tutorials on this subject. Wayne co-authored the book, Testing the Data Warehouse, and continues to
write widely published DW QA articles. He continues to lead ETL testing and coaching projects as an
independent consultant. Wayne studied computer science at Technical University in Germany. You
can contact Wayne at wyaddow@gmail.com.

You might also like