Professional Documents
Culture Documents
Data Warehouse Concepts: Avinash Kanumuru Diya Jana Debyajit Majumder
Data Warehouse Concepts: Avinash Kanumuru Diya Jana Debyajit Majumder
Data Warehouse Concepts: Avinash Kanumuru Diya Jana Debyajit Majumder
Avinash Kanumuru
Diya Jana
Debyajit Majumder
2 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Content [contd…]
6 Metadata Management
7 OLAP
3 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
An Overview
Understanding What is a Data Warehouse
5 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Data Warehouse def. by WH Inmon
6 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Data Warehouse Architecture
What makes a Data Warehouse
Source Tables: These are real-time, volatile data in relational databases for
transaction processing (OLTP). These can be any relational databases or flat files.
ETL Tools: To extract, cleansing, transform (aggregates, joins) and load the data from
sources to target.
Maintenance and Administration Tools: To authorize and monitor access to the data,
set-up users. Scheduling jobs to run on offshore periods.
Modeling Tools: Used for data warehouse design for high-performance using
dimensional data modeling technique, mapping the source and target files.
Databases: Target databases and data marts, which are part of data warehouse.
These are structured for analysis and reporting purposes.
End-user tools for analysis and reporting: get the reports and analyze the data from
target tables. Different types of Querying, Data Mining, OLAP tools are used for this
purpose.
8 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Data Warehouse Architecture
9 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Data Modeling
Effective way of using a Data Warehouse
11 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Terms used in Dimensional Data Model
Fact Table
sale oderId date custId prodId storeId qty amt
o100 1/7/97 53 p1 c1 1 12
o102 2/7/97 53 p2 c1 2 11
105 3/8/97 111 p1 c3 5 50
Dimension Table
customer custId name address city
53 joe 10 main sfo
81 fred 12 main sfo
111 sally 80 willow la
13 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Snowflake Schema
Dimension Table
sType tId size location
Fact Table t1 small downtown
store storeId cityId tId mgr t2 large suburbs
s5 sfo t1 joe Dimension Table
s7 sfo t2 fred city cityId pop regId
s9 la t1 nancy sfo 1M north
la 5M south
The star and snowflake schema are most commonly region regId name
found in dimensional data warehouses and data north cold region
marts where speed of data retrieval is more south warm region
important than the efficiency of data manipulations.
As such, the tables in these schema are not
normalized much, and are frequently designed at a
level of normalization short of third normal form.
14 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Overview of Data Cleansing
16 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Six Steps To Data Quality
Understand
Understand Information
Information Flow
Flow Identify authoritative data sources
In Organization
In Organization
Interview Employees & Customers
Clean & Load Use data cleansing tools to clean data at the source
Data Load only clean data into the data warehouse
17 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Data Quality Solution
Customized Programs
Strengths:
– Addresses specific needs
– No bulky one time investment
Limitations
– Tons of Custom programs in different environments are difficult to
manage
– Minor alterations demand coding efforts
Data Quality Assessment tools
Strength
– Provide automated assessment
Limitation
– No measure of data accuracy
18 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Data Quality Solution
19 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Tools In The Market
Business Rule Discovery Tools
– Integrity Data Reengineering Tool from Vality Technology
– Trillium Software System from Harte -Hanks Data Technologies
– Migration Architect from DB Star
Data Reengineering & Cleansing Tools
– Carlton Pureview from Oracle
– ETI-Extract from Evolutionary Technologies
– PowerMart from Informatica Corp
– Sagent Data Mart from Sagent Technology
Data Quality Assessment Tools
– Migration Architect, Evoke Axio from Evoke Software
– Wizrule from Wizsoft
Name & Address Cleansing Tools
– Centrus Suite from Sagent
– I.d.centric from First Logic
20 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Data Extraction, Transformation, Load
Visitors
Web
Browsers
External Data –
Demographics,
Household,
The Webographics,
Internet Income
Staging Area
Meta Data
Repository
Web Server Logs Flat Files
& •Clean
E-comm •Transform Enterprise
Transaction Data Scheduled •Match Scheduled Data
RDBMS •Merge
Extraction Loading Warehouse
Other OLTP
Systems
Data Collection Data Extraction Data Transformation Data Loading Data Storage &
Integration
22 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
ETL Architecture
23 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Why ETL ?
Companies have valuable data lying around throughout their networks that
needs to be moved from one place to another.
The data lies in all sorts of heterogeneous systems,and therefore in all sorts
of formats.
To solve the problem, companies use extract, transform and load (ETL)
software.
The data used in ETL processes can come from any source:
a mainframe application, an ERP application, a CRM tool, a flat file, and
an Excel spreadsheet.
24 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Major components involved in ETL Processing
25 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Major components involved in ETL Processing
Design manager
Lets developers define source-to-target mappings, transformations, process flows, and jobs
Meta data management
Provides a repository to define, document, and manage information about the ETL design and runtime
processes
Extract
The process of reading data from a database.
Transform
The process of converting the extracted data
Load
The process of writing the data into the target database.
Transport services
ETL tools use network and file protocols to move data between
source and target systems and in-memory protocols to move data
between ETL run-time components.
Administration and operation
ETL utilities let administrators schedule, run, monitor ETL jobs, log
all events, manage errors, recover from failures, reconcile outputs
with source systems
26 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
ETL Tools
27 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Metadata Management
Metadata is Information...
That describes the WHAT, WHEN, WHO, WHERE, HOW of the data warehouse
About the data being captured and loaded into the Warehouse
Documented in IT tools that improves both business and technical understanding of data
and data-related processes
29 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Importance Of Metadata
Locating Information
Time spent in looking for information.
How often information is found?
What poor decisions were made based on the incomplete information?
How much money was lost or earned as a result?
Interpreting information
How many times have businesses needed to rework or recall products?
What impact does it have on the bottom line ?
How many mistakes were due to misinterpretation of existing documentation?
How much interpretation results form too much metadata?
How much time is spent trying to determine if any of the metadata is accurate?
Integrating information
How various data perspectives connect together?
How much time is spent trying to figure out that?
How much does the inefficiency and lack of metadata affect decision making
30 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Requirements for DW Metadata Management
Enable DW users to identify and invoke pre-built queries against the data stores
Design and enhance new data models and schemas for the data warehouse
31 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Consumers of Metadata
Technical Users
• Warehouse administrator
• Application developer
Business Users -Business metadata
• Meanings
• Definitions
• Business Rules
Software Tools
• Used in DW life-cycle development
• Metadata requirements for each tool must be identified
• The tool-specific metadata should be analysed for inclusion in the enterprise
metadata repository
• Previously captured metadata should be electronically transferred from the
enterprise metadata repository to each individual tool
32 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Trends in the Metadata Management Tools
33 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Trends in the Metadata Management Tools
Metadata Repositories
IBM, Oracle and Microsoft to offer free or near-free basic
repository services
Enable organisations to reuse metadata across technologies
Integrate DB design, data transformation and BI tools from
different vendors
Multi-tool vendors taking a bridged or federated rather than
integrated approach to sharing metadata
Both IBM and Oracle have multiple repositories for different lines
of products — e.g., One for AD and one for DW, with bridges
between them
34 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Trends in the Metadata Management Tools
35 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
OLAP
36 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Agenda
OLAP Definition
Distinction between OLTP and OLAP
MDDB Concepts
Implementation Techniques
Architectures
Features
Representative Tools
12/08/21 37
37 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
OLAP: On-Line Analytical Processing
12/08/21 38
38 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Distinction between OLTP and OLAP
39 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
MDDB Concepts
40 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
RDBMS v/s MDDB: Increased Complexity...
Relational DBMS MDDB
MODEL COLOR DEALER VOL.
MINI VAN BLUE Clyde 6
MINI VAN BLUE Gleason 3
MINI VAN BLUE Carr 2
MINI VAN RED Clyde 5 Sales Volumes
MINI VAN RED Gleason 3
MINI VAN RED Carr 1
MINI VAN WHITE Clyde 3
MINI VAN WHITE Gleason 1
M Mini Van
MINI VAN WHITE Carr 4 O
SPORTS COUPE BLUE Clyde 3 D Coupe
41 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Benefits of MDDB over RDBMS
Ease of Data Presentation & Navigation
– A great deal of information is gleaned immediately upon direct inspection of
the array
– User is able to view data along presorted dimensions with data arranged in an
inherently more organized, and accessible fashion than the one offered by the
relational table.
Storage Space
– Very low Space Consumption compared to Relational DB
Performance
– Gives much better performance.
– Relational DB may give comparable results only through database tuning
(indexing, keys etc), which may not be possible for ad-hoc queries.
Ease of Maintenance
– No overhead as data is stored in the same way it is viewed. In Relational DB,
indexes, sophisticated joins etc. are used which require considerable storage
and maintenance
12/08/21 42
42 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Issues with MDDB
• Sparsity
- Input data in applications are typically sparse
-Increases with increased dimensions
• Data Explosion
-Due to Sparsity
-Due to Summarization
• Performance
-Doesn’t perform better than RDBMS at high data
volumes (>20-30 GB)
12/08/21 43
43 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Issues with MDDB - Sparsity Example
WELD M 14 6 5 314
Miini Van A
S
Weld 31
O T
KELLY D Coupe54 3 5 275 Kelly 27
E N
LINK L 03 56 A
Sedan 4 3 2 M Link 56
KRANZ 41 45 E
Blue Red White Kranz 45
LUCUS 33 COLOR41
WEISS 23 19 Lucas 41
Weiss 19
31 41 23 01 14 54 03 12 33
EMPLOYEE #
12/08/21 44
44 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
OLAP Features
12/08/21 45
45 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Features of OLAP - Rotation
M
Mini Van
6 5 4 C Blue 6 3 4
O O
D Coupe
3 5 5 L Red 5 5 3
E O
L R
Sedan 4 3 2 o
White 4 5 2
( ROTATE 90 ) Mini Van Coupe Sedan
Blue Red White
COLOR MODEL
View #1 View #2
Sales Volumes
M Mini Van
C Blue C Blue
O O O
D Coupe L Red
L Red
E O O
L Sedan
Carr
Gleason
R White
Carr
Gleason
R White
Mini Van
Coupe
Clyde Clyde Sedan
Blue Red White Sedan Coupe Mini Van Carr Gleason Clyde
COLOR o
MODEL o
DEALERSHIP o
( ROTATE 90 ) ( ROTATE 90 ) ( ROTATE 90 )
D D
E E
A A
L Carr L Carr Mini Van
E E M
R Gleason
R Gleason O Coupe
S S D
H Mini Van H Blue E Sedan
Blue
I Clyde Coupe I Clyde Red L Red
White
White
P Sedan P Mini Van Coupe Sedan
White Red Blue Clyde Gleason Carr
COLOR o
MODEL o
DEALERSHIP
( ROTATE 90 ) ( ROTATE 90 )
MDDB allows end user to quickly slice in on exact view of the data required.
Sales Volumes
Mini Van
M Mini Van
O
D Coupe Carr
E Coupe
Clyde
L Normal Metal
Blue Blue
Carr
Clyde
Normal Metal
DEALERSHIP
Blue Blue
COLOR
12/08/21 48
48 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Features of OLAP - Drill Down / Up
ORGANIZATION DIMENSION
REGION Midwest
12/08/21 49
49 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
OLAP Reporting - Drill Down
12/08/21 50
50 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
OLAP Reporting - Drill Down
12/08/21 51
51 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
OLAP Reporting - Drill Down
52 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Implementation Techniques -OLAP Architectures
12/08/21 53
53 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
MOLAP - MDDB storage
Web
OLAP Browser
Cube
OLAP
Calculation
Engine OLAP
Tools
OLAP
Applications
12/08/21 54
54 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
MOLAP - Features
12/08/21 55
55 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
ROLAP - Standard SQL storage
OLAP
Calculation
SQL Engine OLAP
Tools
OLAP
Applications
12/08/21 56
56 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
ROLAP - Features
12/08/21 57
57 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
HOLAP - Combination of RDBMS and MDDB
OLAP Cube
Any Client
Relational DW Web
Browser
OLAP
Calculation
SQL Engine OLAP
Tools
OLAP
Applications
12/08/21 58
58 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
HOLAP - Features
12/08/21 59
59 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Architecture Comparison
12/08/21 60
60 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Representative OLAP Tools:
12/08/21 61
61 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Sample OLAP Applications
Sales Analysis
Financial Analysis
Profitability Analysis
Performance Analysis
Risk Management
Profiling & Segmentation
Scorecard Application
NPA Management
Strategic Planning
Customer Relationship Management (CRM)
12/08/21 62
62 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Data Warehouse Testing
63 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Data Warehouse Testing Overview
There is an exponentially increasing cost associated with finding
software defects later in the development lifecycle. In data
warehousing, this is compounded because of the additional business
costs of using incorrect data to make critical business decisions
64 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Difference In Testing Data warehouse and
Transaction System
65 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Difference In Testing Data warehouse and
Transaction System….
User-Triggered vs. System triggered
66 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Difference In Testing Data warehouse and
Transaction System…
Volume of Test Data
The test data in a transaction system is a very small sample of the
overall production data. Data Warehouse has typically large test
data as one does try to fill-up maximum possible combination of
dimensions and facts.
Possible scenarios/ Test Cases
In case of Data Warehouse, the permutations and combinations one
can possibly test is virtually unlimited due to the core objective of
Data Warehouse is to allow all possible views of data.
67 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Difference In Testing Data warehouse and
Transaction System…
68 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Data Warehouse Testing Process
69 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Requirements testing
70 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Unit Testing
Unit testing for data warehouses is WHITEBOX. It should check the ETL
procedures/mappings/jobs and the reports developed.
Unit testing the ETL procedures:
•Whether ETLs are accessing and picking up right data from right source.
•All the data transformations are correct according to the business rules and data
warehouse is correctly populated with the transformed data.
71 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Unit Testing…
72 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Integration Testing
Integration testing will involve following:
Sequence of ETLs jobs in batch.
Initial loading of records on data warehouse.
Incremental loading of records at a later date to verify the newly
inserted or updated data.
Testing the rejected records that don’t fulfil transformation rules.
Error log generation
73 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Performance Testing
74 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Acceptance testing
75 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Questions
76 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Thank You
77 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Data Warehouse Concepts
Avinash Kanumuru
Diya Jana
Debyajit Majumder
79 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Content [contd…]
6 Metadata Management
7 OLAP
80 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
An Overview
Understanding What is a Data Warehouse
82 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Data Warehouse def. by WH Inmon
83 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Data Warehouse Architecture
What makes a Data Warehouse
Avinash Kanumuru
Diya Jana
Debyajit Majumder
86 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Content [contd…]
6 Metadata Management
7 OLAP
87 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
An Overview
Understanding What is a Data Warehouse
89 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Data Warehouse def. by WH Inmon
90 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Data Warehouse Architecture
What makes a Data Warehouse
Avinash Kanumuru
Diya Jana
Debyajit Majumder
93 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Content [contd…]
6 Metadata Management
7 OLAP
94 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
An Overview
Understanding What is a Data Warehouse
96 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Data Warehouse def. by WH Inmon
97 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Data Warehouse Architecture
What makes a Data Warehouse
Source Tables: These are real-time, volatile data in relational databases for
transaction processing (OLTP). These can be any relational databases or flat files.
ETL Tools: To extract, cleansing, transform (aggregates, joins) and load the data from
sources to target.
Maintenance and Administration Tools: To authorize and monitor access to the data,
set-up users. Scheduling jobs to run on offshore periods.
Modeling Tools: Used for data warehouse design for high-performance using
dimensional data modeling technique, mapping the source and target files.
Databases: Target databases and data marts, which are part of data warehouse.
These are structured for analysis and reporting purposes.
End-user tools for analysis and reporting: get the reports and analyze the data from
target tables. Different types of Querying, Data Mining, OLAP tools are used for this
purpose.
99 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Data Warehouse Architecture
100 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Data Modeling
Effective way of using a Data Warehouse
102 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Terms used in Dimensional Data Model
Fact Table
sale oderId date custId prodId storeId qty amt
o100 1/7/97 53 p1 c1 1 12
o102 2/7/97 53 p2 c1 2 11
105 3/8/97 111 p1 c3 5 50
Dimension Table
customer custId name address city
53 joe 10 main sfo
81 fred 12 main sfo
111 sally 80 willow la
104 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Snowflake Schema
Dimension Table
sType tId size location
Fact Table t1 small downtown
store storeId cityId tId mgr t2 large suburbs
s5 sfo t1 joe Dimension Table
s7 sfo t2 fred city cityId pop regId
s9 la t1 nancy sfo 1M north
la 5M south
The star and snowflake schema are most commonly region regId name
found in dimensional data warehouses and data north cold region
marts where speed of data retrieval is more south warm region
important than the efficiency of data manipulations.
As such, the tables in these schema are not
normalized much, and are frequently designed at a
level of normalization short of third normal form.
105 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Overview of Data Cleansing
107 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Six Steps To Data Quality
Understand
Understand Information
Information Flow
Flow Identify authoritative data sources
In Organization
In Organization
Interview Employees & Customers
Clean & Load Use data cleansing tools to clean data at the source
Data Load only clean data into the data warehouse
108 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Data Quality Solution
Customized Programs
Strengths:
– Addresses specific needs
– No bulky one time investment
Limitations
– Tons of Custom programs in different environments are difficult to
manage
– Minor alterations demand coding efforts
Data Quality Assessment tools
Strength
– Provide automated assessment
Limitation
– No measure of data accuracy
109 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Data Quality Solution
110 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Tools In The Market
Business Rule Discovery Tools
– Integrity Data Reengineering Tool from Vality Technology
– Trillium Software System from Harte -Hanks Data Technologies
– Migration Architect from DB Star
Data Reengineering & Cleansing Tools
– Carlton Pureview from Oracle
– ETI-Extract from Evolutionary Technologies
– PowerMart from Informatica Corp
– Sagent Data Mart from Sagent Technology
Data Quality Assessment Tools
– Migration Architect, Evoke Axio from Evoke Software
– Wizrule from Wizsoft
Name & Address Cleansing Tools
– Centrus Suite from Sagent
– I.d.centric from First Logic
111 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Data Extraction, Transformation, Load
Visitors
Web
Browsers
External Data –
Demographics,
Household,
The Webographics,
Internet Income
Staging Area
Meta Data
Repository
Web Server Logs Flat Files
& •Clean
E-comm •Transform Enterprise
Transaction Data Scheduled •Match Scheduled Data
RDBMS •Merge
Extraction Loading Warehouse
Other OLTP
Systems
Data Collection Data Extraction Data Transformation Data Loading Data Storage &
Integration
113 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
ETL Architecture
114 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Why ETL ?
Companies have valuable data lying around throughout their networks that
needs to be moved from one place to another.
The data lies in all sorts of heterogeneous systems,and therefore in all sorts
of formats.
To solve the problem, companies use extract, transform and load (ETL)
software.
The data used in ETL processes can come from any source:
a mainframe application, an ERP application, a CRM tool, a flat file, and
an Excel spreadsheet.
115 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Major components involved in ETL Processing
116 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Major components involved in ETL Processing
Design manager
Lets developers define source-to-target mappings, transformations, process flows, and jobs
Meta data management
Provides a repository to define, document, and manage information about the ETL design and runtime
processes
Extract
The process of reading data from a database.
Transform
The process of converting the extracted data
Load
The process of writing the data into the target database.
Transport services
ETL tools use network and file protocols to move data between
source and target systems and in-memory protocols to move data
between ETL run-time components.
Administration and operation
ETL utilities let administrators schedule, run, monitor ETL jobs, log
all events, manage errors, recover from failures, reconcile outputs
with source systems
117 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
ETL Tools
118 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Metadata Management
Metadata is Information...
That describes the WHAT, WHEN, WHO, WHERE, HOW of the data warehouse
About the data being captured and loaded into the Warehouse
Documented in IT tools that improves both business and technical understanding of data
and data-related processes
120 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Importance Of Metadata
Locating Information
Time spent in looking for information.
How often information is found?
What poor decisions were made based on the incomplete information?
How much money was lost or earned as a result?
Interpreting information
How many times have businesses needed to rework or recall products?
What impact does it have on the bottom line ?
How many mistakes were due to misinterpretation of existing documentation?
How much interpretation results form too much metadata?
How much time is spent trying to determine if any of the metadata is accurate?
Integrating information
How various data perspectives connect together?
How much time is spent trying to figure out that?
How much does the inefficiency and lack of metadata affect decision making
121 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Requirements for DW Metadata Management
Enable DW users to identify and invoke pre-built queries against the data stores
Design and enhance new data models and schemas for the data warehouse
122 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Consumers of Metadata
Technical Users
• Warehouse administrator
• Application developer
Business Users -Business metadata
• Meanings
• Definitions
• Business Rules
Software Tools
• Used in DW life-cycle development
• Metadata requirements for each tool must be identified
• The tool-specific metadata should be analysed for inclusion in the enterprise
metadata repository
• Previously captured metadata should be electronically transferred from the
enterprise metadata repository to each individual tool
123 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Trends in the Metadata Management Tools
124 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Trends in the Metadata Management Tools
Metadata Repositories
IBM, Oracle and Microsoft to offer free or near-free basic
repository services
Enable organisations to reuse metadata across technologies
Integrate DB design, data transformation and BI tools from
different vendors
Multi-tool vendors taking a bridged or federated rather than
integrated approach to sharing metadata
Both IBM and Oracle have multiple repositories for different lines
of products — e.g., One for AD and one for DW, with bridges
between them
125 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Trends in the Metadata Management Tools
126 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
OLAP
127 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Agenda
OLAP Definition
Distinction between OLTP and OLAP
MDDB Concepts
Implementation Techniques
Architectures
Features
Representative Tools
12/08/21 128
128 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
OLAP: On-Line Analytical Processing
12/08/21 129
129 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Distinction between OLTP and OLAP
users
12/08/21
130 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
MDDB Concepts
131 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
RDBMS v/s MDDB: Increased Complexity...
Relational DBMS MDDB
MODEL COLOR DEALER VOL.
MINI VAN BLUE Clyde 6
MINI VAN BLUE Gleason 3
MINI VAN BLUE Carr 2
MINI VAN RED Clyde 5 Sales Volumes
MINI VAN RED Gleason 3
MINI VAN RED Carr 1
MINI VAN WHITE Clyde 3
MINI VAN WHITE Gleason 1
M Mini Van
MINI VAN WHITE Carr 4 O
SPORTS COUPE BLUE Clyde 3 D Coupe
132 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Benefits of MDDB over RDBMS
Ease of Data Presentation & Navigation
– A great deal of information is gleaned immediately upon direct inspection of
the array
– User is able to view data along presorted dimensions with data arranged in an
inherently more organized, and accessible fashion than the one offered by the
relational table.
Storage Space
– Very low Space Consumption compared to Relational DB
Performance
– Gives much better performance.
– Relational DB may give comparable results only through database tuning
(indexing, keys etc), which may not be possible for ad-hoc queries.
Ease of Maintenance
– No overhead as data is stored in the same way it is viewed. In Relational DB,
indexes, sophisticated joins etc. are used which require considerable storage
and maintenance
12/08/21 133
133 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Issues with MDDB
• Sparsity
- Input data in applications are typically sparse
-Increases with increased dimensions
• Data Explosion
-Due to Sparsity
-Due to Summarization
• Performance
-Doesn’t perform better than RDBMS at high data
volumes (>20-30 GB)
12/08/21 134
134 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Issues with MDDB - Sparsity Example
WELD M 14 6 5 314
Miini Van A
S
Weld 31
O T
KELLY D Coupe54 3 5 275 Kelly 27
E N
LINK L 03 56 A
Sedan 4 3 2 M Link 56
KRANZ 41 45 E
Blue Red White Kranz 45
LUCUS 33 COLOR41
WEISS 23 19 Lucas 41
Weiss 19
31 41 23 01 14 54 03 12 33
EMPLOYEE #
12/08/21 135
135 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
OLAP Features
12/08/21 136
136 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Features of OLAP - Rotation
M
Mini Van
6 5 4 C Blue 6 3 4
O O
D Coupe
3 5 5 L Red 5 5 3
E O
L R
Sedan 4 3 2 o
White 4 5 2
( ROTATE 90 ) Mini Van Coupe Sedan
Blue Red White
COLOR MODEL
View #1 View #2
Sales Volumes
M Mini Van
C Blue C Blue
O O O
D Coupe L Red
L Red
E O O
L Sedan
Carr
Gleason
R White
Carr
Gleason
R White
Mini Van
Coupe
Clyde Clyde Sedan
Blue Red White Sedan Coupe Mini Van Carr Gleason Clyde
COLOR o
MODEL o
DEALERSHIP o
( ROTATE 90 ) ( ROTATE 90 ) ( ROTATE 90 )
D D
E E
A A
L Carr L Carr Mini Van
E E M
R Gleason
R Gleason O Coupe
S S D
H Mini Van H Blue E Sedan
Blue
I Clyde Coupe I Clyde Red L Red
White
White
P Sedan P Mini Van Coupe Sedan
White Red Blue Clyde Gleason Carr
COLOR o
MODEL o
DEALERSHIP
( ROTATE 90 ) ( ROTATE 90 )
MDDB allows end user to quickly slice in on exact view of the data required.
Sales Volumes
Mini Van
M Mini Van
O
D Coupe Carr
E Coupe
Clyde
L Normal Metal
Blue Blue
Carr
Clyde
Normal Metal
DEALERSHIP
Blue Blue
COLOR
12/08/21 139
139 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Features of OLAP - Drill Down / Up
ORGANIZATION DIMENSION
REGION Midwest
12/08/21 140
140 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
OLAP Reporting - Drill Down
12/08/21 141
141 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
OLAP Reporting - Drill Down
12/08/21 142
142 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
OLAP Reporting - Drill Down
143 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Implementation Techniques -OLAP Architectures
12/08/21 144
144 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
MOLAP - MDDB storage
Web
OLAP Browser
Cube
OLAP
Calculation
Engine OLAP
Tools
OLAP
Applications
12/08/21 145
145 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
MOLAP - Features
12/08/21 146
146 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
ROLAP - Standard SQL storage
OLAP
Calculation
SQL Engine OLAP
Tools
OLAP
Applications
12/08/21 147
147 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
ROLAP - Features
12/08/21 148
148 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
HOLAP - Combination of RDBMS and MDDB
OLAP Cube
Any Client
Relational DW Web
Browser
OLAP
Calculation
SQL Engine OLAP
Tools
OLAP
Applications
12/08/21 149
149 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
HOLAP - Features
12/08/21 150
150 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Architecture Comparison
12/08/21 151
151 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Representative OLAP Tools:
12/08/21 152
152 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Sample OLAP Applications
Sales Analysis
Financial Analysis
Profitability Analysis
Performance Analysis
Risk Management
Profiling & Segmentation
Scorecard Application
NPA Management
Strategic Planning
Customer Relationship Management (CRM)
12/08/21 153
153 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Data Warehouse Testing
154 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Data Warehouse Testing Overview
There is an exponentially increasing cost associated with finding
software defects later in the development lifecycle. In data
warehousing, this is compounded because of the additional business
costs of using incorrect data to make critical business decisions
155 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Difference In Testing Data warehouse and
Transaction System
156 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Difference In Testing Data warehouse and
Transaction System….
User-Triggered vs. System triggered
157 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Difference In Testing Data warehouse and
Transaction System…
Volume of Test Data
The test data in a transaction system is a very small sample of the
overall production data. Data Warehouse has typically large test
data as one does try to fill-up maximum possible combination of
dimensions and facts.
Possible scenarios/ Test Cases
In case of Data Warehouse, the permutations and combinations one
can possibly test is virtually unlimited due to the core objective of
Data Warehouse is to allow all possible views of data.
158 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Difference In Testing Data warehouse and
Transaction System…
159 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Data Warehouse Testing Process
160 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Requirements testing
161 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Unit Testing
Unit testing for data warehouses is WHITEBOX. It should check the ETL
procedures/mappings/jobs and the reports developed.
Unit testing the ETL procedures:
•Whether ETLs are accessing and picking up right data from right source.
•All the data transformations are correct according to the business rules and data
warehouse is correctly populated with the transformed data.
162 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Unit Testing…
163 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Integration Testing
Integration testing will involve following:
Sequence of ETLs jobs in batch.
Initial loading of records on data warehouse.
Incremental loading of records at a later date to verify the newly
inserted or updated data.
Testing the rejected records that don’t fulfil transformation rules.
Error log generation
164 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Performance Testing
165 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Acceptance testing
166 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Questions
167 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Thank You
168 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Components of Warehouse
Source Tables: These are real-time, volatile data in relational databases for
transaction processing (OLTP). These can be any relational databases or flat files.
ETL Tools: To extract, cleansing, transform (aggregates, joins) and load the data from
sources to target.
Maintenance and Administration Tools: To authorize and monitor access to the data,
set-up users. Scheduling jobs to run on offshore periods.
Modeling Tools: Used for data warehouse design for high-performance using
dimensional data modeling technique, mapping the source and target files.
Databases: Target databases and data marts, which are part of data warehouse.
These are structured for analysis and reporting purposes.
End-user tools for analysis and reporting: get the reports and analyze the data from
target tables. Different types of Querying, Data Mining, OLAP tools are used for this
purpose.
169 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Data Warehouse Architecture
170 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Data Modeling
Effective way of using a Data Warehouse
172 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Terms used in Dimensional Data Model
Fact Table
sale oderId date custId prodId storeId qty amt
o100 1/7/97 53 p1 c1 1 12
o102 2/7/97 53 p2 c1 2 11
105 3/8/97 111 p1 c3 5 50
Dimension Table
customer custId name address city
53 joe 10 main sfo
81 fred 12 main sfo
111 sally 80 willow la
174 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Snowflake Schema
Dimension Table
sType tId size location
Fact Table t1 small downtown
store storeId cityId tId mgr t2 large suburbs
s5 sfo t1 joe Dimension Table
s7 sfo t2 fred city cityId pop regId
s9 la t1 nancy sfo 1M north
la 5M south
The star and snowflake schema are most commonly region regId name
found in dimensional data warehouses and data north cold region
marts where speed of data retrieval is more south warm region
important than the efficiency of data manipulations.
As such, the tables in these schema are not
normalized much, and are frequently designed at a
level of normalization short of third normal form.
175 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Overview of Data Cleansing
177 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Six Steps To Data Quality
Understand
Understand Information
Information Flow
Flow Identify authoritative data sources
In Organization
In Organization
Interview Employees & Customers
Clean & Load Use data cleansing tools to clean data at the source
Data Load only clean data into the data warehouse
178 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Data Quality Solution
Customized Programs
Strengths:
– Addresses specific needs
– No bulky one time investment
Limitations
– Tons of Custom programs in different environments are difficult to
manage
– Minor alterations demand coding efforts
Data Quality Assessment tools
Strength
– Provide automated assessment
Limitation
– No measure of data accuracy
179 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Data Quality Solution
180 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Tools In The Market
Business Rule Discovery Tools
– Integrity Data Reengineering Tool from Vality Technology
– Trillium Software System from Harte -Hanks Data Technologies
– Migration Architect from DB Star
Data Reengineering & Cleansing Tools
– Carlton Pureview from Oracle
– ETI-Extract from Evolutionary Technologies
– PowerMart from Informatica Corp
– Sagent Data Mart from Sagent Technology
Data Quality Assessment Tools
– Migration Architect, Evoke Axio from Evoke Software
– Wizrule from Wizsoft
Name & Address Cleansing Tools
– Centrus Suite from Sagent
– I.d.centric from First Logic
181 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Data Extraction, Transformation, Load
Visitors
Web
Browsers
External Data –
Demographics,
Household,
The Webographics,
Internet Income
Staging Area
Meta Data
Repository
Web Server Logs Flat Files
& •Clean
E-comm •Transform Enterprise
Transaction Data Scheduled •Match Scheduled Data
RDBMS •Merge
Extraction Loading Warehouse
Other OLTP
Systems
Data Collection Data Extraction Data Transformation Data Loading Data Storage &
Integration
183 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
ETL Architecture
184 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Why ETL ?
Companies have valuable data lying around throughout their networks that
needs to be moved from one place to another.
The data lies in all sorts of heterogeneous systems,and therefore in all sorts
of formats.
To solve the problem, companies use extract, transform and load (ETL)
software.
The data used in ETL processes can come from any source:
a mainframe application, an ERP application, a CRM tool, a flat file, and
an Excel spreadsheet.
185 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Major components involved in ETL Processing
186 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Major components involved in ETL Processing
Design manager
Lets developers define source-to-target mappings, transformations, process flows, and jobs
Meta data management
Provides a repository to define, document, and manage information about the ETL design and runtime
processes
Extract
The process of reading data from a database.
Transform
The process of converting the extracted data
Load
The process of writing the data into the target database.
Transport services
ETL tools use network and file protocols to move data between
source and target systems and in-memory protocols to move data
between ETL run-time components.
Administration and operation
ETL utilities let administrators schedule, run, monitor ETL jobs, log
all events, manage errors, recover from failures, reconcile outputs
with source systems
187 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
ETL Tools
188 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Metadata Management
Metadata is Information...
That describes the WHAT, WHEN, WHO, WHERE, HOW of the data warehouse
About the data being captured and loaded into the Warehouse
Documented in IT tools that improves both business and technical understanding of data
and data-related processes
190 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Importance Of Metadata
Locating Information
Time spent in looking for information.
How often information is found?
What poor decisions were made based on the incomplete information?
How much money was lost or earned as a result?
Interpreting information
How many times have businesses needed to rework or recall products?
What impact does it have on the bottom line ?
How many mistakes were due to misinterpretation of existing documentation?
How much interpretation results form too much metadata?
How much time is spent trying to determine if any of the metadata is accurate?
Integrating information
How various data perspectives connect together?
How much time is spent trying to figure out that?
How much does the inefficiency and lack of metadata affect decision making
191 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Requirements for DW Metadata Management
Enable DW users to identify and invoke pre-built queries against the data stores
Design and enhance new data models and schemas for the data warehouse
192 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Consumers of Metadata
Technical Users
• Warehouse administrator
• Application developer
Business Users -Business metadata
• Meanings
• Definitions
• Business Rules
Software Tools
• Used in DW life-cycle development
• Metadata requirements for each tool must be identified
• The tool-specific metadata should be analysed for inclusion in the enterprise
metadata repository
• Previously captured metadata should be electronically transferred from the
enterprise metadata repository to each individual tool
193 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Trends in the Metadata Management Tools
194 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Trends in the Metadata Management Tools
Metadata Repositories
IBM, Oracle and Microsoft to offer free or near-free basic
repository services
Enable organisations to reuse metadata across technologies
Integrate DB design, data transformation and BI tools from
different vendors
Multi-tool vendors taking a bridged or federated rather than
integrated approach to sharing metadata
Both IBM and Oracle have multiple repositories for different lines
of products — e.g., One for AD and one for DW, with bridges
between them
195 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Trends in the Metadata Management Tools
196 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
OLAP
197 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Agenda
OLAP Definition
Distinction between OLTP and OLAP
MDDB Concepts
Implementation Techniques
Architectures
Features
Representative Tools
12/08/21 198
198 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
OLAP: On-Line Analytical Processing
12/08/21 199
199 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Distinction between OLTP and OLAP
users
12/08/21
200 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
MDDB Concepts
201 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
RDBMS v/s MDDB: Increased Complexity...
Relational DBMS MDDB
MODEL COLOR DEALER VOL.
MINI VAN BLUE Clyde 6
MINI VAN BLUE Gleason 3
MINI VAN BLUE Carr 2
MINI VAN RED Clyde 5 Sales Volumes
MINI VAN RED Gleason 3
MINI VAN RED Carr 1
MINI VAN WHITE Clyde 3
MINI VAN WHITE Gleason 1
M Mini Van
MINI VAN WHITE Carr 4 O
SPORTS COUPE BLUE Clyde 3 D Coupe
202 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Benefits of MDDB over RDBMS
Ease of Data Presentation & Navigation
– A great deal of information is gleaned immediately upon direct inspection of
the array
– User is able to view data along presorted dimensions with data arranged in an
inherently more organized, and accessible fashion than the one offered by the
relational table.
Storage Space
– Very low Space Consumption compared to Relational DB
Performance
– Gives much better performance.
– Relational DB may give comparable results only through database tuning
(indexing, keys etc), which may not be possible for ad-hoc queries.
Ease of Maintenance
– No overhead as data is stored in the same way it is viewed. In Relational DB,
indexes, sophisticated joins etc. are used which require considerable storage
and maintenance
12/08/21 203
203 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Issues with MDDB
• Sparsity
- Input data in applications are typically sparse
-Increases with increased dimensions
• Data Explosion
-Due to Sparsity
-Due to Summarization
• Performance
-Doesn’t perform better than RDBMS at high data
volumes (>20-30 GB)
12/08/21 204
204 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Issues with MDDB - Sparsity Example
WELD M 14 6 5 314
Miini Van A
S
Weld 31
O T
KELLY D Coupe54 3 5 275 Kelly 27
E N
LINK L 03 56 A
Sedan 4 3 2 M Link 56
KRANZ 41 45 E
Blue Red White Kranz 45
LUCUS 33 COLOR41
WEISS 23 19 Lucas 41
Weiss 19
31 41 23 01 14 54 03 12 33
EMPLOYEE #
12/08/21 205
205 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
OLAP Features
12/08/21 206
206 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Features of OLAP - Rotation
M
Mini Van
6 5 4 C Blue 6 3 4
O O
D Coupe
3 5 5 L Red 5 5 3
E O
L R
Sedan 4 3 2 o
White 4 5 2
( ROTATE 90 ) Mini Van Coupe Sedan
Blue Red White
COLOR MODEL
View #1 View #2
Sales Volumes
M Mini Van
C Blue C Blue
O O O
D Coupe L Red
L Red
E O O
L Sedan
Carr
Gleason
R White
Carr
Gleason
R White
Mini Van
Coupe
Clyde Clyde Sedan
Blue Red White Sedan Coupe Mini Van Carr Gleason Clyde
COLOR o
MODEL o
DEALERSHIP o
( ROTATE 90 ) ( ROTATE 90 ) ( ROTATE 90 )
D D
E E
A A
L Carr L Carr Mini Van
E E M
R Gleason
R Gleason O Coupe
S S D
H Mini Van H Blue E Sedan
Blue
I Clyde Coupe I Clyde Red L Red
White
White
P Sedan P Mini Van Coupe Sedan
White Red Blue Clyde Gleason Carr
COLOR o
MODEL o
DEALERSHIP
( ROTATE 90 ) ( ROTATE 90 )
MDDB allows end user to quickly slice in on exact view of the data required.
Sales Volumes
Mini Van
M Mini Van
O
D Coupe Carr
E Coupe
Clyde
L Normal Metal
Blue Blue
Carr
Clyde
Normal Metal
DEALERSHIP
Blue Blue
COLOR
12/08/21 209
209 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Features of OLAP - Drill Down / Up
ORGANIZATION DIMENSION
REGION Midwest
12/08/21 210
210 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
OLAP Reporting - Drill Down
12/08/21 211
211 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
OLAP Reporting - Drill Down
12/08/21 212
212 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
OLAP Reporting - Drill Down
213 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Implementation Techniques -OLAP Architectures
12/08/21 214
214 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
MOLAP - MDDB storage
Web
OLAP Browser
Cube
OLAP
Calculation
Engine OLAP
Tools
OLAP
Applications
12/08/21 215
215 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
MOLAP - Features
12/08/21 216
216 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
ROLAP - Standard SQL storage
OLAP
Calculation
SQL Engine OLAP
Tools
OLAP
Applications
12/08/21 217
217 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
ROLAP - Features
12/08/21 218
218 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
HOLAP - Combination of RDBMS and MDDB
OLAP Cube
Any Client
Relational DW Web
Browser
OLAP
Calculation
SQL Engine OLAP
Tools
OLAP
Applications
12/08/21 219
219 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
HOLAP - Features
12/08/21 220
220 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Architecture Comparison
12/08/21 221
221 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Representative OLAP Tools:
12/08/21 222
222 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Sample OLAP Applications
Sales Analysis
Financial Analysis
Profitability Analysis
Performance Analysis
Risk Management
Profiling & Segmentation
Scorecard Application
NPA Management
Strategic Planning
Customer Relationship Management (CRM)
12/08/21 223
223 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Data Warehouse Testing
224 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Data Warehouse Testing Overview
There is an exponentially increasing cost associated with finding
software defects later in the development lifecycle. In data
warehousing, this is compounded because of the additional business
costs of using incorrect data to make critical business decisions
225 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Difference In Testing Data warehouse and
Transaction System
226 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Difference In Testing Data warehouse and
Transaction System….
User-Triggered vs. System triggered
227 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Difference In Testing Data warehouse and
Transaction System…
Volume of Test Data
The test data in a transaction system is a very small sample of the
overall production data. Data Warehouse has typically large test
data as one does try to fill-up maximum possible combination of
dimensions and facts.
Possible scenarios/ Test Cases
In case of Data Warehouse, the permutations and combinations one
can possibly test is virtually unlimited due to the core objective of
Data Warehouse is to allow all possible views of data.
228 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Difference In Testing Data warehouse and
Transaction System…
229 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Data Warehouse Testing Process
230 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Requirements testing
231 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Unit Testing
Unit testing for data warehouses is WHITEBOX. It should check the ETL
procedures/mappings/jobs and the reports developed.
Unit testing the ETL procedures:
•Whether ETLs are accessing and picking up right data from right source.
•All the data transformations are correct according to the business rules and data
warehouse is correctly populated with the transformed data.
232 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Unit Testing…
233 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Integration Testing
Integration testing will involve following:
Sequence of ETLs jobs in batch.
Initial loading of records on data warehouse.
Incremental loading of records at a later date to verify the newly
inserted or updated data.
Testing the rejected records that don’t fulfil transformation rules.
Error log generation
234 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Performance Testing
235 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Acceptance testing
236 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Questions
237 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Thank You
238 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Components of Warehouse
Source Tables: These are real-time, volatile data in relational databases for
transaction processing (OLTP). These can be any relational databases or flat files.
ETL Tools: To extract, cleansing, transform (aggregates, joins) and load the data from
sources to target.
Maintenance and Administration Tools: To authorize and monitor access to the data,
set-up users. Scheduling jobs to run on offshore periods.
Modeling Tools: Used for data warehouse design for high-performance using
dimensional data modeling technique, mapping the source and target files.
Databases: Target databases and data marts, which are part of data warehouse.
These are structured for analysis and reporting purposes.
End-user tools for analysis and reporting: get the reports and analyze the data from
target tables. Different types of Querying, Data Mining, OLAP tools are used for this
purpose.
239 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Data Warehouse Architecture
240 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Data Modeling
Effective way of using a Data Warehouse
242 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Terms used in Dimensional Data Model
Fact Table
sale oderId date custId prodId storeId qty amt
o100 1/7/97 53 p1 c1 1 12
o102 2/7/97 53 p2 c1 2 11
105 3/8/97 111 p1 c3 5 50
Dimension Table
customer custId name address city
53 joe 10 main sfo
81 fred 12 main sfo
111 sally 80 willow la
244 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Snowflake Schema
Dimension Table
sType tId size location
Fact Table t1 small downtown
store storeId cityId tId mgr t2 large suburbs
s5 sfo t1 joe Dimension Table
s7 sfo t2 fred city cityId pop regId
s9 la t1 nancy sfo 1M north
la 5M south
The star and snowflake schema are most commonly region regId name
found in dimensional data warehouses and data north cold region
marts where speed of data retrieval is more south warm region
important than the efficiency of data manipulations.
As such, the tables in these schema are not
normalized much, and are frequently designed at a
level of normalization short of third normal form.
245 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Overview of Data Cleansing
247 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Six Steps To Data Quality
Understand
Understand Information
Information Flow
Flow Identify authoritative data sources
In Organization
In Organization
Interview Employees & Customers
Clean & Load Use data cleansing tools to clean data at the source
Data Load only clean data into the data warehouse
248 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Data Quality Solution
Customized Programs
Strengths:
– Addresses specific needs
– No bulky one time investment
Limitations
– Tons of Custom programs in different environments are difficult to
manage
– Minor alterations demand coding efforts
Data Quality Assessment tools
Strength
– Provide automated assessment
Limitation
– No measure of data accuracy
249 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Data Quality Solution
250 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Tools In The Market
Business Rule Discovery Tools
– Integrity Data Reengineering Tool from Vality Technology
– Trillium Software System from Harte -Hanks Data Technologies
– Migration Architect from DB Star
Data Reengineering & Cleansing Tools
– Carlton Pureview from Oracle
– ETI-Extract from Evolutionary Technologies
– PowerMart from Informatica Corp
– Sagent Data Mart from Sagent Technology
Data Quality Assessment Tools
– Migration Architect, Evoke Axio from Evoke Software
– Wizrule from Wizsoft
Name & Address Cleansing Tools
– Centrus Suite from Sagent
– I.d.centric from First Logic
251 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Data Extraction, Transformation, Load
Visitors
Web
Browsers
External Data –
Demographics,
Household,
The Webographics,
Internet Income
Staging Area
Meta Data
Repository
Web Server Logs Flat Files
& •Clean
E-comm •Transform Enterprise
Transaction Data Scheduled •Match Scheduled Data
RDBMS •Merge
Extraction Loading Warehouse
Other OLTP
Systems
Data Collection Data Extraction Data Transformation Data Loading Data Storage &
Integration
253 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
ETL Architecture
254 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Why ETL ?
Companies have valuable data lying around throughout their networks that
needs to be moved from one place to another.
The data lies in all sorts of heterogeneous systems,and therefore in all sorts
of formats.
To solve the problem, companies use extract, transform and load (ETL)
software.
The data used in ETL processes can come from any source:
a mainframe application, an ERP application, a CRM tool, a flat file, and
an Excel spreadsheet.
255 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Major components involved in ETL Processing
256 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Major components involved in ETL Processing
Design manager
Lets developers define source-to-target mappings, transformations, process flows, and jobs
Meta data management
Provides a repository to define, document, and manage information about the ETL design and runtime
processes
Extract
The process of reading data from a database.
Transform
The process of converting the extracted data
Load
The process of writing the data into the target database.
Transport services
ETL tools use network and file protocols to move data between
source and target systems and in-memory protocols to move data
between ETL run-time components.
Administration and operation
ETL utilities let administrators schedule, run, monitor ETL jobs, log
all events, manage errors, recover from failures, reconcile outputs
with source systems
257 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
ETL Tools
258 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Metadata Management
Metadata is Information...
That describes the WHAT, WHEN, WHO, WHERE, HOW of the data warehouse
About the data being captured and loaded into the Warehouse
Documented in IT tools that improves both business and technical understanding of data
and data-related processes
260 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Importance Of Metadata
Locating Information
Time spent in looking for information.
How often information is found?
What poor decisions were made based on the incomplete information?
How much money was lost or earned as a result?
Interpreting information
How many times have businesses needed to rework or recall products?
What impact does it have on the bottom line ?
How many mistakes were due to misinterpretation of existing documentation?
How much interpretation results form too much metadata?
How much time is spent trying to determine if any of the metadata is accurate?
Integrating information
How various data perspectives connect together?
How much time is spent trying to figure out that?
How much does the inefficiency and lack of metadata affect decision making
261 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Requirements for DW Metadata Management
Enable DW users to identify and invoke pre-built queries against the data stores
Design and enhance new data models and schemas for the data warehouse
262 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Consumers of Metadata
Technical Users
• Warehouse administrator
• Application developer
Business Users -Business metadata
• Meanings
• Definitions
• Business Rules
Software Tools
• Used in DW life-cycle development
• Metadata requirements for each tool must be identified
• The tool-specific metadata should be analysed for inclusion in the enterprise
metadata repository
• Previously captured metadata should be electronically transferred from the
enterprise metadata repository to each individual tool
263 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Trends in the Metadata Management Tools
264 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Trends in the Metadata Management Tools
Metadata Repositories
IBM, Oracle and Microsoft to offer free or near-free basic
repository services
Enable organisations to reuse metadata across technologies
Integrate DB design, data transformation and BI tools from
different vendors
Multi-tool vendors taking a bridged or federated rather than
integrated approach to sharing metadata
Both IBM and Oracle have multiple repositories for different lines
of products — e.g., One for AD and one for DW, with bridges
between them
265 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Trends in the Metadata Management Tools
266 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
OLAP
267 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Agenda
OLAP Definition
Distinction between OLTP and OLAP
MDDB Concepts
Implementation Techniques
Architectures
Features
Representative Tools
12/08/21 268
268 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
OLAP: On-Line Analytical Processing
12/08/21 269
269 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Distinction between OLTP and OLAP
users
12/08/21
270 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
MDDB Concepts
271 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
RDBMS v/s MDDB: Increased Complexity...
Relational DBMS MDDB
MODEL COLOR DEALER VOL.
MINI VAN BLUE Clyde 6
MINI VAN BLUE Gleason 3
MINI VAN BLUE Carr 2
MINI VAN RED Clyde 5 Sales Volumes
MINI VAN RED Gleason 3
MINI VAN RED Carr 1
MINI VAN WHITE Clyde 3
MINI VAN WHITE Gleason 1
M Mini Van
MINI VAN WHITE Carr 4 O
SPORTS COUPE BLUE Clyde 3 D Coupe
272 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Benefits of MDDB over RDBMS
Ease of Data Presentation & Navigation
– A great deal of information is gleaned immediately upon direct inspection of
the array
– User is able to view data along presorted dimensions with data arranged in an
inherently more organized, and accessible fashion than the one offered by the
relational table.
Storage Space
– Very low Space Consumption compared to Relational DB
Performance
– Gives much better performance.
– Relational DB may give comparable results only through database tuning
(indexing, keys etc), which may not be possible for ad-hoc queries.
Ease of Maintenance
– No overhead as data is stored in the same way it is viewed. In Relational DB,
indexes, sophisticated joins etc. are used which require considerable storage
and maintenance
12/08/21 273
273 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Issues with MDDB
• Sparsity
- Input data in applications are typically sparse
-Increases with increased dimensions
• Data Explosion
-Due to Sparsity
-Due to Summarization
• Performance
-Doesn’t perform better than RDBMS at high data
volumes (>20-30 GB)
12/08/21 274
274 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Issues with MDDB - Sparsity Example
WELD M 14 6 5 314
Miini Van A
S
Weld 31
O T
KELLY D Coupe54 3 5 275 Kelly 27
E N
LINK L 03 56 A
Sedan 4 3 2 M Link 56
KRANZ 41 45 E
Blue Red White Kranz 45
LUCUS 33 COLOR41
WEISS 23 19 Lucas 41
Weiss 19
31 41 23 01 14 54 03 12 33
EMPLOYEE #
12/08/21 275
275 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
OLAP Features
12/08/21 276
276 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Features of OLAP - Rotation
M
Mini Van
6 5 4 C Blue 6 3 4
O O
D Coupe
3 5 5 L Red 5 5 3
E O
L R
Sedan 4 3 2 o
White 4 5 2
( ROTATE 90 ) Mini Van Coupe Sedan
Blue Red White
COLOR MODEL
View #1 View #2
Sales Volumes
M Mini Van
C Blue C Blue
O O O
D Coupe L Red
L Red
E O O
L Sedan
Carr
Gleason
R White
Carr
Gleason
R White
Mini Van
Coupe
Clyde Clyde Sedan
Blue Red White Sedan Coupe Mini Van Carr Gleason Clyde
COLOR o
MODEL o
DEALERSHIP o
( ROTATE 90 ) ( ROTATE 90 ) ( ROTATE 90 )
D D
E E
A A
L Carr L Carr Mini Van
E E M
R Gleason
R Gleason O Coupe
S S D
H Mini Van H Blue E Sedan
Blue
I Clyde Coupe I Clyde Red L Red
White
White
P Sedan P Mini Van Coupe Sedan
White Red Blue Clyde Gleason Carr
COLOR o
MODEL o
DEALERSHIP
( ROTATE 90 ) ( ROTATE 90 )
MDDB allows end user to quickly slice in on exact view of the data required.
Sales Volumes
Mini Van
M Mini Van
O
D Coupe Carr
E Coupe
Clyde
L Normal Metal
Blue Blue
Carr
Clyde
Normal Metal
DEALERSHIP
Blue Blue
COLOR
12/08/21 279
279 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Features of OLAP - Drill Down / Up
ORGANIZATION DIMENSION
REGION Midwest
12/08/21 280
280 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
OLAP Reporting - Drill Down
12/08/21 281
281 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
OLAP Reporting - Drill Down
12/08/21 282
282 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
OLAP Reporting - Drill Down
283 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Implementation Techniques -OLAP Architectures
12/08/21 284
284 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
MOLAP - MDDB storage
Web
OLAP Browser
Cube
OLAP
Calculation
Engine OLAP
Tools
OLAP
Applications
12/08/21 285
285 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
MOLAP - Features
12/08/21 286
286 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
ROLAP - Standard SQL storage
OLAP
Calculation
SQL Engine OLAP
Tools
OLAP
Applications
12/08/21 287
287 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
ROLAP - Features
12/08/21 288
288 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
HOLAP - Combination of RDBMS and MDDB
OLAP Cube
Any Client
Relational DW Web
Browser
OLAP
Calculation
SQL Engine OLAP
Tools
OLAP
Applications
12/08/21 289
289 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
HOLAP - Features
12/08/21 290
290 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Architecture Comparison
12/08/21 291
291 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Representative OLAP Tools:
12/08/21 292
292 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Sample OLAP Applications
Sales Analysis
Financial Analysis
Profitability Analysis
Performance Analysis
Risk Management
Profiling & Segmentation
Scorecard Application
NPA Management
Strategic Planning
Customer Relationship Management (CRM)
12/08/21 293
293 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Data Warehouse Testing
294 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Data Warehouse Testing Overview
There is an exponentially increasing cost associated with finding
software defects later in the development lifecycle. In data
warehousing, this is compounded because of the additional business
costs of using incorrect data to make critical business decisions
295 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Difference In Testing Data warehouse and
Transaction System
296 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Difference In Testing Data warehouse and
Transaction System….
User-Triggered vs. System triggered
297 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Difference In Testing Data warehouse and
Transaction System…
Volume of Test Data
The test data in a transaction system is a very small sample of the
overall production data. Data Warehouse has typically large test
data as one does try to fill-up maximum possible combination of
dimensions and facts.
Possible scenarios/ Test Cases
In case of Data Warehouse, the permutations and combinations one
can possibly test is virtually unlimited due to the core objective of
Data Warehouse is to allow all possible views of data.
298 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Difference In Testing Data warehouse and
Transaction System…
299 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Data Warehouse Testing Process
300 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Requirements testing
301 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Unit Testing
Unit testing for data warehouses is WHITEBOX. It should check the ETL
procedures/mappings/jobs and the reports developed.
Unit testing the ETL procedures:
•Whether ETLs are accessing and picking up right data from right source.
•All the data transformations are correct according to the business rules and data
warehouse is correctly populated with the transformed data.
302 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Unit Testing…
303 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Integration Testing
Integration testing will involve following:
Sequence of ETLs jobs in batch.
Initial loading of records on data warehouse.
Incremental loading of records at a later date to verify the newly
inserted or updated data.
Testing the rejected records that don’t fulfil transformation rules.
Error log generation
304 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Performance Testing
305 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Acceptance testing
306 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Questions
307 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Thank You
308 ©
© 2009
2009 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential