Designing your BI Architecture

Exploiting your Data Warehouse
David Cope EDW Architect Asia Pacific

The Analytical Evolution

Easy Mining and Alphablox enable insights to be delivered throughout the enterprise.
Dif IBM fe r e ntia to

Action Action

Business Value

Ad Hoc Analysis

Discovering previously unknown and unsuspected information. Empowering analysts to test hypotheses for better decision making. Query and OLAP


Static, repetitive queries about past results.

Decision Empowerment

IBM DB2 Warehouse Software

Embedded analytics Modeling and design
Datamining mining and Data and visualization visualization In-line In-line analytics analytics

Performance optimization
Data partitioning Workload control Deep compression

Data movement and transformation Database management IBM DB2 Warehouse

Administration and control

Cube Cube dimension Cube hierarchy Cube Level Cube Facts

Cube Model Dimension Hierarchy Facts Join Attribute Level

Join Join Attribute Attribute

Measure Measure

dimension tables fact table

dimension tables

Relational tables in DB2


Model-Based Optimization
Administrator Model
OLAP Metadata

Catalog Tables

Base Tables

Model Information Time & Space constraints Query Types

Statistics Data Samples

Benefits Smart Aggregate Selection Smart Index Selection SQL Generation DB2 Exploitation


Performance Advisor

OLAP Metadata Interchange

meta data bridge
OLAP Metadata

meta data bridge

OLAP Metadata

DB2 Alphablox


OL AP Metad ata

OLAP Metadata

OL AP Metad ata OL AP Metad ata

RDBMS Metadata
OL AP Metad ata


OLAP Metadata

OL AP Metad ata

DB2 Data Warehouse

OL AP Metad ata


QMF for Windows

Model & ETL tool metadata

BI tool metadata

QlikTech ArcPlan

Platform for Customized Analytic Applications and Inline Analytics Pre-built components (Blox) for analytic functionality Allows you to create customized analytic components that are embedded into existing business processes and web applications

For end-users: A web application, portal or dashboard with embedded analytics in an easy-to-use interactive interface For application developers: A J2EE application for analysisoriented interaction A set of analytic-focused extensions to the application server Alphablox with DWE: SQL generated by DWE Design Studio can be pasted into Alphablox pages for warehousebased embedded analytics

Alphablox Architecture
Web Browser DHTML Based Client similar to AJAX XMLHttpRequest Alphablox WebLogic WebSphere Tomcat UI Model GridBlox Calculations ChartBlox Bookmarks DataBlox Alerts PresentBlox Comments

OLAP Essbase / MSAS / SAP BW

Alphablox Cubing Engine ROLAP

Relational Databases



Relational Cubing Engine & OLAP Optimization

Application Server Tier
Relational Cubing Engine
Relational Cube cubelets Cube Definition Metadata Import
DB2 Cube Views DB2 MQTs Star Schema
OLAP Metadata

Database Server Tier

Dimension Data Retrieval

DB2 Alphablox Server

Data Blox


Fact Data Retrieval

DB2 Alphablox Application

Present Blox Grid Blox Chart Blox

Customer Tier

HTTP Server


Versatile Architecture Support

BI Applications and Tools


DB2 Warehouse supports versatile analytics architectures Analytics directed against


External Mart Internal Mart Virtual Mart

External Marts

Internal Marts

Virtual Marts

DWE Easy Mining Mining without a Statistician

Realize the benefits of mining by enabling analysts, rather than relying on statisticians, for your data mining needs

Reporting Tool

DB2 Data Warehouse Edition


Two Types of Data Mining Discovery & Predictive

Automatically find trends and patterns Answer unasked questions Relatively undirected analysis Tool reports on findings In a word Easier Useful for non-statisticians

Specific question Probability associated with outcomes Directed analysis Iterative process
Train Test Apply

Apply model in database at customer touch points


DWE Easy Mining Algorithms

Business Discovery Methods finding useful patterns and relationships Associations Analyst
Which item affinities (rules) are in my data?


[Beer => Diapers] single transaction


Which sequential patterns are in my data? [Love] => [Marriage] => [Baby Products] sequential

DWE Enterprise Data Warehouse

Extracted Information Data Selected Warehouse Data Assimilated Information

Which interesting groups are in my data? customer profiles, store profiles

Predictive Methods predicting values


How to predict categorical values in my data? will the patient be cured, harmed, unaffected by treatment?

How to predict numerical values in my data?



Transform Mine

how likely a customer will respond to the promotion how much will each customer spend this year?
Score data directly in DB2, scalable and real time

Statistician & Data Mining Workbench

How to Recognize a Data Mining Need

What do my customers look like? Which customers should I target in a promotion?
Which products should I use for the promotion?

How should I lay out my new stores? Which products should I replenish in anticipation of a promotion? Which of my customers are most likely to churn? How can I improve customer loyalty? What is the most likely item that a customer will purchase next? Who is most likely to have another heart attack? What is the likelihood of a part failure?
When one part fails, what other part(s) are most likely to fail soon?

How can I identify high-potential prospects (lead generation)? How can I detect potential fraud?

High Level view of the Data Mining Process

Business Problem

A minor miracle occurs

Validate, Refine Data Warehouse Extract & Transform data Build Model



The Data Mining Process

This is an iterative process!
Business Problem
Data Warehouse

Discover & Interpret Information
Apply Results

Revise Data & Refine Model

Select Data

( X = f(X ,Z


Select Transform

Mine Visualize

Report Score data Embed in application

Data Preparation


Data Mining

Discovery technique to find associations or affinities among items (or conditions, outcomes, etc.) in a single transaction.
Constructs statements (rules) that quantify the relationships among items that tend to occur together in transactions

In a supermarket, Cola is bought in 20% of all purchases. Cola is bought in 60% of the purchases involving Orange juice. 3.7% of all purchases involve both Cola and Orange juice. The rule [ Orange juice ] [ Cola ] has the following properties: Support = 3.7% Cola and OJ are present together in 3.7% of all baskets. Confidence = 60% Cola is present in 60% of the baskets containing OJ. Lift = 60% / 20% = 3 Cola is 3 times as likely to be in the basket when OJ is also.

Given the item(s) purchased (rule body), what item (rule head) is most likely to be purchased as well?

Common uses
Promotional or cross-sell offers, Disease management, Part failure

Discovery technique to find affinities among items (or conditions, outcomes, etc.) across multiple transactions over time.
Quantifies relationships (sequences) to identify the most likely item in the next transaction

G, B ----


100% of the customers who get C will get X at a later time

B ---A ---Y

67% of the customers who get B will get X at a later time

--B ---X



Given the item(s) purchased previously (rule body), what item (rule head) is most likely to be purchased in a subsequent transaction within a certain time frame?

Common uses
Fraud detection, Promotional offers, Disease management, Part failure

Discovery technique to find clusters having distinct behaviors and characteristics
Gain insights to customers, stores, insurance claims, etc. Generate distinct behavioral/demographic profiles Understand the most important attributes of each cluster

Create a model to assign individuals to best-fit clusters

Apply model to assign new individuals or re-assign existing individuals Design business actions tailored to different characteristic profiles

Apply model to assign each record to its best-fit cluster Apply appropriate business action for each record based on its assigned cluster

Common uses
Customer segmentation, store profiling, deviation detection

Prediction technique to classify individuals by outcome
Classify by a categorical class variable (e.g., YES-NO-MAYBE response) Understand the most important factors (predictors) leading to each outcome

Create a model to classify individuals according to expected outcome Design business action based on most important predictors

Apply model to predict the outcome for each individual New prospects (expected behavior) Existing individuals (changes in behavior) Identify target individuals for business action

Common uses
Customer attrition (churn), Part failure

Set of predictive techniques to predict a dependent variable
Predict continuous value or binary numeric value Continuous: e.g., revenue (prediction represents amount of revenue) Binary: e.g., 0=No, 1=Yes (prediction represents probability of Yes) Understand the most important predictors of the dependent variable Transform regression, linear regression, polynomial regression

Create a model to predict the dependent variable Design business action (e.g., predict likelihood of default for a loan application, in real time)

Apply model to generate a prediction for each individual (e.g., probability of part failure) Identify target individuals for business action

Common uses
Predict revenue/cost/profitability, Predict risk of loan default

Data exploration
DWE enables you to explore the data.
Check data quality (prior to performing ETL for data preparation) and gain a general understanding of the data

Design Studio provides four tools to inspect data:

Table sampling Univariate distributions Bivariate distributions Multivariate distributions

All these tools are accessible by rightclicking on a table/view/alias/nickname in the database explorer:
-> Data for table sampling/editing -> Value Distributions for multivariate/ univariate/bivariate distributions

Leveraging Mining and Alphablox: DWE Miningblox

Create web applications that provide access to DWE Data Mining Extends the DB2 Alphablox API with mining specific functionality. With Miningblox, you can perform the following tasks: Selecting input data Processing input data Displaying mining results graphically in a Web browser, for example, the characteristics of a customer segment Administering or managing mining runs Typically a web application using MiningBlox tags might be integrated in a business application or an intranet portal.


Why use Miningblox ?

Provide access to Data Mining for a group of business analysts. Create a Miningblox web application that provides access to mining functionality through the Web browser, no need to install software on the Clients machines Analysts can execute mining runs and view results in a customized web application without extensive knowledge about mining software. With the Miningblox Application wizard in the DWE Design Studio, you can easily create Web applications by selecting sample templates or you can extend Alphablox applications with mining functionality.


Deployment through Alphablox application example

MBA application console


Deployment through Alphablox application example

MBA execution


Deployment through Alphablox application example

MBA completion


Deployment through Alphablox application example

MBA results report


Case Study: Retail Department Store

Analytics with Data Mining and Alphablox
David Cope EDW Architect Asia Pacific

Retail Department Store Chain

Business requirements
Perform a data mining POC (really a pilot project) to support the original DWE decision, ensure success, and highlight DWE capabilities for further uptake Define business problem
Boost storewide sales (across other departments) based on womens shoes

Define analytical approach and ETL procedure

Extract all transactions of customers who have purchased womens shoes Transform transactional data into one record per customer, for customer segmentation Perform market basket analysis (MBA) for high-potential customers who have purchased womens shoes

Engagement sponsored by IT with limited access to business users (LOB)


Solution Overview
Prepare data for mining by:
Pulling transactions for womens shoe customers Creating data for customer segmentation

Analytical Dashboard
Alphablox Heat Maps / Other Visualization Data Mining Visualizer/ Alphablox

Use DB2 Mining to perform:

Clustering Identify high-potential customer segments Market Basket Analysis for high-potential segments Identify associated items Identify next-most-likely purchases

Cubing Engine

Data Mining API

Deploy mining results in Alphablox

Integrate data mining information into the dashboard and as part of the guided analysis

Build a dashboard in Alphablox:

Provide critical information and metrics in an Alphablox dashboard to merchandising and marketing. Integrate powerful visualization to make it easier to identify problem areas

DB2 Data Warehouse

Mining Models & Services Clustering Associations & Sequences Scoring Services


Business Scenario for Mining

Business requirements for POC
Focus on customers who have purchased womens shoes in the past 12 months Boost storewide sales (across other departments) based on womens shoes Increase wallet share from high-potential customers

Business questions to be answered

What do my womens shoes customers look like? Which of these customers should I target in a promotion? Which products should I use for the promotion? Which products should I replenish in anticipation of a promotion? How can I improve customer loyalty? What is the most likely item that a womens shoes customer will purchase next?


Step 1: Identify High-Potential Shoe Customers


Result: 16 Distinct Clusters Created


Cluster 1: Those who Act Like VIPs

Frequent Shoppers VIPs Active Shoppers Respond to Discounts

Big Spenders

High Returns

High Potential Customers!


Cluster 6: Frequent Good Shoppers

Shop Here 30 days/yr

Above-Avg Purchases

Above-Avg Spending

Respond to Discounts

Average Returns

High Potential Customers!


Step 2: Identify Associated Items for Clusters 1 & 6

Extracted transactions for those clusters of customers Performed market basket analysis and interpreted results
Associations (items purchased together in one visit)


Identify Purchased Together for Clusters 1 & 6


Results: Associations for Clusters 1 & 6


Step 3: Identify Next Likely Purchase for Clusters 1 & 6

Extracted transactions for those cluster of customers Performed market basket analysis and interpreted results
Sequences (next most likely purchase in a future visit)


Identify Next Likely Purchases for Clusters 1 & 6


Results: Sequences for Customers in Clusters 1 & 6


Results and Future Ideas

Deployment of customer segmentation and MBA
End-user application with Alphablox Create & refresh mining models Identify high-potential customer segments Refresh assignment of each customer to best-fit cluster Target selected customer segments for promotions Batch scoring to identify best offer(s) for each customer/segment Merchandising now has a view of their customers, not just products

Future ideas
Score a customer at checkout register in real time MBA scoring (associations, sequences) Focused MBA scoring for known customers, based on best-fit cluster Make an offer to induce customers to visit other departments before leaving the store


