Professional Documents
Culture Documents
Solving Business Problems With Oracle Data Mining
Solving Business Problems With Oracle Data Mining
Purpose
This tutorial shows you how to use Oracle Data Mining to solve business problems.
Time to Complete
Approximately 2 hours
Topics
This tutorial covers the following topics:
Overview
Scenario
Prerequisites
Summary
Viewing Screenshots
Place the cursor over this icon to load and view all the screenshots for this tutorial.
(Caution: This action loads all screenshots simultaneously, so response time may be
slow depending on your Internet connection.)
Note: Alternatively, you can place the cursor over an individual icon in the following
steps to load and view only the screenshot associated with that step. You can hide an
individual screenshot by clicking it.
Overview
Data Mining is sometimes called Knowledge Discovery-its goal is to provide actionable
information, not found by other means. This information can improve the functioning of
your business. For example, suppose a marketing campaign results in a 2% positive
response. If Data Mining can help focus the campaign on the people most likely to
respond, resulting in a 3% response, then the business outcome is a 50% increase in
revenue.
Data Mining can be divided into two types of "Learning".
The goal is to predict a value for a particular
Supervised characteristic, or attribute that describes some
Learning behavior. The attribute being predicted is called the
Target Attribute.
The goal is to discover relationships and patterns
Unsupervised
rather than to determine a particular value. Here,
Learning
there is no Target Attribute.
Oracle Data Mining (ODM) is powerful data mining software embedded in the Oracle
Database that enables you to discover new insights hidden in your data. Oracle Data
Mining helps businesses to target their best customers, find and prevent fraud, discover
the most influential attributes that affect Key Performance Indicators (KPIs), and find
valuable new information hidden in the data. Oracle Data Mining helps technical
professionals find patterns in their data, identify key attributes, discover new clusters
and associations, and uncover valuable insights.
Oracle Data Mining enables companies to:
KNOW Leverage your data and discover valuable new
MORE information and insights that were previously hidden.
DO Build applications that automate the extraction and
MORE dissemination of new information and insights.
Oracle Data Mining is significantly less expensive
SPEND compared to traditional approaches and, as a component
LESS of your investment in Oracle technology, significantly
reduces your total cost of ownership.
Oracle Data Mining enables you to go beyond standard query and reporting tools and
Online Analytical Processing (OLAP). Query and reporting and OLAP tools can tell you
who are your top customers, what products have sold the most, and where you are
incurring the highest costs. With Oracle Data Mining, you can implement strategies to:
Anticipate and prevent customer attrition
Traditional business intelligence (BI) tools such as reports, interactive query and
reporting, and Online Analytical Processing (OLAP), only report on what has happened
in the past. Oracle Data Mining (ODM) allows you to go beyond traditional BI and
reporting to mine your data and build advanced data mining applications. ODM enables
you to discover new insights, segments and associations, make more accurate
predictions, find the variables that most influence your business, and in general, extract
more information from your data. For example, by analyzing the profiles of your best
customers, ODM enables you to build data mining models and integrated applications to
identify customers who are likely to become your best customers in the future. These
customers may not represent your most valuable customers today, but may match
profiles of your current best customers. Moreover, with ODM you can do more and
transform a predictive model into a regular production application that distributes lists
of your most promising customers to your Sales force every Monday morning. Knowing
the “strategic value” of your customers — which are likely to become profitable
customers in the future and which are not, or predicting which customers are likely to
churn or likely to respond to a marketing offer — and integrating this information into
your operations is the key to proactively managing your business.
The phases of solving a business problem using Oracle Data Mining are as follows:
Problem Definition in terms of Data Mining and Business Goals
Deployment.
Note: This tutorial is not intended as a comprehensive course on Oracle Data Mining. It
illustrates the techniques required to carry out some common data mining operations. If
you want more background information about some of the topics, see the Oracle Data
Mining Concepts Guide.
Back to Topic List
Scenario
An electronics store chain wants to distribute a discount card to its customers, but only
to those customers who are expected to increase their buying (and thus the company’s
revenue) because of this card. A test campaign was run on a sample of customers and
the results were compiled into a table containing the customer demographics,
purchasing patterns, and a measure of revenue produced by each customer.
A dataset describing the customers and results in the test campaign is used to create a
model that can be applied to all customers for the purpose of predicting revenue levels
expected from each customer who uses a discount affinity card – the Target attribute to
be predicted is AFFINITY_CARD with values 0 (= low revenue) and 1 (= high
revenue). The distinct target values (0 and 1 in this case) are sometimes called the target
“classes”, thus the prediction of a target value for each customer is called Classification.
Back to Topic List
Prerequisites
Before starting this tutorial, you should:
1. Perform the Installing Oracle Database 10g on Windows tutorial.
2. Perform the Installing Oracle Data Miner 10g Release 2 tutorial.
3. You may want to confirm sample tables and views in your user schema. Expand Data Sources >
dmuser1.
This displays both tables and views. Click Tables to view tables; Click Views to have the list of
views.
Back to Topic List
Preparing Data for Data Mining
The data used in the Data Mining process usually has to be collected from various
locations. Also, some transformation of the data is usually required to prepare the data
for Data Mining operations. Oracle Data Mining has powerful utilities for sampling,
recoding, discretizing and creating new transformed attributes.
In this section, you perform the following tasks:
Display Data and Statistics
3. Right-click MINING_DATA_BUILD_V to expose a menu with more options. You have access
to Transformation wizards and other tools for data exploration.
4. To see a statistical summary, you can click one of the two selections depending on the data
format type. Click Show Summary Single-Record.
Your output should match the image below:
For each numerical attribute, Maximum and Minimum values, as well as average and variance
are shown. These statistics are calculated on a sample. The sample size in this example is 999.
You can change the sample size by adjusting ODM Preferences, available on the Tools pull-
down menu.
Back to Topic
Use Data Transformation wizard
The Mining Activity Guides assist you in joining data from disparate sources into one
view or table. Also, they carry out transformations that are required by a particular
algorithm. However, there are transforms that typically are completed on a standalone
basis using one of the following Data Transformation wizards.
Filter
Recode
Derive field
In this tutorial, you learn how to use the Filter Single -Record Transformation wizard.
Suppose you want to concentrate on customers between the ages of 21 and 35. You can
filter the data to include only those people. Oracle Data Miner provides a filtering
transformation to define a subset of the data based upon attribute values. Perform the
following steps:
1. Click Data > Transform > Filter Single-Record.
Back to Topic
Back to Topic List
Using the Mining Activity Guides
When the data mining problem has been defined and the source data identified, there are
two phases remaining in the Data Mining Process: Build/Evaluate models and deploy
the results. Oracle Data Miner contains Activity Guides for the purpose of carrying out
these phases with the minimum of required intervention. Moreover, the implicit and
explicit choices and settings used in the Build activity can be passed on seamlessly to
the Apply activity. Therefore, many operations usually required are hidden or
eliminated.
The Build Activity Wizard allows you to :
Identify supplemental data to add to the case table(the basic source data)
Adjust the Activity settings manually, rather than accept automatic settings
This example uses the Decision Tree Classification algorithm to create, evaluate, and
apply a predictive model classifying customers as fitting the profile of either a high-
revenue or low-revenue customer. You need to perform the following tasks:
1. Build a Predictive Model using a Mining Activity
2. Apply a Predictive Model using a Mining Activity
4. On the Select Mining Type page, accept the default of Classification Function Type and the
Decision Tree Algorithm. Click Next.
5. To select all the data to be mined from the case table or view, ensure that the box Join additional
data with case table is not checked. If the box is checked, you can specify data from other tables
and schemas to be joined with the case table. This example assumes that all the necessary data is
in the case table MINING_DATA_BUILD_V. From the Unique Identifier drop down list, select
CUST_ID. Then click Next.
6. You can specify the attribute to be predicted for a data mining problem. Click the
AFFINITY_CARD radio button to indicate the target and then click Next.
Note: The Target Attribute to be predicted in this tutorial is AFFINITY_CARD
7. Some evaluation results depend upon your choice of Preferred Target Value. In this example, you
want to identify the high-revenue customers (Target Value = 1). Select 1 for the Preferred Target
Value and click Next.
8. Enter a name for the activity that is descriptive and click Next.
9. The last step of the Activity Wizard is displayed. You can display the settings in the Activity.
Click Advanced Settings.
10 The parameters have default settings that ensure a good result. The Build parameters determine
. how the branches of the decision tree are constructed. Click OK.
11. Ensure that the Run upon finish check box is selected. Click Finish.
When the Activity Wizard is completed, the steps appropriate to the chosen activity are
displayed. If you choose Run Upon Finish, the steps are executed to completion in sequence and
a check appears on the right side of each step as it is completed.
12 To view the structure of the tree, click Result in the Build step.
.
13 Your results might differ from the display due to the small size of the sample dataset. The default
. view shows all nodes and the attribute values used to determine splits. You can highlight a node
to show the rule for a record to be included in that node. The predicted value is the target value of
the majority of records in that node. Confidence is the percentage of records in the node having
the predicted taget value. Cases is the actual number of cases in the source data satisfying the
rule for that node. Support is the percentage of cases in the source data satisfying the rule for that
node.
14 You may want to view only the terminal nodes(known as Leaves) and eliminate the intermediate
. nodes. Click the Show Leaves Only check box.
15 A decision tree is sensitive to missing values when applied to new data. However, if the splitting
. attribute is missing, the ODM Decision Tree Algorithm provides an alternative attribute known
as surrogate to be used in its place. If both the splitting attribute and its surrogate are missing, the
predicted value is determined at the parent node of the tree. To view the surrogate, highlight a
node and click the Surrogate radio button.
Close the Build Results window.
16 To evaluate the Decision Tree Model, click Result in the Test Metrics step.
.
17 Click the Accuracy tab.
.
18 Click the More Detail button to see the Confusion Matrix.
.
19 The Confusion Matrix is calculated by applying the model to the hold-out sample from the test
. campaign. The values of AFFINITY_CARD are known and are represented by the rows; the
columns are the predictions made by the Classification Model. The upper right and lower left
cells indicate potential prediction errors. Most business problem solutions attempt to minimize
one type of error. For example, the lower left cell contains the number of “missed opportunities”
– cases of high-revenue customers who were not correctly identified by the model. Click the
ROC tab.
20 To explore possible changes in the model's parameters, you can slide the red line back and forth
. by clicking the arrow buttons at the bottom right of the graph and observe the change in the
Confusion Matrix until the values indicate a good solution to the problem. Click Lift.
21 You can view two graphs showing different interpretations of the lift calculations. With the
. Cumulative Lift radio button selected, this indicates how much better the model does than the
naive guess. In this example, the Decision Tree finds about 2.5 times more positive cases than
chance in the first quantile. Select the Cumulative Positive Cases radio button.
21 The graph indicates how many cases must be analysed in order to find the desired number of
. positive cases, that is high-revenue customers.
Back to Topic
Apply a Predictive Model using a Mining Activity
When a model is applied to new data, the input data must be prepared and transformed
in exactly the same way that the source data for the Build activity was prepared. The
Apply activity is based on a Build activity. This Build activity passes to the Apply
activity whatever knowledge is required to prepare the input data appropriately. Perform
the following steps:
1. Click Activity > Apply
2. Click Next.
3. Expand Classification and select the Build Activity that you just built. Then click Next.
4. Click the Select link to choose the input data for the Apply procedure.
5. Expand DMUSER1 and select MINING_DATA_APPLY_V. Then click OK.