Professional Documents
Culture Documents
Decision Trees and Decision Modeling
Decision Trees and Decision Modeling
Decision
Trees and
Modeling
Data Mining Methods According to 2
Continuous Categorical No
Response Response Response
Continuous Linear Logistic regression Principal
predictors regression Neural nets components
Neural nets Discriminant Cluster
k-nearest analysis analysis
neighbors k-nearest neighbors
Categorical Linear Neural nets Association
predictors regression Classification rules
Neural nets trees
Regression trees Logistic regression
Naïve Bayes
3
Supervised
Types of
Algorithms used in classification and
Data prediction
Mining Classification models discrete-valued
functions
Methods Prediction models continuous-valued
functions
Require data in which the value of the
outcome of interest (e.g., purchase or no
purchase) is known
Use training data, validation data, and test
data
Unsupervised
Algorithms where there is no outcome
variable to predict or classify
Include association rules, data reduction
methods, and clustering techniques
Supervised Learning Process: 4
Two Steps
Learning (Training)
Learn a model using the training data
Testing
Test the model using unseen test data to assess model
accuracy
apply
model
Induction Deduction
Source: Bing Liu, Web Data Mining, 2nd ed., Fig. 3.1., p. 66, Springer, 2011.
Supervised Learning: 5
Classification
The task of learning a target function f that maps each
attribute set x to one of the predefined class labels y
Input Output
Attribute set Classification Class label
(x) model (y)
A Simple
Decision
Tree For
Buying A
Car
• Each node
specifies a test
(question) of
an attribute.
• Each branch
from a node is a
possible value
of this
attribute
Source: IBM SPSS Modeler 14.2 Modeling Nodes, 2011, Figure 6-2, p. 115.
9
Decision Tree Goals
1. Accurate classification
minimize error
Internal nodes
Each has exactly one incoming edge and two or
more outgoing edges
Root node
• Each node
specifies a test of
an attribute.
• Each branch from
Internal a node is a
nodes possible value of
this attribute.
Leave nodes
Decision trees seek to create a set of leaf nodes
that are as ”pure” as possible!
Source: http://gautam.lis.illinois.edu/monkmiddleware/public/analytics/decisiontree.html
14
What do we mean by “pure”?
Stopping criteria
When to stop building the tree
Pruning (generalization method)
Pre-pruning versus post-pruning
Data Set
Class attribute:
indicates if loan
application was
approved (Yes)
in the past or not
(No)
Source: Bing Liu, Web Data Mining, 2nd ed., Table 3.1., p. 64, Springer, 2011.
Chapter 3 can be downloaded from:
http://www.springer.com/cda/content/document/cda_downloaddocument/9783642194597-c3.pdf?SGWID=0-
0-45-1229046-p174122057
Loan Application Training Data Set 22
Source: Bing Liu, Web Data Mining, 2nd ed., Fig. 3.5., p. 71, Springer, 2011.
+ 25
Yes 9
No 6
Source: Bing Liu, Web Data Mining, 2nd ed., Fig. 3.2., p. 67, Springer, 2011.
Using the tree to predict the 26
IF Own_house = false
AND Has_job = false
THEN Class = No
C5.0 Tree
Built with
SPSS
Modeler
18
Robustness
Ability
to make reasonably accurate predictions, given
noisy data or data with missing and erroneous values
Scalability
Ability
to construct a prediction model given a rather
large amount of data
Interpretability
Level
of understanding & insight provided by the
model
Compactness of the model
Size of the tree, or the number of rules.
Judging Classification Performance:
Evaluation methods
Accuracy Measures
The following terms are defined for a 2 x 2 confusion matrix:
Source: Kohavi & Provost, Glossary of Terms, Machine Learning, 30, 271-274 (1998),
http://robotics.stanford.edu/~ronnyk/glossary.html
Evaluating Performance When
One Class is More Important
Data
sets with imbalanced class distributions are quite
common in many real applications.
In
many such cases it is more important to identify
members of one class
Taxfraud
Credit card fraud
Credit default
Response to promotional offer
Detecting electronic network intrusion
Predicting delayed flights
In
such cases, we are willing to tolerate greater overall
error, in return for better identifying the important
class for further attention
36
Judging Classification Performance:
Different Kinds of Wrong Predictions
Source: Figure 5.7 in Shmueli et al. (2010), Data Mining for Business Intelligence, p. 104.
44
Cutoff Value = 0.75 Predicted Class
Classification Actual Class 0 1
Confusion 0 11 1
Matrices for 1 5 7
Various True Positive Rate: 7/12 = .5833
Cutoff Values False Positive Rate: 1/12 = .0833
Cutoff Value = 0.50 Predicted Class
Actual Class 0 1
0 10 2
1 1 11
True Positive Rate: 11/12 = .9166
False Positive Rate: 2/12 = .1667
Cutoff Value = 0.25 Predicted Class
Actual Class 0 1
0 8 4
1 1 11
True Positive Rate: 11/12 = .9166
False Positive Rate: 4/12 = .3333
45
The Class Imbalance Problem
Down-sampling (undersampling)
Reduces the number of samples to improve balance
across the classes
Eliminates negative examples, i.e., majority event
cases
48
Remedying the Imbalance Problem with
SPSS Modeler’s Balance Node (Record Ops)
Model overfitting
Model fits the training data too well
Good accuracy on training data but poor on test data
Symptoms:
Tree too large - too deep and too many branches
Some may reflect anomalies in the training data due to
noise or outliers