Professional Documents
Culture Documents
8.classification Tree
8.classification Tree
Classification Trees
Trees and Rules
Goal: Classify or predict an outcome
based on a set of predictors
The output is a set of rules
diagrams
Also called CART (Classification
100
90
80
70
60
Regular beer
Age
50
Light beer
40
30
20
10
0
$0 $20,000 $40,000 $60,000 $80,000
Income
A Classification Tree for the 0
Age
Splitting value
29 31
# records
34375 41173
Income Income
6 23 18 13
Regular Regular
39180 51.5
Leaf/ terminal
Income Age
node
7 16 7 6
Method Settings
1. Determining splits/partitions
Entropy Measure
K
Entropy of a node =− ∑ p log ( p )
i =1
i 2 i
K = number of classes
pi = the percentage of records in
class i
0
Entropy
0 ≤ Entropy ≤ log2(K)
1
0.9
0.8
0.7
0.6
Entropy
0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
pi
0
Entropy: Example
Assume we have K = 2 classes (buy, not buy)
Then, maximum Entropy is log2(2) = 1
Ex.2—Pure Node: p1 = 1, p2 = 0
Entropy = – [1×log2(1) + 0×log2(0)] = 0
Splitting the 100 beer drinkers 0
25 Regular 25 Regular
18 Light 32 Light
K
GI = 1 − ∑ pi2
i =1
K = number of classes
pi = the percentage of records in
class i
0
0 ≤ GI ≤ (K-1) / K
25 Regular 25 Regular
18 Light 32 Light
Predictors 42.5
informative
Trees 34375 41173
used for
6 23 18 13
variable Regular
39180
Regular
51.5
selection
Income Age
Age, Income 7 16 7 6
AND rules 29
Age
31
IF age>42.5 AND
income< $41,173 THEN 34375 41173
Age
29 31
34375 41173
Income Income
6 23 18 13
Regular Regular
39180 51.5
Income Age
7 16 7 6
Age
29 31
34375 41173
Income Income
6 23 18 13
Regular Regular
39180 51.5
Income Age
7 16 7 6
Avoiding Over-fitting
Partitioning is done based on
training set
As we go down the tree, splitting is
based on less data
Larger trees lead to higher
prediction variance
0
Avoiding Over-fitting –
cont.
How will the tree perform on new
data?
The error rate of a tree =
misclassifications
Error Rate
Unseen
data
Training
data
# splits
0
validation
data
Training
data
# splits
Pruning
CART lets tree grow to full extent, then
prunes it back
Idea is to find that point at which the
validation error begins to rise
Generate successively smaller trees by
pruning leaves
At each pruning stage, multiple trees
are possible
Use cost complexity to choose the best
tree at that stage
Cost Complexity
CC(T) = Err(T) + α
L(T)
CC(T) = cost complexity of a tree
Err(T) = proportion of misclassified
records
α = penalty factor attached to tree size
(set by user)
Age
Age
29 31 18 22
Income
1830 670
2.95 1.5
CCAvg Education
1737 93 398 272
0
0.5 2.5 114.5
Sub Tree Sub Tree Sub Tree Sub Tree Sub Tree Sub Tree Sub Tree
1 0 0 1 1
beneath beneath beneath beneath beneath beneath beneath
Training Data (Using Full Tree) Validation Data (Using Full Tree) Test Data (Using Full Tree)
Classification Confusion Matrix Classification Confusion Matrix Classification Confusion Matrix
Predicted Class Predicted Class Predicted Class
Actual Class 1 0 Actual Class 1 0 Actual Class 1 0
1 235 0 1 128 15 1 88 14
0 0 2265 0 17 1340 0 8 890