Professional Documents
Culture Documents
Decision Trees: Decision Tree Is One of The Most Widely Used and
Decision Trees: Decision Tree Is One of The Most Widely Used and
*
In decision analysis, a decision tree can be
used to visually and explicitly represent
decisions and decision making. As the
name goes, it uses a tree-like model of
decisions.
2
3
When to use Decision Trees
▪ Problem characteristics:
▪ Instances can be described by attribute value pairs
▪ Target function is discrete valued
▪ Disjunctive hypothesis may be required
▪ Possibly noisy training data samples
▪ Robust to errors in training data
▪ Missing attribute values
▪ Different classification problems:
▪ Equipment or medical diagnosis
▪ Credit risk analysis
▪ Several tasks in natural language processing
*
5
6
10
Top-down induction of Decision Trees
▪ ID3 (Quinlan, 1986) is a basic algorithm for learning DT's
▪ Given a training set of examples, the algorithms for building DT
performs search in the space of decision trees
▪ The construction of the tree is top-down. The algorithm is greedy.
▪ The fundamental question is “which attribute should be tested next?
Which question gives us more information?”
▪ Select the best attribute
▪ A descendent node is then created for each possible value of this
attribute and examples are partitioned according to this value
▪ The process is repeated for each successor node until all the
examples are classified correctly or there are no attributes left
*
12
Which attribute is the best classifier?
* Maria Simi
14
15
16
17
Example: expected information gain
▪ Let
▪ Values(Wind) = {Weak, Strong}
▪ S = [9+, 5−]
▪ SWeak = [6+, 2−]
▪ SStrong = [3+, 3−]
▪ Information gain due to knowing Wind:
Gain(S, Wind) = Entropy(S) − 8/14 Entropy(SWeak) − 6/14 Entropy(SStrong)
= 0.94 − 8/14 × 0.811 − 6/14 × 1.00
= 0,048
*
19
20
21
-2/5log2/5-3/5log3/5 -3/5log3/5-2/5log2/5
22
First step: which attribute to test at the root?
*
After first step
*
Second step
▪ Working on Outlook=Sunny node:
Gain(SSunny, Humidity) = 0.970 − 3/5 × 0.0 − 2/5 × 0.0 = 0.970
Gain(SSunny, Wind) = 0.970 − 2/5 × 1.0 − 3.5 × 0.918 = 0 .019
Gain(SSunny, Temp.) = 0.970 − 2/5 × 0.0 − 2/5 × 1.0 − 1/5 × 0.0 = 0.570
▪ Humidity provides the best prediction for the target
▪ Lets grow the tree:
▪ add to the tree a successor for each possible value of Humidity
▪ partition the training samples according to the value of Humidity
*
Second and third steps
*
27
28
Other Splitting Criterion: GINI Index
Gini Index/Gini impurity, calculates the amount of
probability of a specific feature that is classified
incorrectly when selected randomly. If all the
elements are linked with a single class then it can
be called pure.GINI Index for a given node t :
k
n
GINI (t ) 1 [ p( j | t )] 2 GINI split
i 1 n
i
GINI (i )
j
Note: information gain in this slide is weighted GINI index
Note: information gain in this slide is weighted GINI index
Note: information gain in this slide is weighted GINI index
Note: information gain in this slide is weighted GINI index
Note: information gain in this slide is weighted GINI index
Gini Index is a metric to measure how often a randomly chosen element would
be incorrectly identified.
It means an attribute with lower Gini index should be preferred.
How to Specify Test Condition?
Depends on attribute types
– Nominal
– ordinal
– Continuous
OR
Binary split: Divide values into two subsets
CarType
CarType {Family,
{Sports, Luxury} {Sports}
Luxury} {Family}
• Overfitting:
– Given a model space H, a specific model hH is said
to overfit the training data if there exists some
alternative model h’H, such that h has smaller error
than h’ over the training examples, but h’ has smaller
error than h over the entire distribution of instances
• Underfitting:
– The model is too simple, so that both training and
test errors are large
underfitting
Detecting Overfitting
Overfitting
Overfitting in Decision Tree Learning
Overfitting results in decision trees that are
more complex than necessary
– Tree growth went too far
– Number of instances gets smaller as we build the
tree (e.g., several leaves match a single example)