Professional Documents
Culture Documents
c18 Decision Tree Learning 2013
c18 Decision Tree Learning 2013
Introduction
Decision tree learning is one of the most
widely used and practical method for
inductive inference
Decision tree learning is a method for
approximating discrete-valued target
functions, in which the learned function is
represented by a decision tree
Decision tree learning is robust to noisy data
and capable of learning disjunctive
expressions
Decision tree representation
Decision tree classify instances by sorting
them down the tree from the root to some
leaf node, which provides the classification of
the instance
Each node in the tree specifies a test of some
attribute of the instance, and each branch
descending from that node corresponds to
one of the possible values for this attributes
Decision Tree Template
Root
Drawn top-to-bottom or
left-to-right
Child Child Leaf
Top (or left-most) node =
Root Node
Descendent node(s) = Child Leaf
Child Node(s)
Bottom (or right-most)
node(s) = Leaf Node(s) Leaf
- (8+30)/(29+35) * E(FALSE)
,
,
↑
jinns ñsuwa
8 6
Weak Strong
Weak → Yes -46 Weak → NO =
2
E=1.0
= -
-
=
0.811
Gain(S,Wind):
=0.940 - (8/14)*0.811 - (6/14)*1.0
=0.048
For the Temperature
S=[9+,5-]
E=0.940
Temperature
Gain(S,Temperature)
=0.940-(4/14)*1.0 -(6/14)*0.0 (4/14)*0.0971
=0.029
For the Humidity
S=[9+,5-]
E=0.940
Humidity
High Normal
Gain(S,Humidity)
=0.940-(7/14)*0.985 (7/14)*0.592
=0.151
For the Outlook
S=[9+,5-]
E=0.940
Outlook
Gain(S,Outlook)
=0.940-(5/14)*0.971 -(4/14)*0.0 (5/14)*0.0971
=0.247
Next
Outlook is the winner for the root.
Now that we have discovered the root
of our decision tree we must now
recursively find the nodes that should
go below
Sunny,
Overcast,
and Rain.
For the Rain ^ Humidity
S=[3+,2-]
E=0.971
Rain
Humidity
: :
2
High Normal
[1+, 1-] [2+, 1-]
E=1.0 E=0.918
Gain(S,Rain^Humidity)
=0.971-(2/5)*1.0 (3/5)*0.918
=0.02
For the Rain ^ Temperature
S=[3+,2-] Rain
E=0.971
Temperature
Gain(S,Rain ^ Temperature)
=0.971 - (3/5)*0.918 (2/5)*1.0
=0.02
For the Rain ^ Wind
S=[3+,2-] Rain
E=0.971
Wind
Weak Strong
[3+, 0-] [0+, 2-]
E=0.0 E=0.0
Gain(S,Rain^Wind)
=0.971-(3/5)*0.0 (2/5)*0.0
=0.971
Next
Wind is the winner for the Rain.
Overcast has only yes.
Outlook
Yes Wind
[D3,D7,D12,D13]
Strong Weak
No Yes
[D6,D14] [D4,D5,D10]
Then
For the Sunny, we are considering for
the Temperature and Humidity.
For the Sunny ^ Temperature
S=[2+,3-] Sunny
E=0.971
Temperature
Gain(S,Sunny ^ Temperature)
=0.971 (2/5)*0.0 (2/5)*1.0 (1/5)*0.0
=0.571
For the Sunny ^ Humidity
S=[2+,3-] Sunny
E=0.971
Humidity
High Normal
[0+, 3-] [2+, 0-]
E=0.0 E=0.0
Gain(S,Sunny ^ Humidity)
=0.971-(3/5)*0.0 (2/5)*0.0
=0.971
Then
For the Sunny, the winner is
Humidity.
- -
Complete tree
Then here is the complete tree:
Outlook
No Yes No Yes
No back tracking
Local minima
n'iii. goat , of
Reduced-Error Pruning
% mirin is
Rule Post-Pruning
Convert tree to equivalent set of rules
Prune each rule by removing any preconditions that result in improving
its estimated accuracy
Sort the pruned rules by their estimated accuracy, and consider them
in this sequence when classifying subsequent instance
Consider
Medical diagnosis, BloodTset has cost 150 dallors
4.
Output Ñwoiiino
running
cate
log