c18 Decision Tree Learning 2013

Decision Tree Learning
Introduction
Decision tree learning is one of the most
widely used and practical method for
inductive inference
Decision tree learning is a method for
approximating discrete-valued target
functions, in which the learned function is
represented by a decision tree
Decision tree learning is robust to noisy data
and capable of learning disjunctive
expressions
Decision tree representation
Decision tree classify instances by sorting
them down the tree from the root to some
leaf node, which provides the classification of
the instance
Each node in the tree specifies a test of some
attribute of the instance, and each branch
descending from that node corresponds to
one of the possible values for this attributes
Decision Tree Template
Root
Drawn top-to-bottom or
left-to-right
Child Child Leaf
Top (or left-most) node =
Root Node
Descendent node(s) = Child Leaf
Child Node(s)
Bottom (or right-most)
node(s) = Leaf Node(s) Leaf
Unique path from root to

each leaf = Rule
Decision Tree for PlayTennis
When to Consider Decision
Trees
Instances describable by attribute-value pairs
Target function is discrete valued
Disjunctive hypothesis may be required
Possibly noisy training data
Examples (Classification problems):

Equipment or medical diagnosis
Credit risk analysis
Top-Down Induction of
Decision Trees
strigil Input ñÑmwññÉn cn-niwoiosmmini.is data ÑwÑwÑñ)
Entropy (1) T-uoniitoqpkmww-wgmgpy.ae }
'oñÑñEniow
wonoonminwoimiinr
'
nu
"
Data mixed Ñwiinua :wwÑÑoon
Data iiunoonmnriwoein.tn row

Entropy (2)
Information Gain
gain ↑ n% niioionfifn-iiinioiow.io iiiooonrinñwlñ

An Example
The Entropy of A1 is computed as the following:
[29+,35-] A1=? E(A) = -29/(29+35)*log2(29/(29+35))

35/(35+29)log2(35/(35+29))
ñÑÑñ
= 0.9937 entropy siniinñ ño w-wnnwo.ua noonan
;
,
The Entropy of True:

True False E(TRUE) = - 21/(21+5)*log2(21/(21+5))
5/(5+21)*log2(5/(5+21))
= 0.7960
The Entropy of False:

[21+, 5-] [8+, 30-]
E(FALSE) = -8/(8+30)*log2(8/(8+30))
30/(30+8)*log2(30/(30+8))
= 0.7426
Compute Information Gain
Gain (Sample, Attributes) or Gain (S,A) is expected
reduction in entropy due to sorting S on attribute A
Gain(S,A) = Entropy(S) - v values(A) |Sv|/|S|
Entropy(Sv)
So, for the previous example, the Information gain is
calculated:
G(A1) = E(A1) E(TRUE)
- (21+5)/(29+35) *
- (8+30)/(29+35) * E(FALSE)
= E(A1) - 26/64 * E(TRUE) - 38/64* E(FALSE)

= 0.9937 26/64 * 0.796 38/64* 0.7426
= 0.5465 I gain
; ≈Ñño ñÑonk
0.50
Training Examples
For the target concept PlayTennis
Day Outlook Temperature Humidity Wind PlayTennis
D1 Sunny Hot High Weak No

D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Weak Yes
D8 Sunny Mild High Weak No
D9 Sunny Cold Normal Weak Yes
D10 Rain Mild Normal Strong Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No
To Build a Decision Tree
We want to build a decision tree for the tennis
matches
The schedule of matches depend on the weather
(Outlook, Temperature, Humidity, and Wind)
So to apply what we know to build a decision tree
based on this table
Example
Calculating the information gains for each of
the weather attributes:
For the Wind
For the Temperature
For the Humidity
For the Outlook
For the Wind NO
=
yes
S=[9+,5-]
E=0.940 Wind
E =
log (9-4) ¥11092 (7-4) =

-0.94
-
-
,
,
↑
jinns ñsuwa
8 6
Weak Strong
Weak → Yes -46 Weak → NO =
2
[6+, 2-] [3+, 3-]

E=0.811 E-
bzlogzlbz ) Hog (2-8)
2
E=1.0
= -
-
=
0.811
Gain(S,Wind):
=0.940 - (8/14)*0.811 - (6/14)*1.0
=0.048
For the Temperature
S=[9+,5-]
E=0.940
Temperature
Hot Mild Cool
[2+, 2-] [4+, 2-] [3+, 1-]

E=1.0 E=0.918 E=0.811
Gain(S,Temperature)
=0.940-(4/14)*1.0 -(6/14)*0.0 (4/14)*0.0971
=0.029
For the Humidity
S=[9+,5-]
E=0.940
Humidity
High Normal
[3+, 4-] [6+, 1-]

E=0.985 E=0.592
Gain(S,Humidity)
=0.940-(7/14)*0.985 (7/14)*0.592
=0.151
For the Outlook
S=[9+,5-]
E=0.940
Outlook
Sunny Overcast Rain
[2+, 3-] [4+, 0] [3+, 2-]

E=0.971 E=0.0 E=0.971
Gain(S,Outlook)
=0.940-(5/14)*0.971 -(4/14)*0.0 (5/14)*0.0971
=0.247
Next
Outlook is the winner for the root.
Now that we have discovered the root
of our decision tree we must now
recursively find the nodes that should
go below
Sunny,
Overcast,
and Rain.
For the Rain ^ Humidity
S=[3+,2-]
E=0.971
Rain
Humidity
: :
2
High Normal
[1+, 1-] [2+, 1-]
E=1.0 E=0.918
Gain(S,Rain^Humidity)
=0.971-(2/5)*1.0 (3/5)*0.918
=0.02
For the Rain ^ Temperature
S=[3+,2-] Rain
E=0.971
Temperature
Hot Mild Cool
[0+, 0-] [2+, 1-] [1+, 1-]

E=0.918 E=1.0
Gain(S,Rain ^ Temperature)
=0.971 - (3/5)*0.918 (2/5)*1.0
=0.02
For the Rain ^ Wind
S=[3+,2-] Rain
E=0.971
Wind
Weak Strong
[3+, 0-] [0+, 2-]
E=0.0 E=0.0
Gain(S,Rain^Wind)
=0.971-(3/5)*0.0 (2/5)*0.0
=0.971
Next
Wind is the winner for the Rain.
Overcast has only yes.
Outlook
Sunny Overcast Rain
Yes Wind
[D3,D7,D12,D13]
Strong Weak
No Yes
[D6,D14] [D4,D5,D10]
Then
For the Sunny, we are considering for
the Temperature and Humidity.
For the Sunny ^ Temperature
S=[2+,3-] Sunny
E=0.971
Temperature
Hot Mild Cool
[0+, 2-] [1+, 1-] [1+, 0-]

E=0.0 E=1.0 E=0.0
Gain(S,Sunny ^ Temperature)
=0.971 (2/5)*0.0 (2/5)*1.0 (1/5)*0.0
=0.571
For the Sunny ^ Humidity
S=[2+,3-] Sunny
E=0.971
Humidity
High Normal
[0+, 3-] [2+, 0-]
E=0.0 E=0.0
Gain(S,Sunny ^ Humidity)
=0.971-(3/5)*0.0 (2/5)*0.0
=0.971
Then
For the Sunny, the winner is
Humidity.
- -
Complete tree
Then here is the complete tree:
Outlook
Sunny Overcast Rain
Humidity Yes Wind

[D3,D7,D12,D13]
High Normal Strong Weak
No Yes No Yes
[D1,D2] [D8,D9,D11] [D6,D14] [D4,D5,D10]

Decision Tree Induction
Hypothesis Space Search by
ID3 z =
tree
Hypothesis space is complete

Target function surely in there
Only outputs a single hypothesis ;7ñ output oonniiñniioies
No back tracking
Local minima
Statically-based search choices jñinouñtoianoinmiñiwirs/ n'iain
Robust to noisy data

Inductive bias: prefer shortest tree
manis Tree riots in :&
C4.5 is an algorithm used to generate a decision tree
developed by Ross Quinlan
C4.5 made a number of improvements to ID3. Some
of these are:
Handling both continuous and discrete attributes
Handling training data with missing attribute value
Handling attributes with differing costs
Pruning trees after creation
Quinlan went on to create C5.0 and See5 (C5.0 for
Unix/Linux, See5 for Windows) which he markets
commercially.
The Over-fitting Issue
Over-fitting is caused by creating
decision rules that work accurately on
the training set based on insufficient
quantity of samples.
As a result, these decision rules may
not work well in more general cases.
Overfitting in Decision Trees
n'iii. goat , of
Reduced-Error Pruning
% mirin is
Rule Post-Pruning
Convert tree to equivalent set of rules
Prune each rule by removing any preconditions that result in improving
its estimated accuracy
Sort the pruned rules by their estimated accuracy, and consider them
in this sequence when classifying subsequent instance
Perhaps most frequently used method

Continuous Valued Attributes
Create a discrete attribute to test continuous
There are two candidate thresholds

The information gain can be computed for
each of the candidate attributes,
Temperature>54 and Temperature>85, and the
best can be selected(Temperature>54)
Attributes with many Values
Problems:
If attribute has many values, Gain will select it
Imagine using the attribute Data. It would have the highest
information gain of any of attributes. But the decision tree is not
useful.
Missing Attribute Values
Attributes with Costs
9m inonoioniop
.
oiowhrnoio cost
Consider
Medical diagnosis, BloodTset has cost 150 dallors
How to learn a consistent tree with low

expected cost?
Decision Tree Advantages
1. Easy to understand
2. Map nicely to a set of business rules
3. Applied to real problems
Tainosin oiinimwiny into ñu data
Make no prior assumptions about the data

=
.
4.
5. Able to process both numerical and categorical

data
Decision Tree Disadvantages
titi Nunn
'
Output Ñwoiiino
running
cate
log
1. Output attribute must be categorical
2. Limited to one output attribute
3. Decision tree algorithms are unstable
4. Trees created from numeric datasets can be

complex
Conclusion
Decision Tree Learning is
Simple to understand and interpret
Requires little data preparation
Able to handle both numerical and categorical
data
Use a white box model
Possible to validate a model using statistical
tests
Robust, perform well with large data in a
short time

c18 Decision Tree Learning 2013

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

c18 Decision Tree Learning 2013

Uploaded by

Copyright:

Available Formats

Decision Tree Learning

Unique path from root to

Examples (Classification problems):

Data mixed Ñwiinua :wwÑÑoon

Data iiunoonmnriwoein.tn row

gain ↑ n% niioionfifn-iiinioiow.io iiiooonrinñwlñ

[29+,35-] A1=? E(A) = -29/(29+35)*log2(29/(29+35))

The Entropy of True:

The Entropy of False:

= E(A1) - 26/64 * E(TRUE) - 38/64* E(FALSE)

Day Outlook Temperature Humidity Wind PlayTennis

D1 Sunny Hot High Weak No

log (9-4) ¥11092 (7-4) =

[6+, 2-] [3+, 3-]

Hot Mild Cool

[2+, 2-] [4+, 2-] [3+, 1-]

[3+, 4-] [6+, 1-]

Sunny Overcast Rain

[2+, 3-] [4+, 0] [3+, 2-]

Hot Mild Cool

[0+, 0-] [2+, 1-] [1+, 1-]

Sunny Overcast Rain

Hot Mild Cool

[0+, 2-] [1+, 1-] [1+, 0-]

Sunny Overcast Rain

Humidity Yes Wind

High Normal Strong Weak

[D1,D2] [D8,D9,D11] [D6,D14] [D4,D5,D10]

Hypothesis space is complete

Only outputs a single hypothesis ;7ñ output oonniiñniioies

Statically-based search choices jñinouñtoianoinmiñiwirs/ n'iain

Robust to noisy data

Perhaps most frequently used method

There are two candidate thresholds

How to learn a consistent tree with low

Make no prior assumptions about the data

5. Able to process both numerical and categorical

1. Output attribute must be categorical

2. Limited to one output attribute

3. Decision tree algorithms are unstable

4. Trees created from numeric datasets can be

You might also like