Professional Documents
Culture Documents
11 Descision Tree
11 Descision Tree
If
not
Then, if its sunny Ill play tennis But if its windy and Im rich, Ill go shopping If its windy and Im poor, Ill go to the cinema If its rainy, Ill stay in
Leaves
Read
In
our example:
If no_parents and sunny_day, then play_tennis no_parents sunny_day play_tennis
Remember
We
need examples put into categories We also need attributes for the examples
Attributes describe examples (background knowledge) Each attribute takes only a finite set of values
Which nodes to put in which positions Including the root node and the leaf nodes
ID3
Node
In order to define information gain precisely, we begin by defining a measure commonly used in information theory, called entropy that characterizes the (im)purity of an arbitrary collection of examples
Want a notion of impurity in data Imagine a set of boxes and balls in them If all balls are in one box
This is nicely ordered so scores low for entropy Boxes with very few in scores low Boxes with almost all examples in scores low
Entropy - Formulae
Given
For
examples in categorisations c1 to cn
Entropy - Explanation
When pi is near to 0
Information Gain
Given
A can take values v1 vm Let Sv = {examples which take value v for attribute A}
Calculate
Gain(S,A)
Estimates the reduction in entropy we get if we know the value of attribute A for the examples in S
one positive example and three negative examples The positive example is s1
And
Attribute A
S1
Recall that
Entropy(S) = -p+log2(p+) p-log2(p-)
Hence
Entropy(S) = -(1/4)log2(1/4) (3/4)log2(3/4) = 0.811
that
And
Now,
(|Sv1|/|S|) * Entropy(Sv1) = (1/4) * (-(0/1)*log2(0/1)-(1/1)log2(1/1)) = (1/4) * (0 - (1)log2(1)) = (1/4)(0-0) = 0 Similarly, (|Sv2|/|S|) = 0.5 and (|Sv3|/|S|) = 0
Final Calculation
So,
we add up the three calculations and take them from the overall entropy of S: answer for information gain:
Final
a set of examples, S
Described by a set of attributes Ai Categorised into categories cj Such that A scores highest for information gain
Relative
Then put that category as a leaf node in the tree Then find the default category (which contains the most examples from S)
If Sv is empty
Otherwise
Remove A from attributes which can be put into nodes Replace S with Sv Find new attribute A scoring best for Gain(S, A) Start again at part 2
Explanatory Diagram
A Worked Example
Weekend W1 W2 W3 W4 W5 W6 W7 W8 W9 W10 Weather Sunny Sunny Windy Rainy Rainy Rainy Windy Windy Windy Sunny Parents Yes No Yes Yes No Yes No No Yes No Money Rich Rich Rich Poor Rich Poor Poor Rich Rich Rich Decision (Category) Cinema Tennis Cinema Cinema Stay in Cinema Cinema Shopping Cinema Tennis
Entropy(S) = = 1.571 (see notes) For all the attributes we currently have available
Gain(S, weather) = = 0.7 Gain(S, parents) = = 0.61 Gain(S, money) = = 0.2816 Because this gives us the biggest information gain
In particular, we look at the examples with the attribute prescribed by the branch Categorisations are cinema, tennis and tennis for W1,W2 and W10 What does the algorithm say?
Ssunny = {W1,W2,W10}
Cannot be weather, of course weve already had that Gain(Ssunny, parents) = = 0.918 Gain(Ssunny, money) = = 0
Avoiding Overfitting
Decision
Avoidance
method 1: method 2:
Stop growing the tree before it reaches perfection Grow to perfection, then prune it back aftwerwards
Most
Avoidance
Background concepts describe examples in terms of attribute-value pairs, values are always finite in number Concept to be learned (target function)
Has
discrete values
Decision