Professional Documents
Culture Documents
Algorithms: Decision Trees: Tilani Gunawardena
Algorithms: Decision Trees: Tilani Gunawardena
Algorithms: Decision Trees: Tilani Gunawardena
Tilani Gunawardena
Decision Tree
• Decision tree builds classification or regression models in the
form of a tree structure
• It breaks down a dataset into smaller and smaller subsets while
at the same time an associated decision tree is incrementally
developed
• The final results is a tree with decision nodes and leaf notes.
– Decision nodes(ex: Outlook) has two or more branches(ex: Sunny,
Overcast and Rainy)
– Leaf Node(ex: Play=Yes or Play=No)
– Topmost decision node in a tree which corresponds to the best
predictor called root node
• Decision trees can handle both categorical and numerical data
Decision tree learning Algorithms
• ID3 (Iterative Dichotomiser 3)
• C4.5 (successor of ID3)
• CART (Classification And Regression Tree)
• CHAID (CHi-squared Automatic Interaction
Detector). Performs multi-level splits when
computing classification trees)
• MARS: extends decision trees to handle
numerical data better.
How it works
• The core algorithm for building decisions tress
called ID3 by J.R. Quinlan which employs a
top-down, greedy search through the space of
possible branches with no backtracking
• ID3 uses Entropy and information Gain to
construct a decision tree
DIVIDE-AND-CONQUER(CONSTRUCTING
DECISION TREES
• Divide and Conquer approach (Strategy: top
down)
– First: select attribute for root node : create branch
for each possible attribute value
– Then: split instances into subsets ; One for each
branch extending from the node
– Finally: repeat recursively for each branch, using
only instances that reach the branch
• Stop if all instances have same class
Which attribute to select?
• Entropy = p
i
i log 2 pi
10
Entropy
• A decision tree is built top-down from root node and involves
partitioning the data into subsets that contain instances with
similar values(homogeneous)
• ID3 algorithm uses entropy to calculate the homogeneity of a
sample
• If the sample is completely homogeneous the entropy is zero and
if the samples is an equally divided it has entropy of one
2-Class Cases:
Minimum
• What is the entropy of a group in which all impurity
examples belong to the same class?
– entropy =
Entropy = p
i
i log 2 pi
12
2-Class Cases:
Minimum
• What is the entropy of a group in which all impurity
examples belong to the same class?
– entropy = - 1 log21 = 0
13
Information Gain
15
Information Gain
• We want to determine which attribute in a given set
of training feature vectors is most useful for
discriminating between the classes to be learned.
16
Calculating Information Gain
Information Gain = entropy(parent) – [average entropy(children)]
gain(population)=info([14,16])-info([13,4],[1,12])
13 13 4 4
impurity
child log 2 log 2 0.787
entropy
17 17 17 17
Entire population (30 instances)
17 instances
child
entropy 1 1 12 12
impurity log 2 log 2 0.391
13 13 13 13
parent 14 14 16 16
mpurity log 2 log 2 0.996
entropy 30 30 30 30 13 instances
17 13
(Weighted) Average Entropy of Children = 0 . 787 0 . 391 0.615
30 30
Information Gain= 0.996 - 0.615 = 0.38 17
Calculating Information Gain
Information Gain = entropy(parent) – [average entropy(children)]
gain(population)=info([14,16])-info([13,4],[1,12])
14 14 16 16
info[14/16]=entropy(14/30,16/30) =
impurity log 2 log 2 0.996
30 30 30 30
17 13
info([13,4],[1,12]) = 0.787 0.391 0.615
30 30
Outlook = Sunny :
info[([2,3])=
Outlook = Overcast :
Info([4,0])=
Outlook = Rainy :
Info([2,3])=
p
i
i log 2 pi
Outlook = Sunny :
info[([2,3])=entropy(2/5,3/5)=
Outlook = Overcast :
Info([4,0])=entropy(1,0)=
Outlook = Rainy :
Info([2,3])=entropy(3/5,2/5)=
p
i
i log 2 pi
Outlook = Sunny :
info[([2,3])=entropy(2/5,3/5)=−2/5log(2/5)−3/5log(3/5)=0.971bits
Note: log(0) is normally
Outlook = Overcast : undefined but we
Info([4,0])=entropy(1,0)=−1log(1)−0log(0)=0bits evaluate 0*log(0) as
zero
Outlook = Rainy :
Info([2,3])=entropy(3/5,2/5)=−3/5log(3/5)−2/5log(2/5)=0.971bits
Outlook = Sunny :
info[([2,3])=entropy(2/5,3/5)=−2/5log(2/5)−3/5log(3/5)=0.971bits
Note: log(0) is normally
Outlook = Overcast : undefined but we
Info([4,0])=entropy(1,0)=−1log(1)−0log(0)=0bits evaluate 0*log(0) as
zero
Outlook = Rainy :
Info([2,3])=entropy(3/5,2/5)=−3/5log(3/5)−2/5log(2/5)=0.971bits
Humidity = normal :
Info([6,1])=entropy(6/7,1/7)=−6/7log(6/7)−1/7log(1/7)=0.191+0.401=0.592 bits
info(nodes)
info(nodes) info(nodes)
info(nodes) =Info([3,4],[6,1])
=Info([2,3],[4,0],[3,2]) =Info([2,2],[4,2],[3,1])
=Info([6,2],[3,3]) =0.788bits
=0.693bits =0.911 bits
=0.892 bits
gain= 0.940-0.788
gain= 0.940-0.693 gain=0.940-0.911
gain=0.940-0.892 =0.152 bits
= 0.247 bits = 0.029 bits
= 0.048 bits
• Information gain tells us how important a given attribute of the feature vectors is.
• We will use it to decide the ordering of attributes in the nodes of a decision tree.
• Constructing a decision tree is all about finding attribute that returns the highest
information gain(the most homogeneous branches)
Continuing to split
• gain(Outlook) =0.247 bits
• gain(Temperature ) = 0.029 bits
• gain(Humidity ) = 0.152 bits
• gain(Windy ) = 0.048 bits
Outlook Temp Humidity Windy Play
Sunny Hot High False No
Outlook Temp Humidity Windy Play
Sunny Hot High True No
Sunny Hot High False No
Overcast Hot High False Yes
Sunny Hot High True No
Rainy Mild High False Yes Sunny Mild High False No
Rainy Cool Normal False Yes Sunny Cool Normal False Yes
Rainy Cool Normal True No Sunny Mild Normal True Yes
Overcast Cool Normal True Yes
Sunny Mild High False No
Sunny Cool Normal False Yes
Temp Humidity Windy Play
Rainy Mild Normal False Yes
Hot High False No
Sunny Mild Normal True Yes
Hot High True No
Overcast Mild High True Yes
Mild High False No
Overcast Hot Normal False Yes
Cool Normal False Yes
Rainy Mild High True No
Mild Normal True Yes
Temp Humidity Windy Play
Hot High False No
Temperature Humidity
Hot High True No
Windy
Play
False True
No No No Yes
No Yes No Yes
Yes
No
Temperature = Hot :
Temperature info[([2,0])=entropy(1,0)=entropy(1,0)=−1log(1)−0log(0)=0 bits Play
Temperature = Mild :
Info([1,1])=entropy(1/2,1/2)=−1/2log(1/2)−1/2log(1/2)=0.5+0.5=1 bits
Hot Cool Temperature = Cool :
Mild Info([1,0])=entropy(1,0)= 0 bits
Windy = False :
Windy info[([2,1])=entropy(2/3,1/3)=−2/3log(2/3)−1/3log(1/3)=0.9179 bits
Windy = True :
Info([1,1])=entropy(1/2,1/2)=1 bits
False True
Expected information for attribute: gain(Temperature ) = 0.571 bits
Info([2,1],[1,1])=(3/5)×0.918+(2/5)×1=0.951bits
No No gain(Humidity ) = 0.971 bits
gain(Windy ) = 0.020 bits
No gain(Windy ) = info([3,2]) – info([2,1],[1,1])
Yes = 0.971-0.951= 0.020 bits
Yes
Humidity = High :
Humidity info[([3,0])=entropy(1,0)=0bits
Humidity = Normal :
High Normal Info([2,0])=entropy(1,0)=0 bits
Yes No
Yes No
Yes
Final decision tree
Note: not all leaves need to be pure; sometimes identical
instances have different classes
⇒ Splitting stops when data can’t be split any further
• Simplification of computation:
•
Highly-branching attributes
Homer 0” 250 36 M
Marge 10” 150 34 F
Bart 2” 90 10 M
Lisa 6” 78 8 F
Maggie 4” 20 1 F
Abe 1” 170 70 M
Selma 8” 160 41 F
Otto 10” 180 38 M
Krusty 6” 200 45 M
Comic 8” 290 38 ?
Person Hair
Length
Weight Age Class
Homer 0” 250 36 M
Marge 10” 150 34 F
Bart 2” 90 10 M
Lisa 6” 78 8 F
Maggie 4” 20 1 F
Abe 1” 170 70 M
Selma 8” 160 41 F
Otto 10” 180 38 M
Krusty 6” 200 45 M
Comic 8” 290 38 ?
Entropy(4F,5M) = -(4/9)log2(4/9) - (5/9)log2(5/9)
= 0.9911
yes no
age <= 40?
Let us try splitting on
Age
yes no
Hair Length <= 2?
yes no
Entropy(4F,0M) = 0
Entropy(0F,1M) =0
yes no
Age <= 2?
age <= 40?
Entropy(4F,0M) = 0
Entropy(0F,1M) =0
• Features/Attributes:
– Type: Italian, French,Thai
– Environment: Fancy, classical
– Occupied?
Occupied Type Rainy Hungry Gf/friend Happiness Class
T Pizza T T T T
F Thai T T T F
T Thai F T T F
F Other F T T F
T Other F T T T
Example of C4.5 algorithm
99.999999999944%
Outlook Temp Humidity Windy Play Outlook Temp Humidity Windy Play
Sunny Hot High False No Rainy Mild High False Yes
Sunny Hot High True No Rainy Cool Normal False Yes
Sunny Mild High False No Rainy Cool Normal True No
Sunny Cool Normal False Yes Rainy Mild Normal False Yes
Sunny Mild Normal True Yes Rainy Mild High True No
X1 X2 X3 X4 C
X1 X2 X3 X4 C
T T T F P
F F F F P
T F F F N
F F T T P
T T T T N
F T F T P
T T T F N
X={X1,X2,X3}
X={X1,X2,X3}
X={X1,X2,X3}
X1 X2 X3 X4 C
X1 X2 X3 X4 C All attributes have same information gain.
T T T F P Break ties arbitrarily.
F F F F P Choose X2
T F F F N
F F T T P
T T T T N
F T F T P
T T T F N
X={X1,X2,X3}
No Yes
Outlook
Gain(Temperature)=0.971-0.8=0.171
Gain(Windy)=0.971-0.951=0.020
Gain(Humidity)=0.971-0.551=0.420
Outlook
O T H W P Sunny Rainy
S H H F Y
Overcast
S H H T N Humidity Windy
S M H F N
High Yes
Normal
Temperature
Temperature Yes No Yes
Hot
Hot Mild
Mild
O T H W P
Yes No S H H F Y O T H W P
No S H H T N S M H F N
Windy
False True
No No
Yes
Outlook
Sunny Rainy
Overcast
Humidity Windy
High Yes
Normal
Temperature Yes No Yes
Hot
Mild
Windy
No
O T H W P True False
S H H T N O T H W P
No Yes S H H F Y
Decision Tree Cont.
76
Example 2:
X={X1,X2,X3,X4}
Entropy(D)=entropy(4/7,3/7)=0.98
D=X1 X2 X3 X4 C
F F F F P
F F T T P
F T F T P
T T T F P Gain(X1 ) = 0.98 - 0.46 = 0.52 Gain(X2 ) = 0.98 – 0.97 = 0.01
T F F F N
T T T T N Gain(X1 ) = 0.52
Gain(X2 ) = 0.01
T T T F N
Gain(X3 ) = 0.01
Gain(X4 ) = 0.01 X1 X2 X3 X4 C
X1 X2 X3 X4 C T T T F P
F F F F P T F F F N
F F T T P T T T T N
F T F T P T T T F N
77
X={X1,X2,X3} X={X1,X2,X3}
X={X1,X2,X3}
X1 X2 X3 X4 C
X1 X2 X3 X4 C All attributes have same information gain.
T T T F P Break ties arbitrarily.
F F F F P Choose X2
T F F F N
F F T T P
T T T T N
F T F T P
T T T F N
X={X1,X2,X3}
79
Example 3
Outlook Temp Humidity Windy Play
Sunny Hot High False Yes
Sunny Hot High True No Outlook
No Yes
80
Outlook
Gain(Temperature)=0.971-0.8=0.171
Gain(Windy)=0.971-0.951=0.020
Gain(Humidity)=0.971-0.551=0.420
81
Outlook
O T H W P Sunny Rainy
S H H F Y
Overcast
S H H T N Humidity Windy
S M H F N
High Yes
Normal
Temperature
Temperature Yes No Yes
Hot
Hot Mild
Mild
O T H W P
Yes No S H H F Y O T H W P
No S H H T N S M H F N
Windy
False True
No No
Yes
82
Outlook
Sunny Rainy
Overcast
Humidity Windy
High Yes
Normal
Temperature Yes No Yes
Hot
Mild
Windy
No
O T H W P True False
S H H T N O T H W P
No Yes S H H F Y
83