Professional Documents
Culture Documents
Decision Trees
Decision Trees
Agenda
The Problem
INPUT Chennai to Bangalore
Long distance?
YES NO
Public Transport
Available Take A car
Chennai to Bangalore
Public Transport
Available? Long distance? Take A car
YES NO
YES NO
YES NO YES NO
Yes & NO
Tea or Coffee
Decision trees help us decide the
splitting condition
Time of the
Hours of sleep COFFEE/TEA
Day
7 3pm TEA
9 6pm TEA
5 10am COFFEE
6 11am COFFEE
9 3pm TEA
IS IT AFTER 4PM
TEA
OR
NO YES I WILL HAVE
TEA COFFEE
NO SHOULD I
SLEEP FOR
YES
MORE THAN
6 HOURS?
YES NO
CH
DECISION
AN
YES NO
79 tea 79 tea
100 Is it before 4PM? 21 coffee 100 Sleep>6hours
21 coffee
YES
70 30 NO YES
60 40 NO
1. GINI IMPURITY
2. ENTROPY & INFORMATION GAIN
1. GINI IMPURITY
2. ENTROPY & INFORMATION GAIN
GINI IMPURITY
60 tea 20 tea
0 coffee 20 coffee
pure impure
79 tea
100 Is it before 4PM? 21 coffee
YES
70 30 NO
Calculating
Gini impurity for a 54 tea 25 tea
subnode 16 coffee 5 coffee
Calculate for sub-nodes
Gini impurity of a subnode = 1 - (probability of tea)^2 - (probability of coffee)^2
79 tea
100 Is it before 4PM? 21 coffee Calculating
Gini impurity for a
subnode
YES
70 30 NO
54 tea 25 tea
16 coffee 5 coffee
Calculate for split
Gini impurity of a subnode = 1 - (probability of tea)^2 - (probability of coffee)^2
79 tea
100 Is it before 4PM? 21 coffee
YES
70 30 NO
54 tea 25 tea
16 coffee 5 coffee
Gini impurity of a SPLIT = weighted average of the gini impurities of its sub branch
Gini impurity of split
0.35(70/100)+0.277(30/100)=0.328
79 tea
100 Sleep>6hours
21 coffee
YES
60 40 NO
60 tea 19 tea
0 coffee 21 coffee
Gini impurity of a SPLIT = weighted average of the gini impurities of its sub branch
Gini impurity of split
0(60/100)+0.498(40/100)=0.199
79 tea 79 tea
100 Is it before 4PM? 21 coffee 100 Sleep>6hours
21 coffee
YES
70 30 NO YES
60 40 NO
79 tea
100 Is it before 4PM? 21 coffee
YES
70 30 NO
79 tea Calculating
100 Is it before 4PM? 21 coffee Gini impurity for a
Calculating subnode
Gini impurity for a
subnode
YES
70 30 NO
SPLITTING CONDITION
ENTROPY
YES NO
CH
Public Transport
Take A car
Available NODE
2. ENTROPY & INFORMATION GAIN
BR
YES NO
-( )
Calculate for sub-nodes
79 tea
100 Is it before 4PM? 21 coffee
YES
70 30 NO
54 tea 25 tea
16 coffee 5 coffee
Calculate for split
79 tea
100 Is it before 4PM? 21 coffee
YES
70 30 NO
54 tea 25 tea
16 coffee 5 coffee
entropy of split
0.775(70/100)+0.438(30/100)=0.673
79 tea
100 Sleep>6hours
21 coffee
YES
60 40 NO
60 tea 19 tea
0 coffee 21 coffee
60 tea 19 tea
0 0 coffee 21 coffee
79 tea
100 Sleep>6hours
21 coffee
YES
60 40 NO
60 tea 19 tea
0 0 coffee 21 coffee
entropy of split
0(60/100)+0.998(40/100)=0.399
entropy of previous split
0.775(70/100)+0.438(30/100)=0.673
INFORMATION GAIN
How do we think of this?
54 tea 25 tea
16 coffee 5 coffee
entropy of split
0.775(70/100)+0.438(30/100)=0.673
PRUNING
PrePruning
PostPruning
Setting a maximum depth to a tree
Reduces the size of decision trees by removing
sections of
the tree that are non-critical and redundant to
classify instances
We first make the decision tree to a large depth.
Suppose a split is giving us a gain of say -10 (loss
of 10) and then the next split on that gives us a
gain of 20.