Download as pdf or txt
Download as pdf or txt
You are on page 1of 52

CSC354

Machine Learning
Dr Muhammad Sharjeel
04
Decision Trees
Ø General motive of Decision Tree (DT) is to create a training model which can
predict class (or value) of target variables by learning decision rules inferred from
prior data (training data)
Ø In a DT, each node represents a feature (attribute), each link (branch) a decision
(rule) and each leaf an outcome
Ø Belongs to the family of supervised learning algorithms
Ø Could be used to solve both regression and classification problems
Ø A transparent algorithm, means decisions can be read and understood

CSC354 – Machine Learning Dr Muhammad Sharjeel


Ø Algorithm pseudocode
1. Place the best attribute of the dataset (complete training set) at the root of
the tree
2. Split the training set into subsets in such a way that each subset contains
data with the same value for an attribute
3. Repeat step 1 and step 2 on each subset until you find leaf nodes in all the
branches of the tree

CSC354 – Machine Learning Dr Muhammad Sharjeel


Ø Three implementations used to create DTs
Ø ID3
Ø C4.5
Ø CART

CSC354 – Machine Learning Dr Muhammad Sharjeel


Ø ID3 (Iterative Dichotomiser), uses information gain as metric
Ø Dichotomisation means dividing something into two completely opposite things
Ø ID3 iteratively divides attributes into two groups (dominant vs others) to construct
a tree
Ø Dominant attributes are selected based on information gain
Ø Performs top-down, greedy search through the space of possible decision trees
Ø Top-down means it starts building the tree from the top
Ø Greedy means at each iteration it selects the best feature at the present moment to
create a node

CSC354 – Machine Learning Dr Muhammad Sharjeel


Ø To create DT using ID3
Ø Shortlist a root node among all the nodes (nodes are ‘features/attributes’ in the dataset)
Ø Determine a node (attribute) that best classifies the training data and use it as the root
Ø Repeat the process for each branch

CSC354 – Machine Learning Dr Muhammad Sharjeel


Ø Which attribute (node) best classifies the training data?
Ø Most dominant attribute would be the one with the highest information gain
Ø Information gain calculates the reduction in the entropy
Ø Entropy of a dataset is the measure of disorder in the target attribute
Ø In other words, it’s the uncertainty in the dataset
Ø Entropy measures
Ø How well a given attribute/feature separates (or classifies) the target classes
Ø Attribute with the highest information gain is selected as the best one
Ø Entropy (S) = ∑ – p(I) . log2p(I)
Ø Gain (S, A) = Entropy(S) – ∑ [ p(S|A) . Entropy(S|A) ]

CSC354 – Machine Learning Dr Muhammad Sharjeel


Ø Compute the entropy [Entropy(S)] for the entire dataset
Ø For each attribute/feature:
Ø Calculate entropy [Entropy(A)] for each value of the attribute
Ø Calculate average information entropy (IE) for the attribute
Ø Calculate information gain (IG) for the attribute
Ø Pick the highest gain attribute
Ø Repeat until the complete tree is formed

CSC354 – Machine Learning Dr Muhammad Sharjeel


Ø Example dataset, 14 instances, 4 input attributes
No. Outlook Temperature Humidity Wind PlayGolf
1 Sunny Hot High Weak No
2 Sunny Hot High Strong No
3 Overcast Hot High Weak Yes
4 Rain Mild High Weak Yes
5 Rain Cool Normal Weak Yes
6 Rain Cool Normal Strong No
7 Overcast Cool Normal Strong Yes
8 Sunny Mild High Weak No
9 Sunny Cool Normal Weak Yes
10 Rain Mild Normal Weak Yes
11 Sunny Mild Normal Strong Yes
12 Overcast Mild High Strong Yes
13 Overcast Hot Normal Weak Yes
14 Rain Mild High Strong No

CSC354 – Machine Learning Dr Muhammad Sharjeel


Ø Compute the entropy [Entropy(S)] for the entire dataset
Ø Entropy(S) = – p(Yes) . log2p(Yes) – p(No) . log2p(No)
Ø Entropy(S) = – (9/14) . log2(9/14) – (5/14) . log2(5/14) = 0.940

CSC354 – Machine Learning Dr Muhammad Sharjeel


Ø For each attribute/feature: (let say, Outlook)
Ø Calculate entropy [Entropy(A)] for each value of the attribute, i.e., in case of Outlook,
'Sunny', 'Rainy’, 'Overcast'

Outlook PlayGolf Outlook PlayGolf Outlook PlayGolf


Sunny No Rain Yes Overcast Yes
Sunny No Rain Yes Overcast Yes
Sunny No Rain No Overcast Yes
Sunny Yes Rain Yes Overcast Yes
Sunny Yes Rain No

Outlook Positive Negative Entropy

Sunny 2 3 0.971

Rainy 3 2 0.971

Overcast 4 0 0

CSC354 – Machine Learning Dr Muhammad Sharjeel


Ø For each attribute/feature:
Ø Calculate average information entropy (IE) for the attribute (i.e., Outlook)
Ø Entropy(IE)[Outlook] = (2+3/9+5)*0.971 + (3+2/9+5)*0.971 + (4+0/9+5)*0
Ø Entropy(IE)[Outlook] = 0.693

Ø Calculate information gain (IG) for the attribute (i.e., Outlook)


Ø IG(Outlook) = 0.940 – 0.693 = 0.247

CSC354 – Machine Learning Dr Muhammad Sharjeel


Ø Pick the highest gain attribute, in this case, Outlook

Attributes Gain
Outlook 0.247

Temperature 0.029
Outlook
Humidity 0.152

Wind 0.048

CSC354 – Machine Learning Dr Muhammad Sharjeel


Ø Outlook (Overcast) only contains examples of ‘Yes’
Ø Outlook (Sunny, Rain) contains both ‘Yes’ and ‘No’ examples

Outlook

Sunny Overcast Rain

? Yes ?

Ø Repeat until the complete tree is formed

CSC354 – Machine Learning Dr Muhammad Sharjeel


Ø Outlook (Overcast) only contains examples of ‘Yes’
Ø Outlook (Sunny, Rain) contains both ‘Yes’ and ‘No’ examples

Outlook

Yes

Outlook Temperature Humidity Wind PlayGolf Outlook Temperature Humidity Wind PlayGolf
Sunny Hot High Weak No Rain Mild High Weak Yes
Sunny Hot High Strong No Rain Cool Normal Weak Yes
Sunny Mild High Weak No Rain Cool Normal Strong No
Sunny Cool Normal Weak Yes Rain Mild Normal Weak Yes
Sunny Mild Normal Strong Yes Rain Mild High Strong No

CSC354 – Machine Learning Dr Muhammad Sharjeel


Ø Outlook (Sunny)
Outlook Temperature Humidity Wind PlayGolf
Sunny Hot High Weak No
Sunny Hot High Strong No
Sunny Mild High Weak No
Sunny Cool Normal Weak Yes
Sunny Mild Normal Strong Yes

Ø Entropy(S) = 0.971
Ø Entropy(A)[Temperature](Cool) = 0
Ø Entropy(A)[Temperature](Hot) = 0
Ø Entropy(A)[Temperature](Mild) = 1
Ø IE(Temperature) = 0.4
Ø IG(Temperature) = 0.571

CSC354 – Machine Learning Dr Muhammad Sharjeel


Ø Outlook (Sunny)
Outlook Temperature Humidity Wind PlayGolf
Sunny Hot High Weak No
Sunny Hot High Strong No
Sunny Mild High Weak No
Sunny Cool Normal Weak Yes
Sunny Mild Normal Strong Yes

Ø Entropy(S) = 0.971
Ø Entropy(A)[Humidity](High) = 0
Ø Entropy(A)[Humidity](Normal) = 0
Ø IE(Humidity) = 0
Ø IG(Humidity) = 0.971

CSC354 – Machine Learning Dr Muhammad Sharjeel


Ø Outlook (Sunny)
Outlook Temperature Humidity Wind PlayGolf
Sunny Hot High Weak No
Sunny Hot High Strong No
Sunny Mild High Weak No
Sunny Cool Normal Weak Yes
Sunny Mild Normal Strong Yes

Ø Entropy(S) = 0.971
Ø Entropy(A)[Wind](Strong) = 1
Ø Entropy(A)[Wind](Weak) = 0.918
Ø IE(Wind) = 0.951
Ø IG(Wind) = 0.020

CSC354 – Machine Learning Dr Muhammad Sharjeel


Ø Pick the highest gain attribute, in this case, Humidity

Outlook

Sunny Overcast Rain

Humidity Yes ?

Normal High

Yes No

CSC354 – Machine Learning Dr Muhammad Sharjeel


Ø Outlook (Rain)
Outlook Temperature Humidity Wind PlayGolf
Rain Mild High Weak Yes
Rain Cool Normal Weak Yes
Rain Cool Normal Strong No
Rain Mild Normal Weak Yes
Rain Mild High Strong No

Ø Entropy(S) = 0.971
Ø Entropy(A)[Temperature](Cool) = 1
Ø Entropy(A)[Temperature](Mild) = 0.918
Ø IE(Temperature) = 0.951
Ø IG(Temperature) = 0.020

CSC354 – Machine Learning Dr Muhammad Sharjeel


Ø Outlook (Rain)
Outlook Temperature Humidity Wind PlayGolf
Rain Mild High Weak Yes
Rain Cool Normal Weak Yes
Rain Cool Normal Strong No
Rain Mild Normal Weak Yes
Rain Mild High Strong No

Ø Entropy(S) = 0.971
Ø Entropy(A)[Humidity](High) = 1
Ø Entropy(A)[Humidity](Normal) = 0.918
Ø IE(Humidity) = 0.951
Ø IG(Humidity) = 0.020

CSC354 – Machine Learning Dr Muhammad Sharjeel


Ø Outlook (Rain)
Outlook Temperature Humidity Wind PlayGolf
Rain Mild High Weak Yes
Rain Cool Normal Weak Yes
Rain Cool Normal Strong No
Rain Mild Normal Weak Yes
Rain Mild High Strong No

Ø Entropy(S) = 0.971
Ø Entropy(A)[Wind](Weak) = 0
Ø Entropy(A)[Wind](Strong) = 0
Ø IE(Humidity) = 0
Ø IG(Humidity) = 0.971

CSC354 – Machine Learning Dr Muhammad Sharjeel


Ø Pick the highest gain attribute, in this case, Wind

Outlook

Sunny Overcast Rain

Humidity Yes Wind

Normal High Weak Strong

Yes No Yes No

CSC354 – Machine Learning Dr Muhammad Sharjeel


Ø Use the final DT (ID3) to classify an unseen example
Ø Outlook = Sunny, Temperature = Cool, Humidity = High, Wind = Strong
Ø Output = No

CSC354 – Machine Learning Dr Muhammad Sharjeel


Ø Shortcomings of ID3
Ø Information gain reduces the entropy due to the selection of a particular
attribute
Ø It has biasness in considering attributes with a large number of distinct
values which might lead to overfitting
Ø It continues to go deeper and deeper (builds many branches) to reduce the
training error but results in an increased test error
Ø Overfitting: Model fits on training data well but fails to generalize
Ø Underfitting: Model is too simple to find the patterns in the data

CSC354 – Machine Learning Dr Muhammad Sharjeel


Ø Improving ID3
Ø Pruning is a mechanism that reduces the size and complexity of a DT by
removing unnecessary nodes
Ø Pre-pruning, stops the tree construction bit early
Ø Do not split a node if its goodness measure is below a threshold value
Ø Post-pruning, once a DT is complete, cross-validation is performed to
test whether expanding a node makes an improvement
Ø If it shows an improvement, continue expanding the node
Ø If it shows a reduction in accuracy, node is converted to a leaf node
Ø To overcome problems in information gain, the information gain ratio is used
(C4.5)

CSC354 – Machine Learning Dr Muhammad Sharjeel


Ø C4.5 is the improved version of ID3
Ø Create more generalized models
Ø Works with continuous data
Ø Could handle missing data
Ø Avoids overfitting
Ø Also known as J48 (C4.5 release 8)
Ø Uses the information gain ratio as metric to split the dataset
Ø Information gain (used in ID3) tends to prefer the attributes with more categories
Ø Such attributes tends to have lower entropy
Ø Results in overfitting
Ø Gain ratio mitigates this issue by penalising attributes having more categories
Ø It uses split information (or intrinsic information)

CSC354 – Machine Learning Dr Muhammad Sharjeel


Ø Information gain ratio
Ø GainRatio(A) = Gain(A) / SplitInfo(A)
Ø Split information
Ø SplitInfo(A) = -∑ |Dj|/|D| . log2|Dj|/|D|

CSC354 – Machine Learning Dr Muhammad Sharjeel


Ø Example dataset, 14 instances, 4 input attributes
No. Outlook Temperature Humidity Wind PlayGolf
1 Sunny Hot High Weak No
2 Sunny Hot High Strong No
3 Overcast Hot High Weak Yes
4 Rain Mild High Weak Yes
5 Rain Cool Normal Weak Yes
6 Rain Cool Normal Strong No
7 Overcast Cool Normal Strong Yes
8 Sunny Mild High Weak No
9 Sunny Cool Normal Weak Yes
10 Rain Mild Normal Weak Yes
11 Sunny Mild Normal Strong Yes
12 Overcast Mild High Strong Yes
13 Overcast Hot Normal Weak Yes
14 Rain Mild High Strong No

CSC354 – Machine Learning Dr Muhammad Sharjeel


Ø Split information for Outlook attribute
Ø Sunny = 5, Overcast = 4, Rain = 5
Ø SplitInfo(Outlook) = - (5/14).log2(5/14) - (4/14).log2(4/14) - (5/14).log2(5/14) = 1.577
Ø GainRatio(Outlook) = 0.247/1.577 = 0.156

Ø Entropy of the whole dataset, Outlook attribute entropy, and information gain of
Outlook already calculated (ID3)
Ø Entropy(S) = 0.940
Ø Entropy(IE)[Outlook] = 0.693
Ø IG(Outlook) = 0.940 – 0.693 = 0.247

CSC354 – Machine Learning Dr Muhammad Sharjeel


Ø Gain ratio for Temperature attribute
Ø Hot = 4, Mild = 6, Cool = 4
Ø SplitInfo(Temperature) = - (4/14).log2(4/14) - (6/14).log2(6/14) - (4/14).log2(4/14) = 1.556
Ø GainRatio(Temperature) = 0.029/1.556 = 0.018
Ø Gain ratio for Humidity attribute
Ø High = 7, Normal = 7
Ø SplitInfo(Humidity) = - (7/14).log2(7/14) - (7/14).log2(7/14) = 1
Ø GainRatio(Humidity) = 0.152/1 = 0.152
Ø Gain ratio for Wind attribute
Ø Weak = 8, Strong = 6
Ø SplitInfo(Wind) = - (8/14).log2(8/14) - (6/14).log2(6/14) = 0.985
Ø GainRatio(Wind) = 0.048/0.985 = 0.048

CSC354 – Machine Learning Dr Muhammad Sharjeel


Ø Gain ratio of Outlook is the highest, so it will be the root node

Outlook

Yes

Outlook Temperature Humidity Wind PlayGolf Outlook Temperature Humidity Wind PlayGolf
Sunny Hot High Weak No Rain Mild High Weak Yes
Sunny Hot High Strong No Rain Cool Normal Weak Yes
Sunny Mild High Weak No Rain Cool Normal Strong No
Sunny Cool Normal Weak Yes Rain Mild Normal Weak Yes
Sunny Mild Normal Strong Yes Rain Mild High Strong No

CSC354 – Machine Learning Dr Muhammad Sharjeel


Ø Outlook (Sunny)
Outlook Temperature Humidity Wind PlayGolf
Sunny Hot High Weak No
Sunny Hot High Strong No
Sunny Mild High Weak No
Sunny Cool Normal Weak Yes
Sunny Mild Normal Strong Yes

Ø GainRatio(Temperature) = 0.571/1.521 = 0.375


Ø GainRatio(Humidity) = 0.971/0.971 = 1
Ø GainRatio(Wind) = 0.020/0.971 = 0.233

CSC354 – Machine Learning Dr Muhammad Sharjeel


Ø Outlook (Rain)
Outlook Temperature Humidity Wind PlayGolf
Rain Mild High Weak Yes
Rain Cool Normal Weak Yes
Rain Cool Normal Strong No
Rain Mild Normal Weak Yes
Rain Mild High Strong No

Ø GainRatio(Temperature) = 0.020/0.971 = 0.020


Ø GainRatio(Humidity) = 0.020/0.971 = 0.020
Ø GainRatio(Wind) = 0.971/0.971 = 1

CSC354 – Machine Learning Dr Muhammad Sharjeel


Ø Final DT using C4.5

Outlook

Sunny Overcast Rain

Humidity Yes Wind

Normal High Weak Strong

Yes No Yes No

CSC354 – Machine Learning Dr Muhammad Sharjeel


Ø Use the final DT (C4.5) to classify an unseen example
Ø Outlook = Rain, Temperature = Cool, Humidity = High, Wind = Weak
Ø Output = Yes

CSC354 – Machine Learning Dr Muhammad Sharjeel


Ø Some drawback of C4.5
Ø Split ratio is higher for multi-valued attributes (more outcomes)
Ø Tends to prefer unbalanced splits in which one partition is much smaller than others
Ø Classification And Regression Tree (CART) uses gini index as metric
Ø If a dataset D contains examples from n classes, gini index is defined as
Ø Gini(D) = 1 – Σ (Pi)2 for i=1 to n (number of classes)
Ø It creates a binary tree
Ø If there are more than two outcomes then gini index is
Ø GiniA(D) = (D1/D).Gini(D1) + (D2/D).Gini(D2)
Ø Reduction in impurity
Ø Gini(A) = Gini(D) – GiniA(D)

CSC354 – Machine Learning Dr Muhammad Sharjeel


Ø Example dataset, 14 instances, 4 input attributes
No. Outlook Temperature Humidity Wind PlayGolf
1 Sunny Hot High Weak No
2 Sunny Hot High Strong No
3 Overcast Hot High Weak Yes
4 Rain Mild High Weak Yes
5 Rain Cool Normal Weak Yes
6 Rain Cool Normal Strong No
7 Overcast Cool Normal Strong Yes
8 Sunny Mild High Weak No
9 Sunny Cool Normal Weak Yes
10 Rain Mild Normal Weak Yes
11 Sunny Mild Normal Strong Yes
12 Overcast Mild High Strong Yes
13 Overcast Hot Normal Weak Yes
14 Rain Mild High Strong No

CSC354 – Machine Learning Dr Muhammad Sharjeel


Ø Total 14 examples, 9 positive, 5 negative
Ø Gini(D) = 1 – (9/14)2 – (5/14)2 = 0.459
Ø Compute gini index of each attribute
Ø Lets start with Outlook (Sunny, Overcast, Rain)
Ø As the attribute has three values, it will have 6 subsets
Ø {(Sunny, Overcast), (Overcast, Rain), (Sunny, Rain), (Sunny), (Overcast), (Rain)}
Ø Empty and All subsets are not used
Ø Gini(S,O), R = (9/14) x [1 - (6/9)2 - (3/9)2] + (5/14) x [1 - (3/5)2 - (2/5)2] = 0.457
Ø Gini(O,R), S = (9/14) x [1 - (7/9)2 - (2/9)2] + (5/14) x [1 - (2/5)2 - (3/5)2] = 0.393
Ø Gini(S,R), O = (10/14) x [1 - (5/10)2 - (5/10)2] + (4/14) x [1 - (4/4)2 - (0/4)2] = 0.357
Ø Gini(A) = 0.459 – 0.357 = 0.101

CSC354 – Machine Learning Dr Muhammad Sharjeel


Ø Next attribute Temperature (Hot, Mild, Cool)
Ø As the attribute has three values, it will have 6 subsets
Ø {(Hot, Mild), (Hot, Cool), (Mild, Cool), (Hot), (Mild), (Cool)}
Ø Empty and All subsets are not used
Ø Gini(H,M), C = (10/14) x [1 - (6/10)2 - (4/10)2] + (4/14) x [1 - (3/4)2 - (1/4)2] = 0.450
Ø Gini(H,C), M = (8/14) x [1 - (5/8)2 - (3/8)2] + (6/14) x [1 - (4/6)2 - (2/6)2] = 0.458
Ø Gini(M,C), H = (10/14) x [1 - (7/10)2 - (3/10)2] + (4/14) x [1 - (2/4)2 - (2/4)2] = 0.442
Ø Gini(A) = 0.459 – 0.442 = 0.016

CSC354 – Machine Learning Dr Muhammad Sharjeel


Ø Next attribute Humidity (High, Normal)
Ø Attribute has only two values
Ø GiniH, N = (7/14) x [1 - (6/7)2 - (1/7)2] + (7/14) x [1 - (3/7)2 - (4/7)2] = 0.367
Ø Gini(A) = 0.459 – 0.367 = 0.091

Ø Next attribute Wind (Weak, Strong)


Ø GiniW, S = (8/14) x [1 - (6/8)2 - (2/8)2] + (6/14) x [1 - (3/6)2 - (3/6)2] = 0.428
Ø Gini(A) = 0.459 – 0.428 = 0.030

Ø The attribute with the highest gini index is Outlook, hence, it will be chosen as root node
Ø Within the Outlook, [(Sunny, Rain), Overcast] [Gini(S,R), O] has the lowest gini index

CSC354 – Machine Learning Dr Muhammad Sharjeel


Ø Partial DT using CART
Outlook

Sunny, Rain Overcast

Outlook Temperature Humidity Wind PlayGolf


Yes
Sunny Hot High Weak No
Sunny Hot High Strong No
Rain Mild High Weak Yes
Rain Cool Normal Weak Yes
Rain Cool Normal Strong No
Sunny Mild High Weak No
Sunny Cool Normal Weak Yes
Rain Mild Normal Weak Yes
Sunny Mild Normal Strong Yes
Rain Mild High Strong No

CSC354 – Machine Learning Dr Muhammad Sharjeel


Ø Calculate the gini index for the following subset Outlook (Sunny, Rain)

Outlook Temperature Humidity Wind PlayGolf


Sunny Hot High Weak No
Sunny Hot High Strong No
Rain Mild High Weak Yes
Rain Cool Normal Weak Yes
Rain Cool Normal Strong No
Sunny Mild High Weak No
Sunny Cool Normal Weak Yes
Rain Mild Normal Weak Yes
Sunny Mild Normal Strong Yes
Rain Mild High Strong No

CSC354 – Machine Learning Dr Muhammad Sharjeel


Ø C4.5 with the continues (numeric) data
Ø Example dataset, 14 instances, 4 input attributes, 2 attributes with continuous data
No. Outlook Temperature Humidity Wind PlayGolf
1 Sunny 85 85 Weak No
2 Sunny 80 90 Strong No
3 Overcast 83 78 Weak Yes
4 Rain 70 96 Weak Yes
5 Rain 68 80 Weak Yes
6 Rain 65 70 Strong No
7 Overcast 64 65 Strong Yes
8 Sunny 72 95 Weak No
9 Sunny 69 70 Weak Yes
10 Rain 75 80 Weak Yes
11 Sunny 75 70 Strong Yes
12 Overcast 72 90 Strong Yes
13 Overcast 81 75 Weak Yes
14 Rain 71 80 Strong No

CSC354 – Machine Learning Dr Muhammad Sharjeel


Ø Outlook and Wind are nominal attributes
Ø Gain ratio for Wind = 0.048
Ø Gain ratio for Outlook = 0.156
Ø Humidity and Temperature are continuous attributes
Ø Convert continuous values to nominal ones
Ø Perform binary split based on a threshold value
Ø Threshold should be a value which offers maximum gain for an attribute

CSC354 – Machine Learning Dr Muhammad Sharjeel


Ø Separate dataset into two parts
Ø Instances less than or equal to
Ø Instances greater than
Ø How?
Ø Sort the attribute values in ascending order
Ø Calculate gain ratio for every value
Ø Value which maximizes the gain would be the threshold (separator)

CSC354 – Machine Learning Dr Muhammad Sharjeel


Ø Sort the Humidity values smallest to largest
Humidity PlayGolf
65 Yes
70 No
70 Yes
70 Yes
75 Yes
78 Yes
80 Yes
80 Yes
80 No
85 No
90 No
90 Yes
95 No
96 Yes

CSC354 – Machine Learning Dr Muhammad Sharjeel


Ø Humidity (65)
Ø Entropy(Humidity<=65) = -(0/1).log2(0/1) – (1/1).log2(1/1) = 0
Ø Entropy(Humidity>65) = -(5/13).log2(5/13) – (8/13).log2(8/13) = 0.961
Ø Gain(Humidity<=,> 65) = 0.940 – (1/14).0 – (13/14).(0.961) = 0.048
Ø SplitInfo(Humidity<=,> 65) = -(1/14).log2(1/14) -(13/14).log2(13/14) = 0.371
Ø GainRatio(Humidity<=,> 65) = 0.126
Ø Humidity (70)
Ø Entropy(Humidity<=70) = – (1/4).log2(1/4) – (3/4).log2(3/4) = 0.811
Ø Entropy(Humidity>70) = – (4/10).log2(4/10) – (6/10).log2(6/10) = 0.970
Ø Gain(Humidity<=,> 70) = 0.940 – (4/14).(0.811) – (10/14).(0.970) = 0.014
Ø SplitInfo(Humidity<=,> 70) = -(4/14).log2(4/14) -(10/14).log2(10/14) = 0.863
Ø GainRatio(Humidity<=,> 70) = 0.016

CSC354 – Machine Learning Dr Muhammad Sharjeel


Ø GainRatio(Humidity<=,> 75) = 0.047
Ø GainRatio(Humidity <=,> 78) = 0.090
Ø GainRatio(Humidity <=,> 80) = 0.107
Ø GainRatio(Humidity <=,> 85) = 0.027
Ø GainRatio(Humidity <=,> 90) = 0.016
Ø GainRatio(Humidity <=,> 95) = 0.128

Ø No calculation of gain ratio for Humidity (96) because it cannot be greater than
this value
Ø Gain is maximum when threshold is equal to Humidity (80)

CSC354 – Machine Learning Dr Muhammad Sharjeel


Ø Apply the same process on Temperature as its values are continuous too
Ø Gain is maximum when Temperature (80)
Ø GainRatio(Temperature<=, > 83) = 0.305
Ø Gain ratio for all the attributes is summarized in the following table
Attribute GainRatio
Wind 0.049
Outlook 0.155
Humidity <=, > 0.107
Temperature <=, > 0.305

Ø Temperature will be the root node as it has the highest gain ratio value
Ø Can you build the complete DT?

CSC354 – Machine Learning Dr Muhammad Sharjeel


Thanks

You might also like