Professional Documents
Culture Documents
Machine Learning
Machine Learning
Machine Learning
Dr Muhammad Sharjeel
04
Decision Trees
Ø General motive of Decision Tree (DT) is to create a training model which can
predict class (or value) of target variables by learning decision rules inferred from
prior data (training data)
Ø In a DT, each node represents a feature (attribute), each link (branch) a decision
(rule) and each leaf an outcome
Ø Belongs to the family of supervised learning algorithms
Ø Could be used to solve both regression and classification problems
Ø A transparent algorithm, means decisions can be read and understood
Sunny 2 3 0.971
Rainy 3 2 0.971
Overcast 4 0 0
Attributes Gain
Outlook 0.247
Temperature 0.029
Outlook
Humidity 0.152
Wind 0.048
Outlook
? Yes ?
Outlook
Yes
Outlook Temperature Humidity Wind PlayGolf Outlook Temperature Humidity Wind PlayGolf
Sunny Hot High Weak No Rain Mild High Weak Yes
Sunny Hot High Strong No Rain Cool Normal Weak Yes
Sunny Mild High Weak No Rain Cool Normal Strong No
Sunny Cool Normal Weak Yes Rain Mild Normal Weak Yes
Sunny Mild Normal Strong Yes Rain Mild High Strong No
Ø Entropy(S) = 0.971
Ø Entropy(A)[Temperature](Cool) = 0
Ø Entropy(A)[Temperature](Hot) = 0
Ø Entropy(A)[Temperature](Mild) = 1
Ø IE(Temperature) = 0.4
Ø IG(Temperature) = 0.571
Ø Entropy(S) = 0.971
Ø Entropy(A)[Humidity](High) = 0
Ø Entropy(A)[Humidity](Normal) = 0
Ø IE(Humidity) = 0
Ø IG(Humidity) = 0.971
Ø Entropy(S) = 0.971
Ø Entropy(A)[Wind](Strong) = 1
Ø Entropy(A)[Wind](Weak) = 0.918
Ø IE(Wind) = 0.951
Ø IG(Wind) = 0.020
Outlook
Humidity Yes ?
Normal High
Yes No
Ø Entropy(S) = 0.971
Ø Entropy(A)[Temperature](Cool) = 1
Ø Entropy(A)[Temperature](Mild) = 0.918
Ø IE(Temperature) = 0.951
Ø IG(Temperature) = 0.020
Ø Entropy(S) = 0.971
Ø Entropy(A)[Humidity](High) = 1
Ø Entropy(A)[Humidity](Normal) = 0.918
Ø IE(Humidity) = 0.951
Ø IG(Humidity) = 0.020
Ø Entropy(S) = 0.971
Ø Entropy(A)[Wind](Weak) = 0
Ø Entropy(A)[Wind](Strong) = 0
Ø IE(Humidity) = 0
Ø IG(Humidity) = 0.971
Outlook
Yes No Yes No
Ø Entropy of the whole dataset, Outlook attribute entropy, and information gain of
Outlook already calculated (ID3)
Ø Entropy(S) = 0.940
Ø Entropy(IE)[Outlook] = 0.693
Ø IG(Outlook) = 0.940 – 0.693 = 0.247
Outlook
Yes
Outlook Temperature Humidity Wind PlayGolf Outlook Temperature Humidity Wind PlayGolf
Sunny Hot High Weak No Rain Mild High Weak Yes
Sunny Hot High Strong No Rain Cool Normal Weak Yes
Sunny Mild High Weak No Rain Cool Normal Strong No
Sunny Cool Normal Weak Yes Rain Mild Normal Weak Yes
Sunny Mild Normal Strong Yes Rain Mild High Strong No
Outlook
Yes No Yes No
Ø The attribute with the highest gini index is Outlook, hence, it will be chosen as root node
Ø Within the Outlook, [(Sunny, Rain), Overcast] [Gini(S,R), O] has the lowest gini index
Ø No calculation of gain ratio for Humidity (96) because it cannot be greater than
this value
Ø Gain is maximum when threshold is equal to Humidity (80)
Ø Temperature will be the root node as it has the highest gain ratio value
Ø Can you build the complete DT?