Professional Documents
Culture Documents
Decision Tree Learning
Decision Tree Learning
Decision Tree Learning
Refund
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
Refund
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
Apply Model to Test Data
Test Data
Refund
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
Apply Model to Test Data
Test Data
Refund
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
Apply Model to Test Data
Test Data
Refund
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
Apply Model to Test Data
Test Data
Refund
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
Apply Model to Test Data
Test Data
Refund
Yes No
NO MarSt
Single, Divorced Married Assign Cheat to “No”
TaxInc NO
< 80K > 80K
NO YES
Decision Tree for PlayTennis
Outlook
13
Top-Down Induction of Decision
Trees ID3
14
Which attribute is best?
G H L M
15
Entropy
17
Information Gain (S=E)
G H True False
18
Information Gain
S=[9+,5-] S=[9+,5-]
E=0.940 E=0.940
Humidity Wind
Gain(S,Humidity) Gain(S,Wind)
=0.940-(7/14)*0.985 =0.940-(8/14)*0.811
– (7/14)*0.592 – (6/14)*1.0
=0.151 =0.048
21
Humidity provides greater info. gain than Wind, w.r.t target classification.
Selecting the Next Attribute
S=[9+,5-]
E=0.940
Outlook
Over
Sunny cast Rain
Gain(S,Outlook)
=0.940-(5/14)*0.971
-(4/14)*0.0 – (5/14)*0.0971
=0.247
22
Selecting the Next Attribute
23
ID3 Algorithm
[D1,D2,…,D14] Outlook
[9+,5-]
? Yes ?
24
ID3 Algorithm
Outlook
No Yes No Yes
26
Avoid Overfitting
27
Effect of Reduced Error Pruning
28
Converting a Tree to Rules
Outlook
No Yes No Yes
30
Unknown Attribute Values
31
Cross-Validation
32