Professional Documents
Culture Documents
AiML 4 3
AiML 4 3
AiML 4 3
AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 3
ID3 a Worked Example
• Recall the example of determining whether to play
outdoor sport based on weather conditions.
• The ID3 algorithm is long, but will:
– Calculate entropy
– Calculate gain
• Try to keep track of the steps, although they may be
difficult to follow at first.
AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 4
Play Outdoor Sport
Num Outlook Temperature Humidity Wind Play?
1 Sunny Hot High Weak No
2 Sunny Hot High Strong No
3 Overcast Hot High Weak Yes
4 Rain Mild High Weak Yes
5 Rain Cold Normal Weak Yes
6 Rain Cold Normal Strong No
7 Overcast Cold Normal Strong Yes
8 Sunny Mild High Weak No
9 Sunny Cold Normal Weak Yes
10 Rain Mild Normal Weak Yes
11 Sunny Mild Normal Strong Yes
12 Overcast Mild High Strong Yes
13 Overcast Hot Normal Weak Yes
14 Rain Mild High Strong No
AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 5
Play Classification
• First use the classification PLAY to calculate entropy on
the outcome.
• This classification (outcome) has two possible values –
yes or no.
• Therefore, the entropy equation will have 2 attributes.
• There are 9-Yes and 5-No values in 14 rows of the data
set.
AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 6
Play Entropy
• Entropy (Play): 9-Yes, and 5-No outcomes:
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑆𝑆 = −𝑝𝑝𝑦𝑦𝑦𝑦𝑦𝑦 𝑙𝑙𝑙𝑙𝑙𝑙2 𝑝𝑝𝑦𝑦𝑦𝑦𝑦𝑦 − 𝑝𝑝𝑛𝑛𝑛𝑛 𝑙𝑙𝑙𝑙𝑙𝑙2 𝑝𝑝𝑛𝑛𝑛𝑛
9 9 5 5
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑆𝑆 = − ∗ 𝑙𝑙𝑙𝑙𝑙𝑙2 − ∗ 𝑙𝑙𝑙𝑙𝑙𝑙2
14 14 14 14
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑆𝑆 = − 0.643 ∗ −0.637 − 0.357 ∗ −1.485
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑆𝑆 = − −0.410 − −0.530
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑆𝑆 = 0.410 + 0.530
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑆𝑆 = 0.94
AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 7
Play Entropy
• Check your answer with an online entropy calculator
such as:
https://planetcalc.com/2476/
• Add the 2 probabilities (9/14 and 5/14), and then click
calculate.
AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 8
Information Gain
• Information gain measures how much entropy is
reduced when partitioning on an attribute (A).
• The higher the number, the better it classifies the data.
• This shows which of the dependant data would be the
best choice from this entropy.
𝑆𝑆𝑣𝑣
𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺 𝑆𝑆, 𝐴𝐴 = 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑆𝑆 − � 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 (𝑆𝑆𝑣𝑣 )
𝑆𝑆
𝑣𝑣𝑣𝑣 𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉(𝐴𝐴)
AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 9
Information Gain
• Information gain will be calculated for each of the
weather attributes:
– Outlook
– Temperature
– Humidity
– Wind
• First, calculate entropy for each attribute value in that
column, and then calculate gain for that attribute.
AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 10
Outlook Attribute
outlook # of Positive # of negative Total
outcomes (yes) (no)
Sunny 2 3 5
Overcast 4 0 4
Rain 3 2 5
• Calculate:
– Entropy (Sunny)
– Entropy(Overcast)
– Entropy(Rain)
AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 11
Outlook - Entropy(Sunny)
• Entropy (Sunny): 2-Yes, and 3-No outcomes
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 = −𝑝𝑝𝑦𝑦𝑦𝑦𝑦𝑦 𝑙𝑙𝑙𝑙𝑙𝑙2 𝑝𝑝𝑦𝑦𝑦𝑦𝑦𝑦 − 𝑝𝑝𝑛𝑛𝑛𝑛 𝑙𝑙𝑙𝑙𝑙𝑙2 𝑝𝑝𝑛𝑛𝑛𝑛
2 2 3 3
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 2+, 3 − = −( )𝑙𝑙𝑙𝑙𝑙𝑙2 − ( )𝑙𝑙𝑙𝑙𝑙𝑙2
5 5 5 5
2 2 3 3
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 = −( )𝑙𝑙𝑙𝑙𝑙𝑙2 − ( )𝑙𝑙𝑙𝑙𝑙𝑙2
5 5 5 5
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 = − 0.4 ∗ −1.322 − 0.6 ∗ −0.737
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 = 0.529 + 0.442 = 0.971
AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 12
Outlook - Entropy (Overcast)
• Entropy (Overcast): 4 Yes, and 0 No outcomes
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 Overcast = −𝑝𝑝𝑦𝑦𝑦𝑦𝑦𝑦 𝑙𝑙𝑙𝑙𝑙𝑙2 𝑝𝑝𝑦𝑦𝑦𝑦𝑦𝑦 − 𝑝𝑝𝑛𝑛𝑛𝑛 𝑙𝑙𝑙𝑙𝑙𝑙2 𝑝𝑝𝑛𝑛𝑛𝑛
4 4 0 0
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 4+, 0 − = −( )𝑙𝑙𝑙𝑙𝑙𝑙2 − ( )𝑙𝑙𝑙𝑙𝑙𝑙2
4 4 4 4
4 4
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 Overcast = −( )𝑙𝑙𝑙𝑙𝑙𝑙2 −0
4 4
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 Overcast = − 1 ∗ −0 − 0
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 Overcast = 0
• This is expected - the result is always known.
AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 13
Outlook − Entropy (Rain)
• Entropy (Rain): 3 Yes, and 2 No outcomes
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 Rain = −𝑝𝑝𝑦𝑦𝑦𝑦𝑦𝑦 𝑙𝑙𝑙𝑙𝑙𝑙2 𝑝𝑝𝑦𝑦𝑦𝑦𝑦𝑦 − 𝑝𝑝𝑛𝑛𝑛𝑛 𝑙𝑙𝑙𝑙𝑙𝑙2 𝑝𝑝𝑛𝑛𝑛𝑛
3 3 2 2
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 3+, 2 − = −( )𝑙𝑙𝑙𝑙𝑙𝑙2 − ( )𝑙𝑙𝑙𝑙𝑙𝑙2
5 5 5 5
1 1 1 1
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 Rain = −( )𝑙𝑙𝑙𝑙𝑙𝑙2 − ( )𝑙𝑙𝑙𝑙𝑙𝑙2
2 2 2 2
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 Rain = − 0.5 ∗ −1 − 0.5 ∗ −1
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 Rain = 0.442 + 0.529 = 0.971
AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 14
Gain
Previously calculated from Play and Outlook:
• Play (Yes/No) - 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑆𝑆 = 0.94
• Entropy(Sunny) = 0.971
• Entropy(Overcast) = 0
• Entropy(Rain) = 0.971
AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 15
Gain
Outlook numbers needed from
Outlook to calculate gain:
• There are 5 sunny outcomes out of a
possible 14.
• There are 4 overcast outcomes out
of a possible 14.
• There are 5 rain outcomes out of a
possible 14.
AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 16
Gain (S, Outlook)
𝑆𝑆𝑣𝑣
𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺 𝑆𝑆, 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 = 0.94 − � 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 (𝑆𝑆𝑣𝑣 )
𝑆𝑆
𝑣𝑣𝑣𝑣 𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉(𝐴𝐴)
5 4 5
𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺 𝑆𝑆, 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 = 0.94 − ∗ 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸(𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 ) − ∗ 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸(𝑆𝑆𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂 ) − ∗ 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸(𝑆𝑆𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 )
14 14 14
AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 17
Gain
• Now repeat for temperature, humidity, and wind.
• Beginning with temperature, calculate entropy, and
then gain.
cold 3 1 4
mild 4 2 6
hot 2 2 4
AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 18
Temperature - Entropy (Cold)
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 cold = −𝑝𝑝𝑦𝑦𝑦𝑦𝑦𝑦 𝑙𝑙𝑙𝑙𝑙𝑙2 𝑝𝑝𝑦𝑦𝑦𝑦𝑦𝑦 − 𝑝𝑝𝑛𝑛𝑛𝑛 𝑙𝑙𝑙𝑙𝑙𝑙2 𝑝𝑝𝑛𝑛𝑛𝑛
3 3 1 1
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 3+, 1 − = −( )𝑙𝑙𝑙𝑙𝑙𝑙2 − ( )𝑙𝑙𝑙𝑙𝑙𝑙2
4 4 4 4
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 cold = − 0.75 ∗ −0.415 − 0.25 ∗ −2.0
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 cold = 0.311 + 0.5 = 0.811
cold 3 1 4
mild 4 2 6
hot 2 2 4
AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 19
Temperature - Entropy (Mild)
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 mild = −𝑝𝑝𝑦𝑦𝑦𝑦𝑦𝑦 𝑙𝑙𝑙𝑙𝑙𝑙2 𝑝𝑝𝑦𝑦𝑦𝑦𝑦𝑦 − 𝑝𝑝𝑛𝑛𝑛𝑛 𝑙𝑙𝑙𝑙𝑙𝑙2 𝑝𝑝𝑛𝑛𝑛𝑛
4 4 2 2
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 4+, 2 − = −( )𝑙𝑙𝑙𝑙𝑙𝑙2 − ( )𝑙𝑙𝑙𝑙𝑙𝑙2
6 6 6 6
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 mild = − 0.667 ∗ −0.584 − 0.333 ∗ −1.586
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 mild = 0.390 + 0.528 = 0.918
cold 3 1 4
mild 4 2 6
hot 2 2 4
AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 20
Temperature - Entropy (Hot)
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 hot = −𝑝𝑝𝑦𝑦𝑦𝑦𝑦𝑦 𝑙𝑙𝑙𝑙𝑙𝑙2 𝑝𝑝𝑦𝑦𝑦𝑦𝑦𝑦 − 𝑝𝑝𝑛𝑛𝑛𝑛 𝑙𝑙𝑙𝑙𝑙𝑙2 𝑝𝑝𝑛𝑛𝑛𝑛
2 2 2 2
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 2+, 2 − = −( )𝑙𝑙𝑙𝑙𝑙𝑙2 − ( )𝑙𝑙𝑙𝑙𝑙𝑙2
4 4 4 4
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 hot = − 0.5 ∗ −1.0 − 0.5 ∗ −1.0
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 hot = 0.5 + 0.5 = 1.0
cold 3 1 4
mild 4 2 6
hot 2 2 4
AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 21
Gain (S, Temperature)
𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺 𝑆𝑆, temperature
𝑆𝑆𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑆𝑆𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 𝑆𝑆ℎ𝑜𝑜𝑜𝑜
= 0.94 − ∗ 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸(𝑆𝑆𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 ) − ∗ 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸(𝑆𝑆𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 ) − ∗ 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸(𝑆𝑆ℎ𝑜𝑜𝑜𝑜 )
14 14 14
4 6 4
𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺 𝑆𝑆, temperature = 0.94 − ∗ 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸(𝑆𝑆𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 ) − ∗ 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸(𝑆𝑆𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 ) − ∗ 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸(𝑆𝑆ℎ𝑜𝑜𝑜𝑜 )
14 14 14
𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺 𝑆𝑆, temperature = 0.94 − 0.286 ∗ 0.811 − 0.429 ∗ 0.918 − 0.286 ∗ 1.0
AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 22
Humidity - Entropy (High)
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 ℎ𝑖𝑖𝑖𝑖𝑖 = −𝑝𝑝𝑦𝑦𝑦𝑦𝑦𝑦 𝑙𝑙𝑙𝑙𝑙𝑙2 𝑝𝑝𝑦𝑦𝑦𝑦𝑦𝑦 − 𝑝𝑝𝑛𝑛𝑛𝑛 𝑙𝑙𝑙𝑙𝑙𝑙2 𝑝𝑝𝑛𝑛𝑛𝑛
3 3 4 4
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 3+, 4 − = −( )𝑙𝑙𝑙𝑙𝑙𝑙2 − ( )𝑙𝑙𝑙𝑙𝑙𝑙2
7 7 7 7
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 ℎ𝑖𝑖𝑖𝑖𝑖 = − 0.429 ∗ −1.221 − 0.571 ∗ −0.808
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 ℎ𝑖𝑖𝑖𝑖𝑖 = 0.524 + 0.461 = 0.985
AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 23
Humidity – Entropy (Normal)
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 normal = −𝑝𝑝𝑦𝑦𝑦𝑦𝑦𝑦 𝑙𝑙𝑙𝑙𝑙𝑙2 𝑝𝑝𝑦𝑦𝑦𝑦𝑦𝑦 − 𝑝𝑝𝑛𝑛𝑛𝑛 𝑙𝑙𝑙𝑙𝑙𝑙2 𝑝𝑝𝑛𝑛𝑛𝑛
6 6 1 1
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 5+, 2 − = −( )𝑙𝑙𝑙𝑙𝑙𝑙2 − ( )𝑙𝑙𝑙𝑙𝑙𝑙2
7 7 7 7
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 normal = − 0.857 ∗ −0.223 − 0.143 ∗
− 2.806
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 normal = 0.191 + 0.401 = 0.592
AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 24
Gain (S, Humidity)
𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺 𝑆𝑆, ℎ𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢
𝑆𝑆ℎ𝑖𝑖𝑖𝑖𝑖 𝑆𝑆𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛
= 0.94 − ∗ 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑆𝑆ℎ𝑖𝑖𝑖𝑖𝑖 − ∗ 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑆𝑆𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛
14 14
7 7
𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺 𝑆𝑆, ℎ𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢 = 0.94 − ∗ 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑆𝑆ℎ𝑖𝑖𝑖𝑖𝑖 − ∗ 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑆𝑆𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛
14 14
AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 25
Wind - Entropy (Weak)
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 weak = −𝑝𝑝𝑦𝑦𝑦𝑦𝑦𝑦 𝑙𝑙𝑙𝑙𝑙𝑙2 𝑝𝑝𝑦𝑦𝑦𝑦𝑦𝑦 − 𝑝𝑝𝑛𝑛𝑛𝑛 𝑙𝑙𝑙𝑙𝑙𝑙2 𝑝𝑝𝑛𝑛𝑛𝑛
6 6 2 2
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 6+, 2 − = −( )𝑙𝑙𝑙𝑙𝑙𝑙2 − ( )𝑙𝑙𝑙𝑙𝑙𝑙2
8 8 8 8
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 weak = − 0.75 ∗ −0.415 − 0.25 ∗ −2.0
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 weak = 0.311 + 0.5 = 0.811
AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 26
Wind - Entropy (Strong)
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 strong = −𝑝𝑝𝑦𝑦𝑦𝑦𝑦𝑦 𝑙𝑙𝑙𝑙𝑙𝑙2 𝑝𝑝𝑦𝑦𝑦𝑦𝑦𝑦 − 𝑝𝑝𝑛𝑛𝑛𝑛 𝑙𝑙𝑙𝑙𝑙𝑙2 𝑝𝑝𝑛𝑛𝑛𝑛
3 3 3 3
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 3+, 3 − = −( )𝑙𝑙𝑙𝑙𝑙𝑙2 − ( )𝑙𝑙𝑙𝑙𝑙𝑙2
6 6 6 6
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 strong = − 0.5 ∗ −1 − 0.5 ∗ −1.0
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 strong = 0.5 + 0.5 = 1.0
AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 27
Gain (S, Wind)
𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺 𝑆𝑆, 𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤
𝑆𝑆𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤 𝑆𝑆𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠
= 0.94 − ∗ 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑆𝑆𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤 − ∗ 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑆𝑆𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠
14 14
AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 28
Gain
• Calculations for gain:
– Gain (S, Outlook)=0.246
– Gain (S, Temperature)=0.028
– Gain (S, Humidity)=0.151
– Gain (S, Wind)=0.048
• Outlook has the highest gain, and will be the best
choice for the root of the tree.
AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 29
Splitting Data
Notice which rows will be passed down:
Outlook
1,2,3,4,5,6,7,8,9,10,11,12,13,14
AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 30
Round 2
• The rows are passed to the next level based on the
three choices of Sunny, Overcast or Rain.
• Now begin the process again for each of the 3 data sets.
• There are three sets (Sunny, Overcast, Rain) that must
repeat the same process.
• This is what the ID3 algorithm does!
AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 31
Sunny Branch
• First, check the branch Sunny.
• Remember that ID3 tests to see if a table is empty.
• It is not empty, therefore, do not add a default
categorization.
• ID3 also tests if the outcomes are all the same.
• There are a variety of outcomes, so do not add a
categorisation.
AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 32
Sunny Data
• Rows 1, 2, 8, 9, and 11 produce the following table:
row Outlook Temperature Humidity Wind Play?
1 Sunny Hot High Weak No
2 Sunny Hot High Strong No
8 Sunny Mild High Weak No
9 Sunny Cold Normal Weak Yes
11 Sunny Mild Normal Strong Yes
AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 33
ID3
• There are 2-Yes, and 3-No outcomes for Play.
• The ID3 algorithm says
If all examples are positive, Return the single-node tree Root, with
label = +.
If all examples are negative, Return the single-node tree Root,
with label = -.
AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 34
Outlook Sunny
• Begin the process again, using the “Sunny” Outlook.
(Rainy will be next – remember that Overcast was
eliminated because it had all “Yes” outcomes?)
AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 35
Outlook Sunny
• This process will be like the process used to calculate
gains in the previous exercise (when Outlook was
chosen as the root).
• The Outlook column is eliminated since it was selected
elsewhere (root) in our branch.
AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 36
Attribute Selection
• Which attribute should be chosen under the Sunny
branch node?
• Play has the following 2* yes and 3 * no
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑆𝑆 = −𝑝𝑝𝑦𝑦𝑦𝑦𝑦𝑦 𝑙𝑙𝑙𝑙𝑙𝑙2 𝑝𝑝𝑦𝑦𝑦𝑦𝑦𝑦 − 𝑝𝑝𝑛𝑛𝑛𝑛 𝑙𝑙𝑙𝑙𝑙𝑙2 𝑝𝑝𝑛𝑛𝑛𝑛
2 2 3 3
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑆𝑆 = − ∗ 𝑙𝑙𝑙𝑙𝑙𝑙2 − ∗ 𝑙𝑙𝑙𝑙𝑙𝑙2
5 5 5 5
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑆𝑆 = −0.4 ∗ −1.322 − 0.6 ∗ −0.737
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑆𝑆 = 0.529 + 0.442 = 0.971
AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 37
Calculate Gain
• First calculate entropy for Temperature
Temperature # of Positive # of negative Total Can you guess
what the
Values outcomes (yes) (no) entropy values
hot 0 2 2 will be?
mild 1 1 2
cold 1 0 1
• It is always negative when it is hot, so the entropy will
be 0.
• It is always positive when it is cold so the entropy for
will also be 0.
• For mild it is 1 for positive and negative out of a total of
2 so the entropy will be 1.
AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 38
Gain (S, Temperature)
𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺 𝑆𝑆, 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡
𝑆𝑆ℎ𝑜𝑜𝑜𝑜 𝑆𝑆𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚
= 0.972 − ∗ 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑆𝑆ℎ𝑜𝑜𝑜𝑜 − ∗ 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑆𝑆𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚
5 5
𝑆𝑆𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐
− ∗ 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑆𝑆𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐
5
AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 39
Gain (S, Humidity)
Humidity # of Positive # of negative Total
Values outcomes (yes) (no)
high 0 3 3
normal 2 0 2
AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 40
Gain (S, Humidity)
𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺 𝑆𝑆, ℎ𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢
𝑆𝑆ℎ𝑖𝑖𝑖𝑖𝑖 𝑆𝑆𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛
= 0.972 − ∗ 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑆𝑆ℎ𝑖𝑖𝑖𝑖𝑖 − ∗ 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑆𝑆𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛
5 5
AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 41
Finally for Wind
• Strong will have entropy of 1
AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 42
Entropy (Weak)
𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤 = −𝑝𝑝𝑦𝑦𝑦𝑦𝑦𝑦 𝑙𝑙𝑙𝑙𝑙𝑙2 𝑝𝑝𝑦𝑦𝑦𝑦𝑦𝑦 − 𝑝𝑝𝑛𝑛𝑛𝑛 𝑙𝑙𝑙𝑙𝑙𝑙2 𝑝𝑝𝑛𝑛𝑛𝑛
1 1 2 2
𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤 = − ∗ 𝑙𝑙𝑙𝑙𝑙𝑙2 − ∗ 𝑙𝑙𝑙𝑙𝑙𝑙2
3 3 3 3
AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 43
Gain (S, Wind)
𝐺𝐺𝑎𝑎𝑎𝑎𝑎𝑎 𝑆𝑆, 𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤
3 2
= 0.972 − ∗ 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑆𝑆𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤 − ∗ 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑆𝑆𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠
5 5
AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 44
Attribute Selection
• That gives us
– 𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺 𝑆𝑆, 𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤 = 0.021
– 𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺 𝑆𝑆, ℎ𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢 = 0.972
– 𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺 𝑆𝑆, 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 = 0.572
• Humidity is the highest so it will be chosen at this
branch
AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 45
Data Flow
Outlook
(1,2,3,4,5,6,7,8,9,10,1
1,12,13,14)
high normal
1,2,8 9,11
AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 46
Round 3
• Again, start the process with the new tables as done
previously.
• First look at the data that has reached humidity high.
• Notice that the result is always No.
• The algorithm says if all the results (negative in this
example) are equal, then this path is complete.
Row Temperature Humidity Wind Play?
1 Hot High Weak No
2 Hot High Strong No
8 Mild High Weak No
AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 47
Humidity Normal
• Notice that Humidity Normal is the same.
• Both results are equal so this path is complete.
row Temperature Humidity Wind Play?
AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 48
Current Diagram
Outlook
(1,2,3,4,5,6,7,8,9,1
0,11,12,13,14)
Sunny Overcast Rain
Humidity 3,7,12,13 4,5,6,10,14
1,2,8,9,11
high normal
No Yes
1,2,8 9,11
AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 49
Overcast
• As all rows under Outlook –> Sunny are now complete,
move on to Outlook –> Overcast.
Num Outlook Temperature Humidity Wind Play?
3 Overcast Hot High Weak Yes
7 Overcast Cold Normal Strong Yes
12 Overcast Mild High Strong Yes
13 Overcast Hot Normal Weak Yes
AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 50
Diagram with Overcast Complete
Outlook
(1,2,3,4,5,6,7,8,9
,10,11,12,13,14)
high normal
No Yes
1,2,8 9,11
AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 51
Complete for Outlook - Rain
• The last steps are to complete the process for rain.
• Find the best gain from the remaining data.
• You should find that Wind produces the highest gain.
AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 52
Wind
Split them into strong and weak to see that both produce
a definite result, the branch is finished.
AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 53
Final Tree
Outlook
(1,2,3,4,5,6,7,8,9
,10,11,12,13,14)
AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 54
Scenario Missed
• Every row from the original data is now categorized.
• The only scenario not covered in the above using the
ID3 algorithm is when the attributes are exhausted
before fully classifying the data.
• For example use the following information:
Outlook Play?
Overcast Yes
Overcast Yes
Overcast Yes
Overcast No
AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 55
Choose Classification
Outlook Play?
Overcast Yes
Overcast Yes
Overcast Yes
Overcast No
AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 56
Overview
• On first reading the ID3 algorithm may seem
complicated, but there are few decisions to make.
• Work through some of the steps manually to get a
better understanding of the process.
• There are also other examples online with the play
sport example that are explained in different ways.
• Search for these to help your understanding.
AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 57
Summary
In this lesson, you should have learned how to:
• Calculate entropy
• Calculate gain
• Manually work through the ID3 algorithm
AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 58