AiML 4 3

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 59

Artificial Intelligence with

Machine Learning in Java


4-3
ID3 Worked Example

Copyright © 2019, Oracle and/or its affiliates. All rights reserved.


Objectives
This lesson covers the following objectives:
• Calculate entropy
• Calculate gain
• Manually work through the ID3 algorithm

AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 3
ID3 a Worked Example
• Recall the example of determining whether to play
outdoor sport based on weather conditions.
• The ID3 algorithm is long, but will:
– Calculate entropy
– Calculate gain
• Try to keep track of the steps, although they may be
difficult to follow at first.

AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 4
Play Outdoor Sport
Num Outlook Temperature Humidity Wind Play?
1 Sunny Hot High Weak No
2 Sunny Hot High Strong No
3 Overcast Hot High Weak Yes
4 Rain Mild High Weak Yes
5 Rain Cold Normal Weak Yes
6 Rain Cold Normal Strong No
7 Overcast Cold Normal Strong Yes
8 Sunny Mild High Weak No
9 Sunny Cold Normal Weak Yes
10 Rain Mild Normal Weak Yes
11 Sunny Mild Normal Strong Yes
12 Overcast Mild High Strong Yes
13 Overcast Hot Normal Weak Yes
14 Rain Mild High Strong No

AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 5
Play Classification
• First use the classification PLAY to calculate entropy on
the outcome.
• This classification (outcome) has two possible values –
yes or no.
• Therefore, the entropy equation will have 2 attributes.
• There are 9-Yes and 5-No values in 14 rows of the data
set.

AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 6
Play Entropy
• Entropy (Play): 9-Yes, and 5-No outcomes:
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑆𝑆 = −𝑝𝑝𝑦𝑦𝑦𝑦𝑦𝑦 𝑙𝑙𝑙𝑙𝑙𝑙2 𝑝𝑝𝑦𝑦𝑦𝑦𝑦𝑦 − 𝑝𝑝𝑛𝑛𝑛𝑛 𝑙𝑙𝑙𝑙𝑙𝑙2 𝑝𝑝𝑛𝑛𝑛𝑛
9 9 5 5
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑆𝑆 = − ∗ 𝑙𝑙𝑙𝑙𝑙𝑙2 − ∗ 𝑙𝑙𝑙𝑙𝑙𝑙2
14 14 14 14
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑆𝑆 = − 0.643 ∗ −0.637 − 0.357 ∗ −1.485
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑆𝑆 = − −0.410 − −0.530
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑆𝑆 = 0.410 + 0.530
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑆𝑆 = 0.94

AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 7
Play Entropy
• Check your answer with an online entropy calculator
such as:
https://planetcalc.com/2476/
• Add the 2 probabilities (9/14 and 5/14), and then click
calculate.

AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 8
Information Gain
• Information gain measures how much entropy is
reduced when partitioning on an attribute (A).
• The higher the number, the better it classifies the data.
• This shows which of the dependant data would be the
best choice from this entropy.

𝑆𝑆𝑣𝑣
𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺 𝑆𝑆, 𝐴𝐴 = 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑆𝑆 − � 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 (𝑆𝑆𝑣𝑣 )
𝑆𝑆
𝑣𝑣𝑣𝑣 𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉(𝐴𝐴)

AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 9
Information Gain
• Information gain will be calculated for each of the
weather attributes:
– Outlook
– Temperature
– Humidity
– Wind
• First, calculate entropy for each attribute value in that
column, and then calculate gain for that attribute.

AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 10
Outlook Attribute
outlook # of Positive # of negative Total
outcomes (yes) (no)
Sunny 2 3 5
Overcast 4 0 4
Rain 3 2 5
• Calculate:
– Entropy (Sunny)
– Entropy(Overcast)
– Entropy(Rain)

AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 11
Outlook - Entropy(Sunny)
• Entropy (Sunny): 2-Yes, and 3-No outcomes
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 = −𝑝𝑝𝑦𝑦𝑦𝑦𝑦𝑦 𝑙𝑙𝑙𝑙𝑙𝑙2 𝑝𝑝𝑦𝑦𝑦𝑦𝑦𝑦 − 𝑝𝑝𝑛𝑛𝑛𝑛 𝑙𝑙𝑙𝑙𝑙𝑙2 𝑝𝑝𝑛𝑛𝑛𝑛
2 2 3 3
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 2+, 3 − = −( )𝑙𝑙𝑙𝑙𝑙𝑙2 − ( )𝑙𝑙𝑙𝑙𝑙𝑙2
5 5 5 5
2 2 3 3
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 = −( )𝑙𝑙𝑙𝑙𝑙𝑙2 − ( )𝑙𝑙𝑙𝑙𝑙𝑙2
5 5 5 5
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 = − 0.4 ∗ −1.322 − 0.6 ∗ −0.737
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 = 0.529 + 0.442 = 0.971

AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 12
Outlook - Entropy (Overcast)
• Entropy (Overcast): 4 Yes, and 0 No outcomes
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 Overcast = −𝑝𝑝𝑦𝑦𝑦𝑦𝑦𝑦 𝑙𝑙𝑙𝑙𝑙𝑙2 𝑝𝑝𝑦𝑦𝑦𝑦𝑦𝑦 − 𝑝𝑝𝑛𝑛𝑛𝑛 𝑙𝑙𝑙𝑙𝑙𝑙2 𝑝𝑝𝑛𝑛𝑛𝑛
4 4 0 0
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 4+, 0 − = −( )𝑙𝑙𝑙𝑙𝑙𝑙2 − ( )𝑙𝑙𝑙𝑙𝑙𝑙2
4 4 4 4
4 4
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 Overcast = −( )𝑙𝑙𝑙𝑙𝑙𝑙2 −0
4 4
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 Overcast = − 1 ∗ −0 − 0
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 Overcast = 0
• This is expected - the result is always known.

AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 13
Outlook − Entropy (Rain)
• Entropy (Rain): 3 Yes, and 2 No outcomes
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 Rain = −𝑝𝑝𝑦𝑦𝑦𝑦𝑦𝑦 𝑙𝑙𝑙𝑙𝑙𝑙2 𝑝𝑝𝑦𝑦𝑦𝑦𝑦𝑦 − 𝑝𝑝𝑛𝑛𝑛𝑛 𝑙𝑙𝑙𝑙𝑙𝑙2 𝑝𝑝𝑛𝑛𝑛𝑛
3 3 2 2
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 3+, 2 − = −( )𝑙𝑙𝑙𝑙𝑙𝑙2 − ( )𝑙𝑙𝑙𝑙𝑙𝑙2
5 5 5 5
1 1 1 1
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 Rain = −( )𝑙𝑙𝑙𝑙𝑙𝑙2 − ( )𝑙𝑙𝑙𝑙𝑙𝑙2
2 2 2 2
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 Rain = − 0.5 ∗ −1 − 0.5 ∗ −1
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 Rain = 0.442 + 0.529 = 0.971

AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 14
Gain
Previously calculated from Play and Outlook:
• Play (Yes/No) - 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑆𝑆 = 0.94
• Entropy(Sunny) = 0.971
• Entropy(Overcast) = 0
• Entropy(Rain) = 0.971

AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 15
Gain
Outlook numbers needed from
Outlook to calculate gain:
• There are 5 sunny outcomes out of a
possible 14.
• There are 4 overcast outcomes out
of a possible 14.
• There are 5 rain outcomes out of a
possible 14.

AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 16
Gain (S, Outlook)
𝑆𝑆𝑣𝑣
𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺 𝑆𝑆, 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 = 0.94 − � 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 (𝑆𝑆𝑣𝑣 )
𝑆𝑆
𝑣𝑣𝑣𝑣 𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉(𝐴𝐴)

𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺 𝑆𝑆, 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜


𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝑆𝑆𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂 𝑆𝑆𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅
= 0.94 − ∗ 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸(𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 ) − ∗ 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸(𝑆𝑆𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂 ) − ∗ 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸(𝑆𝑆𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 )
14 14 14

5 4 5
𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺 𝑆𝑆, 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 = 0.94 − ∗ 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸(𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 ) − ∗ 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸(𝑆𝑆𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂 ) − ∗ 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸(𝑆𝑆𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 )
14 14 14

𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺 𝑆𝑆, 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜


= 0.94 − 0.357 ∗ 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸(𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 ) − 0.286 ∗ 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸(𝑆𝑆𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂 ) − 0.357 ∗ 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸(𝑆𝑆𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 )

𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺 𝑆𝑆, 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 = 0.94 − 0.357 ∗ 0.971 − 0.286 ∗ 0 − 0.357 ∗ 0.971

𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺 𝑆𝑆, 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 = 0.94 − 0.347 − 0 − 0.347

𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺 𝑆𝑆, 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 = 0.246

AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 17
Gain
• Now repeat for temperature, humidity, and wind.
• Beginning with temperature, calculate entropy, and
then gain.

temperature # of Positive # of negative Total


Values outcomes (yes) (no)

cold 3 1 4
mild 4 2 6
hot 2 2 4

AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 18
Temperature - Entropy (Cold)
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 cold = −𝑝𝑝𝑦𝑦𝑦𝑦𝑦𝑦 𝑙𝑙𝑙𝑙𝑙𝑙2 𝑝𝑝𝑦𝑦𝑦𝑦𝑦𝑦 − 𝑝𝑝𝑛𝑛𝑛𝑛 𝑙𝑙𝑙𝑙𝑙𝑙2 𝑝𝑝𝑛𝑛𝑛𝑛
3 3 1 1
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 3+, 1 − = −( )𝑙𝑙𝑙𝑙𝑙𝑙2 − ( )𝑙𝑙𝑙𝑙𝑙𝑙2
4 4 4 4
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 cold = − 0.75 ∗ −0.415 − 0.25 ∗ −2.0
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 cold = 0.311 + 0.5 = 0.811

temperature # of Positive # of negative Total


Values outcomes (yes) (no)

cold 3 1 4
mild 4 2 6
hot 2 2 4

AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 19
Temperature - Entropy (Mild)
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 mild = −𝑝𝑝𝑦𝑦𝑦𝑦𝑦𝑦 𝑙𝑙𝑙𝑙𝑙𝑙2 𝑝𝑝𝑦𝑦𝑦𝑦𝑦𝑦 − 𝑝𝑝𝑛𝑛𝑛𝑛 𝑙𝑙𝑙𝑙𝑙𝑙2 𝑝𝑝𝑛𝑛𝑛𝑛
4 4 2 2
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 4+, 2 − = −( )𝑙𝑙𝑙𝑙𝑙𝑙2 − ( )𝑙𝑙𝑙𝑙𝑙𝑙2
6 6 6 6
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 mild = − 0.667 ∗ −0.584 − 0.333 ∗ −1.586
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 mild = 0.390 + 0.528 = 0.918

temperature # of Positive # of negative Total


Values outcomes (yes) (no)

cold 3 1 4
mild 4 2 6
hot 2 2 4

AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 20
Temperature - Entropy (Hot)
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 hot = −𝑝𝑝𝑦𝑦𝑦𝑦𝑦𝑦 𝑙𝑙𝑙𝑙𝑙𝑙2 𝑝𝑝𝑦𝑦𝑦𝑦𝑦𝑦 − 𝑝𝑝𝑛𝑛𝑛𝑛 𝑙𝑙𝑙𝑙𝑙𝑙2 𝑝𝑝𝑛𝑛𝑛𝑛
2 2 2 2
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 2+, 2 − = −( )𝑙𝑙𝑙𝑙𝑙𝑙2 − ( )𝑙𝑙𝑙𝑙𝑙𝑙2
4 4 4 4
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 hot = − 0.5 ∗ −1.0 − 0.5 ∗ −1.0
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 hot = 0.5 + 0.5 = 1.0

temperature # of Positive # of negative Total


Values outcomes (yes) (no)

cold 3 1 4
mild 4 2 6
hot 2 2 4

AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 21
Gain (S, Temperature)
𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺 𝑆𝑆, temperature
𝑆𝑆𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑆𝑆𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 𝑆𝑆ℎ𝑜𝑜𝑜𝑜
= 0.94 − ∗ 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸(𝑆𝑆𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 ) − ∗ 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸(𝑆𝑆𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 ) − ∗ 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸(𝑆𝑆ℎ𝑜𝑜𝑜𝑜 )
14 14 14

4 6 4
𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺 𝑆𝑆, temperature = 0.94 − ∗ 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸(𝑆𝑆𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 ) − ∗ 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸(𝑆𝑆𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 ) − ∗ 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸(𝑆𝑆ℎ𝑜𝑜𝑜𝑜 )
14 14 14

𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺 𝑆𝑆, temperature


= 0.94 − 0.286 ∗ 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸(𝑆𝑆𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 ) − 0.429 ∗ 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸(𝑆𝑆𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 ) − 0.286 ∗ 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸(𝑆𝑆ℎ𝑜𝑜𝑜𝑜 )

𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺 𝑆𝑆, temperature = 0.94 − 0.286 ∗ 0.811 − 0.429 ∗ 0.918 − 0.286 ∗ 1.0

𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺 𝑆𝑆, temperature = 0.94 − 0.232 − 0.394 − 0.286

𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺 𝑆𝑆, temperature = 0.028

AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 22
Humidity - Entropy (High)
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 ℎ𝑖𝑖𝑖𝑖𝑖 = −𝑝𝑝𝑦𝑦𝑦𝑦𝑦𝑦 𝑙𝑙𝑙𝑙𝑙𝑙2 𝑝𝑝𝑦𝑦𝑦𝑦𝑦𝑦 − 𝑝𝑝𝑛𝑛𝑛𝑛 𝑙𝑙𝑙𝑙𝑙𝑙2 𝑝𝑝𝑛𝑛𝑛𝑛
3 3 4 4
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 3+, 4 − = −( )𝑙𝑙𝑙𝑙𝑙𝑙2 − ( )𝑙𝑙𝑙𝑙𝑙𝑙2
7 7 7 7
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 ℎ𝑖𝑖𝑖𝑖𝑖 = − 0.429 ∗ −1.221 − 0.571 ∗ −0.808
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 ℎ𝑖𝑖𝑖𝑖𝑖 = 0.524 + 0.461 = 0.985

Humidity # of Positive # of negative Total


Values outcomes (no)
(yes)
high 3 4 7
normal 6 1 7

AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 23
Humidity – Entropy (Normal)
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 normal = −𝑝𝑝𝑦𝑦𝑦𝑦𝑦𝑦 𝑙𝑙𝑙𝑙𝑙𝑙2 𝑝𝑝𝑦𝑦𝑦𝑦𝑦𝑦 − 𝑝𝑝𝑛𝑛𝑛𝑛 𝑙𝑙𝑙𝑙𝑙𝑙2 𝑝𝑝𝑛𝑛𝑛𝑛
6 6 1 1
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 5+, 2 − = −( )𝑙𝑙𝑙𝑙𝑙𝑙2 − ( )𝑙𝑙𝑙𝑙𝑙𝑙2
7 7 7 7
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 normal = − 0.857 ∗ −0.223 − 0.143 ∗
− 2.806
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 normal = 0.191 + 0.401 = 0.592

Humidity # of Positive # of negative Total


Values outcomes (no)
(yes)
high 3 4 7
normal 6 1 7

AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 24
Gain (S, Humidity)
𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺 𝑆𝑆, ℎ𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢
𝑆𝑆ℎ𝑖𝑖𝑖𝑖𝑖 𝑆𝑆𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛
= 0.94 − ∗ 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑆𝑆ℎ𝑖𝑖𝑖𝑖𝑖 − ∗ 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑆𝑆𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛
14 14

7 7
𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺 𝑆𝑆, ℎ𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢 = 0.94 − ∗ 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑆𝑆ℎ𝑖𝑖𝑖𝑖𝑖 − ∗ 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑆𝑆𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛
14 14

𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺 𝑆𝑆, ℎ𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢 = 0.94 − 0.5 ∗ (0.985) − 0.5 ∗ (0.592)

𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺 𝑆𝑆, ℎ𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢 = 0.94 − 0.493 − 0.296

𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺 𝑆𝑆, ℎ𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢 = 0.151

AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 25
Wind - Entropy (Weak)
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 weak = −𝑝𝑝𝑦𝑦𝑦𝑦𝑦𝑦 𝑙𝑙𝑙𝑙𝑙𝑙2 𝑝𝑝𝑦𝑦𝑦𝑦𝑦𝑦 − 𝑝𝑝𝑛𝑛𝑛𝑛 𝑙𝑙𝑙𝑙𝑙𝑙2 𝑝𝑝𝑛𝑛𝑛𝑛
6 6 2 2
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 6+, 2 − = −( )𝑙𝑙𝑙𝑙𝑙𝑙2 − ( )𝑙𝑙𝑙𝑙𝑙𝑙2
8 8 8 8
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 weak = − 0.75 ∗ −0.415 − 0.25 ∗ −2.0
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 weak = 0.311 + 0.5 = 0.811

Wind Values # of Positive # of negative Total


outcomes (no)
(yes)
weak 6 2 8
strong 3 3 6

AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 26
Wind - Entropy (Strong)
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 strong = −𝑝𝑝𝑦𝑦𝑦𝑦𝑦𝑦 𝑙𝑙𝑙𝑙𝑙𝑙2 𝑝𝑝𝑦𝑦𝑦𝑦𝑦𝑦 − 𝑝𝑝𝑛𝑛𝑛𝑛 𝑙𝑙𝑙𝑙𝑙𝑙2 𝑝𝑝𝑛𝑛𝑛𝑛
3 3 3 3
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 3+, 3 − = −( )𝑙𝑙𝑙𝑙𝑙𝑙2 − ( )𝑙𝑙𝑙𝑙𝑙𝑙2
6 6 6 6
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 strong = − 0.5 ∗ −1 − 0.5 ∗ −1.0
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 strong = 0.5 + 0.5 = 1.0

Wind Values # of Positive # of negative Total


outcomes (no)
(yes)
weak 6 2 8
strong 3 3 6

AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 27
Gain (S, Wind)
𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺 𝑆𝑆, 𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤
𝑆𝑆𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤 𝑆𝑆𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠
= 0.94 − ∗ 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑆𝑆𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤 − ∗ 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑆𝑆𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠
14 14

𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺 𝑆𝑆, 𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤


8 6
= 0.94 − ∗ 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑆𝑆𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤 − ∗ 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑆𝑆𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠
14 14

𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺 𝑆𝑆, 𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤 = 0.94 − 0.571 ∗ (0.811) − 0.429 ∗ (1.0)

𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺 𝑆𝑆, 𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤 = 0.94 − 0.463 − 0.429

𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺 𝑆𝑆, 𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤 = 0.048

AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 28
Gain
• Calculations for gain:
– Gain (S, Outlook)=0.246
– Gain (S, Temperature)=0.028
– Gain (S, Humidity)=0.151
– Gain (S, Wind)=0.048
• Outlook has the highest gain, and will be the best
choice for the root of the tree.

AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 29
Splitting Data
Notice which rows will be passed down:

Outlook
1,2,3,4,5,6,7,8,9,10,11,12,13,14

Sunny Overcast Rain

1,2,8,9,11 3,7,12,13 4,5,6,10,14

AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 30
Round 2
• The rows are passed to the next level based on the
three choices of Sunny, Overcast or Rain.
• Now begin the process again for each of the 3 data sets.
• There are three sets (Sunny, Overcast, Rain) that must
repeat the same process.
• This is what the ID3 algorithm does!

AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 31
Sunny Branch
• First, check the branch Sunny.
• Remember that ID3 tests to see if a table is empty.
• It is not empty, therefore, do not add a default
categorization.
• ID3 also tests if the outcomes are all the same.
• There are a variety of outcomes, so do not add a
categorisation.

AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 32
Sunny Data
• Rows 1, 2, 8, 9, and 11 produce the following table:
row Outlook Temperature Humidity Wind Play?
1 Sunny Hot High Weak No
2 Sunny Hot High Strong No
8 Sunny Mild High Weak No
9 Sunny Cold Normal Weak Yes
11 Sunny Mild Normal Strong Yes

AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 33
ID3
• There are 2-Yes, and 3-No outcomes for Play.
• The ID3 algorithm says
If all examples are positive, Return the single-node tree Root, with
label = +.
If all examples are negative, Return the single-node tree Root,
with label = -.

• Since they are not all the same, do not add a


categorization leaf node.
• Instead, add an attribute node.

AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 34
Outlook Sunny
• Begin the process again, using the “Sunny” Outlook.
(Rainy will be next – remember that Overcast was
eliminated because it had all “Yes” outcomes?)

AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 35
Outlook Sunny
• This process will be like the process used to calculate
gains in the previous exercise (when Outlook was
chosen as the root).
• The Outlook column is eliminated since it was selected
elsewhere (root) in our branch.

• We are now left with:

AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 36
Attribute Selection
• Which attribute should be chosen under the Sunny
branch node?
• Play has the following 2* yes and 3 * no
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑆𝑆 = −𝑝𝑝𝑦𝑦𝑦𝑦𝑦𝑦 𝑙𝑙𝑙𝑙𝑙𝑙2 𝑝𝑝𝑦𝑦𝑦𝑦𝑦𝑦 − 𝑝𝑝𝑛𝑛𝑛𝑛 𝑙𝑙𝑙𝑙𝑙𝑙2 𝑝𝑝𝑛𝑛𝑛𝑛
2 2 3 3
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑆𝑆 = − ∗ 𝑙𝑙𝑙𝑙𝑙𝑙2 − ∗ 𝑙𝑙𝑙𝑙𝑙𝑙2
5 5 5 5
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑆𝑆 = −0.4 ∗ −1.322 − 0.6 ∗ −0.737
– 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑆𝑆 = 0.529 + 0.442 = 0.971

AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 37
Calculate Gain
• First calculate entropy for Temperature
Temperature # of Positive # of negative Total Can you guess
what the
Values outcomes (yes) (no) entropy values
hot 0 2 2 will be?
mild 1 1 2
cold 1 0 1
• It is always negative when it is hot, so the entropy will
be 0.
• It is always positive when it is cold so the entropy for
will also be 0.
• For mild it is 1 for positive and negative out of a total of
2 so the entropy will be 1.
AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 38
Gain (S, Temperature)
𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺 𝑆𝑆, 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡
𝑆𝑆ℎ𝑜𝑜𝑜𝑜 𝑆𝑆𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚
= 0.972 − ∗ 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑆𝑆ℎ𝑜𝑜𝑜𝑜 − ∗ 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑆𝑆𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚
5 5
𝑆𝑆𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐
− ∗ 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑆𝑆𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐
5

𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺 𝑆𝑆, 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 = 0.972 − 0 − 0.4 − 0

𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺 𝑆𝑆, 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 = 0.572

AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 39
Gain (S, Humidity)
Humidity # of Positive # of negative Total
Values outcomes (yes) (no)

high 0 3 3
normal 2 0 2

Again both of these have a definite outcome – high is


always negative and normal is always positive, so they
both have an entropy of 0.

AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 40
Gain (S, Humidity)
𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺 𝑆𝑆, ℎ𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢
𝑆𝑆ℎ𝑖𝑖𝑖𝑖𝑖 𝑆𝑆𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛
= 0.972 − ∗ 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑆𝑆ℎ𝑖𝑖𝑖𝑖𝑖 − ∗ 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑆𝑆𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛
5 5

𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺 𝑆𝑆, ℎ𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢 = 0.972 − 0 − 0

𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺 𝑆𝑆, ℎ𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢 = 0.972

AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 41
Finally for Wind
• Strong will have entropy of 1

Wind Values # of Positive # of negative Total


outcomes (no)
(yes)
weak 1 2 3
strong 1 1 2

AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 42
Entropy (Weak)
𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤 = −𝑝𝑝𝑦𝑦𝑦𝑦𝑦𝑦 𝑙𝑙𝑙𝑙𝑙𝑙2 𝑝𝑝𝑦𝑦𝑦𝑦𝑦𝑦 − 𝑝𝑝𝑛𝑛𝑛𝑛 𝑙𝑙𝑙𝑙𝑙𝑙2 𝑝𝑝𝑛𝑛𝑛𝑛

1 1 2 2
𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤 = − ∗ 𝑙𝑙𝑙𝑙𝑙𝑙2 − ∗ 𝑙𝑙𝑙𝑙𝑙𝑙2
3 3 3 3

𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤 = −0.333 ∗ −1.586 − 0.666 ∗ −0.586

𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤 = 0.528 + 0.390 = 0.918

AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 43
Gain (S, Wind)
𝐺𝐺𝑎𝑎𝑎𝑎𝑎𝑎 𝑆𝑆, 𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤
3 2
= 0.972 − ∗ 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑆𝑆𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤 − ∗ 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑆𝑆𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠
5 5

𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺 𝑆𝑆, 𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤 = 0.972 − 0.6 ∗ 0.918 − 0.4

𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺 𝑆𝑆, 𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤 = 0.972 − 0.551 − 0.4

𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺 𝑆𝑆, 𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤 = 0.021

AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 44
Attribute Selection
• That gives us
– 𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺 𝑆𝑆, 𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤 = 0.021
– 𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺 𝑆𝑆, ℎ𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢 = 0.972
– 𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺 𝑆𝑆, 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 = 0.572
• Humidity is the highest so it will be chosen at this
branch

AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 45
Data Flow
Outlook
(1,2,3,4,5,6,7,8,9,10,1
1,12,13,14)

Sunny Overcast Rain


1,2,8,9,11 3,7,12,13 4,5,6,10,14
Humidity

high normal
1,2,8 9,11

AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 46
Round 3
• Again, start the process with the new tables as done
previously.
• First look at the data that has reached humidity high.
• Notice that the result is always No.
• The algorithm says if all the results (negative in this
example) are equal, then this path is complete.
Row Temperature Humidity Wind Play?
1 Hot High Weak No
2 Hot High Strong No
8 Mild High Weak No

AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 47
Humidity Normal
• Notice that Humidity Normal is the same.
• Both results are equal so this path is complete.
row Temperature Humidity Wind Play?

9 Cold Normal Weak Yes

11 Mild Normal Strong Yes

• Now add these classifications to the diagram.

AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 48
Current Diagram
Outlook
(1,2,3,4,5,6,7,8,9,1
0,11,12,13,14)
Sunny Overcast Rain
Humidity 3,7,12,13 4,5,6,10,14
1,2,8,9,11

high normal
No Yes
1,2,8 9,11

AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 49
Overcast
• As all rows under Outlook –> Sunny are now complete,
move on to Outlook –> Overcast.
Num Outlook Temperature Humidity Wind Play?
3 Overcast Hot High Weak Yes
7 Overcast Cold Normal Strong Yes
12 Overcast Mild High Strong Yes
13 Overcast Hot Normal Weak Yes

• Notice that the result is always Yes.


• So according to ID3 this is also finished.

AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 50
Diagram with Overcast Complete
Outlook
(1,2,3,4,5,6,7,8,9
,10,11,12,13,14)

Sunny Overcast Rain


Humidity Yes 4,5,6,10,14
1,2,8,9,11 3,7,12,13

high normal
No Yes
1,2,8 9,11

AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 51
Complete for Outlook - Rain
• The last steps are to complete the process for rain.
• Find the best gain from the remaining data.
• You should find that Wind produces the highest gain.

AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 52
Wind
Split them into strong and weak to see that both produce
a definite result, the branch is finished.

Num Outlook Temperature Humidity Wind Play?


4 Rain Mild High Weak Yes
5 Rain Cold Normal Weak Yes
10 Rain Mild Normal Weak Yes

6 Rain Cold Normal Strong No


14 Rain Mild High Strong No

AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 53
Final Tree
Outlook
(1,2,3,4,5,6,7,8,9
,10,11,12,13,14)

Sunny Overcast Rain


Humidity Yes Wind
1,2,8,9,11 3,7,12,13 4,5,6,10,14

high normal strong weak


No Yes No Yes
1,2,8 9,11 6,14 4,5,10

AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 54
Scenario Missed
• Every row from the original data is now categorized.
• The only scenario not covered in the above using the
ID3 algorithm is when the attributes are exhausted
before fully classifying the data.
• For example use the following information:
Outlook Play?
Overcast Yes
Overcast Yes
Overcast Yes
Overcast No

AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 55
Choose Classification
Outlook Play?
Overcast Yes
Overcast Yes
Overcast Yes
Overcast No

• With this there is not a 100% classification.


• In this scenario, ID3 chooses the example that is most
common.
• Overcast is classified as Yes.

AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 56
Overview
• On first reading the ID3 algorithm may seem
complicated, but there are few decisions to make.
• Work through some of the steps manually to get a
better understanding of the process.
• There are also other examples online with the play
sport example that are explained in different ways.
• Search for these to help your understanding.

AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 57
Summary
In this lesson, you should have learned how to:
• Calculate entropy
• Calculate gain
• Manually work through the ID3 algorithm

AiML 4-3
ID3 Worked Example Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 58

You might also like