Solution 10 Decision Trees

Data Science & Advanced Topics Chair of Data Science
WS 2022/23 Prof. Michael Granitzer

Julian Stier & Saber Zerhoudi ◦ julian.stier@uni-passau.de
Sheet 10
Decision Trees
Learning Goals
• Decision Trees
• Impurity functions
• The ID3 algorithm
1. lmm Draw a decision tree for each of the following three boolean functions. (All variables
draw from the set {FALSE, TRUE} and you may use {0, 1} instead.)
A. x1 ∧ ¬x2 B. x1 XOR x2 C. x1 ∨ (x2 ∧ x3 )
Solution:
• A
x1
1/ \0
x2 f
1/ \0
f t
• B
x1
/ \
0/ \1
x2 x2
/ | |\
1/ |0 1| \0
t f f t
• C
x1
/ \
1/ \0
t x2
/ \
0/ \1
f x3
/ \
0/ \1
f t
Sheet 10 Data Science & Advanced Topics Page 2 of 5
2. lmm Consider the following illustration of two possible splittings of an example set D with
four classes C = {c1 , c2 , c3 , c4 }.
(a) Compute the drop in impurity ∆ι for left and right split, respectively, using the mis-
classification rate ιmisclass as well as the Gini impurity ιGini . (Note: This will yield 4
values: one for every combination of “split” and “impurity measure”.)
Solution: We need to determine for four classes c ∈ {1, 2, 3, 4} and two possible
attribute values a ∈ {1, 2} which of the given splits we should choose.
We estimate p(a = 1) as |D 1| |D2 |

|D| = 0.5 and p(a = 2) as |D| = 0.5. You could e.g.
imagine the dataset to contain 20,000 examples and both D1 and D2 contain 10,000
examples.
a/c 1 2 3 4
1 1
1 2 2
1 1
2 2 2
p(c) = p(c|a) · p(a) :
P
a
c 1 2 3 4
1 1 1 1
p(c) 4 4 4 4
∆ιmis (D, {D1 , D2 }) = ∆ιmis (D) − a p(a) · ∆ιmis (Da ) = 1 − maxc p(c) − a p(a) ·
P P
(1 − maxc p(c|a)) = 34 − ( 14 + 14 ) = 14
∆ιgini (D, {D1 , D2 }) = ∆ιgini (D) − a p(a) · ∆ιgini (Da ) = 1 − c p(c)2 − a p(a) ·
P P P
(1 − c p(c|a)2 ) = 43 − ( 14 + 14 ) = 14
P
a/c 1 2 3 4
1 1 1
1 4 4 2
1 1 1
2
P4 4 2
p(c) = p(c|a) · p(a) :
a
c 1 2 3 4
1 1 1 1
p(c) 4 4 4 4
∆ιmis (D, {D1 , D2 }) = ∆ιmis (D) − a p(a) · ∆ιmis (Da ) = 1 − maxc p(c) − a p(a) ·
P P
(1 − maxc p(c|a)) = 14
∆ιgini (D, {D1 , D2 }) = ∆ιgini (D) − a p(a) · ∆ιgini (Da ) = 1 − c p(c)2 − a p(a) ·
P P P
(1 − c p(c|a)2 ) = 81
P
(b) Explain the result.
Solution: The left split is preferable as it has a higher chance of leading to pure
nodes in fewer splits. The misclassification rate only pays attention to the most
frequent class (maxc ) - regardless of whether the remaining probability mass is
scattered across only one or several other classes. The Gini criterion takes into
account the probability mass from all classes and prefers situations in which the
probability mass is scattered across only a few dominating classes. According to
the Gini criterion we would choose the left split over the right split. According to
the misclassification rate none of the splits is preferable over the other.
3. llm Given the following training set with dogs data:
Color Fur Size Class

brown ragged small well-behaved
black ragged big dangerous
black smooth big dangerous
black curly small well-behaved
white curly small well-behaved
white smooth small dangerous
red ragged big well-behaved
Use the ID3 algorithm to determine a decision tree, whereas the attributes are to be chosen
with the maximum average information gain iGain :
X |Da |
iGain(D, A) ≡ H(D) − · H(Da ) with H(D) = −p⊕ log2 (p⊕ ) − p log2 (p )
a∈A
|D|
Solution: Determine the values of H(D| Attribute ):
• Attribute Color: m = 7 (Number of objects)
Color well-behaved dangerous Probability

brown 1 0 P (brown) = 1/7
black 1 2 P (black) = 3/7
white 1 1 P (white) = 2/7
red 1 0 P (red) = 1/7
1 3 1 1 2 2

H(D| Color) = − (1 log2 1 + 0 log2 0) + ( log2 ( ) + log2 ( ))
7 7 3 3 3 3
2 1 1 1 1 1

+ ( log2 ( ) + log2 ( )) + (1 log2 1 + 0 log2 0)
7 2 2 2 2 7
1 1 2 1 1 1

= − 0 + (log2 ( ) + 2 log2 ( )) + (log2 ( ) + log2 ( )) + 0
7 3 3 7 2 2
1 1 2 2 1

= − (log2 ( ) + 2 log2 ( )) + log2 ( ) ≈ 0.679
7 3 3 7 2
• Attribute Fur:
Fur well-behaved dangerous Probability

ragged 2 1 P (ragged) = 3/7
smooth 0 2 P (smooth) = 2/7
curly 2 0 P (curly) = 2/7
3 2 2 1 1 2
 
 7 ( 3 log2 ( 3 ) + 3 log2 ( 3 )) + 7 (0 log2 0 + 1 log2 1)
H(D|Fur) = −  
 2 
+ (1 log2 1 + 0 log2 0)
7
3 2 2 1 1 1 2 1
= − ( log2 ( ) + log2 ( )) = − (2 log2 ( ) + log2 ( ))
7 3 3 3 3 7 3 3
≈ 0.394
• Attribut Size:
Size well-behaved dangerous Probability

small 3 1 P (small) = 4/7
big 1 2 P (big) = 3/7
4 3 3 1 1 3 1 1 2 2

H(D|Size) = − ( log2 ( ) + log2 ( )) + ( log2 ( ) + log2 ( ))
7 4 4 4 4 7 3 3 3 3
1 3 1 1 2

= − (3 log2 ( ) + log2 ( )) + (log2 ( ) + 2 log2 ( ))
7 4 4 3 3
≈ 0.857
H(D|A) minimal for A = Fur ⇒ Choose Attribute Fur.
Nodes fur=smooth and fur=curly are already pure. Node fur=ragged is not pure yet, so
we have to evaluate the impurity reduction for all remaining attributes (except "fur") for
node fur=ragged. Since there are two classes compute the related subtree recursively.
• Attribute Color: m = 3 (Number of objects)
Color well-behaved dangerous Probability

brown 1 0 P (brown) = 1/3
black 0 1 P (black) = 1/3
red 1 0 P (red) = 1/3
1 1 1

H(D| Color) = − (1 log2 1 + 0 log2 0) + (0 log2 0 + 1 log2 1) + (1 log2 1 + 0 log2 0)
3 3 3
= 0
• Attribute Size:
Size well-behaved dangerous Probability

small 1 0 P (small) = 1/3
big 1 1 P (big) = 2/3
1 2 1 1 1 1

H(D| Size) = − (1 log2 1 + 0 log2 0) + ( log2 ( ) + log2 ( ))
3 3 2 2 2 2
2 1 2
= − log2 ( ) =
3 2 3
H(D|A) minimal for A = Color ⇒ choose attribute Color. Resulting tree:

Solution 10 Decision Trees

Uploaded by

Copyright:

Available Formats

You might also like

Solution 10 Decision Trees

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Solution 10 Decision Trees

Uploaded by

Copyright:

Available Formats

Data Science & Advanced Topics Chair of Data Science

WS 2022/23 Prof. Michael Granitzer

We estimate p(a = 1) as |D 1| |D2 |

(b) Explain the result.

3. llm Given the following training set with dogs data:

Color Fur Size Class

Solution: Determine the values of H(D| Attribute ):

• Attribute Color: m = 7 (Number of objects)

Color well-behaved dangerous Probability

Fur well-behaved dangerous Probability

Size well-behaved dangerous Probability

H(D|A) minimal for A = Fur ⇒ Choose Attribute Fur.

• Attribute Color: m = 3 (Number of objects)

Color well-behaved dangerous Probability

Size well-behaved dangerous Probability

H(D|A) minimal for A = Color ⇒ choose attribute Color. Resulting tree:

You might also like