Professional Documents
Culture Documents
Lecture 2 Decision Trees
Lecture 2 Decision Trees
Unpredictability
Overfitting
Atsuto Maki
Autumn, 2020
1 Decision Trees
The representation
Training
2 Unpredictability
Entropy
Information gain
Gini impurity
3 Overfitting
Overfitting
Occam’s principle
Training and validation set approach
Extensions
1 Decision Trees
The representation
Training
2 Unpredictability
Entropy
Information gain
Gini impurity
3 Overfitting
Overfitting
Occam’s principle
Training and validation set approach
Extensions
顺序测试属性(功能)
Basic Idea: Test the attributes (features) sequentially
= Ask questions about the target/status sequentially
=依次询问有关目标/状态的问题
当涉及名义数据时,例如(但不限于)有用。 在医学诊断,信用风险分析等方面
Useful also (but not limited to) when nominal data are involved,
e.g. in medical diagnosis, credit risk analysis etc.
Each leaf node bears a category label, and the test pattern is
assigned the category of the leaf node reached.
每个叶子节点带有一个类别标签,并且为测试模式分配了到达的叶子节点的类别。
Atsuto Maki Lecture 2: Decision Trees
Decision Trees
The representation
Unpredictability
Training
Overfitting
1 Decision Trees
The representation
Training
2 Unpredictability
Entropy
Information gain
Gini impurity
3 Overfitting
Overfitting
Occam’s principle
Training and validation set approach
Extensions
How many (and what) questions will you ask me to get the
number x as rapidly as possible?
Entropy
Entropy
1
log2
pi
(pi : probability for event i)
Entropy
1
log2
pi
(pi : probability for event i)
Entropy
Entropy
X
Entropy = −pi log2 pi
i
Entropy
X
Entropy = −pi log2 pi =
i
= −0.5 log2 0.5 − 0.5 log2 0.5
Entropy
X
Entropy = −pi log2 pi =
i
= −0.5 log2 0.5 −0.5 log2 0.5
| {z } | {z }
−1 −1
Entropy
X
Entropy = −pi log2 pi =
i
= −0.5 log2 0.5 −0.5 log2 0.5 =
| {z } | {z }
−1 −1
=1
Entropy
X
Entropy = −pi log2 pi =
i
= −0.5 log2 0.5 −0.5 log2 0.5 =
| {z } | {z }
−1 −1
=1
Entropy
X
Entropy = −pi log2 pi
i
Entropy
X
Entropy = −pi log2 pi =
i
1 1
= 6 × (− log2 )
6 6
Entropy
X
Entropy = −pi log2 pi =
i
1 1
= 6 × (− log2 ) =
6 6
1
= − log2 = log2 6 ≈ 2.58
6
Entropy
X
Entropy = −pi log2 pi =
i
1 1
= 6 × (− log2 ) =
6 6
1
= − log2 = log2 6 ≈ 2.58
6
Entropy
X
Entropy = −pi log2 pi
i
Entropy
X
Entropy = −pi log2 pi =
i
= −5 · 0.1 log2 0.1 − 0.5 log2 0.5
Entropy
X
Entropy = −pi log2 pi =
i
= −5 · 0.1 log2 0.1 − 0.5 log2 0.5 =
≈ 2.16
Entropy
X
Entropy = −pi log2 pi =
i
= −5 · 0.1 log2 0.1 − 0.5 log2 0.5 =
≈ 2.16
A real die is more unpredictable (2.58 bit) than a fake (2.16 bit)
Entropy
Unpredictability of a dataset
Entropy
Unpredictability of a dataset
58 58 42 42
− log2 − log2 = 0.981
100 100 100 100
Entropy
Unpredictability of a dataset
58 58 42 42
− log2 − log2 = 0.981
100 100 100 100
97 97 3 3
− log2 − log2 = 0.194
100 100 100 100
Entropy
58 58 42 42
− log2 − log2 = 0.981
100 100 100 100
97 97 3 3
− log2 − log2 = 0.194
100 100 100 100
Information gain
Ask about attribute A for a data set S that has Entropy Ent(S),
Gain = Ent(S) −
| {z }
before
Information gain
Ask about attribute A for a data set S that has Entropy Ent(S),
and get subsets SV according to the value of A
X |Sv |
Gain = Ent(S) − Ent(Sv )
| {z } |S| | {z }
before v ∈Values(A)
| {z } after
weighted
sum
A = •: 63 positive → 1.0 •
◦
•
◦
◦
◦
◦
◦
+
9 ◦ ◦ • • +
A = ◦: 19 positive → 0.9980 ◦ • ◦ ◦ +
6 • ◦ • ◦
Expected: 25 · 1.0 + 19
25 · 0.9980 ≈ 0.9985 ◦ • • ◦ +
◦ ◦ ◦ ◦
• ◦ ◦ ◦
◦ • • ◦ +
◦ ◦ ◦ • +
◦ ◦ • ◦
• • • ◦ +
◦ • ◦ •
◦ ◦ ◦ ◦
◦ ◦ • ◦
◦ • • •
◦ • ◦ ◦ +
◦ ◦ • ◦
• ◦ ◦ ◦
◦ • ◦ ◦ +
◦ ◦ • • +
• • ◦ ◦ +
◦ ◦ ◦ ◦
◦ ◦ • ◦
A = •: 63 positive → 1.0 •
◦
•
◦
◦
◦
◦
◦
+
9 ◦ ◦ • • +
A = ◦: 19 positive → 0.9980 ◦ • ◦ ◦ +
6 • ◦ • ◦
Expected: 25 · 1.0 + 19
25 · 0.9980 ≈ 0.9985 ◦ • • ◦ +
◦ ◦ ◦ ◦
9 • ◦ ◦ ◦
B = •: positive → 0.684
11 ◦ • • ◦ +
3 ◦ ◦ ◦ • +
B = ◦: positive → 0.750
14 ◦ ◦ • ◦
Expected: 0.721 •
◦
•
•
•
◦
◦
•
+
◦ ◦ ◦ ◦
◦ ◦ • ◦
◦ • • •
◦ • ◦ ◦ +
◦ ◦ • ◦
• ◦ ◦ ◦
◦ • ◦ ◦ +
◦ ◦ • • +
• • ◦ ◦ +
◦ ◦ ◦ ◦
◦ ◦ • ◦
A = •: 63 positive → 1.0 •
◦
•
◦
◦
◦
◦
◦
+
9 ◦ ◦ • • +
A = ◦: 19 positive → 0.9980 ◦ • ◦ ◦ +
6 • ◦ • ◦
Expected: 25 · 1.0 + 19
25 · 0.9980 ≈ 0.9985 ◦ • • ◦ +
◦ ◦ ◦ ◦
9 • ◦ ◦ ◦
B = •: positive → 0.684
11 ◦ • • ◦ +
3 ◦ ◦ ◦ • +
B = ◦: positive → 0.750
14 ◦ ◦ • ◦
Expected: 0.721 •
◦
•
•
•
◦
◦
•
+
6 ◦ ◦ ◦ ◦
C = •: 12 positive → 1.0 ◦ ◦ • ◦
6 ◦ • • •
C = ◦: 13 positive → 0.9957 ◦ • ◦ ◦ +
◦ ◦ • ◦
Expected: 0.9977 • ◦ ◦ ◦
◦ • ◦ ◦ +
◦ ◦ • • +
• • ◦ ◦ +
◦ ◦ ◦ ◦
◦ ◦ • ◦
A = •: 63 positive → 1.0 •
◦
•
◦
◦
◦
◦
◦
+
9 ◦ ◦ • • +
A = ◦: 19 positive → 0.9980 ◦ • ◦ ◦ +
6 • ◦ • ◦
Expected: 25 · 1.0 + 19
25 · 0.9980 ≈ 0.9985 ◦ • • ◦ +
◦ ◦ ◦ ◦
9 • ◦ ◦ ◦
B = •: positive → 0.684
11 ◦ • • ◦ +
3 ◦ ◦ ◦ • +
B = ◦: positive → 0.750
14 ◦ ◦ • ◦
Expected: 0.721 •
◦
•
•
•
◦
◦
•
+
6 ◦ ◦ ◦ ◦
C = •: 12 positive → 1.0 ◦ ◦ • ◦
6 ◦ • • •
C = ◦: 13 positive → 0.9957 ◦ • ◦ ◦ +
◦ ◦ • ◦
Expected: 0.9977 • ◦ ◦ ◦
◦ • ◦ ◦ +
D = •: 35 positive → 0.9710 ◦
•
◦
•
•
◦
•
◦
+
+
9
D = ◦: 20 positive → 0.9928 ◦
◦
◦
◦
◦
•
◦
◦
Expected: 0.9884
Atsuto Maki Lecture 2: Decision Trees
Decision Trees Entropy
Unpredictability Information gain
Overfitting Gini impurity
Yes No
Yes No
Yes No
D D
Yes No Yes No
_ + + _
贪婪的方法来选择一个问题:
选择最能告诉我们答案的属性
Greedy approach to choose a question:
Choose the attribute which tells us most about the answer
X X
pi (1 − pi ) = 1 − pi2
i i
X X
pi (1 − pi ) = 1 − pi2
i i
1 Decision Trees
The representation
Training
2 Unpredictability
Entropy
Information gain
Gini impurity
3 Overfitting
Overfitting
Occam’s principle
Training and validation set approach
Extensions
Overfitting
When the learned models are overly specialized for the training
samples.
当学习的模型过于专门用于训练样本时。
Overfitting
When the learned models are overly specialized for the training
samples.
Overfitting
When the learned models are overly specialized for the training
samples.
将可用数据分成两
Separate the available data into two sets of examples 组示例
Training set T : to form the learned model 构造学习模型
Validation set V : to evaluate the accuracy of this model
以评估该模型的准确性
The motivations:
The training may be misled by random errors, but the
validation set is unlikely to exhibit the same random
fluctuations 训练可能会因随机错误而被误导,但验证集不太可能表现出相同
的随机波动
The validation set to provide a safety check against
overfitting the spurious characteristics of the training set
验证集可提供安全检查,以防过度拟合训练集的虚假特性
The motivations:
The training may be misled by random errors, but the
validation set is unlikely to exhibit the same random
fluctuations
The validation set to provide a safety check against
overfitting the spurious characteristics of the training set
Reduced-Error Pruning
Avoid overfitting
Stop growing when data split not statistically significant
Grow full tree, then post-prune (e.g. Reduced error pruning)
Avoid overfitting
Stop growing when data split not statistically significant
Grow full tree, then post-prune (e.g. Reduced error pruning)
A collection of trees (Ensemble learning: in Lecture 10)
Bootstrap aggregating (bagging)
Decision Forests