Professional Documents
Culture Documents
ML Decision Trees
ML Decision Trees
Machine Learning
Mario Martin
Decision trees: table of contents
Mario Martin
Toy example: “PlayTennis” dataset
Mario Martin
Decision tree for “PlayTennis”
Outlook
No Yes No Yes
Mario Martin
Decision tree for “PlayTennis”
Outlook
Mario Martin
Decision tree for “PlayTennis”
Outlook
No Yes No Yes
Mario Martin
DT can represent complex rules
Outlook
(Outlook=Sunny ∧ Humidity=Normal)
∨ (Outlook=Overcast)
∨ (Outlook=Rain ∧ Wind=Weak)
Mario Martin
Learning Decision Tres
Mario Martin
“Top-Down” induction of decision trees: ID3
Mario Martin
ID3
Mario Martin
ID3
Mario Martin
ID3
Outlook
Mario Martin
ID3
Outlook
High Normal
[D1-,D2-] [D8+,D9+,D11+]
Mario Martin
ID3
Outlook
[D6-,D14-] [D4+,D5+,D10+]
[D1-,D2-] [D8+,D9+,D11+]
Mario Martin
ID3
Outlook
No Yes No Yes
[D1-,D2-] [D8+,D9+,D11+] [D6-,D14-] [D4+,D5+,D10+]
Mario Martin
ID3
Outlook
No Yes No Yes
Mario Martin
How to choose the best attribute?
Mario Martin
How to choose the best attribute?
Mario Martin
What attribute is best?
Mario Martin
What attribute is best?
Mario Martin
Information theory
Mario Martin
Binary answer case
Mario Martin
Binary answer case
Mario Martin
Entropy
Mario Martin
Information gain
Mario Martin
Information gain
Mario Martin
Which is the best attribute to choose?
Mario Martin
Choosing the attribute
S=[9+,5-] S=[9+,5-]
E=0.940 E=0.940
Humidity Wind
S=[9+,5-] G(S,Outlook)
E=0.940 =0.940-(5/14)*0.971
Outlook -(4/14)*0.0 – (5/14)*0.0971
=0.247
Over
Sunny Rain
cast
Mario Martin
Choosing next attribute
[D1,D2,…,D14] Outlook
[9+,5-]
Mario Martin
ID3 Algorithm
Outlook
No Yes No Yes
Mario Martin
ID3 Algorithm
Mario Martin
[Possible issues while building the tree]
Mario Martin
Overfitting in DTs
Decision trees tend to overfit the data given that we can partition the
examples until we reach a leave where all the predictions are correct
As we descend on the tree the decisions are taken with less and less
data
We can regularize the learning by controlling the parameters that
define the complexity of the tree
Minimum number of examples in a leave
Number of nodes/leaves in the tree
Maximum depth of the branches of the tree
Fixing a threshold for the heuristic
Mario Martin
Examples of trees
Mario Martin
Overfitting in DTs
Mario Martin
Overfitting in DTs
Error
test set
training set
Node number
Mario Martin
Atributes
Mario Martin
Categorical Variables
Mario Martin
Atributes
Mario Martin
Attributes with numeric values
Mario Martin
Attributes with numeric values
Consider each value as the threshold for the split. For each different split,
compute the information gain. (Note: There are at most n-1 possible
splits)
Turn the split with highest IG into the boolean test for the node Data will
be split into [A[x] < vi] and [A[x] >= vi ]
Mario Martin
Attributes with numeric values
Mario Martin
Attributes with numeric values
Mario Martin
Decision Trees for regression
Similar to classification, but minimize the sum of
squared errors relative to the average target value in
each induced partition;
Mario Martin
Decision Trees for regression
Mario Martin
Advantages/Problems with DTs
Advantages:
Their computational complexity is low
They perform automatic feature selection, only the attributes
relevant for the task are used
They can be interpreted more easily than other models
Drawbacks:
They have large variance, slight changes on the sample can result in
completely different model
They are sensitive to noise and miss-classifications (regularization
helps)
They approximate the concepts by partitions that are parallel to the
attributes, so they can return complex models if classification borders
are oblique
Mario Martin