Professional Documents
Culture Documents
Lesson 5
Lesson 5
AI AND MACHINE
LEARNING
• what is decision tree
UNDERSTANDING • How decision tree works
DECISION TREES • Application of decision tree
WHAT IS DECISION TREE
▪ We wanted to predict
the risk level of
catching COVID (for
an individual) is High
/ Low (target)
root node
intermediate node
leaf node
Longer answer
Entropy is the measure of disorder in the target feature of the dataset.
Information Gain is the calculated reduction in entropy
Hence the selected feature will cause the biggest ordered group of
target features 🡪 (e.g. biggest homogenous group of instances
formed)
INTRO TO AI & ML | DIP IN BORDER SECURITY | SINGAPORE POLYTECHNIC
HOW DECISION TREE WORKS
COVID RISK ANALYSIS
▪ We will use only
Above 12 feature as
the root node
▪ We will have
▪ 4 of 4 (100%) [see box]
▪ Above 12 is N AND
▪ Risk is Low
▪ A mixed set for risk when
Above 12 is Y
▪ We will have
▪ 5 of 5 (100%) [see box]
▪ Vaxxed is Y
▪ Risk is Low
▪ a mixed set for risk when
Vaxxed is N
▪ We will have
▪ No homogenous split (no
red box)
▪ 3 of 4 for (N and Low)
▪ 6 of 7 for (Y and High)
Entropy(S) = - ∑ pᵢ * log₂(pᵢ) ; i = 1 to n
n = number of classes of in the target feature (e.g. High / Low, n= 2)
Pi = Probability of class “i” OR
Ratio of “number rows with class i in the target feature” to the
”total number of rows” in the dataset.
(e.g. 5/7 in the Above 12 Y with LOW Risk in our example)
2 types of pruning
▪ Pre-pruning (forward pruning)
▪ Post-pruning (backward pruning)
INTRO TO AI & ML | DIP IN BORDER SECURITY | SINGAPORE POLYTECHNIC
IMPROVING THE DECISION TREE
Consideration - Problematic
▪ Features individually do not contribute much to the
decision, but when combined may have a significant impact
INTRO TO AI & ML | DIP IN BORDER SECURITY | SINGAPORE POLYTECHNIC
IMPROVING THE DECISION TREE
POST-PRUNING
Post-Pruning
▪ More commonly used, pruning occurs after the building
process (until the full decision tree is built)
▪ Advantages
▪ Easy to interpret
▪ Easy to prepare
▪ Less data cleaning required
▪ Disadvantages
▪ Unstable nature
▪ Less effective in predicting outcome of a continuous variable