Download as pdf or txt
Download as pdf or txt
You are on page 1of 26

INTRODUCTION TO

AI AND MACHINE
LEARNING
• what is decision tree
UNDERSTANDING • How decision tree works
DECISION TREES • Application of decision tree
WHAT IS DECISION TREE

▪ An inductive learning task


▪ Use particular facts to make more generalized conclusions
▪ A predictive model based on a branching series of Boolean
tests
▪ A structure that contains nodes (as boxes) and edges (as arrows) built
from dataset
▪ Each node is either decision node (to make a decision) or leaf node (
represent an outcome).

INTRO TO AI & ML | DIP IN BORDER SECURITY | SINGAPORE POLYTECHNIC


WHAT IS DECISION TREE
PREDICT COVID RISK
▪ We use fake
COVID-19 data, on the
left as an
over-simplified
analysis example

▪ We wanted to predict
the risk level of
catching COVID (for
an individual) is High
/ Low (target)

INTRO TO AI & ML | DIP IN BORDER SECURITY | SINGAPORE POLYTECHNIC


WHAT IS DECISION TREE
PREDICT COVID RISK

root node

intermediate node

leaf node

INTRO TO AI & ML | DIP IN BORDER SECURITY | SINGAPORE POLYTECHNIC


WHAT IS DECISION TREE
PREDICT COVID RISK

▪ The decision tree is generated based on data collected.


▪ We will use the term instances for each row of the data collected
▪ We will use the term attributes for each column of the data collected
▪ When we use attributes (a characteristic of data) in data mining
software it is called features as it is a label of the characteristic
▪ Not all attributes collected is used in each path of the decision
▪ e.g. Vaccinated (Vaxxed) will reduce the risk to Low
▪ Some attributes may not even appear in the decision tree
▪ e.g. Community spread (Community) is not used in this decision tree
INTRO TO AI & ML | DIP IN BORDER SECURITY | SINGAPORE POLYTECHNIC
HOW DECISION TREE WORKS
ITERATIVELY DICHOTOMISER 3 (ID3)

▪ Iterative (repeatedly) dichotomizer (divider) 3, algorithm is


inverted by Ross Quinlan in 1975

▪ Top-down greedy approach


▪ Start from top
▪ In each iteration, it will select the best attribute at that moment to
create a node

INTRO TO AI & ML | DIP IN BORDER SECURITY | SINGAPORE POLYTECHNIC


HOW DECISION TREE WORKS
ID3 PROCESS

Steps for ID3


▪ Choose the best features to split the remaining data points
or instances and make the feature a decision node.
▪ Repeat process for recursively for each child
▪ Stop when:
▪ All the instances have the same target feature value
▪ No more attributes (all used)
▪ No more instances available

INTRO TO AI & ML | DIP IN BORDER SECURITY | SINGAPORE POLYTECHNIC


HOW DECISION TREE WORKS
ID3 PROCESS

How does ID3 select the best features?

Short answer – Select feature with highest Information Gain or Gain

Longer answer
Entropy is the measure of disorder in the target feature of the dataset.
Information Gain is the calculated reduction in entropy
Hence the selected feature will cause the biggest ordered group of
target features 🡪 (e.g. biggest homogenous group of instances
formed)
INTRO TO AI & ML | DIP IN BORDER SECURITY | SINGAPORE POLYTECHNIC
HOW DECISION TREE WORKS
COVID RISK ANALYSIS
▪ We will use only
Above 12 feature as
the root node

▪ We will have
▪ 4 of 4 (100%) [see box]
▪ Above 12 is N AND
▪ Risk is Low
▪ A mixed set for risk when
Above 12 is Y

INTRO TO AI & ML | DIP IN BORDER SECURITY | SINGAPORE POLYTECHNIC


HOW DECISION TREE WORKS
COVID RISK ANALYSIS
▪ We will use only
Vaxxed feature as the
root node

▪ We will have
▪ 5 of 5 (100%) [see box]
▪ Vaxxed is Y
▪ Risk is Low
▪ a mixed set for risk when
Vaxxed is N

INTRO TO AI & ML | DIP IN BORDER SECURITY | SINGAPORE POLYTECHNIC


HOW DECISION TREE WORKS
COVID RISK ANALYSIS
▪ We will use only
Community feature as
the root node

▪ We will have
▪ No homogenous split (no
red box)
▪ 3 of 4 for (N and Low)
▪ 6 of 7 for (Y and High)

INTRO TO AI & ML | DIP IN BORDER SECURITY | SINGAPORE POLYTECHNIC


HOW DECISION TREE WORKS
COVID RISK ANALYSIS
▪ The best feature for root node is Vaxxed which generate 5 instances
in a homogenous partition – Highest Gain

▪ This is a simplified example as it we have ignored the mixed set.


However, they do contribute to the calculation.

INTRO TO AI & ML | DIP IN BORDER SECURITY | SINGAPORE POLYTECHNIC


HOW DECISION TREE WORKS
ENTROPY AND GAIN
▪ In data mining software (e.g.Orange3) Decision tree could be built
based on induce binary tree

▪ In binary classification, every split will only have 2 possible classes


(or outcomes) (e.g. Yes / No, True / False, 1 / 0)

▪ We also assume the target feature have 2 possible classes with


equal number of both classes.
▪ e.g. 5 x True AND 5 x False in a dataset of 10 instances

INTRO TO AI & ML | DIP IN BORDER SECURITY | SINGAPORE POLYTECHNIC Source: https://towardsdatascience.com/decision-trees-for-classification-id3-algorithm-explained-89df76e72df1


HOW DECISION TREE WORKS
ENTROPY AND GAIN
▪ Entropy which measure of disorder in the target feature (Risk in this
case) of the dataset, can be calculated as:

Entropy(S) = - ∑ pᵢ * log₂(pᵢ) ; i = 1 to n
n = number of classes of in the target feature (e.g. High / Low, n= 2)
Pi = Probability of class “i” OR
Ratio of “number rows with class i in the target feature” to the
”total number of rows” in the dataset.
(e.g. 5/7 in the Above 12 Y with LOW Risk in our example)

INTRO TO AI & ML | DIP IN BORDER SECURITY | SINGAPORE POLYTECHNIC Source: https://towardsdatascience.com/decision-trees-for-classification-id3-algorithm-explained-89df76e72df1


HOW DECISION TREE WORKS
ENTROPY AND GAIN
▪ Information Gain (IG) for a feature (A) is calculated as:

IG(S, A) = Entropy(S) - ∑((|Sᵥ| / |S|) * Entropy(Sᵥ))

▪ Sv = the set of row in S for which the feature A has value v


▪ |S| = number of rows in S
▪ |Sv| = number of rows Sv

Since we are using an application, the calculation is for your information


only.
INTRO TO AI & ML | DIP IN BORDER SECURITY | SINGAPORE POLYTECHNIC Source: https://towardsdatascience.com/decision-trees-for-classification-id3-algorithm-explained-89df76e72df1
IMPROVING THE DECISION TREE

Pruning is a technique used to reduce the number of features


used in tree to
▪ Prevent overfitting
▪ Reduce computation time

2 types of pruning
▪ Pre-pruning (forward pruning)
▪ Post-pruning (backward pruning)
INTRO TO AI & ML | DIP IN BORDER SECURITY | SINGAPORE POLYTECHNIC
IMPROVING THE DECISION TREE

The decision tree


could over-learnt all
the data including its
error.
e.g. The leaf node
with only 1 instance
(see bottom) it could
be considered as
overfitting.

INTRO TO AI & ML | DIP IN BORDER SECURITY | SINGAPORE POLYTECHNIC


IMPROVING THE DECISION TREE
PRE-PRUNING
Pre-Pruning
▪ Used during the building process to stop adding
features
▪ Typically based on the reduction of information gain

Consideration - Problematic
▪ Features individually do not contribute much to the
decision, but when combined may have a significant impact
INTRO TO AI & ML | DIP IN BORDER SECURITY | SINGAPORE POLYTECHNIC
IMPROVING THE DECISION TREE
POST-PRUNING
Post-Pruning
▪ More commonly used, pruning occurs after the building
process (until the full decision tree is built)

2 techniques uses algorithms based on either reduced error


or reduce cost complexity
▪ Subtree replacement
▪ Subtree Raising
INTRO TO AI & ML | DIP IN BORDER SECURITY | SINGAPORE POLYTECHNIC
IMPROVING THE DECISION TREE
POST-PRUNING | SUBTREE REPLACEMENT

The use of leaf node will


generalize the decision tree
but could reduce accuracy

Used for very large tree

INTRO TO AI & ML | DIP IN BORDER SECURITY | SINGAPORE POLYTECHNIC


IMPROVING THE DECISION TREE
POST-PRUNING | SUBTREE RAISING

Very time consuming to


verify as it is not based on
data, but likely guided by
domain knowledge

INTRO TO AI & ML | DIP IN BORDER SECURITY | SINGAPORE POLYTECHNIC


IMPROVING THE DECISION TREE
ERROR PROPAGATION

▪ Decision trees works by a series of decisions. If the decision


of one feature is wrong

▪ Subsequent decision will be wrong

▪ Path will be affected, which could also be wrong

INTRO TO AI & ML | DIP IN BORDER SECURITY | SINGAPORE POLYTECHNIC


NOTE ON DECISION TREE
ADVANTAGES / DISADVANTAGES

▪ Advantages
▪ Easy to interpret
▪ Easy to prepare
▪ Less data cleaning required

▪ Disadvantages
▪ Unstable nature
▪ Less effective in predicting outcome of a continuous variable

INTRO TO AI & ML | DIP IN BORDER SECURITY | SINGAPORE POLYTECHNIC


APPLICATION OF DECISION TREE

▪ Decision tree can use for


▪ Categorical variable
▪ Distinct category with no in-between
▪ Continuous variable
▪ A range of values such as age, weight

▪ Suitable for handling non-linear data set effectively in real life


situations

INTRO TO AI & ML | DIP IN BORDER SECURITY | SINGAPORE POLYTECHNIC


END OF LESSON 5

You might also like