Decision Tree

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

Design of Intelligent Systems

CS- 715
MS (Computer Science)

Decision Tree (ID3)


BY: Bushra Saif
Introduction
• Classification is a two-step process, learning step and prediction
step, in machine learning. In the learning step, the model is
developed based on given training data. In the prediction step,
the model is used to predict the response for given data.

• Decision trees, provide high-level efficiency and easy


interpretation. These two benefits make this simple algorithm
popular in the space of machine learning.

• Decision Tree algorithm belongs to the family of supervised


learning algorithms. Unlike other supervised learning
algorithms, the decision tree algorithm can be used for solving
regression and classification problems too.

L1.2
Introduction
• It is a tree-structured classifier, where internal nodes represent
the features of a dataset, branches represent the decision rules
and each leaf node represents the outcome.

• In a Decision tree, there are two nodes, which are the Decision
Node and Leaf Node. Decision nodes are used to make any
decision and have multiple branches, whereas Leaf nodes are the
output of those decisions and do not contain any further
branches.

• The decisions or the test are performed on the basis of features


of the given dataset. It is a graphical representation for getting all
the possible solutions to a problem/decision based on given
conditions.
L1.3
Introduction
• The goal of using a Decision Tree is to create a training model
that can use to predict the class or value of the target variable by
learning simple decision rules inferred from prior data(training
data).

• The goal of using a Decision Tree is to create a training model


that can use to predict the class or value of the target variable by
learning simple decision rules inferred from prior data(training
data).

• It is called a decision tree because, similar to a tree, it starts with


the root node, which expands on further branches and constructs
a tree-like structure.

L1.4
Introduction
• Decision trees start with a root node, which acts as a starting
point (at the top), and is followed by splits that produce
branches. The statistical/mathematical term for these branches is
edges. The branches then link to leaves, known also as nodes,
which form decision points. A final categorization is produce
when a leaf does not generate any new branches and results in
what is known as a terminal node.

L1.5
Introduction
• Suppose there is a candidate who has a job offer and wants to
decide whether he should accept the offer or Not. So, to solve
this problem, the decision tree starts with the root node (Salary
attribute by ASM- attribute selection measure). The root node
splits further into the next decision node (distance from the
office) and one leaf node based on the corresponding labels.

L1.6
Introduction
• Real-life examples include picking a scholarship recipient, assessing
an applicant for a home loan, predicting e-commerce sales, or
selecting the right job applicant. When a customer or applicant
queries why they weren’t selected for a particular scholarship, home
loan, job, etc., you can pass them the decision tree and let them see
the decision-making process for themselves.

L1.7
Algorithmic steps
Step1: It begins with the original set S as the root node.
Step2: Find the best attribute in the dataset using Attribute
Selection Measure (ASM).
On each iteration of the algorithm, it iterates through the very
unused attribute of the set S and
calculates Entropy(H) and Information gain(IG) of this
attribute.
Step3: It then selects the attribute which has the smallest Entropy
or Largest Information gain.
Step4: The set S is then split by the selected attribute to produce
a subset of the data.
Step5: The algorithm continues to recur on each subset,
considering only attributes never selected before.
L1.8
Attribute selection Measure
Entropy:
• Entropy is a measure of the randomness in the information being
processed. The higher the entropy, the harder it is to draw any
conclusions from that information. Flipping a coin is an example
of an action that provides information that is random.
• “Entropy values range from 0 to 1”, Less the value of entropy
more it is trusting able.

L1.9
Attribute selection Measure

Information Gain:
• Information gain or IG is a statistical property that measures
how well a given attribute separates the training examples
according to their target classification.
• Information gain is used to decide which feature to split on at
each step in building the tree.
L1.10
Attribute selection Measure
• It reduction of uncertainty given some feature and it is also a
deciding factor for which attribute should be selected as a
decision node or root node.

• We decided to break the first decision on the basis of outlook.


We could have our first decision based on humidity or wind but
we chose outlook. Why?
L1.11
Attribute selection Measure
• Because making our decision on the basis of outlook reduced
our randomness in the outcome(which is whether to play or not),
more than what it would have been reduced in case of humidity
or wind.

L1.12
Practical Example using ID3
• Let's illustrate this with help of an example. Let's assume we
want to play Tennis on a particular day — say Saturday — how
will you decide whether to play or not. Let's say you go out and
check if it's hot or cold, check the speed of the wind and
humidity, how the weather is, i.e. is it sunny, cloudy, or rainy.

• You take all these factors into account to decide if you want to
play or not.

• So, you calculate all these factors for the last ten days and form a
lookup table like the one below.

L1.13
Practical Example using ID3

L1.14

You might also like