Professional Documents
Culture Documents
Decision Tree
Decision Tree
Decision Tree
CS- 715
MS (Computer Science)
L1.2
Introduction
• It is a tree-structured classifier, where internal nodes represent
the features of a dataset, branches represent the decision rules
and each leaf node represents the outcome.
• In a Decision tree, there are two nodes, which are the Decision
Node and Leaf Node. Decision nodes are used to make any
decision and have multiple branches, whereas Leaf nodes are the
output of those decisions and do not contain any further
branches.
L1.4
Introduction
• Decision trees start with a root node, which acts as a starting
point (at the top), and is followed by splits that produce
branches. The statistical/mathematical term for these branches is
edges. The branches then link to leaves, known also as nodes,
which form decision points. A final categorization is produce
when a leaf does not generate any new branches and results in
what is known as a terminal node.
L1.5
Introduction
• Suppose there is a candidate who has a job offer and wants to
decide whether he should accept the offer or Not. So, to solve
this problem, the decision tree starts with the root node (Salary
attribute by ASM- attribute selection measure). The root node
splits further into the next decision node (distance from the
office) and one leaf node based on the corresponding labels.
L1.6
Introduction
• Real-life examples include picking a scholarship recipient, assessing
an applicant for a home loan, predicting e-commerce sales, or
selecting the right job applicant. When a customer or applicant
queries why they weren’t selected for a particular scholarship, home
loan, job, etc., you can pass them the decision tree and let them see
the decision-making process for themselves.
L1.7
Algorithmic steps
Step1: It begins with the original set S as the root node.
Step2: Find the best attribute in the dataset using Attribute
Selection Measure (ASM).
On each iteration of the algorithm, it iterates through the very
unused attribute of the set S and
calculates Entropy(H) and Information gain(IG) of this
attribute.
Step3: It then selects the attribute which has the smallest Entropy
or Largest Information gain.
Step4: The set S is then split by the selected attribute to produce
a subset of the data.
Step5: The algorithm continues to recur on each subset,
considering only attributes never selected before.
L1.8
Attribute selection Measure
Entropy:
• Entropy is a measure of the randomness in the information being
processed. The higher the entropy, the harder it is to draw any
conclusions from that information. Flipping a coin is an example
of an action that provides information that is random.
• “Entropy values range from 0 to 1”, Less the value of entropy
more it is trusting able.
L1.9
Attribute selection Measure
Information Gain:
• Information gain or IG is a statistical property that measures
how well a given attribute separates the training examples
according to their target classification.
• Information gain is used to decide which feature to split on at
each step in building the tree.
L1.10
Attribute selection Measure
• It reduction of uncertainty given some feature and it is also a
deciding factor for which attribute should be selected as a
decision node or root node.
L1.12
Practical Example using ID3
• Let's illustrate this with help of an example. Let's assume we
want to play Tennis on a particular day — say Saturday — how
will you decide whether to play or not. Let's say you go out and
check if it's hot or cold, check the speed of the wind and
humidity, how the weather is, i.e. is it sunny, cloudy, or rainy.
• You take all these factors into account to decide if you want to
play or not.
• So, you calculate all these factors for the last ten days and form a
lookup table like the one below.
L1.13
Practical Example using ID3
L1.14