Datamining-Lect5 Decision Tree

Classification: Basic Concepts and
Decision Trees
and Decision Rules
Data Classification and Prediction
 Data classification
 classification
 prediction
 Methods of classification
 decision tree induction
 Forward and Back-propagation (Neural Network)
 Bayesian classification
 Association rule mining
Classification: Definition
 Given a collection of records (training set )
 Each record contains a set of attributes, one of the attributes is
the class.
 Find a model for class attribute as a function of
the values of other attributes.
 Goal: previously unseen records should be
assigned a class as accurately as possible.
 A test set is used to determine the accuracy of the model.
Usually, the given data set is divided into training and test sets,
with training set used to build the model and test set used to
validate it.
Illustrating Classification Task
Tid Attrib1 Attrib2 Attrib3 Class Learning
1 Yes Large 125K No
algorithm
2 No Medium 100K No
3 No Small 70K No
4 Yes Medium 120K No

Induction
5 No Large 95K Yes
6 No Medium 60K No
7 Yes Large 220K No Learn

8 No Small 85K Yes Model
9 No Medium 75K No
10 No Small 90K Yes

Model
10
Training Set
Apply
Tid Attrib1 Attrib2 Attrib3 Class Model
11 No Small 55K ?
12 Yes Medium 80K ?
13 Yes Large 110K ? Deduction

14 No Small 95K ?
15 No Large 67K ?
10
Test Set
Examples of Classification Task
 Predicting tumor cells as benign or malignant
 Classifying credit card transactions

as legitimate or fraudulent
 Classifying secondary structures of protein

as alpha-helix, beta-sheet, or random
coil
 Categorizing news stories as finance,

weather, entertainment, sports, etc.
Classification Techniques
 Decision Tree based Methods
 Rule-based Methods / Decision Rules
 Neural Networks (NN)
 Artificial Neural Network (ANN)
 Recurrence Neural Network (RNN)
 Association Rules (Apriori Algorithm)
 Naïve Bayes and Bayesian Belief Networks
 Memory based reasoning(LSTM)
 Support Vector Machines (SVM)
Description of
Decision Rules or Trees
 Native appeal for users
 Presentation Forms
 graphically –decision trees (ID3: Iterative Dichotomiser 3)
(dīˈkädəməser)
 “if, then” statements (decision rules)
What They Look Like
 Works like a flow chart
 Looks like an upside down tree
 Nodes
 appear as rectangles or circles
 represent test or decision
 Lines or branches - represent outcome of a test
 Circles - terminal (leaf) nodes
 Top or starting node- root node
 Internal nodes - rectangles
Example of a Decision Tree
Tid Refund Marital Taxable

Splitting Attributes
Status Income Cheat
1 Yes Single 125K No

2 No Married 100K No Refund
3 No Single 70K No
Yes No
4 Yes Married 120K No NO MarSt
5 No Divorced 95K Yes Married
Single, Divorced
6 No Married 60K No
7 Yes Divorced 220K No TaxInc NO
8 No Single 85K Yes < 80K > 80K
9 No Married 75K No
NO YES
10 No Single 90K Yes
10
Training Data Model: Decision Tree

Another Example of Decision Tree
MarSt Single,
Married Divorced
Tid Refund Marital Taxable
Status Income Cheat
NO Refund
1 Yes Single 125K No
Yes No
2 No Married 100K No
3 No Single 70K No NO TaxInc
4 Yes Married 120K No < 80K > 80K
5 No Divorced 95K Yes
NO YES
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No There could be more than one tree that
10 No Single 90K Yes fits the same data!
10
Decision Tree Classification Task
Tid Attrib1 Attrib2 Attrib3 Class
Tree
1 Yes Large 125K No Induction
2 No Medium 100K No algorithm
3 No Small 70K No

Induction
5 No Large 95K Yes
6 No Medium 60K No

9 No Medium 75K No
10 No Small 90K Yes

Model
10
Training Set
Apply Decision
Model Tree
11 No Small 55K ?
12 Yes Medium 80K ?
13 Yes Large 110K ?

Deduction
14 No Small 95K ?
15 No Large 67K ?
10
Test Set
How They Work
 Decision rules - partition sample of data
 Terminal node (leaf) indicates the class assignment
 Tree partitions samples into mutually exclusive groups
 One group for each terminal node
 All paths
 start at the root node
 end at a leaf
 Each path represents a decision rule
 joining (AND) of all the tests along that path
 separate paths that result in the same class are disjunctions (ORs)
 All paths - mutually exclusive
 for any one case - only one path will be followed
 false decisions on the left branch
 true decisions on the right branch
Apply Model to Test Data
Test Data
Start from the root of tree. Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
Test Data
Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
TaxInc NO
< 80K > 80K
NO YES
Test Data
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
TaxInc NO
< 80K > 80K
NO YES
Test Data
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
TaxInc NO
< 80K > 80K
NO YES
Test Data
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
TaxInc NO
< 80K > 80K
NO YES
Test Data
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
Single, Divorced Married Assign Cheat to “No”
TaxInc NO
< 80K > 80K
NO YES
Decision Tree Classification Task
Tree
1 Yes Large 125K No Induction
2 No Medium 100K No algorithm
3 No Small 70K No

Induction
5 No Large 95K Yes
6 No Medium 60K No

9 No Medium 75K No
10 No Small 90K Yes

Model
10
Training Set
Apply Decision
Model Tree
11 No Small 55K ?
12 Yes Medium 80K ?
13 Yes Large 110K ?

Deduction
14 No Small 95K ?
15 No Large 67K ?
10
Test Set
Entropy
 S is a sample of training examples
 p is the proportion of positive examples
 n is the proportion of negative examples
 Calculate
 I(pi,ni)=((-p/p+n)log2(p/p+n))–((n/p+n)log2(n/p+n))
 Calculate
 Entropy(S) = pi+ni/P I(pi,ni)
20
Splitting Based on INFO...
 Information Gain:
 Entropy ( p)    Entropy (i ) 
 n
k
GAIN i
 n 
split i 1
Parent Node, p is split into k partitions;

ni is number of records in partition i
Building Decision Tree
Step – 1 Refund
Yes
Marital Status
Single
Taxable Inc.
125K
Cheat
No
Calculate “Class” attribute Entropy.

No Married 100K No
 No Single 70K No
 Class attribute is “Cheat” Yes

No
Married
Divorced
120K
95K
No
Yes
No Married 60K No
Yes Divorced 220K No
No Single 85K Yes
No Married 75K No
p=3, n=7, p + n=10

No Single 90K Yes

Entropy(Cheet) = (-p/p+n)log2(p/p+n)–(n/p+n)log2(n/p+n)
=-3/10 log2 3/10 – 7/10 log2 7/10

= ((-3/10)(log(3/10) / log(2))) – ((7/10)(log (7/10) / log(2)))
= 0.88
 Step – 2 Refund
Yes
Marital Status
Single
Taxable Inc.
125K
Cheat
No
Calculate Entropy of other attributes

No Married 100K No
No Single 70K No
such as Refund, Marital Status and Yes
No
Married
Divorced
120K
95K
No
Yes
Taxable Income. No Married 60K No
 Entropy (Refund) No
No
Single
Married
85K
75K
Yes
No
No Single 90K Yes
pi ni I(pi,ni)
Yes 0 3 0
No 3 4 0.98
I(3,4)=-3/7log23/7-4/7log24/7 I(0,3)=-0/3log20/3-3/3log23/3
=0.98 =0
E(Refund)=7/10x0 + 7/10 x 0.98=0.686
Gain(Refund)=E(Cheat)-E(Refund)
=0.88-0.686=0.19
Refund Marital Status Taxable Inc. Cheat
Yes Single 125K No

No Married 100K No

No Single 70K No
Yes Married 120K No
No Divorced 95K Yes
No Married 60K No
 Entropy (Marital St.) No

No
Single
Married
85K
75K
Yes
No
No Single 90K Yes
pi ni I(pi,ni)
Single 2 2 1
Married 0 4 0
Divorced 1 1 1
I(2,2)=-2/4log22/4-2/4log22/4 I(2,2)=-1/2log21/2-1/2log21/2
=1 =1
I(2,2)=-0/4log20/4-4/4log24/4
=0
E(Marital Status)=4/10x1 + 4/10 x 0 + 4/10 x 1 = 0.6
Gain(Marital Status)=E(Cheat)-E(Refund)
=0.88-0.6=0.28
Yes Single 125K No

No Married 100K No

No Single 70K No
Yes Married 120K No
No Divorced 95K Yes
No Married 60K No
 Entropy (Taxable Inc.) No

No
Single
Married
85K
75K
Yes
No
No Single 90K Yes
pi ni I(pi,ni)
>80 3 4 1
<80 0 3 0
I(3,4)=-3/7log23/7-4/7log24/7
=1
I(0,3)=-0/3log20/3-3/3log23/3
=0
E(Marital Status)=7/10x1 + 3/10 x 0 = 0.686
Gain(Marital Status)=E(Cheat)-E(Refund)
=0.88-0.686=0.19
Yes Single 125K No

No Married 100K No

No Single 70K No
Yes Married 120K No
No Divorced 95K Yes
No Married 60K No
No Single 85K Yes
Which attribute got larger Gain? No
No
Married
Single
75K
90K
No
Yes
 Gain(Refund) =0.19
 Gain(Marital Status) =0.28 (Greater than other)
 Gain(Taxable Inc.) =0.19
Yes Single 125K No

No Married 100K No

No Single 70K No
Yes Married 120K No
No Divorced 95K Yes
No Married 60K No
No Single 85K Yes
 Step – 2 No
No
Married
Single
75K
90K
No
Yes
 Split the attributes according to the Marital Status
Marital Status Refund Taxable Inc. Cheat
MarSt Married No 100K No

Married Single, Married Yes 120K No
Divorced Married No 60K No
Married No 75K No
NO Single Yes 125K No
Single No 70K No
Single No 85K Yes
Single No 90K Yes
Divorced No 95K Yes
Divorced Yes 220K No
Single Yes 125K No

Single No 70K No
Single No 85K Yes
Building Decision Tree Single

Divorced
Divorced
No
No
Yes
90K
95K
220K
Yes
Yes
No
Step – 3
 Calculate the Entropy with respect to Marital
Status=“Single” Refund Taxable Inc. Cheat
Yes 125K No
 Entropy (MS=Single =>Refund) No

No
70K
85K
No
Yes
No 90K Yes
pi ni I(pi,ni)
Yes 0 1 0
No 2 1 0.91
I(0,1)=-0/1log20/1-1/1log21/1 I(2,1)=-2/3log22/3-1/3log21/3
=0 =0.91
E(MS.single=Refund)=1/10x0 + 3/10 x 0.91=0.27
Gain(MS.single=Refund)=E(Cheat)-E(Refund)
=0.88-0.27=0.60
Single Yes 125K No

Single No 70K No
Single No 85K Yes
Building Decision Tree Single

Divorced
Divorced
No
No
Yes
90K
95K
220K
Yes
Yes
No
Step – 3
 Calculate the Entropy with respect to Marital
Status=“Single” Taxable Inc. Cheat
125K No
 Entropy (MS=Single =>Taxable Inc.) 70K

85K
No
Yes
90K Yes
pi ni I(pi,ni)
<80 0 1 0
>80 2 1 0.91
I(0,1)=-0/1log20/1-1/1log21/1 I(2,1)=-2/3log22/3-1/3log21/3
=0 =0.91
E(MS.single=Refund)=1/10x0 + 3/10 x 0.91=0.27
Gain(MS.single=Refund)=E(Cheat)-E(Refund)
=0.88-0.27=0.60
Single Yes 125K No

MarSt Single No 70K No
Married Single,
Single No 85K Yes
Divorced Single No 90K Yes
Divorced No 95K Yes
NO Refund Divorced Yes 220K No
Yes No Marital Status Refund Taxable Inc. Cheat
Single Yes 125K No

NO TaxInc Divorced Yes 220K No
< 80K > 80K
Single No 70K <80 No
Single No 85K >80 Yes
NO YES Single No 90K >80 Yes
Divorced No 95K >80 Yes
Decision Rules:
 Marital St.= Married then Cheat= No
 Marital St.= (Single Or Divorced) And Refund = Yes then Cheat = No
 Marital St.= (Single Or Divorced) And Refund = No And TaxInc <= 80K then Cheat = No
 Marital St.= (Single Or Divorced) And Refund = No And TaxInc >= 80K then Cheat = Yes
Step – 1 Age
Adult
Income
low
Cartype
Family
Class
High
 Calculate “Class” attribute Entropy. Adult high Sports High
Younger low Sports Low
Old low Family Low
Younger high Truck High
p=4, n=2, p + n=6 Old high Truck High
Entropy(Cheet) = (-p/p+n)log2(p/p+n)–(n/p+n)log2(n/p+n)
=-4/6 log2 4/6 – 2/6 log2 2/6

= ((-4/6)(log(4/6) / log(2))) – ((2/6)(log (2/6) / log(2)))
= 0.92
Age Income Cartype Class
Step – 2 Adult low Family High
Adult high Sports High
 Entropy of Age? Gain of Age ? Younger low Sports Low
Old low Family Low
 Entropy of Income? Gain of Income ? Younger
Old
high
high
Truck
Truck
High
High
 Entropy of Cartype? Gain of Cartype ?

Which attribute’s Gain is larger…?
 Split the table according to the larger Gain.
 Calculate Entropy and Gain if needed
 Build Tree
Age Income Cartype Class
Adult low Family High
Adult high Sports High

Younger low Sports Low
Old low Family Low
Younger high Truck High
Old high Truck High
Step – 2
 Entropy of Age=0.667 Gain of Age=0.256
 Entropy of Income=0.459 Gain of Income=0.46
 Entropy of Cartype=0.667 Gain of Cartype=0.256
Which attribute’s Gain is larger…?
 Split the table according to the larger Gain.
 Calculate Entropy and Gain Income Age Cartype Class
low Adult Family High
 Build Tree low Younger Sports Low
Income low Old Family Low
High Low high Adult Sports High
high Younger Truck High
High high Old Truck High
Income Age Cartype Class
Income
High Low low Adult Family High
low Younger Sports Low
low Old Family Low
High Age
Adult Younger, Old
High Low
Decision Rules:
 Income = High then Class = High
 Income = Low And Age = Adult then Class = High
 Income = Low And (Age = Younger Or Old) then Class = Low
 Try Yourself:
 Try Yourself:
 https://kindsonthegenius.com/blog/2018/04/how-to-build-a-decision-tree-for-
classification-step-by-step-procedure-using-entropy-and-gain.html
Under-fitting and Over-fitting
Producing a model that doesn’t perform well even
on the training data, is called under-fitting.
Although typically when this happens we decide
our model isn’t good enough and keep looking for
a better one.
Producing a model that performs well on the

training data but that generalizes poorly to any
new data, is called over-fitting. This could involve
learning noise in the data.
Underfitting and Overfitting
Under-fitting Just Right Over-fitting
• A very small model • A very large model

• Too few samples • Too many samples

Datamining-Lect5 Decision Tree

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Datamining-Lect5 Decision Tree

Uploaded by

Copyright:

Available Formats

Classification: Basic Concepts and

4 Yes Medium 120K No

7 Yes Large 220K No Learn

10 No Small 90K Yes

12 Yes Medium 80K ?

13 Yes Large 110K ? Deduction

 Classifying credit card transactions

 Classifying secondary structures of protein

 Categorizing news stories as finance,

Tid Refund Marital Taxable

1 Yes Single 125K No

Training Data Model: Decision Tree

4 Yes Medium 120K No

7 Yes Large 220K No Learn

10 No Small 90K Yes

12 Yes Medium 80K ?

13 Yes Large 110K ?

4 Yes Medium 120K No

7 Yes Large 220K No Learn

10 No Small 90K Yes

12 Yes Medium 80K ?

13 Yes Large 110K ?

Parent Node, p is split into k partitions;

Calculate “Class” attribute Entropy.

 Class attribute is “Cheat” Yes

p=3, n=7, p + n=10

=-3/10 log2 3/10 – 7/10 log2 7/10

Calculate Entropy of other attributes

Yes Single 125K No

Building Decision Tree

 Entropy (Marital St.) No

Yes Single 125K No

Building Decision Tree

 Entropy (Taxable Inc.) No

Yes Single 125K No

Building Decision Tree

Yes Single 125K No

Building Decision Tree

 Split the attributes according to the Marital Status

Marital Status Refund Taxable Inc. Cheat

MarSt Married No 100K No

Single Yes 125K No

Building Decision Tree Single

 Entropy (MS=Single =>Refund) No

Single Yes 125K No

Building Decision Tree Single

 Entropy (MS=Single =>Taxable Inc.) 70K

Single Yes 125K No

Single Yes 125K No

=-4/6 log2 4/6 – 2/6 log2 2/6

 Entropy of Cartype? Gain of Cartype ?

Building Decision Tree

Producing a model that performs well on the

Under-fitting Just Right Over-fitting

• A very small model • A very large model

You might also like