Download as pdf or txt
Download as pdf or txt
You are on page 1of 38

Classification: Basic Concepts and

Decision Trees
and Decision Rules
Data Classification and Prediction
 Data classification
 classification
 prediction
 Methods of classification
 decision tree induction
 Forward and Back-propagation (Neural Network)
 Bayesian classification
 Association rule mining
Classification: Definition
 Given a collection of records (training set )
 Each record contains a set of attributes, one of the attributes is
the class.
 Find a model for class attribute as a function of
the values of other attributes.
 Goal: previously unseen records should be
assigned a class as accurately as possible.
 A test set is used to determine the accuracy of the model.
Usually, the given data set is divided into training and test sets,
with training set used to build the model and test set used to
validate it.
Illustrating Classification Task
Tid Attrib1 Attrib2 Attrib3 Class Learning
1 Yes Large 125K No
algorithm
2 No Medium 100K No

3 No Small 70K No

4 Yes Medium 120K No


Induction
5 No Large 95K Yes

6 No Medium 60K No

7 Yes Large 220K No Learn


8 No Small 85K Yes Model
9 No Medium 75K No

10 No Small 90K Yes


Model
10

Training Set
Apply
Tid Attrib1 Attrib2 Attrib3 Class Model
11 No Small 55K ?

12 Yes Medium 80K ?

13 Yes Large 110K ? Deduction


14 No Small 95K ?

15 No Large 67K ?
10

Test Set
Examples of Classification Task
 Predicting tumor cells as benign or malignant

 Classifying credit card transactions


as legitimate or fraudulent

 Classifying secondary structures of protein


as alpha-helix, beta-sheet, or random
coil

 Categorizing news stories as finance,


weather, entertainment, sports, etc.
Classification Techniques
 Decision Tree based Methods
 Rule-based Methods / Decision Rules
 Neural Networks (NN)
 Artificial Neural Network (ANN)
 Recurrence Neural Network (RNN)
 Association Rules (Apriori Algorithm)
 Naïve Bayes and Bayesian Belief Networks
 Memory based reasoning(LSTM)
 Support Vector Machines (SVM)
Description of
Decision Rules or Trees
 Native appeal for users
 Presentation Forms
 graphically –decision trees (ID3: Iterative Dichotomiser 3)
(dīˈkädəməser)
 “if, then” statements (decision rules)
What They Look Like
 Works like a flow chart
 Looks like an upside down tree
 Nodes
 appear as rectangles or circles
 represent test or decision
 Lines or branches - represent outcome of a test
 Circles - terminal (leaf) nodes
 Top or starting node- root node
 Internal nodes - rectangles
Example of a Decision Tree

Tid Refund Marital Taxable


Splitting Attributes
Status Income Cheat

1 Yes Single 125K No


2 No Married 100K No Refund
3 No Single 70K No
Yes No
4 Yes Married 120K No NO MarSt
5 No Divorced 95K Yes Married
Single, Divorced
6 No Married 60K No
7 Yes Divorced 220K No TaxInc NO
8 No Single 85K Yes < 80K > 80K
9 No Married 75K No
NO YES
10 No Single 90K Yes
10

Training Data Model: Decision Tree


Another Example of Decision Tree

MarSt Single,
Married Divorced
Tid Refund Marital Taxable
Status Income Cheat
NO Refund
1 Yes Single 125K No
Yes No
2 No Married 100K No
3 No Single 70K No NO TaxInc
4 Yes Married 120K No < 80K > 80K
5 No Divorced 95K Yes
NO YES
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No There could be more than one tree that
10 No Single 90K Yes fits the same data!
10
Decision Tree Classification Task
Tid Attrib1 Attrib2 Attrib3 Class
Tree
1 Yes Large 125K No Induction
2 No Medium 100K No algorithm
3 No Small 70K No

4 Yes Medium 120K No


Induction
5 No Large 95K Yes

6 No Medium 60K No

7 Yes Large 220K No Learn


8 No Small 85K Yes Model
9 No Medium 75K No

10 No Small 90K Yes


Model
10

Training Set
Apply Decision
Model Tree
Tid Attrib1 Attrib2 Attrib3 Class
11 No Small 55K ?

12 Yes Medium 80K ?

13 Yes Large 110K ?


Deduction
14 No Small 95K ?

15 No Large 67K ?
10

Test Set
How They Work
 Decision rules - partition sample of data
 Terminal node (leaf) indicates the class assignment
 Tree partitions samples into mutually exclusive groups
 One group for each terminal node
 All paths
 start at the root node
 end at a leaf
 Each path represents a decision rule
 joining (AND) of all the tests along that path
 separate paths that result in the same class are disjunctions (ORs)
 All paths - mutually exclusive
 for any one case - only one path will be followed
 false decisions on the left branch
 true decisions on the right branch
Apply Model to Test Data
Test Data
Start from the root of tree. Refund Marital Taxable
Status Income Cheat

No Married 80K ?
Refund 10

Yes No

NO MarSt
Single, Divorced Married

TaxInc NO
< 80K > 80K

NO YES
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat

No Married 80K ?
Refund 10

Yes No

NO MarSt
Single, Divorced Married

TaxInc NO
< 80K > 80K

NO YES
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat

No Married 80K ?
Refund 10

Yes No

NO MarSt
Single, Divorced Married

TaxInc NO
< 80K > 80K

NO YES
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat

No Married 80K ?
Refund 10

Yes No

NO MarSt
Single, Divorced Married

TaxInc NO
< 80K > 80K

NO YES
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat

No Married 80K ?
Refund 10

Yes No

NO MarSt
Single, Divorced Married

TaxInc NO
< 80K > 80K

NO YES
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat

No Married 80K ?
Refund 10

Yes No

NO MarSt
Single, Divorced Married Assign Cheat to “No”

TaxInc NO
< 80K > 80K

NO YES
Decision Tree Classification Task
Tid Attrib1 Attrib2 Attrib3 Class
Tree
1 Yes Large 125K No Induction
2 No Medium 100K No algorithm
3 No Small 70K No

4 Yes Medium 120K No


Induction
5 No Large 95K Yes

6 No Medium 60K No

7 Yes Large 220K No Learn


8 No Small 85K Yes Model
9 No Medium 75K No

10 No Small 90K Yes


Model
10

Training Set
Apply Decision
Tid Attrib1 Attrib2 Attrib3 Class
Model Tree
11 No Small 55K ?

12 Yes Medium 80K ?

13 Yes Large 110K ?


Deduction
14 No Small 95K ?

15 No Large 67K ?
10

Test Set
Entropy
 S is a sample of training examples
 p is the proportion of positive examples
 n is the proportion of negative examples
 Calculate
 I(pi,ni)=((-p/p+n)log2(p/p+n))–((n/p+n)log2(n/p+n))
 Calculate
 Entropy(S) = pi+ni/P I(pi,ni)

20
Splitting Based on INFO...
 Information Gain:

 Entropy ( p)    Entropy (i ) 
 n
k
GAIN i

 n 
split i 1

Parent Node, p is split into k partitions;


ni is number of records in partition i
Building Decision Tree
Step – 1 Refund
Yes
Marital Status
Single
Taxable Inc.
125K
Cheat
No

Calculate “Class” attribute Entropy.


No Married 100K No
 No Single 70K No

 Class attribute is “Cheat” Yes


No
Married
Divorced
120K
95K
No
Yes
No Married 60K No
Yes Divorced 220K No
No Single 85K Yes
No Married 75K No

p=3, n=7, p + n=10


No Single 90K Yes

Entropy(Cheet) = (-p/p+n)log2(p/p+n)–(n/p+n)log2(n/p+n)

=-3/10 log2 3/10 – 7/10 log2 7/10


= ((-3/10)(log(3/10) / log(2))) – ((7/10)(log (7/10) / log(2)))
= 0.88
Building Decision Tree
 Step – 2 Refund
Yes
Marital Status
Single
Taxable Inc.
125K
Cheat
No

Calculate Entropy of other attributes


No Married 100K No
No Single 70K No
such as Refund, Marital Status and Yes
No
Married
Divorced
120K
95K
No
Yes
Taxable Income. No Married 60K No
Yes Divorced 220K No

 Entropy (Refund) No
No
Single
Married
85K
75K
Yes
No
No Single 90K Yes

pi ni I(pi,ni)
Yes 0 3 0
No 3 4 0.98

I(3,4)=-3/7log23/7-4/7log24/7 I(0,3)=-0/3log20/3-3/3log23/3
=0.98 =0
E(Refund)=7/10x0 + 7/10 x 0.98=0.686

Gain(Refund)=E(Cheat)-E(Refund)
=0.88-0.686=0.19
Refund Marital Status Taxable Inc. Cheat

Yes Single 125K No


No Married 100K No

Building Decision Tree


No Single 70K No
Yes Married 120K No
No Divorced 95K Yes
No Married 60K No
Yes Divorced 220K No

 Entropy (Marital St.) No


No
Single
Married
85K
75K
Yes
No
No Single 90K Yes
pi ni I(pi,ni)
Single 2 2 1
Married 0 4 0
Divorced 1 1 1

I(2,2)=-2/4log22/4-2/4log22/4 I(2,2)=-1/2log21/2-1/2log21/2
=1 =1
I(2,2)=-0/4log20/4-4/4log24/4
=0
E(Marital Status)=4/10x1 + 4/10 x 0 + 4/10 x 1 = 0.6

Gain(Marital Status)=E(Cheat)-E(Refund)
=0.88-0.6=0.28
Refund Marital Status Taxable Inc. Cheat

Yes Single 125K No


No Married 100K No

Building Decision Tree


No Single 70K No
Yes Married 120K No
No Divorced 95K Yes
No Married 60K No
Yes Divorced 220K No

 Entropy (Taxable Inc.) No


No
Single
Married
85K
75K
Yes
No
No Single 90K Yes
pi ni I(pi,ni)
>80 3 4 1
<80 0 3 0

I(3,4)=-3/7log23/7-4/7log24/7
=1
I(0,3)=-0/3log20/3-3/3log23/3
=0
E(Marital Status)=7/10x1 + 3/10 x 0 = 0.686

Gain(Marital Status)=E(Cheat)-E(Refund)
=0.88-0.686=0.19
Refund Marital Status Taxable Inc. Cheat

Yes Single 125K No


No Married 100K No

Building Decision Tree


No Single 70K No
Yes Married 120K No
No Divorced 95K Yes
No Married 60K No
Yes Divorced 220K No
No Single 85K Yes
Which attribute got larger Gain? No
No
Married
Single
75K
90K
No
Yes

 Gain(Refund) =0.19
 Gain(Marital Status) =0.28 (Greater than other)
 Gain(Taxable Inc.) =0.19
Refund Marital Status Taxable Inc. Cheat

Yes Single 125K No


No Married 100K No

Building Decision Tree


No Single 70K No
Yes Married 120K No
No Divorced 95K Yes
No Married 60K No
Yes Divorced 220K No
No Single 85K Yes
 Step – 2 No
No
Married
Single
75K
90K
No
Yes

 Split the attributes according to the Marital Status

Marital Status Refund Taxable Inc. Cheat

MarSt Married No 100K No


Married Single, Married Yes 120K No
Divorced Married No 60K No
Married No 75K No
NO Single Yes 125K No
Single No 70K No
Single No 85K Yes
Single No 90K Yes
Divorced No 95K Yes
Divorced Yes 220K No
Marital Status Refund Taxable Inc. Cheat

Single Yes 125K No


Single No 70K No
Single No 85K Yes

Building Decision Tree Single


Divorced
Divorced
No
No
Yes
90K
95K
220K
Yes
Yes
No

Step – 3
 Calculate the Entropy with respect to Marital
Status=“Single” Refund Taxable Inc. Cheat

Yes 125K No

 Entropy (MS=Single =>Refund) No


No
70K
85K
No
Yes
No 90K Yes
pi ni I(pi,ni)
Yes 0 1 0
No 2 1 0.91

I(0,1)=-0/1log20/1-1/1log21/1 I(2,1)=-2/3log22/3-1/3log21/3
=0 =0.91
E(MS.single=Refund)=1/10x0 + 3/10 x 0.91=0.27

Gain(MS.single=Refund)=E(Cheat)-E(Refund)
=0.88-0.27=0.60
Marital Status Refund Taxable Inc. Cheat

Single Yes 125K No


Single No 70K No
Single No 85K Yes

Building Decision Tree Single


Divorced
Divorced
No
No
Yes
90K
95K
220K
Yes
Yes
No

Step – 3
 Calculate the Entropy with respect to Marital
Status=“Single” Taxable Inc. Cheat

125K No

 Entropy (MS=Single =>Taxable Inc.) 70K


85K
No
Yes
90K Yes
pi ni I(pi,ni)
<80 0 1 0
>80 2 1 0.91

I(0,1)=-0/1log20/1-1/1log21/1 I(2,1)=-2/3log22/3-1/3log21/3
=0 =0.91
E(MS.single=Refund)=1/10x0 + 3/10 x 0.91=0.27

Gain(MS.single=Refund)=E(Cheat)-E(Refund)
=0.88-0.27=0.60
Building Decision Tree
Marital Status Refund Taxable Inc. Cheat

Single Yes 125K No


MarSt Single No 70K No
Married Single,
Single No 85K Yes
Divorced Single No 90K Yes
Divorced No 95K Yes
NO Refund Divorced Yes 220K No
Yes No Marital Status Refund Taxable Inc. Cheat

Single Yes 125K No


NO TaxInc Divorced Yes 220K No
Marital Status Refund Taxable Inc. Cheat
< 80K > 80K
Single No 70K <80 No
Single No 85K >80 Yes
NO YES Single No 90K >80 Yes
Divorced No 95K >80 Yes

Decision Rules:
 Marital St.= Married then Cheat= No
 Marital St.= (Single Or Divorced) And Refund = Yes then Cheat = No
 Marital St.= (Single Or Divorced) And Refund = No And TaxInc <= 80K then Cheat = No
 Marital St.= (Single Or Divorced) And Refund = No And TaxInc >= 80K then Cheat = Yes
Building Decision Tree
Step – 1 Age
Adult
Income
low
Cartype
Family
Class
High
 Calculate “Class” attribute Entropy. Adult high Sports High
Younger low Sports Low
Old low Family Low
Younger high Truck High
p=4, n=2, p + n=6 Old high Truck High

Entropy(Cheet) = (-p/p+n)log2(p/p+n)–(n/p+n)log2(n/p+n)

=-4/6 log2 4/6 – 2/6 log2 2/6


= ((-4/6)(log(4/6) / log(2))) – ((2/6)(log (2/6) / log(2)))
= 0.92
Building Decision Tree
Age Income Cartype Class
Step – 2 Adult low Family High
Adult high Sports High
 Entropy of Age? Gain of Age ? Younger low Sports Low
Old low Family Low
 Entropy of Income? Gain of Income ? Younger
Old
high
high
Truck
Truck
High
High

 Entropy of Cartype? Gain of Cartype ?


Which attribute’s Gain is larger…?
 Split the table according to the larger Gain.
 Calculate Entropy and Gain if needed
 Build Tree
Age Income Cartype Class
Adult low Family High
Adult high Sports High

Building Decision Tree


Younger low Sports Low
Old low Family Low
Younger high Truck High
Old high Truck High

Step – 2
 Entropy of Age=0.667 Gain of Age=0.256
 Entropy of Income=0.459 Gain of Income=0.46
 Entropy of Cartype=0.667 Gain of Cartype=0.256
Which attribute’s Gain is larger…?
 Split the table according to the larger Gain.
 Calculate Entropy and Gain Income Age Cartype Class
low Adult Family High
 Build Tree low Younger Sports Low
Income low Old Family Low
High Low high Adult Sports High
high Younger Truck High
High high Old Truck High
Building Decision Tree
Income Age Cartype Class
Income
High Low low Adult Family High
low Younger Sports Low
low Old Family Low
High Age
Adult Younger, Old

High Low

Decision Rules:
 Income = High then Class = High
 Income = Low And Age = Adult then Class = High
 Income = Low And (Age = Younger Or Old) then Class = Low
Building Decision Tree
 Try Yourself:
Building Decision Tree
 Try Yourself:

 https://kindsonthegenius.com/blog/2018/04/how-to-build-a-decision-tree-for-
classification-step-by-step-procedure-using-entropy-and-gain.html
Under-fitting and Over-fitting
Producing a model that doesn’t perform well even
on the training data, is called under-fitting.
Although typically when this happens we decide
our model isn’t good enough and keep looking for
a better one.

Producing a model that performs well on the


training data but that generalizes poorly to any
new data, is called over-fitting. This could involve
learning noise in the data.
Underfitting and Overfitting

Under-fitting Just Right Over-fitting

• A very small model • A very large model


• Too few samples • Too many samples

You might also like