Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 4

Name: Syed Rehanuddin Quadri

Subject: Data Mining


















Create the Naive classifier and a decision tree for the data set attached.
Height Weight
Foot
Size Income Gender
6' 180 12 <50k male
5'11'' 190 11 >80k male
5'7'' 170 12 50-80 male
5'11'' 165 10 <50 male
5' 100 6 <50 female
5'6'' 150 8 <50 female
5'5'' 130 7 50-80 female
5'9'' 150 9 >80 female
6'3'' 230 13 >80 male
6'3'' 180 10 >80 female
5'5'' 190 10 <50 male
6'1'' 195 10 50-80 female

Formulae:
Information:
s
s
s
s
,...,s ,s s
i
m
i
i
m 2 1 2
1
log ) I(


Entropy: ) ,..., (
...
E(A) 1
1
1
mj j
v
j
mj j
s s I
s
s s



Information gained: E(A) ) s ,..., s , I(s Gain(A) m 2 1
Height:
Range Male Female Information
<5.6 1 2 0.917
5.6 to 5.11 3 2 0.971
5.11 2 2 1
6 6 1
Entropy 0.967

Information Gain (H ) 1- 0.967=0.033

Weight:
Range Male Female Information
<160 0 4 0
160 to 179 2 0 0
>179 4 2 0.917
6 6 1
Entropy 0.459

Information Gain (W) 1- 0.459=0.541
Income:
Range Male Female Information
<50k 3 2 0.971
50k to 80k 1 2 0.917
> 80k 2 2 1
6 6 1
Entropy 0.967

Information Gain (I) 1- 0.967=0.033
Foot size:
Range Male Female Information
<9 0 3 0
<9 to >11 3 3 1
>11 3 0 0
6 6 1
Entropy 0.5

Information Gain (I) 1- 0.5=0.5

Decision tree:
Decision tree mainly concentrates on classification or regression models which are in the form of a tree structure. It can
breakdown the dataset into as small subsets that are possible Also, an associated decision tree is developed. Tree with
decision nodes and leaf nodes is the final result.
From the given data set we can get the following decision tree.


Income
<50k
Gender
Male
Weight
Height
Foot size
Female
Weight
Height
Foot size
50k to
80k
Gender
Male
Weight
Height
Foot size
Female
Weight
Height
Foot size
> 80k
Gender
Male
Weight
Height
Foot size
Female
Weight
Height
Foot size

You might also like