Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

1. Show the steps of decision tree induction algorithm for the following data.

The
concepts to be learnt are Grades (Grades are class labels).
- If two or more attributes have similar Information Gain, pick any of them.
- Since the best entropy is zero, hence if the average entropy of an attribute is
zero, there is no need to check other candidate attributes.

Grades Hardworking Intelligent Unlucky


1. Good Yes Yes No
2. Bad No Yes Yes
3. Bad Yes No Yes
4. Good Yes Yes No
5. Good Yes Yes No
6. Bad No Yes No
7. Bad Yes No No

The entropy of the data:


There are 3 samples of Good Grades and 4 samples of Bad Grades.
Hence the entropy the data is {- (3/7) log2 (3/7) - (4/7) log2 (4/7)} =
0.524 + 0.461 = 0.985

First we calculate which of the three attributes will be best for the root node.

Attribute “Hardworking”
Branch Hardworking = Yes (5 samples fulfill this condition)
There are 3 samples of Good Grades and 2 samples of Bad Grades.
Entropy is {- (3/5) log2 (3/5) - (2/5) log2 (2/5)} = 0.442 + 0.53 = 0.972
Branch Hardworking = No (2 samples fulfill this condition)
Both of these 2 samples are of Bad Grades.
Entropy is {- (0/2) log2 (0/2) - (2/2) log2 (2/2)} = 0
Average Entropy of Attribute “Hardworking” = 5/7 * 0.972 + 2/7 * 0 = 0.694
Information Gain of Attribute “Hardworking” = 0.985 – 0.694 = 0.291

Attribute “Intelligent”
Branch Intelligent = Yes (5 samples fulfill this condition)
There are 3 samples of Good Grades and 2 samples of Bad Grades.
Entropy is {- (3/5) log2 (3/5) - (2/5) log2 (2/5)} = 0.442 + 0.53 = 0.972
Branch Intelligent = No (2 samples fulfill this condition)
Both of these 2 samples are of Bad Grades.
Entropy is {- (0/2) log2 (0/2) - (2/2) log2 (2/2)} = 0
Average Entropy of Attribute “Intelligent” = 5/7 * 0.972 + 2/7 * 0 = 0.694
Information Gain of Attribute “Intelligent” = 0.985 – 0.694 = 0.291

Attribute “Unlucky”
Branch Unlucky = Yes (2 samples fulfill this condition)
Both of these 2 samples are of Bad Grades.
Entropy is {- (0/2) log2 (0/2) - (2/2) log2 (2/2)} = 0
Branch Unlucky = No (5 samples fulfill this condition)
There are 3 samples of Good Grades and 2 samples of Bad Grades.
Entropy is {- (3/5) log2 (3/5) - (2/5) log2 (2/5)} = 0.442 + 0.53 = 0.972
Average Entropy of Attribute “Unlucky” = 5/7 * 0.972 + 2/7 * 0 = 0.694
Information Gain of Attribute “Unlucky” = 0.985 – 0.694 = 0.291

Since all the three Information Gains are equal, we select any of these attributes.
Suppose we choose the attribute “Hardworking” as the root node.

---------------------------------------

The samples in the branch “Hardworking” = No are of homogenous class labels,


hence we do not grow this branch.

The branch “Hardworking” = Yes has to be grown. We test the two remaining
attributes “Intelligent” and “Unlucky”

The entropy of the samples for the branch “Hardworking” = Yes:


There are a total of 5 samples.
There are 3 samples of Good Grades and 2 samples of Bad Grades.
Entropy is {- (3/5) log2 (3/5) - (2/5) log2 (2/5)} = 0.442 + 0.53 = 0.972

Attribute “Intelligent”
Branch Intelligent = Yes (3 samples fulfill this condition)
All the 3 samples are of Good Grades. Hence Entropy = 0
Branch Intelligent = No (2 samples fulfill this condition)
Both of these 2 samples are of Bad Grades. Hence Entropy = 0
Average Entropy of Attribute “Intelligent” = 3/5 * 0 + 2/5 * 0 = 0
Information Gain of Attribute “Intelligent” = 0.972 – 0 = 0.972

Since we cannot have better results, therefore there is no need to test the other
attribute “Unlucky”. However, just for sake of practice we do the calculations as
follows:

Attribute “Unlucky”
Branch Unlucky = Yes (1 sample fulfills this condition)
The sample is of Bad Grades. Hence Entropy = 0
Branch Unlucky = No (4 samples fulfill this condition)
Three of them are of Good Grades and one is of Bad Grade
Entropy is {- (3/4) log2 (3/4) - (1/4) log2 (1/4)} = 0.311 + 0.5 = 0.811
Average Entropy of Attribute “Unlucky” = 1/5 * 0 + 4/5 * = 0.6488
Information Gain of Attribute “Unlucky” = 0.972 - 0.6488 = 0.324

You might also like