Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 12

Name: Areeb Ahmed

Roll no:17271519-157

Assignment 3
------------------------------------------------------------------------------------------------------------------------------------------

Decision tree using Information Gain

Step 1: Calculate entropy of the target:

Entropy (Buys_computer) =Entropy (9, 5)

= - (9/14log29/14) - (5/14log25/14)

=0.94

Step 2: Calculate the information gain of the each attribute:

Entropy (Age, Buys_computer) = P (<=30) * Entropy (2, 3) + P (31...40) * Entropy (4, 0)

+ P (>40, 30) * Entropy (3, 2)

= 5/14 * 0.97 + 4/14 * 0 + 5/14 * 0.97

= 0.345 + 0.345

= 0.69

Gain (Age, Buys_computer) = Entropy (Buys_computer) - Entropy (Age, Buys_computer)

= 0.94 – 0.69 = 0.24

Entropy (Income, Buys_computer) = P (high) * Entropy (2, 2) + P (medium) * Entropy (4, 2)

+ P (low) * Entropy (3, 1)

= 4/14 * 1 + 6/14 * 0.92 + 4/14 * 0.81

1
= 0.29 + 0.39 + 0.23

= 0.91

Gain (Income, Buys_computer) = Entropy (Buys_computer) - Entropy (Income,


Buys_computer)

= 0.94 – 0.91 = 0.03

Entropy (Student, Buys_computer) = P (no) * Entropy (3, 4) + P (yes) * Entropy (6, 1)

= 7/14 * 0.99 + 7/14 * 0.59

= 0.79

Gain (Student, Buys_computer) = Entropy (Buys_computer) - Entropy (Student,


Buys_computer)

= 0.94 – 0.79 = 0.15

Entropy (Credit_rating, Buys_computer) = P (fair) * Entropy (6, 2) + P (excellent) * Entropy


(3, 3)

= 8/14 * 0.81 + 6/14 * 1

= 0.89

Gain (Credit_rating, Buys_computer) = Entropy (Buys_computer) - Entropy (Credit_rating,


Buys_computer)

= 0.94 – 0.89 = 0.05

Here, the largest gain attribute is Age.

Selected Age as root node:

2
Age

<=30 31 to 40 >40 or 30

Buy = Yes

Step 1: Calculate entropy of the target (when age<=30)

Entropy (Buys_computer) =Entropy (2, 3) = 0.97

Step 2: Calculate the information gain of the each attribute:

Entropy (Income, Buys_computer) = P (high) * Entropy (0, 2) + P (medium) * Entropy (1, 1)

+ P (low) * Entropy (1, 0)

= 2/5 * 0 + 2/5 * 1 + 1/5 * 0

= 0.4

Gain (Income, Buys_computer) = Entropy (Buys_computer) - Entropy (Income,


Buys_computer)

= 0.97 – 0.4 = 0.53

Entropy (Student, Buys_computer) = P (no) * Entropy (0, 2) + P (yes) * Entropy (0, 2)

=0

Gain (Student, Buys_computer) = Entropy (Buys_computer) - Entropy (Student,


Buys_computer)

= 0.97 – 0 = 0.97

3
Entropy (Credit_rating, Buys_computer) = P (fair) * Entropy (1, 2) + P (excellent) * Entropy
(1, 1)

= 3/5 * 0.92 + 2/5 * 1

= 0.95

Gain (Credit_rating, Buys_computer) = Entropy (Buys_computer) - Entropy (Credit_rating,


Buys_computer)

= 0.97 – 0.95 = 0.02

Here, the largest gain attribute is Student.

Selected Student as root node.

Step 1: Calculate entropy of the target (when age>30 or >40)

Entropy (Buys_computer) =Entropy (3, 2) = 0.97

Step 2: Calculate the information gain of the each attribute:

Entropy (Income, Buys_computer) = P (medium) * Entropy (2, 1)

+ P (low) * Entropy (1, 1)

= 3/5 * 0.92 + 2/5 * 1

= 0.95

Gain (Income, Buys_computer) = Entropy (Buys_computer) - Entropy (Income,


Buys_computer)

= 0.97 – 0.95 = 0.02

Entropy (Student, Buys_computer) = P (no) * Entropy (1, 1) + P (yes) * Entropy (2, 1)

= 3/5 * 0.92 + 2/5 * 1

= 0.95

Gain (Student, Buys_computer) = Entropy (Buys_computer) - Entropy (Student,


Buys_computer)
4
= 0.97 – 0.95 = 0.02

Entropy (Credit_rating, Buys_computer) = P (fair) * Entropy (3, 0) + P (excellent) * Entropy


(0, 2)

= 3/5 * 0 + 2/5 * 0

=0

Gain (Credit_rating, Buys_computer) = Entropy (Buys_computer) - Entropy (Credit_rating,


Buys_computer)

= 0.97 – 0 = 0.97

Here, the largest gain attribute is Credit_rating

Selected Credit_rating as root node:

Age

<=30 31 to 40 >40 or 30

Student Buy = Yes Credit_rating

Yes No Fair Excellent

Buy = Yes Buy = No Buy = No Buy = Yes

Rules set of this tree:

1. If age <=30 and student = no then buys_computer = no


2. If age <=30 and student = yes then buys_computer = yes
3. If age = 31 to 40 then buys_computer = yes
4. If age = >30 or >40 and credit rating = excellent then buys_computer = yes

5
5. If age = >30 or >40 and credit rating = fair then buys_computer = no

Decision tree using Gain ratio


Step 1: Calculate the gain ratio of the each attribute:

SplitInfoage (<=30, 31...40,>40or>30) = -5/14log25/14-4/14log24/14-5/14log25/14

= 1.58

Gain Ratio (Age) = Gain (Age) / SplitInfoage

= 0.24 / 1.58 = 0.15

SplitInfoincome (high, medium, low) = -4 / 14 log2 4 / 14 – 6 / 14 log2 6 / 14 – 4 / 14 log2 4 / 14

= 1.56

Gain Ratio (Income) = Gain (Income) / SplitInfoincome

= 0.03 / 1.56 = 0.019

SplitInfostusdent (yes, no) = -7 / 14 log2 7 / 14 – 7 / 14 log2 7 / 14

=1

Gain Ratio (Student) = Gain (Student) / SplitInfostudent

= 0.79 / 1 = 0.79

SplitInfocredit_rating (fair, excellent) = -8 / 14 log2 8 / 14 – 6 / 14 log2 6 / 14

= 0.99

Gain Ratio (Credit_rating) = Gain (Credit_rating) / SplitInfocredit_rating

= 0.89 / 0.99 = 0.90

Here, the largest gain ratio attribute is Credit_rating

Selected Credit_rating as root node:

6
Credit_rating

Fair Excellent

Step 1: Calculate the gain ratio of the each attribute:

SplitInfoage (<=30, 31...40,>40or>30) = -3/8log23/8-2/14log22/14-3/14log23/14

= 1.56

Entropy (Buys_computer) =Entropy (6, 2) = 0.81

Step 2: Calculate the information gain of the each attribute:

Entropy (Age, Buys_computer) = P (<=30) * Entropy (1, 2) + P (31..40) * Entropy (2, 0)

+ P (>40,>30) * Entropy (3, 0)

= 3/8 * 0.92 + 2/8 * 0 + 3/8 * 0

= 0.345

Gain (Age, Buys_computer) = Entropy (Buys_computer) - Entropy (Income, Buys_computer)

= 0.81 – 0.345 = 0.465

Gain Ratio (Age) = Gain (Age) / SplitInfoage

= 0.465 / 1.56 = 0.298

7
SplitInfoincome (high, middle,low) = 1.56

Entropy (Buys_computer) =Entropy (6, 2) = 0.81

Entropy (income, Buys_computer) = P (high) * Entropy (2, 1) + P (middle) * Entropy (2, 1)

+ P (low) * Entropy (2, 0)

= 3/8 * 0.92 + 3/8 * 0.92

= 0.345 + 0.345 = 0.69

Gain (Income, Buys_computer) = Entropy (Buys_computer) - Entropy (Income,


Buys_computer)

= 0.81 – 0.69 = 0.12

Gain Ratio (Income) = Gain (income) / SplitInfoincome

= 0.12 / 1.56 = 0.08

SplitInfostudent (yes,no) = 1

Entropy (Buys_computer) =Entropy (6, 2) = 0.81

Entropy (student, Buys_computer) = P (no) * Entropy (2, 2) + P (yes) * Entropy (4, 0)

= 4/8 * 1 + 4/8 * 0

= 0.5

Gain (Student, Buys_computer) = Entropy (Buys_computer) - Entropy (Student,


Buys_computer)

= 0.81 – 0.5 = 0.31

Gain Ratio (Student) = Gain (Student) / SplitInfostudent

= 0.31 / 1 = 0.31

8
Credit_rating

Fair Excellent

Student student

Yes No yes no

Buy=Yes Age buy=yes buy=no

<=30 31..40 > 40 or > 30

buy = no buy = yes buy = yes

Rules set of this tree:

1. If credit=fair and student = yes then buys_computer = yes


2. If credit=fair and student = no and age<=30 then buys_computer = no
3. If credit=fair and student = no and age=31…40 or >40,30 then buys_computer = yes
4. If credit rating = excellent and student=no then buys_computer = no
5. If credit rating = excellent and student=yes then buys_computer = yes

Decision tree using Gini Index


Calculating the Gini Index for Age:

Gini index= 5/14*(1-(2/52+3/52)) +4/14*(1-(4/4)2) +5/14*(1-(3/52+2/52))) =0.34

Calculating the Gini Index for Income:

Gini index= 4/14*(1-((2/4) ^2 + (2/4) ^2)) +6/14*(1-((4/6) ^2 + (2/6) ^2)) +4/14*(1-((3/4) ^2 + (1/4)
^2)) =0.44

9
Calculating the Gini Index for Student:

Gini index= 7/14*(1-((3/7) ^2 + (4/7) ^2)) +7/14*(1-((6/7) ^2 + (1/7) ^2)) =0.37

Calculating the Gini Index for Credit_rating:

Gini index= 8/14*(1-((6/8) ^2 + (2/8) ^2)) +6/14*(1-((3/6) ^2 + (3/6) ^2)) =0.43

Here, the smallest Gini index age

Selected age as root node:

Age

<=30 31 to 40 >40 or 30

Buy = Yes

Now, calculating Gini index when age <=30,

Calculating the Gini Index for Income:

Gini index= 2/5*(1-((0/2) ^2 + (2/2) ^2)) +2/5*(1-((1/2) ^2 + (1/2) ^2)) +1/5*(1-((1/1) ^2 + (0/1) ^2))
=0.2

Calculating the Gini Index for Student:

Gini index= 3/5*(1-((3/3) ^2 + (0/3) ^2)) +2/5*(1-((2/2) ^2 + (0/2) ^2)) =0

Calculating the Gini Index for Credit_rating:

Gini index= 3/5*(1-((1/3) ^2 + (2/3) ^2)) +2/5*(1-((1/2) ^2 + (1/2) ^2)) =0.47

Here, the smallest Gini index is student

Selected student as root node:

10
Now, calculating Gini index when age >40,>30:

Calculating the Gini Index for Income:

Gini index= 3/5*(1-((2/3) ^2 + (1/3) ^2)) +2/5*(1-((1/2) ^2 + (1/2) ^2)) =0.47

Calculating the Gini Index for Student:

Gini index= 3/5*(1-((2/3) ^2 + (1/3) ^2)) +2/5*(1-((1/2) ^2 + (1/2) ^2)) =0.47

Calculating the Gini Index for Credit_rating:

Gini index= 3/5*(1-((3/3) ^2 + (0/3) ^2)) +2/5*(1-((2/2) ^2 + (0/2) ^2)) =0

Here, the smallest Gini index is credit

Selected credit as root node:

Age

<=30 31 to 40 >40 or 30

Student Buy = Yes Credit_rating

Yes No Fair Excellent

Buy = Yes Buy = No Buy = No Buy = Yes

Rules set of this tree:

1. If age <=30 and student = no then buys_computer = no


2. If age <=30 and student = yes then buys_computer = yes
3. If age = 31 to 40 then buys_computer = yes
4. If age = >30 or >40 and credit rating = excellent then buys_computer = yes

11
12

You might also like