Variable Rule Misses Total Misses Age Young - No 2/5 5/15 Middle - Yes 2/5 Old - Yes 1/5

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 4

Assignment (Decision Trees)

Due on November 27
matatora@jmu.edu

Name (print clearly): Constantin Damian


By writing my name, I acknowledge that I am aware and understand the Romanian American
University Honor Code. Therefore, all the answers of this homework are the product of my own
intellectual effort.

Given the table shown below, determine the outcome of the following classification problem:

Age Job House Credit Loan Approved


Young False No Good No

Age Job House Credit Loan Approved


Young False No Fair No
Young False No Good No
Young True No Good Yes
Young True Yes Fair Yes
Young False No Fair No
Middle False No Fair No
Middle False No Good No
Middle True Yes Good Yes
Middle False Yes Excellent Yes
Middle False Yes Excellent Yes
Old False Yes Excellent Yes
Old False Yes Good Yes
Old True No Good Yes
Old True No Excellent Yes
Old False No Fair No

1) Build the decision tree by hand following the method indicated in the lecture or in the
class notes of Decision Trees.
Answer:

I’m starting with the first variable (Age) which has 3 valuesYoung, Middle, Old.

Variable Rule Misses Total misses


Age Young-->No 2/5
Middle-->Yes 2/5 5/15
Old-->Yes 1/5
Now, I’ll do the same thing for the other variables:

Variable Rule Misses Total Misses


Job True-->Yes 0/5
4/15
False-->No 4/10
Variable Rule Misses Total Misses
House Yes-->Yes 0/6
3/15
No-->No 3/9
Variable Rule Misses Total Misses
Credit Fair-->No 1/5
Good-->Yes 2/6 3/15
Excellent-->Yes 0/4

There is a tie in “total misses” between “House” and “Credit”. They both have a pure
node, but since “House” has a bigger pure node, I’m gonna pick “House” as the criteria, then
“Job”.
So the answer is NO.
2) Build the decision tree in R using following the procedure indicated in the lecture.
Answer:

I made an excel file with the table given, then I imported it into R studio:

Then, I installed the C50 package and created the decision tree with it:
3) Calculate the Information Gain of the Parent Table with respect to its children. No
need to calculate the IG of the grandchildren or great grandchildren. Explain if the
result obtained by the IG of the Children of the root agree or disagree with the results
obtained in part 1 of this homework.
Answer:

First, I calculated the entropy for the Table, Job True/False and House Yes/No:

Since the Job’s True value and House’s Yes are only “yes” in the main table, their
entropy is 0.

Then, using the Information Gain formula, I calculated the information gain of the
parent table with his children:

In my opinion, the result obtained by the IG on the children of the root agrees with
the part 1 of the homework. The IG and the entropy are closely related, so as the IG is higher,
the entropy is lower. A lower entropy means a purer node. The purer the node, the less
mixed outcomes it has (ex: if the entropy of a node is 0, the node will be 100% positive or
100% negative. On the other hand, if the entropy is 1, the node will have a 50% chance to be
positive and 50% to be negative).
In our case, the answer for the first exercise is “NO”, wich means that the entropy is
lower and the IG is higher, therefore “NO” has a 100% chance to be the outcome.

You might also like