Professional Documents
Culture Documents
Variable Rule Misses Total Misses Age Young - No 2/5 5/15 Middle - Yes 2/5 Old - Yes 1/5
Variable Rule Misses Total Misses Age Young - No 2/5 5/15 Middle - Yes 2/5 Old - Yes 1/5
Variable Rule Misses Total Misses Age Young - No 2/5 5/15 Middle - Yes 2/5 Old - Yes 1/5
Due on November 27
matatora@jmu.edu
Given the table shown below, determine the outcome of the following classification problem:
1) Build the decision tree by hand following the method indicated in the lecture or in the
class notes of Decision Trees.
Answer:
I’m starting with the first variable (Age) which has 3 valuesYoung, Middle, Old.
There is a tie in “total misses” between “House” and “Credit”. They both have a pure
node, but since “House” has a bigger pure node, I’m gonna pick “House” as the criteria, then
“Job”.
So the answer is NO.
2) Build the decision tree in R using following the procedure indicated in the lecture.
Answer:
I made an excel file with the table given, then I imported it into R studio:
Then, I installed the C50 package and created the decision tree with it:
3) Calculate the Information Gain of the Parent Table with respect to its children. No
need to calculate the IG of the grandchildren or great grandchildren. Explain if the
result obtained by the IG of the Children of the root agree or disagree with the results
obtained in part 1 of this homework.
Answer:
First, I calculated the entropy for the Table, Job True/False and House Yes/No:
Since the Job’s True value and House’s Yes are only “yes” in the main table, their
entropy is 0.
Then, using the Information Gain formula, I calculated the information gain of the
parent table with his children:
In my opinion, the result obtained by the IG on the children of the root agrees with
the part 1 of the homework. The IG and the entropy are closely related, so as the IG is higher,
the entropy is lower. A lower entropy means a purer node. The purer the node, the less
mixed outcomes it has (ex: if the entropy of a node is 0, the node will be 100% positive or
100% negative. On the other hand, if the entropy is 1, the node will have a 50% chance to be
positive and 50% to be negative).
In our case, the answer for the first exercise is “NO”, wich means that the entropy is
lower and the IG is higher, therefore “NO” has a 100% chance to be the outcome.