Professional Documents
Culture Documents
Pset 2
Pset 2
Qn 2. We will use the dataset below to learn a decision tree which predicts
if people pass machine learning (True or False), based on their previous GPA
(High, Medium, or Low) and whether or not they studied (True or False).
1
(b) What is the entropy H(Passed | GPA)?
(c) What is the entropy H(Passed | Studied)?
(d) Draw the full decision tree that would be learned for this dataset. You do
not need to show any calculations.
4. Given the following decision tree, show how the new examples in the table
would be classified by filling in the last column in the table. If an example
cannot be classified, enter UNKNOWN in the last column.
3. Using the dataset below, we want to build a decision tree which classifies Y
as T /F given the binary variables A, B, C.
A B C Y
F F F F
T F T T
T T F T
T T T F
2
(a) Draw the tree that would be learned by the greedy algorithm with zero
training error.
(b) Is this tree optimal (i.e. does it get zero training error with minimal
depth)? Explain in less than two sentences. If it is not optimal, draw the
optimal tree as well.
5. Consider the problem of binary classification using the Naive Bayes classi-
fier. You are given two dimensional features (X1 , X2 ) and the categorical class
conditional distributions in the tables below. The entries in the tables corre-
spond to P(X1 = x1 | Ci) and P(X2 = x2 | Ci) respectively. The two classes
(Ci : i = 1, 2) are equally likely.
Given a data point (-1, 1), calculate the following posterior probabilities
(a) P(C1 | X1 = -1, X2 = 1)
6. Here’s a naive Bayes model with the following conditional probability table
and the following prior probabilities over classes.
Word type a b c
P(w | y =1) 5/10 3/10 2/10
P(w | y =0) 2/10 2/10 6/10
P(y=1) P(y=0)
8/10 2/10
(a) True or False, the naive Bayes model is able to tell us the probability of
seeing the document w = (a, a, b, c) under the model.
3
(b) If True, what is the probability?
Figure 3: Solution.
(a) What value of k minimizes the training set error for this dataset? What
is the resulting training error?
(b) Why might using too large values k be bad in this dataset? Why might
too small values of k also be bad?
(c) In Figure 4, sketch the 1-nearest neighbor decision boundary for this
dataset.
7. A KNN classifier assigns a test instance the majority class associated with
its K nearest training instances. Distance between instances is measured using
Euclidean distance. Suppose we have the following training set of positive (+)
and negative (-) instances and a single test instance (o). All instances are
projected onto a vector space of two real-valued features (X and Y). Answer
the following questions. Assume “unweighted” KNN (every nearest neighbor
contributes equally to the final vote).
4
Figure 4: Input distribution.
(a) What would be the class assigned to this test instance for K=1?
(b) What would be the class assigned to this test instance for K=3?
(c) What would be the class assigned to this test instance for K=5?
(d) Setting K to a large value seems like a good idea. We get more votes!
Given this particular training set, would you recommend setting K = 11?
Why or why not?