Professional Documents
Culture Documents
2022 CS244 End Sem Soln
2022 CS244 End Sem Soln
1
Z value leaving an area of 0.025, is Z0.025 = 1.96 ‘−’) given the features A, B, and C.
Hence 95% confidence Interval is
–2.6 – 1.96*(0.3/6)< µ < 2.6 + 1.96*(0.3/6)
= –2.5 < µ < 2.7
6. Naive Bayes: Suppose we are given the following
dataset, where A, B, C are input binary random vari-
ables, and y is a binary output whose value we want to
predict. How would a naive Bayes classifier predict y
given this input: A = 0, B = 0, C = 1? Assume that in
case of a tie the classifier always prefers to predict 0
for y. [4] (A) First, consider building a decision tree by greed-
ily splitting according to information gain. (a) Which
features could be at the root of the resulting tree? (b)
How many edges are there in the longest path of the
resulting tree?
(B) Now, consider building a decision tree with the
smallest possible height. (a) Which features could
be at the root of the resulting tree? (b) How many
edges are there in the longest path of the resulting tree?
[2+2+2+1]
7. Deep Learning: Suppose you are given predictions of
n different experts (or, automated learners), whether 10. PageRank: Consider the following diagram that de-
a given email message is SPAM (1), or EMAIL (0). picts the connectivity among 4 web pages (nodes 1-
Your goal is to output a single prediction per message, 4). You need to compute the page-rank for each of the
that would be as accurate as possible. For this pur- node. Assume damping factor as 1. [Hints: Try to
pose, you’d like to implement a majority voting mech- avoid iterative method] [4]
anism. That is, if more than half of the experts predict
1 3
SPAM, than your final prediction should be SPAM for
that instance. Otherwise, the final prediction should
be EMAIL. (a) Suggest a neural network, that imple-
ments majority voting when there are 4 experts overall
(named A,B,C,D). Specify the network structure and
2 4
weights. (b) Explain shortly how to adapt the network
structure and weights to the general case of n experts.
[3+1] 11. SVM: Suppose we only have four training exam-
ples in two dimensions which are as follows. P1 =
8. Linear regression: We are interested here in a par- (0, 0), P2 = (2, 2), P3 = (h, 1), P4 = (0, 3) where
ticular 1-dimensional linear regression problem. The 0 ≤ h ≤ 3. The positive examples are P1 &P2 and
dataset corresponding to this problem has n examples the other two points are negative examples. (a) How
(x1 ; y1 ), . . . , (xn ; yn ) where xi and yi are real num- large can h ≥ 0 be so that the training points are still
bers for all i. Let w∗ = [w0∗ , w1∗ ]T be the least squares linearly separable? (b) What is the margin achieved by
solution. In other words, w∗ minimizes J(w) = the maximum margin boundary as a function of h? (c)
n
1X Assume that we can only observe the second compo-
(yi −w0 −w1 ×xi )2 . You can assume for our pur- nent of the input vectors. Without the other compo-
n
i=1
poses here that the solution is unique. Find the value nent, the labeled training points reduce to (0,+), (2,+),
of thePof the following expressions with justifications (1,−), and (3,−). What is the lowest order p of poly-
n
x nomial kernel that would allow us to correctly classify
[x̄ = in i ]
these points? [2+2+1]
(a) n1 ni=1 (yi − w0∗ − w1∗ xi )(xi − x̄)
P
2
and c2 = 13.0. (a) What are the initial cluster assign-
ments? (That is, which examples are in cluster c1 and
which examples are in cluster c2 ?) (b) What are the
new cluster centers after making the assignments in
(a)? (c) State True or False: K-Means Clustering is
guaranteed to converge. [1+1+1]
1 2
4 3
Fig: SC Fig: LA