Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Birla Institute of Technology & Science, Pilani

Work-Integrated Learning Programmes Division


Second Semester 2020-2021

Mid-Semester Test
(EC-2 Regular)

SOLUTION

Course No. : SS ZG568


Course Title : APPLIED MACHINE LEARNING
Nature of Exam : Open Book
Weightage : 30% No. of Pages =
Duration : 2 Hours No. of Questions =
Date of Exam : Saturday, 06/03/2021 (AN)
Note:
1. Please follow all the Instructions to Candidates given on the cover page of the answer book.
2. All parts of a question should be answered consecutively. Each answer should start from a fresh page.
3. Assumptions made if any, should be stated clearly at the beginning of your answer.

Q.1Set. (A) [1 + 1.5 + 1.5 + 1 + 1 = 6]


(a) When is mini batch learning preferred over batch learning? Large dataset. Works particularly
well when GPUs are available.
(b) What is the difference between instance-based vs. model based learning? Which one is faster
to train? Which one is faster during inferencing? Instance-based faster to train. Model based
faster during inferencing.
(c) A machine learning algorithm has 3 hyperparameters with h1, h2, h3 number of different
values. How many models do you need to build if you are doing 5 fold cross validation to
choose the best model? 5h1*h2*h3
(d) When do you use stratified random sampling over simple random sampling? Heavy tailed
distribution of training data
(e) What is one advantage of using One-Vs-One strategy in multiclass classification? Each
classifier requires to be trained only with data belonging to two classes.

Q.1Set. (B) [1 + 1.5 + 1.5 + 1 + 1 = 6]


(a) When is mini batch learning preferred over batch learning?
(b) What is the difference between instance-based vs. model based learning? Which one is faster
to train? Which one is faster during inferencing?
(c) A machine learning algorithm has 3 hyperparameters with h1, h2, h3 number of different
values. How many models do you need to build if you are doing 5 fold cross validation to
choose the best model?
(d) When do you use stratified random sampling over simple random sampling?
(e) What is one advantage of using One-Vs-One strategy in multiclass classification?

Q.1 Set. (C) [1 + 1.5 + 1.5 + 1 + 1 = 6]


(a) When is mini batch learning preferred over batch learning?
(b) What is the difference between instance-based vs. model based learning? Which one is faster
to train? Which one is faster during inferencing?
(c) A machine learning algorithm has 3 hyperparameters with h1, h2, h3 number of different
values. How many models do you need to build if you are doing 5 fold cross validation to
choose the best model?
(d) When do you use stratified random sampling over simple random sampling?
(e) What is one advantage of using One-Vs-One strategy in multiclass classification?

Q.2Set. (A) You are training a logistic regression classifier to classify the following data
Input x Class
-0.5 1
0.5 0
Logistic regression function is given by y = 1/ (1 + exp(-w0 – w1*x)). Assume initially at t=0,
(w0, w1) = (0,0). What will be the values of w0 and w1 after one iteration with learning rate =
1. [3 + 3 = 6]
y = 1/(1+exp(−w0−w1x))
Intially t =0 (w0,w1) = (0,0)
With learning rate = 1, w0 (t=1) = 0, w1 (t=1) = -0.25

Q.2Set. (B) Design one logistic regression based classifier which will give the best accuracy on training
data described by the following dataset. Class Y=1 if output of classifier >0.5, Y=0 if output
of classifier < 0.5. Note multiple solutions are possible. [4 + 2 = 6]

Input Input Output


x1 x2 Class
Y
0 0 0
0 1 0
1 0 0
-1 0 0
0 -1 0
1 1 1
1 -1 1
-1 1 1
-1 -1 1

(a) What is the logistic regression equation (specify all parameters) for the best possible classifier
(zero classification error) with least chance of overfitting?
h(x1,x2)=1/(1+exp(w0-w1x1^2-w2x2^2))
w0, w1, w2 need to be chosen such that all points with output class Y=0 are inside the ellipse
given by w1x1^2+w2x2^2=w0. w1=w2=1 and 1.0 < w0 < 2 will achieve zero classification
error.
(b) What is the equation of the corresponding decision surface? Draw the decision surface. Note:
classifier output =0.5 for points on the decision surface.
w1x1^2+w2x2^2=w0 is the corresponding decision surface. It will be elliptical in shape
centered at (0,0).

Q.2Set. (C) You are training a logistic regression classifier to classify the following data
Input x Class
-0.5 1
0.5 0

Logistic regression function is given by y = 1/ (1 + exp(-w0 – w1*x)). Assume initially at t=0,


(w0, w1) = (0,0). What will be the values of w0 and w1 after one iteration with learning rate = 1.
[3 + 3 = 6]
y = 1/(1+exp(−w0−w1x))
Initially t =0 (w0,w1) = (0,0)
With learning rate = 1, w0 (t=1) = 0, w1 (t=1) = -0.25

Q.3Set. (A) First five documents in the following figure are used to train a Naive Bayes classifier.
Calculate Prob( + | Test ), Prob( - | Test ). If needed, use Laplace smoothing. Which class
does the Test document belong to? [6]
Q.3Set. (B) First five documents in the following figure are used to train a Naive Bayes classifier.
Calculate Prob( + | Test ), Prob( - | Test ). If needed, use Laplace smoothing. Which class
does the Test document belong to? [6]

Q.3Set. (C) The dataset given below contains 10 training samples for a binary classification problem
with attributes color, type, origin of the car and the class label theft assigned as yes/no.
Predict the probability of theft of a Test record = <Red, Domestic, SUV car>. If needed, use
Laplace smoothing. [6]
P(theft=Y)=0.5 P(theft=N)=0.5
P(yes|test)=P(test|yes)P(yes)/P(test) = P(red|yes)P(Domestic|yes)P(SUV|yes)
P(yes)/P(test)
=3/5*2/5*1/5*0.5/P(test)
P(no|test)= P(red|no)P(Domestic|no)P(SUV|no) P(no)/P(test)
=2/5*3/5*3/5*0.5/P(test)

So, theft =no for the test record

Q.4Set. (A) Consider the input output pairs <x,y> of the training data given as <1,1>, <1,2>, <2,2>,
<3,3>, <5, 3>, <7, 8>, <6,4>, <7,5>, <6,7> and <4,4> . Let h(x) = w*x be the hypothesis in
one parameter w (lines passing through the origin) used for line fitting for the given data,
where w is to be taken as the slope of the line represented by the hypothesis. [2 + 4 = 6]

(a) What is the equation of sum of squared error or loss function E(w)?

(b) What is the shape of E(w)? Calculate the optimal w=w_optimal and minimum value
of total squared error E(w_optimal).

At Global minima,

E(w) is a vertically oriented parabola with vertex ( 0.90, 12.86)


Q.4Set. (B) [4 + 2 = 6]

(a) Using regression, determine a, b in y=a × 10bx for the table given below.
Check Q4 SetB.pdf
(b) Find the optimal sum of squares error.

Input x Output y
0 1.4
2 16
4 160
6 1400

Q.4Set. (C) Consider the following dataset with 4 records. [4 + 2 = 6]

Input x Output y
1 exp(2)
2 exp(4)
3 exp(6.3)
4 exp(9.2)

Assume output y = exp(α x). Using linear regression,

(a) Find the best value of α.

ln(y_i)= α x_i
J(α) = (α-2)^2+(2 α-4)^2+(3 α – 6.3)^2 + (4 α – 9.2)^2

Taking derivative of J(α) w.r.t. α and equating to 0,


(α-2)+2(2 α-4)+3(3 α – 6.3)+4(4 α – 9.2)=0
Or, α = 65.7/30=2.2

(b) Find the optimal total sum of square error.


(exp(2.2)-exp(2))^2+(exp(4.4)-exp(4))^2+(exp(6.6)-exp(6.3))^2+(exp(8.8)-exp(9.2))^2

Q.5Set. (A) [3 + 1 + 1 + 1 = 6]

(a) Find the equation of maximum margin SVM classifier for the OR logic gate with following
truth table

Input x1 Input x2 Output y


0 0 0
0 1 1
1 0 1
1 1 1

x1+x2=0.5 is the equation of maximum margin classifier.


(b) Show the support vectors.
(0,0) for y=0; (1,0) (0,1) for y=1
(c) Draw the training data points and the maximum margin decision line.
(d) What will be the output for (x1,x2)=(0.4,0.33)? y=1, since 0.4+0.33 > 0.5

Q.5Set. (B) [3 + 1 + 1 + 1 = 6]
(a) Find the equation of maximum margin SVM classifier for the AND logic gate with following
truth table

Input x1 Input x2 Output y


0 0 0
0 1 0
1 0 0
1 1 1

x1+x2=1.5 is the equation of maximum margin classifier.


(b) Show the support vectors
(1,0) (0,1) for y=0, (1,1) for y=1
(c) Draw the training data points with output label and the maximum margin decision line.
(d) What will be the output for (x1,x2)=(0.4,0.67)? y=0, since 0.4+0.67< 1.5

Q.5Set. (C) [3 + 1 + 1 + 1 = 6]

(a) Find the equation for hyper plane using linear Support Vector Machine method. Positive Class
data points: (x1,x2)={(3, 2), (4, 3), (2, 3), (3, -1)} Negative Class data points: {(1, 0), (1, -1),
(0, 2), (-1, 2)}

(b) Show the support vectors


(2,3) (3,-1) for +ve class, (1,0) for –ve class. Blue lines pass thru the SVs. Red lines do not
pass thru’ the SVs. Note: Margin for blue lines > red lines
(c) Draw the training data points with output label and the maximum margin decision line.
(d) What will be the output for (x1,x2)=(1.0,1.0)? Negative class since 4*1+1*1 < 7.5

*******

You might also like