Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Birla Institute of Technology & Science, Pilani

Work-Integrated Learning Programmes Division

Second Semester 2020-2021

Mid-Semester Test
(EC-2 Regular)


Course No. : SS ZG568

Nature of Exam : Open Book
Weightage : 30% No. of Pages =
Duration : 2 Hours No. of Questions =
Date of Exam : Saturday, 06/03/2021 (AN)
1. Please follow all the Instructions to Candidates given on the cover page of the answer book.
2. All parts of a question should be answered consecutively. Each answer should start from a fresh page.
3. Assumptions made if any, should be stated clearly at the beginning of your answer.

Q.1Set. (A) [1 + 1.5 + 1.5 + 1 + 1 = 6]

(a) When is mini batch learning preferred over batch learning? Large dataset. Works particularly
well when GPUs are available.
(b) What is the difference between instance-based vs. model based learning? Which one is faster
to train? Which one is faster during inferencing? Instance-based faster to train. Model based
faster during inferencing.
(c) A machine learning algorithm has 3 hyperparameters with h1, h2, h3 number of different
values. How many models do you need to build if you are doing 5 fold cross validation to
choose the best model? 5h1*h2*h3
(d) When do you use stratified random sampling over simple random sampling? Heavy tailed
distribution of training data
(e) What is one advantage of using One-Vs-One strategy in multiclass classification? Each
classifier requires to be trained only with data belonging to two classes.

Q.1Set. (B) [1 + 1.5 + 1.5 + 1 + 1 = 6]

(a) When is mini batch learning preferred over batch learning?
(b) What is the difference between instance-based vs. model based learning? Which one is faster
to train? Which one is faster during inferencing?
(c) A machine learning algorithm has 3 hyperparameters with h1, h2, h3 number of different
values. How many models do you need to build if you are doing 5 fold cross validation to
choose the best model?
(d) When do you use stratified random sampling over simple random sampling?
(e) What is one advantage of using One-Vs-One strategy in multiclass classification?

Q.1 Set. (C) [1 + 1.5 + 1.5 + 1 + 1 = 6]

(a) When is mini batch learning preferred over batch learning?
(b) What is the difference between instance-based vs. model based learning? Which one is faster
to train? Which one is faster during inferencing?
(c) A machine learning algorithm has 3 hyperparameters with h1, h2, h3 number of different
values. How many models do you need to build if you are doing 5 fold cross validation to
choose the best model?
(d) When do you use stratified random sampling over simple random sampling?
(e) What is one advantage of using One-Vs-One strategy in multiclass classification?

Q.2Set. (A) You are training a logistic regression classifier to classify the following data
Input x Class
-0.5 1
0.5 0
Logistic regression function is given by y = 1/ (1 + exp(-w0 – w1*x)). Assume initially at t=0,
(w0, w1) = (0,0). What will be the values of w0 and w1 after one iteration with learning rate =
1. [3 + 3 = 6]
y = 1/(1+exp(−w0−w1x))
Intially t =0 (w0,w1) = (0,0)
With learning rate = 1, w0 (t=1) = 0, w1 (t=1) = -0.25

Q.2Set. (B) Design one logistic regression based classifier which will give the best accuracy on training
data described by the following dataset. Class Y=1 if output of classifier >0.5, Y=0 if output
of classifier < 0.5. Note multiple solutions are possible. [4 + 2 = 6]

Input Input Output

x1 x2 Class
0 0 0
0 1 0
1 0 0
-1 0 0
0 -1 0
1 1 1
1 -1 1
-1 1 1
-1 -1 1

(a) What is the logistic regression equation (specify all parameters) for the best possible classifier
(zero classification error) with least chance of overfitting?
w0, w1, w2 need to be chosen such that all points with output class Y=0 are inside the ellipse
given by w1x1^2+w2x2^2=w0. w1=w2=1 and 1.0 < w0 < 2 will achieve zero classification
(b) What is the equation of the corresponding decision surface? Draw the decision surface. Note:
classifier output =0.5 for points on the decision surface.
w1x1^2+w2x2^2=w0 is the corresponding decision surface. It will be elliptical in shape
centered at (0,0).

Q.2Set. (C) You are training a logistic regression classifier to classify the following data
Input x Class
-0.5 1
0.5 0

Logistic regression function is given by y = 1/ (1 + exp(-w0 – w1*x)). Assume initially at t=0,

(w0, w1) = (0,0). What will be the values of w0 and w1 after one iteration with learning rate = 1.
[3 + 3 = 6]
y = 1/(1+exp(−w0−w1x))
Initially t =0 (w0,w1) = (0,0)
With learning rate = 1, w0 (t=1) = 0, w1 (t=1) = -0.25

Q.3Set. (A) First five documents in the following figure are used to train a Naive Bayes classifier.
Calculate Prob( + | Test ), Prob( - | Test ). If needed, use Laplace smoothing. Which class
does the Test document belong to? [6]
Q.3Set. (B) First five documents in the following figure are used to train a Naive Bayes classifier.
Calculate Prob( + | Test ), Prob( - | Test ). If needed, use Laplace smoothing. Which class
does the Test document belong to? [6]

Q.3Set. (C) The dataset given below contains 10 training samples for a binary classification problem
with attributes color, type, origin of the car and the class label theft assigned as yes/no.
Predict the probability of theft of a Test record = <Red, Domestic, SUV car>. If needed, use
Laplace smoothing. [6]
P(theft=Y)=0.5 P(theft=N)=0.5
P(yes|test)=P(test|yes)P(yes)/P(test) = P(red|yes)P(Domestic|yes)P(SUV|yes)
P(no|test)= P(red|no)P(Domestic|no)P(SUV|no) P(no)/P(test)

So, theft =no for the test record

Q.4Set. (A) Consider the input output pairs <x,y> of the training data given as <1,1>, <1,2>, <2,2>,
<3,3>, <5, 3>, <7, 8>, <6,4>, <7,5>, <6,7> and <4,4> . Let h(x) = w*x be the hypothesis in
one parameter w (lines passing through the origin) used for line fitting for the given data,
where w is to be taken as the slope of the line represented by the hypothesis. [2 + 4 = 6]

(a) What is the equation of sum of squared error or loss function E(w)?

(b) What is the shape of E(w)? Calculate the optimal w=w_optimal and minimum value
of total squared error E(w_optimal).

At Global minima,

E(w) is a vertically oriented parabola with vertex ( 0.90, 12.86)

Q.4Set. (B) [4 + 2 = 6]

(a) Using regression, determine a, b in y=a × 10bx for the table given below.
Check Q4 SetB.pdf
(b) Find the optimal sum of squares error.

Input x Output y
0 1.4
2 16
4 160
6 1400

Q.4Set. (C) Consider the following dataset with 4 records. [4 + 2 = 6]

Input x Output y
1 exp(2)
2 exp(4)
3 exp(6.3)
4 exp(9.2)

Assume output y = exp(α x). Using linear regression,

(a) Find the best value of α.

ln(y_i)= α x_i
J(α) = (α-2)^2+(2 α-4)^2+(3 α – 6.3)^2 + (4 α – 9.2)^2

Taking derivative of J(α) w.r.t. α and equating to 0,

(α-2)+2(2 α-4)+3(3 α – 6.3)+4(4 α – 9.2)=0
Or, α = 65.7/30=2.2

(b) Find the optimal total sum of square error.


Q.5Set. (A) [3 + 1 + 1 + 1 = 6]

(a) Find the equation of maximum margin SVM classifier for the OR logic gate with following
truth table

Input x1 Input x2 Output y

0 0 0
0 1 1
1 0 1
1 1 1

x1+x2=0.5 is the equation of maximum margin classifier.

(b) Show the support vectors.
(0,0) for y=0; (1,0) (0,1) for y=1
(c) Draw the training data points and the maximum margin decision line.
(d) What will be the output for (x1,x2)=(0.4,0.33)? y=1, since 0.4+0.33 > 0.5

Q.5Set. (B) [3 + 1 + 1 + 1 = 6]
(a) Find the equation of maximum margin SVM classifier for the AND logic gate with following
truth table

Input x1 Input x2 Output y

0 0 0
0 1 0
1 0 0
1 1 1

x1+x2=1.5 is the equation of maximum margin classifier.

(b) Show the support vectors
(1,0) (0,1) for y=0, (1,1) for y=1
(c) Draw the training data points with output label and the maximum margin decision line.
(d) What will be the output for (x1,x2)=(0.4,0.67)? y=0, since 0.4+0.67< 1.5

Q.5Set. (C) [3 + 1 + 1 + 1 = 6]

(a) Find the equation for hyper plane using linear Support Vector Machine method. Positive Class
data points: (x1,x2)={(3, 2), (4, 3), (2, 3), (3, -1)} Negative Class data points: {(1, 0), (1, -1),
(0, 2), (-1, 2)}

(b) Show the support vectors

(2,3) (3,-1) for +ve class, (1,0) for –ve class. Blue lines pass thru the SVs. Red lines do not
pass thru’ the SVs. Note: Margin for blue lines > red lines
(c) Draw the training data points with output label and the maximum margin decision line.
(d) What will be the output for (x1,x2)=(1.0,1.0)? Negative class since 4*1+1*1 < 7.5


You might also like