Machine Learning PYQ 2022 Ans

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

Machine Learning Exam Solution - May 2022

Serial No of QP: 1262


UPC: 32347607
Course: B. Sc (H) Computer Science, VI Semester
Date of exam: 25th May 2022

Section A (5 marks)
Question 1(i)

Differences between Supervised and Unsupervised learning are:

Supervised Learning Unsupervised Learning


Supervised learning algorithms are trained using Unsupervised learning algorithms are
labelled data. trained using unlabelled data.
Unsupervised learning model finds
Supervised learning model predicts the output.
the hidden patterns in data.
The goal of supervised learning is to train the The goal of unsupervised learning is
model so that it can predict the output when it is to find the hidden patterns and useful
given new data. insights from the unknown dataset.
Unsupervised Learning can be
Supervised learning can be categorized in
classified in Clustering and
Classification and Regression problems.
Associations problems.
It includes various algorithms such as Linear
It includes various algorithms such as
Regression, Logistic Regression, Support Vector
Clustering, KNN, and Apriori
Machine, Multi-class Classification, Decision tree,
algorithm.
Bayesian Logic, etc.

Question 1 (ii) (5 marks)

Concept learning is defined as Inferring a boolean-valued function from training examples of its
input and output. It describes the process by which experience allows us to partition objects in
the world into classes for the purpose of generalization, discrimination, and inference. Models of
concept learning have adopted one of three contrasting views concerning category
representation. Concept learning can be viewed as the task of searching through a large space of
hypotheses implicitly defined by the hypothesis representation. The goal of this search is to find
the hypothesis that best fits the training examples.
Consider the example task of learning the target concept "days on which Aldo enjoys his favorite
watersport." Given a set of example days, each represented by a set of attributes, one of the
attribute say EnjoySport can indicate whether or not Aldo enjoys his favorite water sport on a
particular day. The task is to learn to predict the value of EnjoySport for an arbitrary day, based
on the values of its other attributes.

Question 1 (iii) (5 marks - 4 for y and 1 for applying sigmoid. )

Question 1 (iv) (Difference 3 marks)

Overfitting Underfitting
Training data is modeled well Training data is not modeled well with no
generalization of new data

When training is done with a lot of data, Cannot capture the underlying trend of the
overfitting occurs When a model gets trained data.
with so much data, it starts learning from the
noise and inaccurate data entries in our data
set. Then the model does not categorize the
data correctly, because of too many details
and noise.
High variance and low bias High bias and low variance

How does overfitting and underfitting affect model generalization? (2 marks)

Generalization describe a model’s ability to react to new data. That is, after being trained on a
training set, a model can digest new data and make accurate predictions. If a model has been
trained too well on training data, it will be unable to generalize. It will make inaccurate
predictions when given new data, making the model useless even though it is able to make
accurate predictions for the training data (Overfitting). The inverse is also true. Underfitting
happens when a model has not been trained enough on the data. In the case of underfitting, it
makes the model just as useless and it is not capable of making accurate predictions, even with
the training data.

Question 1(v) 5 marks

We can construct a new feature from two Boolean or categorical features by forming their
Cartesian product. For example, if we have one feature Shape with values Circle, Triangle and
Square, and another feature Color with values Red, Green and Blue, then their Cartesian product
would be the feature (Shape,Color) with values (Circle,Red), (Circle,Green), (Circle,Blue),
(Triangle,Red), and so on. The effect that this would have depends on the model being trained.
Constructing Cartesian product features for a naive Bayes classifier means that the two original
features are no longer treated as independent, and so this reduces the strong bias that naive Bayes
models have. This is not the case for tree models, which can already distinguish between all
possible pairs of feature values. On the other hand, a newly introduced Cartesian product feature
may incur a high information gain, so it can possibly affect the model learned.

Question 1(vi) 5 marks ( 2 formula + 3 for calculation) Give marks if any base is taken.
For logBase - 2. Answer is 2

A7



Question 2
i)
XW = Y
XT.XW = XTY
W = inv(XT.X)XT.Y

X= XT = XT X =
1 1 9 1 1 1 1 5 15 19
1 2 1 1 15 55 49
1 3 2 1 2 3 4 19 49 111
1 4 3 5
1 5 4 9 1 2 3
4

Inv(XT X) = XT*Y = inv(XT X)*XT*Y =


926/405 -367/810 -31/16 69 -8743/810
2 228 3613/81
-367/810 97/810 2/81 341 239/81
-31/162 2/81 5/162

( 2 marks for formula + 3 for calculating answer except inverse calculation)


Correct steps 5 and 1 marks for final answer

3 (i) ( 2 marks)
Bayes' theorem for conditional probability (stated mathematically) is as follows:
P(A│B)=(P(B|A) P(A))/(P(B))
Where:
● A and B must be different events
● !"#$%&
● P(A│B) is conditional probability: the probability of event A occurring given that B
is true (posterior probability)
● P(B|A) is conditional probability: the probability of event B occurring given that A
is true
● P(A),P(B): the probability of observing A and B respectively without any given
conditions (marginal probability or prior probability)
6 marks for model (prior probability) and 2 marks for final probabilities

4 (i) (4 marks)

Standard/ Batch Gradient Descent Stochastic Gradient Descent

It involves using the entire dataset or The process simply takes one
training set to compute the gradient to random stochastic gradient descent
find the optimal solution. example, iterates, then improves
before moving to the next random
example.
takes the sum of all the training set to
Saves time and computing space
run a single iteration
while still looking for the best
optimal solution

the dataset is properly shuffled to


avoid pre-existing orders then
partitioned into m examples.

Formula: Shuffle (reorder) training examples:

Disadvantage: Disadvantage:
Since it uses the entire dataset, for a Since it takes and iterates one
large sample size (millions of samples) example at a time, it tends to result
it will result in high computation time in more noise than desired.
and space


In this question if the student write

Field(4+ve, 4-ve)
IT (4- 2+ve , 2-ve) Business(4, 2+ve, 2-ve)

Experience(4+ve , 4-ve)
Coding (4, 2+ve, 2-ve) Administration(4, 2+ve , 2-ve)

Splitting on any attribute is giving same information, so any attribute could be


chosen.

6 marks should be awarded




Question 5. (10 marks)

Input to h1 = 0.05 * 0.15 + 0.10 * 0.25 + 0.35 = 0.3825


Input to h2 = 0.05 * 0.2 + 0.1 * 0.3 + 0.35 = 0.39

Output(h1) = sigmoid(0.3825) = 0.594 ( 2 marks)

Output(h2) = sigmoid(0.39) = 0.596 (2 marks)

Input to o1 = 0.594* 0.40 +0.596* 0.50 +0.6 = 1.1356


Input to o2 = 0.594* 0.45 + 0.596* 0.55 + 0.6 = 1.1951

Output(o1) = sigmoid(1.1356) = 0.7568 (2 marks)


Output(o2) = sigmoid(1.1951) = 0.7676 (2 marks)

Error = (½)*[(0.11-0.7568)*(0.11-0.7568)+(0.99-0.7676)*(0.99-0.7676)] = 0.024


( 2 marks)

Question 6
(4 marks)
(i) K-NN is a classification machine learning algorithm while K-means is a clustering
machine learning algorithm.
(ii)K-NN is a lazy learner while K-Means is an eager learner. An eager learner has a
model fitting that means a training step but a lazy learner does not have a training
phase.
(iii)K-NN is a Supervised machine learning while K-means is an unsupervised
machine learning
(iv)K-NN performs much better if all of the data have the same scale but this is not true
for K-means.

Give marks if students have written algorithms for knn and k-means
(ii) Steps used by PCA ( 6 marks)

1. Standardize the dataset X so that mean is zero. Let the standardized data
be X’
2. Calculate the covariance matrix C from X’
3. Find the eigenvectors and eigenvalues of the covariance matrix
4. Sort the columns of the eigenvector matrix V and eigenvalue matrix D in order of
decreasing eigenvalue.
5. Select a subset of the eigenvectors as basis vectors W
6. Project the data onto the new basis X_new = X’W

Question 7 (4 marks)

(i)

x y x-xmean y-ymean (x-xmean)* (x-xmean)*


(y-ymean) (x-xmean)
3 1 -3 -4.4 13.2 9
9 8 3 2.6 7.8 9
11 11 5 5.6 28 25
5 4 -1 -1.4 1.4 1
2 3 -4 -2.4 9.6 16
Mean =6 Mean = 5.4 60 60

b1= 60/60 =1

B0 = 5.4 - 1*6 = -0.6

3 marks for correct steps and 1 for correct output


(ii)

Confusion Matrix (4 marks)

n=1000 Predicted No Predicted Yes


Actual No 780 (TN) 120(FP)
Actual Yes 40(FN) 60(TP)

Precision = TP/(TP+FP) = 60/180 = ⅓ (1 marks)


Recall = TP/(TP+FN) = 60/100 = 0.6 ( 1 marks)

Question 8

(i) Regularization is a technique to reduce the errors by fitting the function appropriately
on the given training set and avoid overfitting. It is a technique to reduce overfitting. It
applies a “penalty” to the input parameters with the larger coefficients, which
subsequently limits the amount of variance in the model. (2 marks)

If lambda is zero it will not have any effect on the model and model might be overfitted
(1 marks)

If the lambda is infinity then the model will make all the coefficients zero. Hence the
resulting model will be underfitted. (1marks)

(ii) (4 marks for explanation and diagram , 2 marks for mathematical formulation )

A support vector machine is a machine learning model that is able to generalize


between two different classes if the set of labeled data is provided in the training set to
the algorithm. The main function of the SVM is to check for that hyperplane that is able
to distinguish between the two classes.

There can be many hyperplanes that can do this task but the objective is to find that
hyperplane that has the highest margin that means maximum distances between the
two classes, so that in future if a new data point comes that is be classified then it can
be classified easily.
Mathematical formulation: Given a set of (xi,yi) i= 1 to n , objective is to choose a plane
that maximizes the margin b/w the +ve and -ve samples

You might also like