Professional Documents
Culture Documents
Machine Learning PYQ 2022 Ans
Machine Learning PYQ 2022 Ans
Machine Learning PYQ 2022 Ans
Section A (5 marks)
Question 1(i)
Concept learning is defined as Inferring a boolean-valued function from training examples of its
input and output. It describes the process by which experience allows us to partition objects in
the world into classes for the purpose of generalization, discrimination, and inference. Models of
concept learning have adopted one of three contrasting views concerning category
representation. Concept learning can be viewed as the task of searching through a large space of
hypotheses implicitly defined by the hypothesis representation. The goal of this search is to find
the hypothesis that best fits the training examples.
Consider the example task of learning the target concept "days on which Aldo enjoys his favorite
watersport." Given a set of example days, each represented by a set of attributes, one of the
attribute say EnjoySport can indicate whether or not Aldo enjoys his favorite water sport on a
particular day. The task is to learn to predict the value of EnjoySport for an arbitrary day, based
on the values of its other attributes.
Overfitting Underfitting
Training data is modeled well Training data is not modeled well with no
generalization of new data

When training is done with a lot of data, Cannot capture the underlying trend of the
overfitting occurs When a model gets trained data.
with so much data, it starts learning from the
noise and inaccurate data entries in our data
set. Then the model does not categorize the
data correctly, because of too many details
and noise.
High variance and low bias High bias and low variance
Generalization describe a model’s ability to react to new data. That is, after being trained on a
training set, a model can digest new data and make accurate predictions. If a model has been
trained too well on training data, it will be unable to generalize. It will make inaccurate
predictions when given new data, making the model useless even though it is able to make
accurate predictions for the training data (Overfitting). The inverse is also true. Underfitting
happens when a model has not been trained enough on the data. In the case of underfitting, it
makes the model just as useless and it is not capable of making accurate predictions, even with
the training data.
We can construct a new feature from two Boolean or categorical features by forming their
Cartesian product. For example, if we have one feature Shape with values Circle, Triangle and
Square, and another feature Color with values Red, Green and Blue, then their Cartesian product
would be the feature (Shape,Color) with values (Circle,Red), (Circle,Green), (Circle,Blue),
(Triangle,Red), and so on. The effect that this would have depends on the model being trained.
Constructing Cartesian product features for a naive Bayes classifier means that the two original
features are no longer treated as independent, and so this reduces the strong bias that naive Bayes
models have. This is not the case for tree models, which can already distinguish between all
possible pairs of feature values. On the other hand, a newly introduced Cartesian product feature
may incur a high information gain, so it can possibly affect the model learned.
Question 1(vi) 5 marks ( 2 formula + 3 for calculation) Give marks if any base is taken.
For logBase - 2. Answer is 2
A7



Question 2
i)
XW = Y
XT.XW = XTY
W = inv(XT.X)XT.Y
X= XT = XT X =
1 1 9 1 1 1 1 5 15 19
1 2 1 1 15 55 49
1 3 2 1 2 3 4 19 49 111
1 4 3 5
1 5 4 9 1 2 3
4
It involves using the entire dataset or The process simply takes one
training set to compute the gradient to random stochastic gradient descent
find the optimal solution. example, iterates, then improves
before moving to the next random
example.
takes the sum of all the training set to
Saves time and computing space
run a single iteration
while still looking for the best
optimal solution
Disadvantage: Disadvantage:
Since it uses the entire dataset, for a Since it takes and iterates one
large sample size (millions of samples) example at a time, it tends to result
it will result in high computation time in more noise than desired.
and space


In this question if the student write
Field(4+ve, 4-ve)
IT (4- 2+ve , 2-ve) Business(4, 2+ve, 2-ve)
Experience(4+ve , 4-ve)
Coding (4, 2+ve, 2-ve) Administration(4, 2+ve , 2-ve)
Question 6
(4 marks)
(i) K-NN is a classification machine learning algorithm while K-means is a clustering
machine learning algorithm.
(ii)K-NN is a lazy learner while K-Means is an eager learner. An eager learner has a
model fitting that means a training step but a lazy learner does not have a training
phase.
(iii)K-NN is a Supervised machine learning while K-means is an unsupervised
machine learning
(iv)K-NN performs much better if all of the data have the same scale but this is not true
for K-means.
Give marks if students have written algorithms for knn and k-means
(ii) Steps used by PCA ( 6 marks)
1. Standardize the dataset X so that mean is zero. Let the standardized data
be X’
2. Calculate the covariance matrix C from X’
3. Find the eigenvectors and eigenvalues of the covariance matrix
4. Sort the columns of the eigenvector matrix V and eigenvalue matrix D in order of
decreasing eigenvalue.
5. Select a subset of the eigenvectors as basis vectors W
6. Project the data onto the new basis X_new = X’W
Question 7 (4 marks)
(i)
b1= 60/60 =1
Question 8
(i) Regularization is a technique to reduce the errors by fitting the function appropriately
on the given training set and avoid overfitting. It is a technique to reduce overfitting. It
applies a “penalty” to the input parameters with the larger coefficients, which
subsequently limits the amount of variance in the model. (2 marks)
If lambda is zero it will not have any effect on the model and model might be overfitted
(1 marks)
If the lambda is infinity then the model will make all the coefficients zero. Hence the
resulting model will be underfitted. (1marks)
(ii) (4 marks for explanation and diagram , 2 marks for mathematical formulation )
There can be many hyperplanes that can do this task but the objective is to find that
hyperplane that has the highest margin that means maximum distances between the
two classes, so that in future if a new data point comes that is be classified then it can
be classified easily.
Mathematical formulation: Given a set of (xi,yi) i= 1 to n , objective is to choose a plane
that maximizes the margin b/w the +ve and -ve samples