Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Mid Term Solution

MCQ
1. (B) P(B|A) decreases
The conditional probability equation for joint probability distribution;
P(A, B) = P(A|B)P(B) = P(B|A)P(A).
In P(A, B) = P(B|A)P(A) if P(A) increases then, only the decrease in P(B|A) will result in decrease of P(A,
B).

2. (B) and (D)


On smaller datasets, variance is a concern since even small changes in the training set may change the
optimal parameters significantly. Hence, a high bias/ low variance classifier would be preferred. On the
other hand, with a large dataset, since we have enough points to represent the data distribution
accurately, variance is not of much concern. Hence, one would go for the classifier with low bias even
though it has higher variance.

3.A) L1 norm.

4. (b) P(E), P(H), P(E|H)


This is Bayes’ rule;
P(H|E F) = (P(E F|H)*P(H)) / P(E F)

5. (d) decrease variance


Averaging out the predictions of multiple classifiers will drastically reduce the variance.
Averaging is not specific to decision trees; it can work with many different learning algorithms. But it works
particularly well with decision trees.

6. (b) lasso

For feature selection, we would prefer to use lasso since solving the optimization problem when using
lasso will cause some of the coefficients to be exactly zero (depending of course on the data) whereas
with ridge regression, the magnitude of the coefficients will be reduced, but won't go down to zero.

7. A)

take the derivative of the given function, equate to zero and solve for u, as it is given, we want to minimize
the squared error

2|x1-u| + 2|x2-u| + 2|x3-u| +....2|xn-u| = 0

u = (x1 + x2…..xn ) / n = mean(x)

8. B)

λ is a scalar value, which determines the magnitude of eigen vector.

Av = λv given, To check whether v is a eigen vector of A2, Consider A2v = A(Av) = A(λv) = λ(Av) = λ(λx)
= λ2x. Hence A50v = λ50 v

9. A)
for multiplication and addition the calculation is (64*200 + 200 * 10) = 14800

10. A

Explanation is as derived in this image

11. C

take vectors whose eigen values are maximum as it captured the spread of data more. so v1, v3 are the
vectors with λ1 = 6, λ4 = 6

12. D

Expected Gain is : 75% chance is number above 35 to come, and 35% chance you loose (win chance *
amount + loss chance * -amount)
A - (0.75 * 1 + 0.35 * -1) * 100 = 40
B - (0.75 * 10 + 0.35 * -10) * 10= 40
C - (0.75 * 100 + 0.35 * -100) * 1= 40
All have same expected gain, so ans is D

13. D

¼ - to select same number, ¾ not to select same number for friend 1 and friend 2. if friend 1 and friend 2
select same number, friend 3 to match with same number has ¼ probability. so same number chose
probability amongst three friends is ¼ * ¼ = 1/16If friend 1 and friend 2 did not select same number,
probability of friend 3 to match either friend1 or friend 2 number is 2/4 . So total Yes probability to choose
same number is 1/16 + (¾ * 2/4) = 7/16

14. C

15. D

all the conditions are part of Random Forest training procedure


Mid Term (Subjective Solution)

1.

The formula for entropy is:

Answer is -(5/8 log(5/8) + 3/8 log(3/8))

= 0.287

2. 0.63
𝑃(𝑇/𝑋)∗𝑃(𝑋)
P(X|T) = 𝑃(𝑇)

0.99∗0.05
=
0.99∗0.05+0.03∗0.95
0.0495
=
0.078

= 0.63

3. Give 1 mark each even if the formula is missing but the final answer matches, otherwise if the final answer is
wrong but formula is written correctly then give 0.5 marks each.

Accuracy: 𝑇𝑃 + 𝑇𝑁 120/150 = 4/5 = 0.8


𝑇𝑃 + 𝐹𝑃 + 𝑇𝑁 + 𝐹𝑁
Misclassification Rate: 1 – Accuracy 0.2
True Positive Rate: 𝑇𝑃 70/80 = 7/8 = 0.875
𝑇𝑃 + 𝐹𝑁
False Positive Rate: 𝐹𝑃 20/70 = 2/7 = 0.285
𝐹𝑃 + 𝑇𝑁
Precision: 𝑇𝑃 70/90 = 7/9 = 0.77
𝑇𝑃 + 𝐹𝑃

4.

Bias occurs when the predicted values are further from the actual values. Low bias indicates a model where the
prediction values are very close to the actual ones. High bias can cause an algorithm to miss the relevant relations
between features and target outputs.

Variance refers to the amount the target model will change when trained with different training data. For a good
model, the variance should be minimized. High variance can cause an algorithm to model the random noise in the
training data rather than the intended outputs.

You might also like