Professional Documents
Culture Documents
EndSem 202223 Solution
EndSem 202223 Solution
Instructions
• Attempt all questions.
• Marks corresponding to each question is highlighted in bold within square braces at the end of
the question.
• You are allowed to carry any handwritten notes or printed material. Laptops are not allowed.
• You should show the reasoning behind each answer in the answer sheet clearly indicating the
problem number. In case of ambiguities in any of the questions, clearly state your assumptions
and attempt the question(s).
Answer of exercise 1
1 of 4
BITS F464
Machine Learning 2022-23
Aditya Challa End-Sem Exam
(c) The cross-entropy loss for the transformed data set D0 is given by,
n X
X L Xn
L= I(yi = l) log( I(yj = l)pij ) (4)
i=1 l=0 j=1
1 + 2 + 2 + 2 + 2 = 9 marks
Answer of exercise 2
2 of 4
BITS F464
Machine Learning 2022-23
Aditya Challa End-Sem Exam
Pk Pk
(c) Prove or disprove i=1 βM,i (dfi (θ0 )/dθ) ≥ i=1 βL,i (dfi (θ0 )/dθ).
Suppose we reverse the labels - i.e assign the label +1 if θ < θ0 and label −1 for θ > θ0 , then answer the following
questions.
(d) Is the quantity ki=1 βM,i (dfi (θ0 )/dθ) less than or equal to zero? Justify your answer.
P
(e) Prove or disprove ki=1 βM,i (dfi (θ0 )/dθ) ≥ ki=1 βL,i (dfi (θ0 )/dθ).
P P
For the logistic regression the loss function is the cross entropy loss defined as −yi log(sigmoid(β t f (θ) + β0 )) − (1 −
yi ) log(1 − sigmoid(β t f (θ) + β0 )). So, for label yi = 1, minimizing the cross-entropy maximizing β t f (θ) + β0 . Using
t t t
the same first order approximation, we should maximize βL ∇f (θ0 ). So, we must have βL ∇f (θ0 ) ≥ βM ∇f (θ0 ).
Pk t t
If the labels are reversed, we would get i=1 βM,i ( df i (θ 0 )/dθ) ≤ 0. Accordingly, we would get βL ∇f (θ0 ) ≤ βM ∇f (θ0 ).
Answer of exercise 4
The main idea is the following construction - Let the neural network be fθ (x). Now for a sample xi , Θ(xi ) denotes
the hidden neurons which has value 0. Replace all the connections to these hidden neurons with weight 0. Then,
the entire network is simply a sequence of linear operators and hence equivalent to a matrix multiplication. Let this
matrix be W . Then, fθ (xi ) = W xi . Now, we can answer the questions.
3 of 4
BITS F464
Machine Learning 2022-23
Aditya Challa End-Sem Exam
[2 + 2 + 2 = 6 Marks]
Answer of exercise 5
The variance of our estimate depends on the underlying distribution. Take the extreme example where X = 0 with
probability 1. Then irrespective of what θ is, the estimate will always be θ0 . So essentially we only have one sample
(0, θ0 ) repeating several times. Then any value θ is a minimizer and hence the variance of θ̂ can be potentially infinite.
4 of 4