Professional Documents
Culture Documents
D. Assume Model A Is Trained For Multi-Label Classification and Binary Cross-Entropy Is The Loss, The Loss Value For Label Sunny Is - Log0.2
D. Assume Model A Is Trained For Multi-Label Classification and Binary Cross-Entropy Is The Loss, The Loss Value For Label Sunny Is - Log0.2
D. Assume Model A Is Trained For Multi-Label Classification and Binary Cross-Entropy Is The Loss, The Loss Value For Label Sunny Is - Log0.2
3. If model A predicts the probability of an instance as Sunny = 0.2, Raining = 0.5, and Cloudy = 0.3,
and the target value is Cloudy, which statement is incorrect?
A. This problem could be a multi-class single-label classification problem
B. This problem could be a multi-class multi-label classification problem
C. Assume Model A is trained for single-label classification and cross-entropy is the loss, the
loss value is -log0.3
D. Assume Model A is trained for multi-label classification and binary cross-entropy is the loss,
the loss value for label Sunny is -log0.2
A. Model parameters should be randomly initialized to break symmetry (i.e., the neurons are
always having the same values)
B. Data normalization rescales the parameters into the similar scale
C. Early stopping to one approach for reducing overfitting
D. Mini-batch SGD can jump out of local optimum and saddle points, and is more stable than
SGD.
6. If 𝑦𝑦 = (𝑨𝑨𝑨𝑨)𝑇𝑇 (2𝒙𝒙 + 𝒛𝒛), where A is a square matrix, x and z are vectors, y is a scalar. Which one is
𝜕𝜕𝜕𝜕
𝜕𝜕𝒙𝒙
1 2 𝜕𝜕𝜕𝜕
7. 𝐿𝐿 = �𝒘𝒘𝑻𝑻 𝒙𝒙 − 𝑦𝑦� if 𝒙𝒙 = (1, 2), 𝒘𝒘 = (2, 1), 𝑦𝑦 = 0 . Compute the gradient of = ( 4, 8)𝑇𝑇
2 𝜕𝜕𝒘𝒘
𝜕𝜕𝜕𝜕 𝜕𝜕𝜕𝜕
8. If the input to the ReLU function (𝒚𝒚 = 𝑚𝑚𝑚𝑚𝑚𝑚(0, 𝒙𝒙)) is 𝒙𝒙 = (−1, 1)𝑇𝑇 , = (1, 2)𝑇𝑇 , = ( 0 , 2 )𝑇𝑇
𝜕𝜕𝒚𝒚 𝜕𝜕𝒙𝒙
9. For a parameter vector 𝒘𝒘 = (0, 1)𝑇𝑇 , if the gradient for the first two iterations are 𝑑𝑑𝑑𝑑 =
(1, −1)𝑇𝑇 and 𝑑𝑑𝑑𝑑 = (1, 1)𝑇𝑇 respectively, what is the value of w after the two iterations using
SGD with momentum as the optimization algorithm (learning rate is 0.1, beta=0.9) 𝑤𝑤 =
(−𝟎𝟎. 𝟐𝟐𝟐𝟐 , 1.09 )𝑇𝑇