Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

NPTEL Online Certification Courses

Indian Institute of Technology Kharagpur

Deep Learning
Assignment- Week 12
TYPE OF QUESTION: MCQ/MSQ
Number of questions: 10 Total mark: 10 X 1 = 10
______________________________________________________________________________

QUESTION 1:
We are given two distributions. The first distribution, P is a uniform distribution between [-3, 3].
Another distribution, Q is a Normal distribution with zero mean and standard deviation of 1.
What will be the KL(Q||P)?

a. 0.5
b. 0.0
c. 1.0
d. ∞

Correct Answer: d

Detailed Solution:

KL(Q || P) = ∫Q(x) log [Q(x)/ P(x)] dx


However, P(x) = 0 for all x >3 and for all x < 3, but Q(x) still has finite real values.
Thus log(∞) makes KL() go to infinity.
______________________________________________________________________________

QUESTION 2:
Which of the following is True regarding the reconstruction loss (realized as mean squared error
between input and predicted signal) of standard auto-encoder?

a. Such loss is not differentiable and cannot be used for back propagation
b. Such loss tends to form distinct clusters in latent space
c. Such loss cannot be optimized with gradient descent
d. None of the above
Correct Answer: b

Detailed Solution:

MSE based losses tend to form clusters in latent space of auto-encoders for similar category
signals, but in an unsupervised way. The remaining options are False.
______________________________________________________________________________
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

QUESTION 3:
What can be the maximum value of KL divergence metric?

a. 1
b. 0
c. ∞
d. 0.5

Correct Answer: c

Detailed Solution:

KL divergence is finite only if the support of p (range of the values of x for which p(x) is
defined) is contained within the support of q. However, note that KL divergence can be
infinite even if p(x) and q(x) are nonzero for all x. For example, we can show that for two
distributions, namely, Cauchy and Normal. The KL divergence is infinity even though both
of the distributions are defined for all real values of x.

Proof:

______________________________________________________________________________
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

QUESTION 4:
For an auto-encoder, suppose we give an input signal, x and reconstruct a signal y. Which one
of the following objective functions can we MINIMIZE to train the parameters of the auto-
encoder with gradient descent optimizer?

a. L(x, y) = exp -(|x - y|)


b. L(x,y) = - log(|x - y|)
c. L(x, y) = exp(|x-y|)
d. L(x,y) = (x + y)2
Correct Answer: c

Detailed Solution:

Except for option (c) all other objective functions will INCREASE if reconstructed signal
starts matching the input signal, x, and thus we will deviate away from our objective of
training the auto-encoder to mimic the input signal.

______________________________________________________________________________

QUESTION 5:
Suppose we have a 2N dimensional Normal distribution in which we assume all components are
independent of each other. What will be the size (number of elements) of the vector to fully
represent the covariance matrix of this distribution?

a. N
b. 2N
c. N/2
d. N/4
Correct Answer: b

Detailed Solution:

Since each component is independent of each other, there will be only 2N diagonal entries
in the covariance matrix. The remaining elements will be all equal to 0. So, a 2N
dimensional vector is sufficient to represent the covariance matrix. This principal is used in
VAE architecture.

____________________________________________________________________________
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

QUESTION 6:
What will happen if we do not enforce KL divergence loss in VAE latent code space?

a. The latent code distribution will be mimic zero mean and unit variance Normal
distribution
b. Network will learn to form distinctive clusters with high standard deviation for
each cluster
c. Network will learn to form distinctive clusters with low standard deviation for
each cluster
d. None of the above
Correct Answer: c

Detailed Solution:

With zero KL loss, the encoder part of VAE will try to form separated clusters (by
increasing the distance of the mean vectors) and simultaneously reduce the standard
deviation of each cluster to reduce confusion for the decoder part of the network. This will
allow to efficiently reduce the reconstruction loss only present as the loss component in the
network. So, without the KL loss, the network will reduce to a simple autoencoder network.

____________________________________________________________________________

QUESTION 7:
KL divergence between two discrete distributions, P and Q is given as 𝐾𝐿 (𝑄 || 𝑃):
𝑄(𝑥)
a. 𝐾𝐿(𝑄||𝑃) = ∑𝑥 𝑄(𝑥)𝑙𝑜𝑔
𝑃(𝑥)
𝑃(𝑥)
b. 𝐾𝐿(𝑄||𝑃) = ∑𝑥 𝑃(𝑥)𝑙𝑜𝑔
𝑄(𝑥)
𝑄(𝑥)
c. 𝐾𝐿(𝑄||𝑃) = − ∑𝑥 𝑄(𝑥)𝑙𝑜𝑔
𝑃(𝑥)
𝑃(𝑥)
d. 𝐾𝐿(𝑄||𝑃) = ∑𝑥 𝑄(𝑥)𝑙𝑜𝑔
𝑄(𝑥)

Correct Answer: a

Detailed Solution:

Follows directly from the formula of KL divergence.

______________________________________________________________________________
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

QUESTION 8:
Figure shows latent vector addition of two concepts of “man without a hat” and “hat”. What is
expected from the resultant vector?

a. Hat without man


b. Man with hat
c. Woman with hat
d.
Woman without hat
Correct Answer: b

Detailed Solution:

It is expected that VAE latent space follows semantic vector arithmetic. Thus the resultant
vector is a vector addition of the two semantic concepts which will result in the final vector
to represent a MAN WITH HAT.

______________________________________________________________________________

QUESTION 9:
Which one of the following statements is True in the original GAN training?

a. It is desired that the Discriminator loss monotonically goes down


b. It is desired that the Generator loss monotonically goes down
c. It is desired that the Discriminator loss monotonically goes down while the
Discriminator loss monotonically goes up
d. It is desired that neither of the losses of Discriminator or Generator
monotonically goes up or down monotonically

Correct Answer: d

Detailed Solution:

Since GAN game is played under a zero-sum non-cooperative game; if one of the wins, the
opponent losses and the Nash equilibrium (when the Discriminator fails to distinguish real
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur

samples from fake samples) is not reached. Thus, it is NOT desired that loss function of
any of the players decrease monotonically.

____________________________________________________________________________

QUESTION 10:
When the GAN game has converged to its Nash equilibrium (when the Discriminator randomly
makes an error in distinguishing fake samples from real samples), what is the probability (of
belongingness to real class) given by the Discriminator to a fake generated sample?

a. 1
b. 0.5
c. 0
d. 0.25

Correct Answer: b

Detailed Solution:

Nash equilibrium is reached when the generated distribution, pg(x) equals the original data
distribution, pdata(x), which leads to D(x) = 0.5 for all x.

______________________________________________________________________

______________________________________________________________________________

************END*******

You might also like