DLAI4 Example Solutions

MATH4267 Solutions JL
1. (a) The conditional PDFs of (X1 , X2 |Y = 1) and (X1 , X2 |Y = 0) are:

1 1 2 2

fX1 ,X2 |Y =1 (x1 , x2 ) = exp − (x − 1) + y
2π 2

1 1 2 2

fX1 ,X2 |Y =0 (x1 , x2 ) = exp − (x + 1) + y
2π 2
and we have P (Y = 1) = α. So the unconditional PDF of (X1 , X2 ) is given by:
f(X1 ,X2 ) (x1 , x2 ) = P (Y = 1)f(X1 ,X2 |Y =1) + P (Y = 0)f(X1 ,X2 |Y =0)

1 1 2 2

= α exp − (x − 1) + y +
2π 2

1 1 2 2

(1 − α) exp − (x + 1) + y (1)
2π 2
[SEEN SIMILAR]
[3 marks]
(b) The generalisation error of d is:

E(X1 ,X2 ),Y Loss Y, Ŷ ) = E(X1 ,X2 ),Y (c (Y, d(X1 , X2 ))) (2)
NB: it is fine not to write this out as an integral, but an answer written as an
integral is also acceptable. [SEEN]
[3 marks]
1 of 7 2024
(c) The generalisation error for a zero-one loss is minimised for the function
(
1 if P (Y = 1|X1 , X2 = (x1 , x2 )) > P (Y = 0|X1 , X2 = (x1 , x2 ))
dmin (x1 , x2 ) =
0 otherwise
Since we have
fX1 ,X2 |Y =1 (x1 , x2 )P (Y = 1)
P (Y = 1|X1 , X2 = (x1 , x2 )) =
fX1 ,X2 (x1 , x2 )
αfX1 ,X2 |Y =1 (x1 , x2 )
=
fX1 ,X2 (x1 , x2 )
fX ,X |Y =0 (x1 , x2 )P (Y = 0)
P (Y = 0|X1 , X2 = (x1 , x2 )) = 1 2
fX1 ,X2 (x1 , x2 )
(1 − α)fX1 ,X2 |Y =0 (x1 , x2 )
=
fX1 ,X2 (x1 , x2 )
we have dmin (x1 , x2 ) in our case is:

(
1 if αfX1 ,X2 |Y =1 (x1 , x2 ) > (1 − α)fX1 ,X2 |Y =0 (x1 , x2 )
dmin (x1 , x2 ) = (3)
0 otherwise
[SEEN SIMILAR]
[4 marks]
NB: it is fine to use abbreviations of previously defined functions, rather than
writing them out in full each time.
2 of 7 2024
2. (a) Given a finite set S, a σ-algebra Σ on S is a collection of subsets of S (that is,

Σ ⊆ 2S ) such that:
• S ∈ Σ;
• Σ is closed under complementarity, so if A ∈ Σ then (S \ A) ∈ Σ;
• Σ is closed under union/intersection, so if A, B ∈ Σ then (A ∪ B) ∈ Σ (and
(A ∩ B) ∈ Σ).
NB: the concept of countable union or intersection is not relevant here, since S is
finite, but there is no penalty for mentioning it. Only one of union/intersection
closure need be mentioned, but there is no penalty for mentioning both. [SEEN]
[3 marks]
(b) Since the map from X (ℓ) to X (ℓ+1) is a deterministic function, we have S(X (ℓ) ) ⊇
S(X (ℓ+1) ), from which the statement follows. Inequality will hold in each case
if the function from X (ℓ) to X (ℓ+1) is one-to-one. (it is fine to say bijective,
invertible, or injective). [SEEN]
[3 marks]
(c) Suppose that the outcome of the original network is Y = (Y1 , Y2 , . . . Ym ) where
′
there are M neurons, and the output of the new network is Y ′ = (Y1′ , Y2′ , . . . Ym−1 ).
Then Y1′ = Y1 , Y2′ = Y2 , . . . , and we can define a deterministic function from Y
to Y ′ . Hence S(Y ′ ) ⊆ S(Y ). [SEEN SIMILAR]
[4 marks]
(d) Suppose A, B ∈ Σ and we have closure under union. Then by closure under
complement, (S \ A) ∈ Σ and (S \ B) ∈ Σ. By the assumed closure under union,
we have (S \ A) ∪ (S \ B) ∈ Σ. But the set T = (S \ A) ∪ (S \ B) will contain
everything in S not in both A and B, so (A ∩ B) = (S \ T ). Since T ∈ Σ, we
have S \ T ∈ Σ by closure under complement, so (A ∩ B) ∈ Σ.
Suppose A, B ∈ Σ and we have closure under intersection. Then by closure
under complement, (S \ A) ∈ Σ and (S \ B) ∈ Σ. By the assumed closure under
intersection, we have (S \ A) ∩ (S \ B) ∈ Σ. But the set T ′ = (S \ A) ∪ (S \ B)
will contain everything in S not in either A and B, so (A ∪ B) = (S \ T ′ ). Since
T ′ ∈ Σ, we have S \ T ′ ∈ Σ by closure under complement, so (A ∪ B) ∈ Σ.
NB: it is fine to show just one way and say ‘The other direction is similar’, as
long as there is acknowledgement of proof the other way too.
[UNSEEN]
[4 marks]
3 of 7 2024
3. (a) We have
H(X|Y ) − H(Y |X) = (H(X, Y ) − H(Y )) − (H(X, Y ) − H(X)) = H(X) − H(Y )

(4)
We can derive this (this is not necessary for a correct answer) as follows:
Z
fX,Y (x, y)
H(X|Y ) − H(Y |X) = − fX,Y (x, y) ln dxdy
fY (y)
Z
fX,Y (x, y)
+ fX,Y (x, y) ln dxdy
fX (x)
Z
fX,Y (x, y)fY (y)
= fX,Y (x, y) ln dxdy
fX,Y (x, y)fX (x)
Z
fY (y)
= fX,Y (x, y) ln dxdy
fX (x)
Z Z
= − fX,Y (x, y) ln (fX (x)) dxdy + fX,Y (x, y) ln (fY (y)) dxdy
Z Z
=− fX,Y (x, y)dy ln (fX (x)) dx
Z Z
+ fX,Y (x, y)dx ln (fY (y)) dy
Z Z
= − fX (x) ln (fX (x)) dx + fY (y) ln (fY (y)) dy
= H(X) − H(Y )
Since H(X|Y ) − H(Y |X) = H(X) − H(Y ), we have H(X|Y ) = H(Y |X) if and
only if H(X) = H(Y ). [SEEN]
[3 marks]
(b) The K-L divergence of random variables X and Y (where defined) is defined as
Z
fX (x)
DKL (fX , fY ) = fX (x) ln dx
fY (x)
where fX (x) and fY (x) are the PDFs of X and Y , and the integral is over the
domain of fX and fY .
To show that the K-L divergence is non-negative, we use Jensen’s inequality:
since ln(x) is concave, we have, for positive-valued functions p(x), q(x):
Z Z
p(x) ln(q(x))dx ≤ ln p(x)q(x)dx
4 of 7 2024
so
Z
fY (x)
DKL (fX ||fY ) = − fX (x) ln dx
fX (x)
Z Z
fY (x)
≥ ln fX (x) dx ln fY (x)dx
fX (x)
= ln(1) = 0
Alternatively, using the more specific inequality ln(x) ≤ x − 1, we have

Z
fY (x)
DKL (fX ||fY ) = − fX (x) ln dx
fX (x)
Z
fY (x)
≥ − fX (x) − 1 dx
fX (x)
Z Z
=− fY (x)dx + fX (x)dx
= −1 + 1 = 0
[SEEN]
[5 marks]
5 of 7 2024
(c) We have
I(X, Y ) = H(X) + H(Y ) − H(X, Y )

Z Z Z
= − fX (x) ln(fX (x))dx − fY (y) ln(fY (x))dy + fX,Y (x, y) ln (fX,Y ) dxdy
Z Z Z Z
=− fX,Y (x, y)dy ln(fX (x))dx − fX,Y (x, y)dx ln(fY (x))dy
Z
+ fX,Y (x, y) ln (fX,Y ) dxdy
Z Z
= − fX,Y (x, y) ln(fX (x))dx − fX,Y (x, y) ln(fY (x))dy
Z
+ fX,Y (x, y) ln (fX,Y ) dxdy
Z
fX,Y (x, y)
= fX,Y (x, y) ln dxdy (= DKL (fX,Y (x, y)||fX (x)fY (y)))
fX (x)fY (y)
 f (x,y) 
Z X,Y
fX,Y (x, y)  fY (y) 
= fY (y) ln dxdy
fY (y) fX (x)

fX|Y =y (x)
Z
= fY (y)fX|Y =y (x) ln dxdy
fX (x)
Z
= fY (y)DKL (fX|Y =y ||fX (x))dxdy

= Ey∼Y DKL (fX|Y =y ||fX (x))
(5)
as required. [UNSEEN]
[7 marks]
6 of 7 2024
4. (a) We have hi = ϕ(W hi−1 + U Xi + b). The value of W affects hi both directly
through this equation, and through hi−1 . Thus we have for i > 1:
dhi ∂hi ∂hi dhi−1
αi = = +
dW ∂W ∂hi−1 dW
= βi + γi αi−1
[SEEN SIMILAR]
[4 marks]
(b) If n = 2 then we have
2−1 2
!
X Y
α2 = β2 + γ2 α1 = β2 + γj βi
i=1 j=1+1
Suppose by induction that the formula in the question holds for n = 1, 2 . . . n−1.
Then we have
αn = βn + γn αn−1
n−2 n−1
! !
X Y
= βn + γn βn−1 + γj βi
i=1 j=i+1
n−2 n−1
!
X Y
= βn + γn βn−1 + γn γj βi
i=1 j=i+1
n−1 n
! n−2 n
!
X Y X Y
= βn + γj βi + γj βi
i=n−1 j=i+1 i=1 j=1+1
n−1 n
!
X Y
= βn + γj βi
i=1 j=1+1
which gives the requisite result by induction. [UNSEEN]

[7 marks]
(c) The exploding gradient problem can arise when activation functions have gra-
dients which exceed 1. An example could be the scaled sigmoid function:
ϕ(x) = (1 + exp(−5x))−1 (6)
[SEEN SIMILAR]
[4 marks]
7 of 7 2024

DLAI4 Example Solutions

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DLAI4 Example Solutions

Uploaded by

Copyright:

Available Formats

MATH4267 Solutions JL

1. (a) The conditional PDFs of (X1 , X2 |Y = 1) and (X1 , X2 |Y = 0) are:

and we have P (Y = 1) = α. So the unconditional PDF of (X1 , X2 ) is given by:

f(X1 ,X2 ) (x1 , x2 ) = P (Y = 1)f(X1 ,X2 |Y =1) + P (Y = 0)f(X1 ,X2 |Y =0)

we have dmin (x1 , x2 ) in our case is:

2. (a) Given a finite set S, a σ-algebra Σ on S is a collection of subsets of S (that is,

H(X|Y ) − H(Y |X) = (H(X, Y ) − H(Y )) − (H(X, Y ) − H(X)) = H(X) − H(Y )

Alternatively, using the more specific inequality ln(x) ≤ x − 1, we have

I(X, Y ) = H(X) + H(Y ) − H(X, Y )

which gives the requisite result by induction. [UNSEEN]

ϕ(x) = (1 + exp(−5x))−1 (6)

You might also like