Professional Documents
Culture Documents
P. Balamurugan Deep Learning - Theory and Practice
P. Balamurugan Deep Learning - Theory and Practice
P. Balamurugan Deep Learning - Theory and Practice
IE 643
Lecture 10
1 Recap
Let B denote the p × n matrix connecting input layer and hidden layer.
Let A denote the n × p matrix connecting hidden layer and output layer.
xi yi
.708333 .708333
.583333 .583333
.166667 .166667
.458333 .458333
.875 .875
Observation: Though there are two valleys in the loss surface, they look
symmetric.
P. Balamurugan Deep Learning - Theory and Practice September 25, 2020. 14 / 47
Popular examples of MLP
PS i
Error: E (b, a) = i=1 (abx − y i )2 .
When we fix a or when we fix b, we observe that the graph does not
contain multiple valleys.
PS i
Error: E (b, a) = i=1 (abx − y i )2 .
When we fix a or when we fix b, we observe that the graph does not
contain multiple valleys.
Thus when we fix a or when we fix b, we observe that the graph looks as if
every local optimum is a global optimum.
PS i
Error: E (b, a) = i=1 (abx − y i )2 .
When we fix a or when we fix b, we observe that the graph does not
contain multiple valleys.
Thus when we fix a or when we fix b, we observe that the graph looks as if
every local optimum is a global optimum.
Question: Does this behavior extend to higher dimensions?
PS i
Error: E (b, a) = i=1 (abx − y i )2 .
When we fix a or when we fix b, we observe that the graph does not
contain multiple valleys.
Thus when we fix a or when we fix b, we observe that the graph looks as if
every local optimum is a global optimum.
Question: Does this behavior extend to higher dimensions?
We shall investigate this in higher dimensions !
Some notations:
↑ ↑ ... ↑ ↑ ↑ ... ↑
Let X = x 1 x 2 . . . x S , Y = y 1 y 2 . . . y S .
↓ ↓ ... ↓ ↓ ↓ ... ↓
Note: X and Y are n × S matrices.
Some notations:
h11 h12 . . . h1q
For a n × q matrix H = ... .. .
. . . . .. ,
Recall:
Input: Training Data D = {(x i , y i )}Si=1 , x i ∈ Rn , y i ∈ Rn ,
∀i ∈ {1, . . . , S}.
Error: E = Si=1 kABx i − y i k22 . (We have used E since the context is
P
clear!)
Recall:
Input: Training Data D = {(x i , y i )}Si=1 , x i ∈ Rn , y i ∈ Rn ,
∀i ∈ {1, . . . , S}.
Error: E = Si=1 kABx i − y i k22 . (We have used E since the context is
P
clear!)
hp1 . . . hpq
g11 H . . . g1n H
define: G ⊗ H = ... .. as Kronecker product of G and H.
... .
gm1 H . . . gmn H
Note: G ⊗ H is of size mp × nq.
Claim:
Proof idea:
we can write:
we can write:
we can write:
Consider the function F (z) = kMz − ck22 . We have the following result:
Convexity of function F
The function F (z) is convex.
Consider the function F (z) = kMz − ck22 . We have the following result:
Convexity of function F
The function F (z) is convex.
Consider the problem F (z) = kMz − ck22 . We have the following result:
Convexity of function F
The function F (z) is convex.
Consider the function F (z) = kMz − ck22 . We have the following result:
Convexity of function F
The function F (z) is convex.
Consider the function F (z) = kMz − ck22 . We have the following result:
Convexity of function F
The function F (z) is convex.
Consider the function F (z) = kMz − ck22 . We have the following result:
Convexity of function F
The function F (z) is convex.
argmin F (z) = argmin kMz − ck22 = argmin z > M > Mz − 2z > M > c + c > c
z z z
Consider the function F (z) = kMz − ck22 . We have the following result:
Convexity of function F
The function F (z) is convex.
argmin F (z) = min kMz − ck22 = argmin z > M > Mz − 2z > M > c + c > c
z z z
= argmin z M Mz − 2z > M > c
> >
z
Consider the function F (z) = kMz − ck22 . We have the following result:
Convexity of function F
The function F (z) is convex.
Consider the function F (z) = kMz − ck22 . We have the following result:
Convexity of function F
The function F (z) is convex.
Consider the function F (z) = kMz − ck22 . We have the following result:
Convexity of function F
The function F (z) is convex.
Consider the function F (z) = kMz − ck22 . We have the following result:
Convexity of function F
The function F (z) is convex.