Download as pdf or txt
Download as pdf or txt
You are on page 1of 41

Vector Calculus

Contents
• ex11-12, bt11-12
Differentiation of Univariate Functions
Partial Differentiation and Gradients
Gradients of Matrices
Backpropagation
Higher-Order Derivatives
Linearization and Multivariate Taylor Series

1/10/2022 Chapter 5 - Vector Calculus 2


The Chain Rule

x f f(x) g (gf)(x)

(gf)(x) = g(f(x))f(x) # gf means g after f


dg dg df
=
dx df dx

1/9/2022 Chapter 5 - Vector Calculus 3


Chain rule – Ex
• Use the chain rule to compute the derivative of
h(x) = (2x + 1)4

x 2() 2x () + 1 2x+ 1 ()4 (2x+1)4

• h can be expressed as h(x) = (gfu)(x)


u(x) = 2x, f(u) = u + 1, g(f) = f4
h(x) = (gfu)(x) = g(f)f(u)u(x) = 4f3(1)(2) = 8(2x + 1)3

1/9/2022 Chapter 5 - Vector Calculus 4


Partial Derivative
Definition (Partial Derivative). For a function of n variables
x1, . . . , xn f : n → ,
x  f(x)
we define the partial derivatives as
f f(x1,…, xk+ h, xk+1,..., xn) – f(x1,..., xk,…, xn)
= limh→0
xk h

1/9/2022 Chapter 5 - Vector Calculus 5


Gradients of f : n → 

• We collect all partial derivatives of f in the row vector to


form the gradient of f f f … f
x1 x2 xn
df
• Notation. xf gradf
dx
• Ex. For f : 2 → , f(x1, x2) = x13 – x1x2
f f
• Partial derivatives = 3x12 – x2, = x 1 3 – x1
x1 x2
• The Gradient of f
xf = [3x12x2 – x2 x13 – x1]  12 (1 row, 2 columns)

1/9/2022 Chapter 5 - Vector Calculus 6


Gradients of f : n →  x  3 f  1
df
gradf = = xf  1n
dx
Ex. For f(x, y) = (x3 + 2y)2, xf  13
we obtain the partial derivatives
f 
• = 2(x3 + 2y) (x3 + 2y) = 6x2(x3 + 2y)
x x

f 
• = 2(x3 + 2y) (x3 + 2y) = 4(x3 + 2y)
y y
 The gradient of f is [6x2(x3 + 2y) 4(x3 + 2y)]

1/9/2022 Chapter 5 - Vector Calculus 7


Gradients/Jacobian of Vector-Valued Functions f : n  m

• For a vector-valued function f: n  m,


f(x) = [f1(x) f2(x) … fm(x)]T m-row vector
where fi : n  
Gradient (or Jacobian) of f
df ∇xf 1
J = ∇xf = = … Dimension: mn
dx
∇xfm

1/9/2022 Chapter 5 - Vector Calculus 8


Jacobian of f: n  m – size
x3 f4 J  43
x1 x2 x3
f1
f2
f2 x3

f3

fm

1/9/2022 Chapter 5 - Vector Calculus 9


Jacobian of f: n  m – Ex

Ex. Find the Jacobian of f: 3  2


f1(x1, x2, x3) = 2x1 + x2x3, f2(x1, x2, x3) = x1x3 - x22

Jacobian of f: J 23

1/9/2022 Chapter 5 - Vector Calculus 10


Gradient of f: n  m – Ex

• We are given f(x) = Ax, f(x) ∈ m, A ∈ mn, x ∈ n. Compute


the gradient ∇xf
fi
• ∇xf = its size is mn
xj
mn

𝑛
fi(x) = 𝑗=1 aij xj
fi
 = aij  ∇xf = A
xj

1/9/2022 Chapter 5 - Vector Calculus 11


Gradient of f: n  m – Ex2

• Given h :  → , h(t) = (fx)(t)


where f : 2 → , f(x) = exp(x1 + x22),
x1(t) t
x :  →  , x(t) =
2 =
x2(t) sint
dh
Compute , the gradient of h with respect to t.
dt
• Use the chain rule (matrix version) for h = fx
dh d df dx
= (fx) =
dt dt dx dt

1/9/2022 Chapter 5 - Vector Calculus 12


Gradient of f: n  m – Ex2
x1
dh
= f f t = f x1 + f x2
dt x1 x2 x2 x1 t x2 t
t
= exp(x1 + x22) + 2x2exp(x1 + x22)cost,
where x1 = t, x2 = sint.

1/9/2022 Chapter 5 - Vector Calculus 13


Gradient of f: n  m – Exercise
• y  N, θ  D, Φ  ND
e: D  N, e(θ) = y − Φθ,
L: N  , L(e) = e2 = eTe
dL de dL
Find , , and
de dθ dθ

1/9/2022 Chapter 5 - Vector Calculus 14


Gradient of A : m  pq
Approach 1

4×2×3 tensor

1/9/2022 Chapter 5 - Vector Calculus 15


Gradient of A : m  pq
Approach 2: Re-shape matrices into vectors

4×2×3 tensor

1/9/2022 Chapter 5 - Vector Calculus 16


Gradients of A : m  pq – Ex
• Ex. Consider A: 3  32
𝑥1 − 𝑥2 𝑥1 + 𝑥3
• A(x1, x2, x3) = 𝑥1 2 + 𝑥3 2𝑥1
𝑥3 − 𝑥2 𝑥1 + 𝑥2 + 𝑥3
dA
• The dimension of : (32)3
dx
• Approach 1
1 1 −1 0 0 1
A A A
= 2𝑥1 2 , = 0 0, = 1 0
x 1 x2 x3 (32)3 tensor
0 1 −1 1 1 1

1/9/2022 Chapter 5 - Vector Calculus 17


Gradient of f : mn  p – Ex
fM AMN xN
… … … … x1 f: MN  M, f(A) = Ax
fj Aj1 … AjN … fi = Ai1x1 +… + Aikxk +…+ AiNxN
fi Ai1 … AiN xN
fi fi
… … … …  = xk, = 0 (j  i)
Aik Ajk
df
 M(MN)
dA … … …
… … …
0 … 0
x1 … xN
x1 … xN
0 … 0
… … …
… … …

1/9/2022 Chapter 5 - Vector Calculus 18


Gradient of Matrices with Respect to Matrices mn  pq

For R ∈ MN and f : MN → NN d𝑲𝑝𝑞


∈ 1×(𝑀×𝑁)
with f(R) = RTR = K ∈ NN d𝑹
Compute the gradient dK/dR.
The gradient has the dimensions
R
dK/dR ∈ (NN)MN K

Kpq

1/9/2022 Chapter 5 - Vector Calculus 19


Gradient of Matrices with Respect to Matrices mn  pq

dK/dR ∈ (NN)MN K= RTR


R = [r1 r2 … rN], ri is ith column of R
dK𝑝𝑞 1×(𝑀×𝑁)
∈
dR Kpq
Rij
dK𝑝𝑞
∈ 1
dR𝑖𝑗 𝑀
Kpq = rpTrq = 𝑚=1 𝑅𝑚𝑝𝑅𝑚𝑞
𝑅𝑖𝑞 𝑖𝑓𝑗 = 𝑝  𝑞
dK𝑝𝑞 𝑅𝑖𝑝 𝑖𝑓 𝑗 = 𝑞𝑝
= pqij =
* dR𝑖𝑗 2𝑅𝑖𝑞 𝑖𝑓 𝑗 = 𝑝 = 𝑞
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

1/9/2022 Chapter 5 - Vector Calculus 20


Backpropagation - Introduction

• Probably the single most important algorithm in all of Deep Learning


• In many machine learning applications, we find good model
parameters by performing gradient descent  compute the gradient
of a learning objective w.r.t. the parameters of the model. For
example, an ANN (single Hidden Layer with 150 nodes) for
128x128x3 color image needs at least 128x128x3x150 = 7,372,800
weights.
• The backpropagation algorithm is an efficient way to compute the
gradient of an error function with respect to the parameters of the
model.

1/9/2022 Chapter 5 - Vector Calculus 21


 Given training data ML Needs Gradients
{(x1, y1), (x2, y2), …, (xm, ym)}

 Choose decision and cost functions


𝒚𝑖 = 𝑓𝜃 (𝒙𝑖 )
C(𝒚𝑖, yi)

 Define the goal


1
Find * that minimizes iC(𝒚𝑖, yi)
𝑚

! The backpropagation  Train the model with (stochastic) gradient


descent to update ,
algorithm is an efficient 𝜕𝐶
way to compute the (t+1) = (t) -  (xi, yi)
gradient 𝜕(t)

1/9/2022 Chapter 5 - Vector Calculus 22


Epochs
• Backpropagation algorithm consists of many cycles, each cycle is
called an epoch with two processes:

forward phase
a(0)  z(1), a(1)  z(2), a(2)  …  C

𝜕𝐶 𝜕𝐶 𝜕𝐶
 …
𝜕 (1) 𝜕 (2) 𝜕 (𝑁)

backward phase

1/9/2022 Chapter 5 - Vector Calculus 23


Deep Network (ANN with hidden layers)

Activation equations (matrix version)


Layer (1) = hidden layer
z(1) = W(1)a(0) + b(1)
a(1) = 1(z(1))
Layer (2) = output layer
z(2) = W(2)a(1) + b(2)
a(2) = 2(z(2))

The cost for example number k


1 (2) 1
Ck = 𝑖(𝑎𝑖 −𝑦𝑖 )2 = a(2) – y2
2 2

1/9/2022 Chapter 5 - Vector Calculus 24


Forward phase
For L = 1..N, a(0) = x
z(L) = W(L)a(L-1) + b(L)
a(L) = L(z(L))
1
C: cost function (i.e., C = a(N) – y2)
2

1/9/2022 Chapter 5 - Vector Calculus 25


Backpropagation
Layer 1 Layer 2 … Layer K-1 Layer K … Layer N-1 Layer N

𝜕𝐶 𝜕𝐶 𝜕𝒂(𝑁) 𝜕𝒂(𝑁−1)
=
𝜕𝑾(𝑁−1) 𝜕𝒂(𝑁) 𝜕𝒂(𝑁−1) 𝜕𝑾(𝑁−1) 𝜕𝐶 𝜕𝐶 𝜕𝒂(𝑁)
=
𝜕𝑾(𝑁) 𝜕𝒂(𝑁) 𝜕𝑾(𝑁)
𝜕𝐶 𝜕𝐶 𝜕𝒂(𝑁) 𝜕𝒂(𝑁−1)
=
𝜕𝒃(𝑁−1) 𝜕𝒂(𝑁) 𝜕𝒂(𝑁−1) 𝜕𝒃(𝑁−1) 𝜕𝐶 𝜕𝐶 𝜕𝒂(𝑁)
=
𝜕𝒃(𝑁) 𝜕𝒂(𝑁) 𝜕𝒃(𝑁)
𝜕𝐶 𝜕𝐶 𝜕𝒂(𝑁) 𝜕𝒂(𝑁−1) 𝜕𝒂(𝑁−2)
= Benefit of backpropagation:
𝜕𝑾(𝑁−2) 𝜕𝒂(𝑁) 𝜕𝒂(𝑁−1) 𝜕𝒂(𝑁−2) 𝜕𝑾(𝑁−2)
Reused terms outside the box
𝜕𝐶 𝜕𝐶 𝜕𝒂(𝑁) 𝜕𝒂(𝑁−1) 𝜕𝒂(𝑁−2)
=
𝜕𝒃(𝑁−2) 𝜕𝒂(𝑁) 𝜕𝒂(𝑁−1) 𝜕𝒂(𝑁−2) 𝜕𝒃(𝑁−2)

1/9/2022 Chapter 5 - Vector Calculus 26


Backpropagation Activation equations
z(L) = W(L)a(L-1) + b(L)
Layer 1 Layer 2 … Layer K-1 Layer K … Layer N-1 Layer N a(L) = L(z(L))
C: cost function

𝜕𝐶 𝜕𝐶 𝜕𝒂(𝑁) 𝜕𝒂(𝑁−1) 𝜕𝒂(𝐿+3) 𝜕𝒂(𝐿+2) 𝜕𝒂(𝐿+1)


= … (𝐿+2) (𝐿+1) (𝐿+1)
𝜕𝑾(𝐿+1) 𝜕𝒂(𝑁) 𝜕𝒂(𝑁−1) 𝜕𝒂(𝑁−2) 𝜕𝒂 𝜕𝒂 𝜕𝑾

𝜕𝐶 𝜕𝐶 𝜕𝒂(𝑁) 𝜕𝒂(𝑁−1) 𝜕𝒂(𝐿+2) 𝜕𝒂(𝐿+1) 𝜕𝒂(𝐿)


= …
𝜕𝑾(𝐿) 𝜕𝒂(𝑁) 𝜕𝒂(𝑁−1) 𝜕𝒂(𝑁−2) 𝜕𝒂(𝐿+1) 𝜕𝒂(𝐿) 𝜕𝑾(𝐿)

𝜕𝐶 𝜕𝐶 𝜕𝒂(𝑁) 𝜕𝒂(𝑁−1) 𝜕𝒂(𝐿+2) 𝜕𝒂(𝐿+1) 𝜕𝒂(𝐿)


= … Backpropagation
𝜕𝒃(𝐿) 𝜕𝒂(𝑁) 𝜕𝒂(𝑁−1) 𝜕𝒂(𝑁−2) 𝜕𝒂(𝐿+1) 𝜕𝒂(𝐿) 𝜕𝒃(𝐿)

At each layer (L), need to compute Compute e(L+1) (at layer


𝜕𝐶 𝜕𝐶 𝜕𝒂(𝑁) 𝜕𝒂(𝐿+2) 𝜕𝒂(𝐿+1) 𝜕𝒂(𝐿+1) L+1) before computing
e(L) := = … = e(L+1)
e(L) (at layer L)
𝜕𝒂(𝐿) 𝜕𝒂(𝑁) 𝜕𝒂(𝑁−1) 𝜕𝒂(𝐿+1) 𝜕𝒂(𝐿) 𝜕𝒂(𝐿)
1/9/2022 Chapter 5 - Vector Calculus 27
Backpropagation algorithm
For each example in training examples
1. Feed forward
2. Backpropagation
At output layer (N), compute and store:
𝜕𝐶
e(N) = (𝑁)
𝜕𝒂
𝜕𝐶 𝜕𝒂(𝑁) 𝜕𝐶 (𝑁)
𝜕𝒂
= e (N) , = e(N)
𝜕𝑾(𝑁) 𝜕𝑾(𝑁) 𝜕𝒃(𝑁) 𝜕𝒃(𝑁)
For layer (L) from N-1 to 1:
𝜕𝒂 (𝐿+1)
• Compute e(L) using e(L) = e(L+1)
Activation equations 𝜕𝒂(𝐿)
z(L) = W(L)a(L-1) + b(L) 𝜕𝐶 𝜕𝒂(𝐿) , 𝜕𝐶
• Compute (𝐿) = e (L) = e(L)
a(L) = L(z(L)) (𝐿)
𝜕𝑾 𝜕𝑾 (𝐿) 𝜕𝒃(𝐿)
𝜕𝒂
C: cost function
𝜕𝒃(𝐿)

1/9/2022 Chapter 5 - Vector Calculus 28


Higher-order partial derivatives

Consider a function f : 2
 of two variables x, y.
Second order partial derivatives:
2 f 2 f Ex.
f: 2  , f(x, y) = x3y – 3xy2 + 5y,
x 2 xy 𝜕𝑓 𝜕𝑓
= 3x2y – 3y2, = x3 – 6xy +5
𝜕𝑥 𝜕𝑦
2 f 2 f 2
𝜕𝑓 2
𝜕𝑓
= 3𝑥2 − 6𝑦,
2 = 6𝑥𝑦,

yx y 2
𝜕𝑥 𝜕𝑥𝜕𝑦
2
𝜕𝑓 2 𝜕2𝑓
= 3𝑥 − 6𝑦, 2 = −6𝑥
𝜕𝑦𝜕𝑥 𝜕𝑦
n f
is the nth partial derivative of f with respect to x
x n

1/9/2022 Chapter 5 - Vector Calculus 29


The Hessian
• The Hessian is the collection of all second-order partial derivatives

Hessian matrix is symmetric for twice


continuously differentiable functions,
that is,
𝜕2𝑓 𝜕2𝑓
Hessian matrix =
𝜕𝑥𝜕𝑦 𝜕𝑦𝜕𝑥

1/9/2022 Chapter 5 - Vector Calculus 30


Gradient vs Hessian of f: n  
Consider a function f : n

Gradient Hessian
 2 f 2 f 2 f 
 f f f   ... 
f   ...   1x 2
x1x2 x1xn 
 x1 x2 xn   2 f 2 f 2 f 
 ... 
Dimension: 1  n  f   x2 x1
2
x2 2 x2 xn 
 ... ... ... ... 
 
 2 f 2 f  f 
2

 x x xn x2
...
xn 2 
 n 1
Dimension: n  n

1/9/2022 Chapter 5 - Vector Calculus 31


Gradient vs Hessian of f: n  m
Consider (vector-valued) function f : n
 m

x1 f1
Gradient x2 f2 Hessian
x3
m  n matrix m  (n  n) tensor

Dimension: 2  3

Dimension: 2  (3  3)

1/9/2022 Chapter 5 - Vector Calculus 32


Example
• Compute the Hessian of the function z = f(x, y) = x2 + 6xy – y3 and
evaluate at the point (x = 1, y = 2, z = 5).

1/9/2022 Chapter 5 - Vector Calculus 33


Taylor series for f:   
Taylor polynomials

Approximation problems

1/9/2022 Chapter 5 - Vector Calculus 34


Taylor series for f: D  
Consider a function f (smooth at x0)

multivariate Taylor series of f at x0 is defined as

where

1/9/2022 Chapter 5 - Vector Calculus 35


Example
Find the Taylor series for the function
f(x, y) = x2 + 2xy + y3 at x0 = 1, y0 = 2.

1/9/2022 Chapter 5 - Vector Calculus 36


Taylor series of f(x, y) = x2 + 2xy + y3

1/9/2022 Chapter 5 - Vector Calculus 37


Taylor series of f(x, y) = x2 + 2xy + y3

*
* *

[i].[j].[k]
3[i,j,k]

1/9/2022 Chapter 5 - Vector Calculus 38


Taylor series of f(x, y) = x2 + 2xy + y3

The Taylor series expansion of f at (x0, y0) = (1, 2) is

1/9/2022 Chapter 5 - Vector Calculus 39


Summary

Differentiation of Univariate Functions


Partial Differentiation and Gradients
Gradients of Matrices
Backpropagation
Higher-Order Derivatives
Linearization and Multivariate Taylor Series

1/11/2022 Chapter 5 - Vector Calculus 40


THANKS

1/9/2022 Chapter 5 - Vector Calculus 41

You might also like