Professional Documents
Culture Documents
Chapter 5. Vector Calculus: January 3, 2021
Chapter 5. Vector Calculus: January 3, 2021
Vector Calculus
January 3, 2021
Definition
Let D be a domain of R, and let f : D → R be a function.
The Taylor polynomial of degree n of a function f at x0 ∈ D is
defined as
n
X f (k) (x0 )
Tn (x) := (x − x0 )k .
k!
k=0
Note.
The function
f (x) = cos(x) is
approximated by
Taylor polynomials
around x0 = 1.
Example
Consider the function f (x) = x 3 + x − 3 and find the Taylor polynomial T4
at x0 = 1.
Answer. We have
∞
x x2 x3 X xn
e =1+x + + + ··· =
2! 3! n!
k=0
∞
x3 x5 X (−1)k x 2k+1
sin(x) = x − + − ··· =
3! 5! (2k + 1)!
k=0
∞
x2 x4 X (−1)k x 2k
cos(x) = 1 − + − ··· =
2! 4! (2k)!
k=0
∞
1 X
= 1 + x + x2 + · · · = k
x .
1−x
k=0
Definition
A function f : Rn → R is a rule that assigns to each x ∈ Rn of n variables
x1 , . . . , xn to a real value f (x) = f (x1 , · · · , xn ).
Definition
For a function f : Rn → R, the partial derivatives of f are defined by
∂f f (x1 + h, x2 , . . . , xn ) − f (x1 , x2 , . . . , xn )
:= lim
∂x1 h→0 h
..
.
∂f f (x1 , x2 , . . . , xn + h) − f (x1 , x2 , . . . , xn )
:= lim
∂xn h→0 h
∂f
Note. When finding the partial derivative ∂xi , we consider only xi varies
and keep the others constant.
Chapter 5. Vector Calculus January 3, 2021 6 / 28
5.2 Partial Differentiation and Gradients
Example
1
Find the partial derivatives of function f (x, y ) = 1+x 2 +2y 4
.
Answer.
∂f (x, y ) 1 ∂ 2x
=− 2 4 2
(1 + x 2 + 3y 4 ) = −
∂x (1 + x + 3y ) ∂x (1 + x + 3y 4 )2
2
∂f (x, y ) 1 ∂ 2 4 12y 3
=− (1 + x + 3y ) = − .
∂y (1 + x 2 + 3y 4 )2 ∂y (1 + x 2 + 3y 4 )2
Definition
For a function f : Rn → R of n variables x1 , · · · , xn , the gradient of f is
defined as:
df h
∂f ∂f
i
1×n
∇x f = grad f = = ∂x · · · ∂xn ∈ R .
dx 1
Example
Find the gradient of function f (x1 , x2 , x3 ) = x12 x23 − 4x22 x3 .
Answer.
h i
∂f ∂f ∂f
∇x f = ∂x1 ∂x2 ∂x3
Sum rule: ∇x [f + g ] = ∇x f + ∇x g .
Product rule: ∇x [fg ] = g (x)∇x f + f (x)∇x g .
df ∂f dx1 ∂f dxn
Chain rule : f 0 (t) = = + ··· +
dt ∂x1 dt ∂xn dt
n
X ∂f dxi
= .
∂xi dt
i=1
Example
Consider f (x1 , x2 ) = x12 + x1 x2 , where x1 = sin t, x2 = cos t. Find the
derivative of f with respect to t.
Answer: We have
∂f ∂f
= 2x1 + x2 , = x1 .
∂x1 ∂x2
dx1 dx2
= cos t, = − sin t.
dt dt
Hence
df
= (2x1 + x2 ) cos t + x1 (− sin t)
dt
= (2 sin t + cos t) cos t − sin t sin t
= 2 sin t cos t + cos2 t − sin2 t
= sin(2t) + cos(2t).
Definition
A vector-valued function of n variable x = (x1 , . . . , xn ), f : Rn → Rm is
given as
f1 (x) f1 (x1 , . . . , xn )
f (x) = ... = ..
,
.
fm (x) fm (x1 , . . . , xn )
where fi : Rn → R is a function of x.
Definition
The partial derivative of a vector-valued function f : Rn → Rm with
respect to xi , i = 1, . . . , n, is given as the vector
∂f1
i ∂x
∂f
= ... ∈ Rm .
∂xi ∂f m
∂xi
Example
Consider a linear transformation (operator) f : Rn → Rm
Example
Consider a vector-valued function
T
f (x, y ) = xy 2 y 3 x 2 − y 2 .
Then
y2
∂f
= 0 ,
∂x
2x
2xy
∂f
= 3y 2 .
∂y
−2y
Definition
The collection of all first-order partial derivatives of a vector-valued
function f : Rn → Rm is called the Jacobian. The Jacobian J is an m × n
matrix, which we define and arrange as follows:
df h
∂f ∂f
i
J = ∇x f = = ∂x ··· ∂xn
dx 1
∂f1 ∂f1
∂x1 · · · ∂x n
= ... · · · · · · .
∂fm
∂x1 · · · ∂f
∂xn
m
Example
If f : Rn → Rm is a linear operator given by
f (x) = Ax,
∇x f = A.
Example
−x +x
e 1 2
Consider a vector-valued function f : R → R , f (x1 , x2 , x3 ) = x1 x22 .
3 2
sin(x1 )
The Jacobian of f is
−x +x
−e 1 2 e −x1 +x2
J = ∇x f = x22 2x1 x2 .
cos x1 0
Chain rule
Example
Consider a function f (x1 , x2 ) = x12 + 2x1 x2 and x1 (s, t) = s. cos t,
x2 = s. sin t. Find the partial derivative of f with respect to s and t.
Answer.
∂f ∂f ∂x1 ∂f ∂x2
= +
∂s ∂x1 ∂s ∂x2 ∂s
= (2x1 + 2x2 ) cos t + 2x1 sin t
= s. cos2 t + 2s. sin t. cos t = s. cos2 t + 2s. sin(2t)
∂f ∂f ∂x1 ∂f ∂x2
= +
∂t ∂x1 ∂t ∂x2 ∂t
= (2x1 + 2x2 )(−s. sin t) + 2x1 .(s. cos t)
= −2s 2 . sin2 t
dA
Let A ∈ R4×2 and let x ∈ R3 . How to define dx ?
We can also identity the space Rm×n of m × n matrices and the space
Rmn . Hence, we reshape A and B into vectors of lengths mn and pq,
respectively. The obtained Jacobian is in size mn × pq.
Example
Let f : Rm×n → Rn×n be given f (A) = AT A := K ∈ Rn×n . Find the
gradient dK /dA.
Answer. The gradient dK /dA ∈ R(n×n)×(m×n) is a tensor. Moreover,
dKpq
∈ R1×m×n .
dA
Denote by Ai the i th column of A and by Kpq by the (p, q)-entry of K ,
p, q = 1, . . . , n. Since
m
X
Kpq = AT
p Aq = Akm Amk ,
k=1
Aiq if j = p, p 6= q
m
∂Kpq X ∂ Aip if j = q = p 6= q
⇒ = (Akm Amk ) =
∂Aij ∂Aij 2Aiq if j = p = q
k=1
0 otherwise
Chapter 5. Vector Calculus January 3, 2021 20 / 28
5.6 Backpropagation and Automatic Differentiation 5.6.1 Gradients in a Deep Network
where x are the inputs (e.g., images), y are the observations (e.g., class
labels), and every function fi , i = 1, · · · , K , possesses its own parameters.
In the i th layer:
fi (xi−1 ) = σ(Ai−1 xi−1 + bi−1 )
where xi−1 is the output of the layer i − 1, σ is an activation function
(sigmoid or ReLU or tanh, ... functions).
Training these model requires us to compute the gradient of a loss
function L w.r.t all model parameters θj = {Aj , bj }, j = 0, . . . , K − 1.
f0 := x
fi := σi (Ai fi−1 + bi−1 ), i = 1, . . . , K .
Given a simple graph representing the data flow from inputs x to outputs
y:
x −→ a −→ b −→ y .
To compute the derivative dy /dx:
dy dy db da
dy dy db da =
= dx db da dx
dx db da dx
Reverse mode: gradients are prop-
Forward mode: the gradients flow
reverse mode agated backward
with forward mode the data from
through the graph, i.e., reverse to
left to right through the graph.
the data flow.
Definition
Reverse mode automatic differentiation is called backpropagation.
Example
Consider the function
q
f (x) = x 2 + exp (x 2 ) + cos(x 2 + exp (x 2 )).
We get
∂a ∂b
= 2x = exp(a)
∂x ∂a
∂c ∂c ∂d 1
= =1 = √
∂a ∂b ∂c 2 c
∂e ∂f ∂f
= − sin c = = 1.
∂c ∂d ∂e
∂f ∂f ∂d ∂f ∂e 1
= + = 1. √ + 1.(− sin c)
∂c ∂d ∂c ∂e ∂c 2 c
∂f ∂f ∂c ∂f
= =
∂b ∂c ∂b ∂c
∂f ∂f ∂b ∂f ∂c ∂f ∂f
= + = exp(a) +
∂a ∂b ∂a ∂c ∂a ∂b ∂c
∂f ∂f ∂a ∂f
= = .2x
∂x ∂a ∂x ∂a