Download as pdf or txt
Download as pdf or txt
You are on page 1of 28

Chapter 5.

Vector Calculus

January 3, 2021

1 5.1 Differentiation of Univariate Functions


2 5.2 Partial Differentiation and Gradients
3 5.3 Gradients of Vector-Valued Functions
4 5.4 Gradients of Matrices
5 5.6 Backpropagation and Automatic Differentiation
5.6.1 Gradients in a Deep Network
5.6.2 Automatic Differentiation
Chapter 5. Vector Calculus January 3, 2021 1 / 28
5.1 Differentiation of Univariate Functions

Definition
Let D be a domain of R, and let f : D → R be a function.
The Taylor polynomial of degree n of a function f at x0 ∈ D is
defined as
n
X f (k) (x0 )
Tn (x) := (x − x0 )k .
k!
k=0

For a smooth functions f ∈ C ∞ (D), the Taylor series of f at x0 is


defined as

X f (k) (x0 )
T∞ (x) := (x − x0 )k .
k!
k=0

For x0 = 0, we obtain the Maclaurin series.


f is called analytic at x0 if T∞ (x) = f (x).

Chapter 5. Vector Calculus January 3, 2021 2 / 28


5.1 Differentiation of Univariate Functions

Note.

1. Taylor polynomial of degree n is an approximation of a function. For


n = 1, we obtain the linear approximation.
2. The Taylor polynomial is similar to f in a neighborhood around x0 .
3. For k ≤ n, Taylor polynomial of degree n is an exact representation of
a polynomial f of degree k.

The function
f (x) = cos(x) is
approximated by
Taylor polynomials
around x0 = 1.

Chapter 5. Vector Calculus January 3, 2021 3 / 28


5.1 Differentiation of Univariate Functions

Example
Consider the function f (x) = x 3 + x − 3 and find the Taylor polynomial T4
at x0 = 1.
Answer. We have

f 0 (x) = 3x 2 + 1, f 00 (x) = 6x, f 000 (x) = 6, f (4) (x) = 0.

Evaluate them at x0 = 1, we get

f (1) = −1, f 0 (1) = 4, f 00 (1) = 6, f 000 (1) = 6, f (4) (1) = 0.

Thus, the Taylor polynomial T4 of f at x0 = 1 is


6 6
T4 (x) = −1 + 4(x − 1) + (x − 1)2 + (x − 1)3
2! 3!
= −1 + 4(x − 1) + 3(x − 1)2 + (x − 1)3
= f (x).

Chapter 5. Vector Calculus January 3, 2021 4 / 28


5.1 Differentiation of Univariate Functions

Maclaurin series of some basic functions


x x2 x3 X xn
e =1+x + + + ··· =
2! 3! n!
k=0

x3 x5 X (−1)k x 2k+1
sin(x) = x − + − ··· =
3! 5! (2k + 1)!
k=0

x2 x4 X (−1)k x 2k
cos(x) = 1 − + − ··· =
2! 4! (2k)!
k=0

1 X
= 1 + x + x2 + · · · = k
x .
1−x
k=0

Chapter 5. Vector Calculus January 3, 2021 5 / 28


5.2 Partial Differentiation and Gradients

Definition
A function f : Rn → R is a rule that assigns to each x ∈ Rn of n variables
x1 , . . . , xn to a real value f (x) = f (x1 , · · · , xn ).

Definition
For a function f : Rn → R, the partial derivatives of f are defined by

∂f f (x1 + h, x2 , . . . , xn ) − f (x1 , x2 , . . . , xn )
:= lim
∂x1 h→0 h
..
.
∂f f (x1 , x2 , . . . , xn + h) − f (x1 , x2 , . . . , xn )
:= lim
∂xn h→0 h

∂f
Note. When finding the partial derivative ∂xi , we consider only xi varies
and keep the others constant.
Chapter 5. Vector Calculus January 3, 2021 6 / 28
5.2 Partial Differentiation and Gradients

Example
1
Find the partial derivatives of function f (x, y ) = 1+x 2 +2y 4
.
Answer.
∂f (x, y ) 1 ∂ 2x
=− 2 4 2
(1 + x 2 + 3y 4 ) = −
∂x (1 + x + 3y ) ∂x (1 + x + 3y 4 )2
2

∂f (x, y ) 1 ∂ 2 4 12y 3
=− (1 + x + 3y ) = − .
∂y (1 + x 2 + 3y 4 )2 ∂y (1 + x 2 + 3y 4 )2

Chapter 5. Vector Calculus January 3, 2021 7 / 28


5.2 Partial Differentiation and Gradients

Definition
For a function f : Rn → R of n variables x1 , · · · , xn , the gradient of f is
defined as:
df h
∂f ∂f
i
1×n
∇x f = grad f = = ∂x · · · ∂xn ∈ R .
dx 1

Example
Find the gradient of function f (x1 , x2 , x3 ) = x12 x23 − 4x22 x3 .
Answer.
h i
∂f ∂f ∂f
∇x f = ∂x1 ∂x2 ∂x3

= 2x1 x23 3x12 x22 − 8x2 x3 −4x22 ∈ R1×3 .


 

Chapter 5. Vector Calculus January 3, 2021 8 / 28


5.2 Partial Differentiation and Gradients

Basic rules of Partial Differentiation

For f , g : Rn → R functions of variables x ∈ Rn .

Sum rule: ∇x [f + g ] = ∇x f + ∇x g .
Product rule: ∇x [fg ] = g (x)∇x f + f (x)∇x g .

Consider a function f : Rn → R of n variables x = (x1 , . . . , xn ). Moreover,


xi (t) are themselves functions of t. Then we have

df ∂f dx1 ∂f dxn
Chain rule : f 0 (t) = = + ··· +
dt ∂x1 dt ∂xn dt
n
X ∂f dxi
= .
∂xi dt
i=1

Chapter 5. Vector Calculus January 3, 2021 9 / 28


5.2 Partial Differentiation and Gradients

Example
Consider f (x1 , x2 ) = x12 + x1 x2 , where x1 = sin t, x2 = cos t. Find the
derivative of f with respect to t.
Answer: We have
∂f ∂f
= 2x1 + x2 , = x1 .
∂x1 ∂x2
dx1 dx2
= cos t, = − sin t.
dt dt
Hence
df
= (2x1 + x2 ) cos t + x1 (− sin t)
dt
= (2 sin t + cos t) cos t − sin t sin t
= 2 sin t cos t + cos2 t − sin2 t
= sin(2t) + cos(2t).

Chapter 5. Vector Calculus January 3, 2021 10 / 28


5.3 Gradients of Vector-Valued Functions

Definition
A vector-valued function of n variable x = (x1 , . . . , xn ), f : Rn → Rm is
given as    
f1 (x) f1 (x1 , . . . , xn )
f (x) =  ...  =  ..
,
   
.
fm (x) fm (x1 , . . . , xn )
where fi : Rn → R is a function of x.

Definition
The partial derivative of a vector-valued function f : Rn → Rm with
respect to xi , i = 1, . . . , n, is given as the vector
 ∂f1 
i ∂x
∂f
=  ...  ∈ Rm .
 
∂xi ∂f m
∂xi

Chapter 5. Vector Calculus January 3, 2021 11 / 28


5.3 Gradients of Vector-Valued Functions

Example
Consider a linear transformation (operator) f : Rn → Rm

f (x) = Ax, where A = aij ∈ Rm×n .


 

It is easy to see that the partial derivatives of f are


 
a1j
∂f  .. 
= Aj =  .  ∈ Rm , j = 1, . . . , n,
∂xj
amj

Chapter 5. Vector Calculus January 3, 2021 12 / 28


5.3 Gradients of Vector-Valued Functions

Example
Consider a vector-valued function
T
f (x, y ) = xy 2 y 3 x 2 − y 2 .


Then
y2

∂f
=  0 ,
∂x
2x
 
2xy
∂f
=  3y 2  .
∂y
−2y

Chapter 5. Vector Calculus January 3, 2021 13 / 28


5.3 Gradients of Vector-Valued Functions

Definition
The collection of all first-order partial derivatives of a vector-valued
function f : Rn → Rm is called the Jacobian. The Jacobian J is an m × n
matrix, which we define and arrange as follows:
df h
∂f ∂f
i
J = ∇x f = = ∂x ··· ∂xn
dx 1
 ∂f1 ∂f1 
∂x1 · · · ∂x n

=  ... · · · · · ·  .
 
∂fm
∂x1 · · · ∂f
∂xn
m

Chapter 5. Vector Calculus January 3, 2021 14 / 28


5.3 Gradients of Vector-Valued Functions

Example
If f : Rn → Rm is a linear operator given by

f (x) = Ax,

where A be a matrix in Rm×n . Then it is clear that the Jacobian of f :

∇x f = A.

Example
 −x +x 
e 1 2
Consider a vector-valued function f : R → R , f (x1 , x2 , x3 ) =  x1 x22  .
3 2

sin(x1 )
The Jacobian of f is
 −x +x
−e 1 2 e −x1 +x2

J = ∇x f =  x22 2x1 x2  .
cos x1 0

Chapter 5. Vector Calculus January 3, 2021 15 / 28


5.3 Gradients of Vector-Valued Functions

Chain rule

Consider a valued-vector function f : Rn → Rm given by


 T
f (x) = f (x1 , . . . , xn ) = f1 (x) · · · fm (x)

and xi = xi (t1 , . . . , tl ) are themselves function of l-variables


t = (t1 , . . . , tl ). It means x : Rl → Rm
 T
x(t) = x(t1 , . . . , tl ) = x1 (t) · · · xn (t) .

Then f ◦ x : Rl → Rm given by f (t) = f (x(t)) and


n
∂fj X ∂fj ∂xi
= , for all j = 1, . . . , m and k = 1, . . . , l
∂tk ∂xi ∂tk
i=1
∇t f = ∇x f ∇t x.

Chapter 5. Vector Calculus January 3, 2021 16 / 28


5.3 Gradients of Vector-Valued Functions

Example
Consider a function f (x1 , x2 ) = x12 + 2x1 x2 and x1 (s, t) = s. cos t,
x2 = s. sin t. Find the partial derivative of f with respect to s and t.
Answer.
∂f ∂f ∂x1 ∂f ∂x2
= +
∂s ∂x1 ∂s ∂x2 ∂s
= (2x1 + 2x2 ) cos t + 2x1 sin t
= s. cos2 t + 2s. sin t. cos t = s. cos2 t + 2s. sin(2t)
∂f ∂f ∂x1 ∂f ∂x2
= +
∂t ∂x1 ∂t ∂x2 ∂t
= (2x1 + 2x2 )(−s. sin t) + 2x1 .(s. cos t)
= −2s 2 . sin2 t

Chapter 5. Vector Calculus January 3, 2021 17 / 28


5.4 Gradients of Matrices

dA
Let A ∈ R4×2 and let x ∈ R3 . How to define dx ?

Compute the partial Re-shape A into a vector à ∈ R8 .


∂A ∂A ∂A
derivatives ∂x , , , Then, compute the gradient ddxà ∈
1 ∂x2 ∂x3
each is a 4 × 2 matrix, R8×3 . Re-shaping this gradient to
and collate them in a obtain the gradient tensor.
4 × 2 × 3 tensor.
Chapter 5. Vector Calculus January 3, 2021 18 / 28
5.4 Gradients of Matrices

The gradient of an m × n matrix A with respect to a p × q matrix B is a


four-dimensional tensor J, whose entries are given as
∂Aij
Jijkl = .
∂Bkl

We can also identity the space Rm×n of m × n matrices and the space
Rmn . Hence, we reshape A and B into vectors of lengths mn and pq,
respectively. The obtained Jacobian is in size mn × pq.

Chapter 5. Vector Calculus January 3, 2021 19 / 28


5.4 Gradients of Matrices

Example
Let f : Rm×n → Rn×n be given f (A) = AT A := K ∈ Rn×n . Find the
gradient dK /dA.
Answer. The gradient dK /dA ∈ R(n×n)×(m×n) is a tensor. Moreover,

dKpq
∈ R1×m×n .
dA
Denote by Ai the i th column of A and by Kpq by the (p, q)-entry of K ,
p, q = 1, . . . , n. Since
m
X
Kpq = AT
p Aq = Akm Amk ,
k=1



Aiq if j = p, p 6= q
m 
∂Kpq X ∂ Aip if j = q = p 6= q
⇒ = (Akm Amk ) =
∂Aij ∂Aij 2Aiq if j = p = q
k=1 


0 otherwise
Chapter 5. Vector Calculus January 3, 2021 20 / 28
5.6 Backpropagation and Automatic Differentiation 5.6.1 Gradients in a Deep Network

In deep learning, the function value y is computed as a many-level


function composition

y = (fK ◦ fK −1 ◦ · · · ◦ f1 )(x) = fK (fK −1 (· · · (f1 (x)) · · · ))

where x are the inputs (e.g., images), y are the observations (e.g., class
labels), and every function fi , i = 1, · · · , K , possesses its own parameters.

Chapter 5. Vector Calculus January 3, 2021 21 / 28


5.6 Backpropagation and Automatic Differentiation 5.6.1 Gradients in a Deep Network

Given a neural network with multiple layers:

In the i th layer:
fi (xi−1 ) = σ(Ai−1 xi−1 + bi−1 )
where xi−1 is the output of the layer i − 1, σ is an activation function
(sigmoid or ReLU or tanh, ... functions).
Training these model requires us to compute the gradient of a loss
function L w.r.t all model parameters θj = {Aj , bj }, j = 0, . . . , K − 1.

Chapter 5. Vector Calculus January 3, 2021 22 / 28


5.6 Backpropagation and Automatic Differentiation 5.6.1 Gradients in a Deep Network

Suppose we have inputs x and observations y and a network structure

f0 := x
fi := σi (Ai fi−1 + bi−1 ), i = 1, . . . , K .

We need find θ = {θj } = {Aj , bj }, j = 0, . . . , K − 1 which minimize the


loss function
L(θ) = ky − fK (θ, x)k2 .

Chapter 5. Vector Calculus January 3, 2021 23 / 28


5.6 Backpropagation and Automatic Differentiation 5.6.1 Gradients in a Deep Network

The gradients of L w.r.t θ:


∂L ∂L ∂fK
= ,
∂θK −1 ∂fK ∂θK −1
∂L ∂L ∂fK ∂fK −1
= ,···
∂θK −2 ∂fK ∂fK −1 ∂θK −2
∂L ∂L ∂fK ∂fi+2 ∂fi+1
= ···
∂θi ∂fK ∂fK −1 ∂fi+1 ∂θi
∂L ∂L
Most of the computation of ∂θi+1 can be reused to compute ∂θi .

Gradients are passed backward through the network.

Chapter 5. Vector Calculus January 3, 2021 24 / 28


5.6 Backpropagation and Automatic Differentiation 5.6.2 Automatic Differentiation

Backpropagation is a special case of a general technique in numerical


analysis called automatic differentiation.
Automatic differentiation refers to a set of techniques to numerically
evaluate the exact gradient of a function by working with
intermediate variables and applying the chain rule.

Chapter 5. Vector Calculus January 3, 2021 25 / 28


5.6 Backpropagation and Automatic Differentiation 5.6.2 Automatic Differentiation

Given a simple graph representing the data flow from inputs x to outputs
y:
x −→ a −→ b −→ y .
To compute the derivative dy /dx:
 
  dy dy db da
dy dy db da =
= dx db da dx
dx db da dx
Reverse mode: gradients are prop-
Forward mode: the gradients flow
reverse mode agated backward
with forward mode the data from
through the graph, i.e., reverse to
left to right through the graph.
the data flow.
Definition
Reverse mode automatic differentiation is called backpropagation.

Chapter 5. Vector Calculus January 3, 2021 26 / 28


5.6 Backpropagation and Automatic Differentiation 5.6.2 Automatic Differentiation

Example
Consider the function
q
f (x) = x 2 + exp (x 2 ) + cos(x 2 + exp (x 2 )).

Use intermediate variables:



a = x 2 , b = exp(a), c = a + b, d = c, e = cos c, f = d + e.

We get
∂a ∂b
= 2x = exp(a)
∂x ∂a
∂c ∂c ∂d 1
= =1 = √
∂a ∂b ∂c 2 c
∂e ∂f ∂f
= − sin c = = 1.
∂c ∂d ∂e

Chapter 5. Vector Calculus January 3, 2021 27 / 28


5.6 Backpropagation and Automatic Differentiation 5.6.2 Automatic Differentiation

We can compute ∂f /∂x using backpropagation method

∂f ∂f ∂d ∂f ∂e 1
= + = 1. √ + 1.(− sin c)
∂c ∂d ∂c ∂e ∂c 2 c
∂f ∂f ∂c ∂f
= =
∂b ∂c ∂b ∂c
∂f ∂f ∂b ∂f ∂c ∂f ∂f
= + = exp(a) +
∂a ∂b ∂a ∂c ∂a ∂b ∂c
∂f ∂f ∂a ∂f
= = .2x
∂x ∂a ∂x ∂a

Chapter 5. Vector Calculus January 3, 2021 28 / 28

You might also like