Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

Mathematics for Machine Learning

Multivariate Calculus
Formula sheet
Dr Samuel J. Cooper
Prof. David Dye
Dr A. Freddie Page

Definition of a derivative Derivatives of named functions


 
0 df (x) f (x + ∆x) − f (x)
f (x) = = lim ∂ 1 1
dx ∆x→0 ∆x =− 2 (1)
∂x x x

Time saving rules sin x = cos x (2)
∂x
- Sum Rule: ∂
cos x = − sin x (3)
d d d ∂x
(f (x) + g(x)) = (f (x)) + (g(x)) ∂
dx dx dx exp x = exp x (4)
∂x
- Power Rule:
f (x) = axb
Derivative structures
f 0 (x) = abx(b−1)
f = f (x, y, z)

- Product Rule:
A(x) = f (x)g(x) - Jacobian:
A0 (x) = f 0 (x)g(x) + f (x)g 0 (x) 
∂f ∂f ∂f

Jf = , ,
∂x ∂y ∂z

- Chain Rule:
If h = h(p) and p = p(m) - Hessian:
dh dh dp
then = ×
 
dm dp dm ∂2f ∂2f ∂2f
 ∂x2 ∂x∂y ∂x∂z 
 
 
Hf =  ∂2f ∂2f ∂2f 
- Total derivative:
 ∂y∂x ∂y 2 ∂y∂z 
 
For the function f (x, y, z, ...), where each variable
 
∂2f ∂2f ∂2f
is a function of parameter t, the total derivative is ∂z∂x ∂z∂y ∂z 2

df ∂f dx ∂f dy ∂f dz
= + + + ...
dt ∂x dt ∂y dt ∂z dt

1
Taylor Series - Directional Gradient:
- Univariate:
1 ∇f.r̂
f (x) = f (c) + f 0 (c)(x − c) + f 00 (c)(x − c)2 + ...
2
X∞ (n)
f (c) - Gradient Descent:
= (x − c)n
n=0
n!
sn+1 = sn − γ∇f
- Multivariate:
f (x) = f (c) + Jf (c)(x − c) + ... - Lagrange Multipliers λ:
1
(x − c)t Hf (c)(x − c) + ...
2 ∇f = λ∇g

- Least Squares - χ2 minimization:


Optimization and Vector Calculus
n
2
X (yi − y(xi ; ak ))2
- Newton-Raphson: χ =
σi
f (xi ) i
xi+1 = xi − 0
f (xi )
- Grad :
  criterion: ∇χ2 = 0
∂f
 ∂x 
 
 
∇f =  ∂f 
 ∂z  anext = acur − γ∇χ2
n
 
 
∂f
X (yi − y(xi ; ak )) ∂y
∂z
= acur + γ
i
σi ∂ak

You might also like