Professional Documents
Culture Documents
Automatic Differentiation Simons - Summer - School - Talk - August - 21st
Automatic Differentiation Simons - Summer - School - Talk - August - 21st
Nick McGreivy
PhD Candidate
Program in Plasma Physics
Princeton University
F −1
NX
x i (θ) = i
Xcm i
cos(mθ) + Xsm sin(mθ)
m=0
F −1
NX
y i (θ) = i
Ycm i
cos(mθ) + Ysm sin(mθ)
m=0
F −1
NX
i i i
z (θ) = Zcm cos(mθ) + Zsm sin(mθ)
m=0
i=NC ,m=NF −1
i i i i i i
p= Xcm , Xsm , Ycm , Ysm , Zcm , Zsm
i=1,m=0
Z 2
f (p) = B(p) · n̂ dA + λL(p)
S
0
B = Bcoils +
Bplasma
:
p 0 = p − η∇f
Primitive operations
The functions A, B, C , and D should be understood as “primitive operations”. These
could be simple primitives like sin , exp , div , etc, but they could also be
complicated primitives like fft , odeint , solve or even custom primitives like
nicksfunc .
Nick McGreivy (PPPL) Stellarator Design with AD Simons Summer School 11 / 21
Nick McGreivy (PPPL) Stellarator Design with AD Simons Summer School 12 / 21
Order matters!
If F : Rn → Rm takes time O(T ) to compute, then multiplying from right to left takes
time O(nT ) and multiplying from left to right takes time O(mT ).
1
h ∂y ∂y 0
i ∂y
F 0 (x)v = ... .. =
∂x1 ∂xn . ∂x1
0
∂y
h ih ∂y ∂x1
∂y
= ...
i
v T F 0 (x) = 1 ...
∂x1 ∂xn ∂y
∂xn
1. Simplicity
Finding and programming analytic derivatives is time-consuming and
error-prone.
1. Simplicity
Finding and programming analytic derivatives is time-consuming and
error-prone.
1. Simplicity
Finding and programming analytic derivatives is time-consuming and
error-prone.
3. Effortless gradients
Easy to rapidly prototype new ideas and objectives.
Thank you
df ∂f ∂f ∂u
= +
dp ∂p ∂u ∂p
df ∂f ∂f ∂u
= +
dp ∂p ∂u ∂p
dg ∂g ∂u ∂g
=0= +
dp ∂u ∂p ∂p
df ∂f ∂f ∂u
= +
dp ∂p ∂u ∂p
−1
dg ∂g ∂u ∂g ∂u ∂g ∂g
=0= + ⇒ =−
dp ∂u ∂p ∂p ∂p ∂u ∂p
df ∂f ∂f ∂u
= +
dp ∂p ∂u ∂p
−1
dg ∂g ∂u ∂g ∂u ∂g ∂g
=0= + ⇒ =−
dp ∂u ∂p ∂p ∂p ∂u ∂p
∂f ∂g −1 ∂g
df ∂f
= −
dp ∂p ∂u ∂u ∂p
df ∂f ∂f ∂u
= +
dp ∂p ∂u ∂p
−1
dg ∂g ∂u ∂g ∂u ∂g ∂g
=0= + ⇒ =−
dp ∂u ∂p ∂p ∂p ∂u ∂p
∂f ∂g −1 ∂g ∂f ∂g −1
df ∂f
= − ⇒ = −λT
dp ∂p ∂u ∂u ∂p ∂u ∂u
df ∂f ∂f ∂u
= +
dp ∂p ∂u ∂p
−1
dg ∂g ∂u ∂g ∂u ∂g ∂g
=0= + ⇒ =−
dp ∂u ∂p ∂p ∂p ∂u ∂p
∂f ∂g −1 ∂g ∂f ∂g −1
df ∂f
= − ⇒ = −λT
dp ∂p ∂u ∂u ∂p ∂u ∂u
T
df ∂f ∂g ∂g ∂f
= + λT where λ=−
dp ∂p ∂p ∂u ∂u
df ∂f ∂g
= + λT
dp ∂p ∂p
The dolfin-adjoint project automatically derives the discrete adjoint and tangent linear
models from a forward model written in the Python interface to FEniCS and Firedrake.
FEniCS enables users to quickly translate scientific models into efficient finite element
code. Firedrake is an automated system for the solution of partial differential equations
using the finite element method (FEM).
Let’s use checkpointing to reduce the memory from the part of the
computation in the purple boxes. We place checkpoints to the left of the
purple boxes and recompute the functions in the purple boxes on the
backwards pass.
We could compute the derivative dx/da by using automatic differentiation on the entire
computation. This might be very inefficient, both in terms of runtime and memory.
We could compute the derivative dx/da by using automatic differentiation on the entire
computation. This might be very inefficient, both in terms of runtime and memory.
A clever trick
We can use the mathematical structure of the iteration to more efficiently compute the
derivative. The key is that the derivative doesn’t depend on xinit . This means that when
computing the derivative, we can simply rerun the iteration with xinit = xn , and compute
the derivative of a single iteration of the algorithm. This is the trick for applying the
adjoint method to systems of non-linear equations.
∂f
Use GD: Ωn+1 = Ωn − η
∂Ω
s.t. A(Ω)Φ = b(Ω)
∂f
Use GD: Ωn+1 = Ωn − η
∂Ω
s.t. A(Ω)Φ = b(Ω)
white
Using the adjoint method, we have
df ∂f ∂b ∂A ∂f
= + λT − λT Φ where AT λ =
dΩ ∂Ω ∂Ω ∂Ω ∂Φ
Pros
Gradient-based optimization of
high-dimensional non-convex
objective functions has been
successful in many domains.
AD and the adjoint method work
together particularly well.
If N = 50, does a factor of 10-20
increase in computational speed
matter?
Exact derivative needed?
Much easier to rewrite an existing
code we understand than write a
new code.
Nick McGreivy (PPPL) Stellarator Design with AD Simons Summer School 35 / 21
Should AD be partially adopted by the stellarator
optimization community? It’s worth considering.
This is an enormously complex question. I’d love to discuss this in depth after the talk or
offline.
Pros Cons
Gradient-based optimization of Is there sufficient demand for
high-dimensional non-convex rewriting STELLOPT with an AD
objective functions has been tool? Does our team have the
successful in many domains. right expertise?
AD and the adjoint method work The stellarator community seems
together particularly well. to like FORTRAN. Bad for AD.
If N = 50, does a factor of 10-20 Does the right tool exist?
increase in computational speed Is gradient-free Bayesian
matter? optimization better? Bayesian
Exact derivative needed? optimization with gradients? Do
Much easier to rewrite an existing we have resources to try all the
code we understand than write a above?
new code. Memory manageable?
Nick McGreivy (PPPL) Stellarator Design with AD Simons Summer School 35 / 21
AD in machine learning (ML):
F −1
NX
x i (θ) = i
Xcm i
cos(mθ) + Xsm sin(mθ)
m=0
F −1
NX
y i (θ) = i
Ycm i
cos(mθ) + Ysm sin(mθ)
m=0
F −1
NX
i i i
z (θ) = Zcm cos(mθ) + Zsm sin(mθ)
m=0
The multi-filament winding pack surrounds the coil centroid. For the ith
coil, the axes of the winding pack v1i and v2i are rotated by an angle αi
relative to the normal N i and binormal B i vectors. The normal vector is
defined as the component perpendicular to the tangent of the vector from
the coil center-of-mass to the local point.
FR −1
NX
i NR θ
α (θ) = + Aicm cos (mθ) + Aism sin (mθ)
2
m=0
Once we have v1i and v2i , we can compute the position of the Nn × Nb
filaments for each coil. We do this using the following formula for the nth
and bth filaments, where n runs from 0 to Nn − 1 and b runs from 0 to
Nb − 1. Here, ln is the spacing between the filaments in the v1 direction,
and lb is the spacing between the filaments in the v2 direction.
i i
h Nn − 1 i i h Nb − 1 i i
rn,b (θ) = rcentral + n− ln v1 (θ) + b − lb v (θ)
2 2
So far we’ve only computed vacuum fields, using the Biot-Savart law.
N1 X
Nc X N2 i × (r − r i )
dln,b
I
n,b
X
i
B(r ) = µ0 In,b i |3
i=1 n=1 b=1
|r − rn,b
A simple engineering objective was chosen, namely the total length of the
coils.
N
X
fEng ≡ Li
i=1
f (x, y ) = sin(xy ) + x 2 /y
Step 1:
Step 2:
x = 2.0 Step 2, continued:
∂f
y = 3.0 ∂f = 1.0 ∂f ∂f ∂v1
= ∂v + ∂f ∂v4
∂f ∂y 1 ∂y ∂v4 ∂y
v1 = x ∗ y = 6.0 ∂v4 = 1.0 ∂f
∂f ∂f ∂v4 ∂y = 1.476
v2 = sin(v1 ) = −0.297 ∂v3 = ∂v4 ∂v3 = 0.333 ∂f ∂f ∂v3 ∂f ∂v1
∂f = ∂v +
v3 = x 2 = 4.0 ∂v2 = 1.0
∂x
∂f
3 ∂x ∂v1 ∂x
∂f X ∂f ∂vj
=
∂vi ∂vj ∂vi
j∈children
of i
Z δ/2
dBz µ0 Iy dx
Biot-Savart: =−
d`y 4π −δ/2 (L + x)2
0
δ/2
2x 3x 2 δ2
Z
µ0 Iy d`y
dBz ≈ − (1 − + 2 )dx ≈ dBfilament (1 + 2 )
4πL2 −δ/2 L L 4L
Nick McGreivy (PPPL) Stellarator Design with AD Simons Summer School 46 / 21