Professional Documents
Culture Documents
SIAMCSE
SIAMCSE
SIAMCSE
Weinan E
Peking University
March 1, 2023 1 / 47
Two major purposes of scientific research:
Understanding first principles
Solving practical problems
March 1, 2023 2 / 47
For practical purposes, the task of finding first principles is basically accomplished after
establishing quantum mechanics.
Paul M. Dirac (1929): “The underlying physical laws necessary for the mathematical theory
of a large part of physics and the whole of chemistry are thus completely known, and the
difficulty is only that the exact application of these laws leads to equations much too
complicated to be soluble.”
What remains to be done is to solve the mathematical problems that describe these first
principles.
March 1, 2023 3 / 47
First principles
They provide the first principles for essentially all of natural sciences and engineering.
March 1, 2023 4 / 47
They are in the form of differential equations, actually mostly PDEs.
Therefore solving PDEs has been a major theme in computational science and engineering.
March 1, 2023 5 / 47
Three different phases
March 1, 2023 6 / 47
This has had enormous success!
March 1, 2023 7 / 47
Many problems remain unsolved from first principles
March 1, 2023 8 / 47
A common source of difficulty: Too many degrees of freedom
−∆ψ + V ψ = Eψ
March 1, 2023 9 / 47
Curse of dimensionality (CoD)
Reason: Polynomials, piecewise polynomials, wavelets, et al. are no longer effective in high
dimension.
too many monomials in high dimension (more then 1 million degree 10 monomials when
d = 10).
the mesh always looks too coarse
March 1, 2023 10 / 47
This is exactly where deep learning can help!
March 1, 2023 11 / 47
Machine learning can do amazing things
March 1, 2023 12 / 47
Recognizing images better than average humans
Given a set of “labeled” images (“label” = the content of the image), find an algorithm
that can automatically tell us the content of similar images.
Figure: The Cifar-10 dataset: Each image is assigned a label from the 10 different categories
https://www.cs.toronto.edu/~kriz/cifar.html
March 1, 2023 13 / 47
AlphaGo: Playing Go game better than the best humans!
https://www.bbc.com/news/technology-35761246
March 1, 2023 14 / 47
Generating non-existing data: Pictures of FAKE human faces
https://arxiv.org/pdf/1710.10196v3.pdf
March 1, 2023 15 / 47
In essence, what’s done in all these examples is to solve some standard
mathematical problem.
March 1, 2023 16 / 47
Dimensionality of the CIFAR-10 problem
Input dimension:
d = 32 × 32 × 3 = 3072
March 1, 2023 17 / 47
Key observation:
This is far-reaching, since functions are the most basic objects in mathematics!
March 1, 2023 18 / 47
Attacking problems in high dimension
Multi-scale modeling
March 1, 2023 19 / 47
1. Mathematical theory
True for all classical algorithms, e.g. approximating functions using polynomials, or wavelets.
March 1, 2023 20 / 47
Approximation theory for neural networks
For random feature models, two-layer neural network model and residual neural network
model,
∗ kf ∗kB
kf − fm(·; θ)kL2(µ) . √ .
m
where the norm k · kB depends on the choice of the model.
“Monte Carlo”-like convergence rate
For multi-layer neural network models,
∗ kf ∗kB
kf − fm(·; θ)kL2(µ) . √ α.
m
where α depends on the number of layers but not on d.
March 1, 2023 21 / 47
The picture is far from being complete
March 1, 2023 22 / 47
2. Using deep learning to solve high dimensional problems in scientific computing
March 1, 2023 23 / 47
Stochastic control
Dynamic model:
zl+1 = zl + gl (zl , al ) + ξl ,
where zl = state, al = control, ξl = noise.
Objective function:
T −1
X
min E{ξl } cl (zl , al ) + cT (zT ) ,
−1
{al }Tl=0
l=0
The standard approach via solving the Bellman equation suffers from CoD!
March 1, 2023 24 / 47
Why choosing this as the first example?
There is a close analogy between stochastic control and ResNet-based deep learning.
Machine learning approximation:
al (z) = f (z, θl )
∗ 2
P
loss EkWLzL − f k E{ cl (zl , f (zl , θl )) + cT (zT )}
March 1, 2023 25 / 47
Work of Zhang, Long, Hu, E and Han (2022)
March 1, 2023 26 / 47
2. Nonlinear parabolic PDEs
∂u 1
+ ∆u + µ · ∇u + f ∇u = 0, u(T, x) = g(x)
∂t 2
Reformulate as a stochastic control problem using backward stochastic differential equations
(BSDE, Pardoux and Peng (1990))
E, Han and Jentzen (Comm Math Stats, 2017); Han, Jentzen and E (PNAS, 2018)
March 1, 2023 27 / 47
RT
LQG (linear quadratic Gaussian) for d = 100 with the cost J = E( 0 kmtk22 dt + g(XT ))
√ √
dXt = 2 λ mt dt + 2 dWt,
Hamilton-Jacobi-Bellman equation:
∂tu + ∆u − λk∇uk22 = 0, u(T, x) = g(x)
Using Hopf-Cole transform, one obtains the solution:
√
h
1 i
u(t, x) = − ln E exp − λg(x + 2WT −t) .
λ
4.7
Deep BSDE Solver
4.6 Monte Carlo
4.5
u(0,0,...,0)
4.4
4.3
4.2
4.1
4.0
0 10 20 30 40 50
lambda
Figure: Optimal cost u(t=0, x=(0, . . . , 0)) for different values of λ.
March 1, 2023 28 / 47
3. Deep learning-based algorithms for multi-scale modeling
March 1, 2023 29 / 47
DeePMD: Molecular dynamics with ab initio accuracy
New paradigm:
use QM to supply the data
use neural network model to find accurate approximation of V
Behler and Parrinello (2007), Jiequn Han et al (2017), Linfeng Zhang et al (2018).
March 1, 2023 30 / 47
Accuracy comparable to QM for a wide range of materials and
molecules
March 1, 2023 33 / 47
The hierarchy of physical models
March 1, 2023 34 / 47
AI for Science: Making first principles truly reliable and useful
March 1, 2023 35 / 47
What about solving low dimensional PDEs?
March 1, 2023 36 / 47
Traditional methods: Finite difference, finite elements, etc
March 1, 2023 37 / 47
Machine learning-based algorithms for PDEs
Weak form (test function): Weak Adversarial Network (WAN, Bao et al. (2020))
March 1, 2023 38 / 47
PINN and DGM
very general
uses least square formulation for both the PDE and the boundary condition,
m = n is no longer necessary
mesh-free, so easier for complex geometry
accuracy cannot be systematically improved, since the optimization problem is highly
non-convex
March 1, 2023 39 / 47
Bridging classical and ML-based algorithms: The random feature
method
March 1, 2023 40 / 47
Multi-scale random feature models
fine scale features captured using a partition of unity, with an independent RFM for
each component in the partition
macro-scale features captured using a global RFM
mp Jk
X X
um(x) = ug (x) + ψk (x) unj φkj (x)
k=1 j=1
March 1, 2023 41 / 47
Accuracy can be systematically improved
10 2
10 4
RFM: a
Errors
10 6
RFM: b
PINN
10 8
10 10
March 1, 2023 42 / 47
Easy to handle complex geometry: Stokes flow
(y(1 − y), 0) if x = 0
(u, v)|∂Ω = (y(1 − y), 0) if x = 1
(0, 0) otherwise
1.0 1.0
0.8
0.4
0.8 0.8
0.6
0.2
0.6 0.6
y
y
0.4
0.0
0.4 0.4
0.2 0.2
0.2 0.2
(a) u (b) v
March 1, 2023 43 / 47
Two-dimensional elasticity problem with a complex geometry
Figure: Complex domain with a cluster of holes that are nearly touching
March 1, 2023 44 / 47
8 8
0.04 8
400000
7 7 0.02 7
300000
6 0.02 6
6
0.01 200000
5 5 5
0.00 100000
4 4 0.00
y
y
0
3 3 3
0.02
0.01
2 100000
2 2
0 0 0 300000
0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8
x x x
y
0 2.5 0
3 3 50000
100000
2 2 2.0
100000
100000
1 1 200000
1.5
0 200000 0 150000 300000
0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8 5.0 5.5 6.0 6.5 7.0 7.5 8.0
x x x
Figure: Numerical solution by the random feature method for the two-dimensional elasticity problem over a
complex geometry
March 1, 2023 45 / 47
Promising but still a lot to be understood
Jingrun Chen et al., Bridging Traditional and Machine Learning-based Algorithms for Solving
PDEs: The Random Feature Method, Journal of Machine Learning, vol 1. No 3. pp.
268-298.
What is the best way to select the basis functions? (Random basis seems to perform
better, but why?)
PINN/DGM is fully adaptive
RFM uses preselected basis functions.
The optimal solution must be somewhere in-between.
Least squares formulation makes the iterative training process very slow.
What is the best (convenient, robust, accurate) way to describe 3-D geometry?
March 1, 2023 46 / 47
Summary
March 1, 2023 47 / 47