Slides Active Flow Control Deep Reinforcement Learning

Deep Reinforcement Learning: a new tool to apprehend
complexity arising from Fluid Mechanics and

Active Flow Control?
J. Rabault1 ,
M. Kuchta1 , N. Cerardi1,2 , U. Reglade1,2 , A. Jensen1 , A. Kuhnle 3
1:
University of Oslo, Department of Mathematics
2 : Mines ParisTech, CEMEF
3 : University of Cambridge, Computer Science Laboratory
January 31st, 2019
J. Rabault et. al. MINDS seminar January 31st, 2019 1 / 46

whoami / affiliation
PostDoc, University of Oslo. Cooperation UNIS / Svalbard, Mines.

Main project: wave / ice interaction in the polar regions.
Mostly fieldwork and laboratory experiments.

whoami / Northwest Barents sea

whoami / ANNs
Interest in Machine Learning / ANNs for Fluid Mechanics.

Keep informed about developments in AI, apply it to my field.

Complexity: when our tools break
Modern science has much difficulty to handle combination of:

Non-linearity
High dimensionality
Evolution in time
Most analytical tools break; simulation can be expensive. ’Lucky’ in Fluid
Mechanics to have this problem:
Proof or dismissal that ”In three space dimensions and time, given an
initial velocity field, there exists a vector velocity and a scalar pressure
field, which are both smooth and globally defined, that solve the
Navier Stokes equations”, Clay Mathematical Institute / Millenium
problems.
”I think the next [21st] century will be the century of complexity”,
Stephen W. Hawking (2005).

Flow Control
’Delaying transition to turbulence by a passive mechanism’, Fransson et. al. (2006).
Flow control: a relevant theoretical and industrial problem.

Difficult: ’the Art of Flow Control’.
Time dependence, nonlinearity, high dimensionality, turbulence.
We propose Artificial Neural Networks (ANNs) and Deep Reinforcement
Learning (DRL) as a tool to perform active flow control:
”Artificial Neural Networks trained through Deep Reinforcement Learning
discover control strategies for active flow control”, Rabault, Kuchta,
Jensen, Reglade, Cerardi, ArXiv (2018), accepted JFM.
Can we use our actuators / design parameter space better?
Agenda
Introduction to Artificial Neural Networks and Supervised Learning.

Taking Machine Learning further: Deep Reinforcement Learning.
Application to a simple case of Active Flow Control.
Attacking challenging multimodal non linearity.
A few words of bonus.

Collaboration with MINDS.

A short introduction to Artificial Neural Networks and
Supervised Learning

Artificial Neural Networks: a bit of history
Idea of ANNs appeared with the first computers:
’The perceptron, a perceiving and recognizing automaton’, Rosenblatt,

Cornell Aeronautical Laboratory (1957).
It took some time before useful applications:

A few applications during the 70s.
Large scale application of CNNs in the 90s.
Recently ’AI spring’ since around 2012: Deep Learning.
Succession of ’AI springs’ and ’AI winters’, but now it is summer...
Deep ANNs: I
Artificial Neural Networks are simply a set of neurons.
Deep ANNs: ANNs with several hidden layers. Feed one layer in the next:
’Deep Learning’, LeCun et. al., Nature (2015).
Convolution layers are a special type of layer:

Deep ANNs: II
What activation function to use?
Linear activation function: only linear network...

Non linear functions: in practice (leaky) ReLU best.
Deep ANN + non linear activation function = universal approximator.
’Multilayer feedforward networks are universal approximators’, Hornik et.
al., Neural Networks (1989). Even Turing complete with memory.
Deep ANNs: III
Universal approximator - Easily scaled - Easily parallelizable
BUT... difficult to train.
Solution: gradient descent, back-propagation of errors: ’Applications

of advances in nonlinear sensitivity analysis’, Werbos, System
modeling and optimization (1982).
Feed data in, compute error.

Back propagate (Leibniz) the error gradient.
Update the coefficients; efficient: computational graph, compiled.
The Deep Learning revolution
Large amounts of data.

Powerful GPUs, fast batch training.
A number of technical tricks.
’Large-scale deep unsupervised learning using graphics processors’, Raina

et. al. (2009)
’Understanding the difficulty of training deep feedforward neural networks’,

X. Glorot et. al. (2010)
’Deep sparse rectifier neural networks’, Glorot et. al. (2011)
’Imagenet classification with deep convolutional neural networks’,

Krizhevsky et. al. (2012)

Machine Learning is taking over
Supervised learning, image analysis work...

Taking Machine Learning one step further: Deep
Reinforcement Learning

Deep Reinforcement Learning overview
Get computers to learn by trial and error.
ANNs can be used as a tool to build such systems.
Birth of Deep Reinforcement Learning: Deep Q Learning:
’Human-level control through deep reinforcement learning’, Mnih et. al.,
Nature (2015).
State-of-the-art, industry standard for continuous control: PPO:
’Proximal policy optimization algorithms’, Schulman et. al., ArXiv (2017).

The reinforcement learning framework
Interaction between Agent and Environment through 3 channels.
Guide through the reward function: no need to know the solution in

advance on a training set. Learn by experimenting.

Q-Learning (I)
’Human-level control through deep reinforcement learning’, Mnih et. al.,

Nature (2015).
Based on older work about Q-Learning:

’Q-learning’, Watkins et. al., Machine Learning (1992).
Discounted future reward: Rt = rt + γrt+1 + γ 2 rt+2 + ... = rt + γRt+1 .
Given a state s and a policy π define the value function:

 
X
V π (s) = E  γ t rt |s, π  ,
t≥0
and the optimal value function: V ∗ (s) = maxπ V π (s).

Q-Learning (II)
Introduce the action of the agent: expected return, starting in state s,

performing action a, continuing with policy π: Q π (s, a).
Then: Q ∗ = maxπ Q π (s, a), and V ∗ (s) = maxa Q ∗ (s, a). Finally:
π ∗ (s) = arg max Q ∗ (s, a).

a
Therefore, knowledge of Q∗
is knowledge of π ∗ . Q learning: iterative
approximation of Q ∗ using the Bellman equation:
Q ∗ (s, a) = r + γ max
0
Q ∗ (s 0 , a0 ).
a
Random action at probability , arg max action at probability 1 − , and:
Qn+1 (st , at ) = Qn (st , at ) + α(rt+1 + γ max Qn (st+1 , a) − Qn (st , at )).

a

The ’maze’ illustration
Good way of thinking about Deep Q Learning: investigate a maze
Matrix Q, one entry per location.
For each location, what is best possible output for each action.

Deep Q-Learning
World is non-linear, high dimensionality, complex...
... use a Deep Neural Network for Q. Therefore ’Deep Q-Learning’.
’Mastering the game of Go without human knowledge’, Silver et. al.,

Nature (2015).
Parametrize Q(s, a, Θi ) using a deep ANN (Θi network weights).

max instead argmax: give the Q-value for each a in different output.
h i
Loss function: Li (Θi ) = E (r + γ maxa0 Q(s 0 , a0 , Θi ) − Q(s, a, Θi ))2
Experience replay buffer inspired from biology.
Update target value at end of training for stability.
Computers that learn without us knowing a solution. Adapted to
non-linear, high dimensional, complex problems. Can make DANNs fight
each other (adversarial training). ANN just one solution (but works good).
Policy Gradient methods
Policy Gradient: directly learn the policy π.
Policy gradient vs Q-Learning:

Often π simpler than Q.
Q: need to be able to effectively solve arg max, tricky high dimension.
Policy gradient: more versatile (continuous learning).
Q: better exploration (can be taken care of).
Use a stochastic policy: π is a distribution for each state input:
Smooths out actions.
Train by slightly shifting distribution.
Characteristics parametrized by an ANN (weights Θi ).
hP i
H t
Objective: find Θ such that: maxΘ E t=0 R(st )|πθ , R(st ) = rt γ

Computing the policy gradient
PH
τ : s-a sequence: (s0 , a0 ), (s1 , a1 ), ..., (sH , aH ), R(τ ) = t=0 R(st , at )
hP i
H P
Value: V (Θ) = E t=0 R(st , at )|πθ = τ P(τ, Θ)R(τ )
X
∇Θ V (Θ) = ∇Θ P(τ, Θ)R(τ )
τ
X P(τ, Θ)
= ∇Θ P(τ, Θ)R(τ )
τ
P(τ, Θ)
X
= P(τ, Θ)∇Θ log (P(τ, Θ)) R(τ )
τ
Expected value, approximate with empirical sampling under policy πθ :
M
1 X
∇Θ V (Θ) ≈ ĝ = ∇Θ log P(τ (i) , Θ) R(τ (i) )
M
i=1
Understanding the policy gradient
M
1 X
∇Θ V (Θ) ≈ ĝ = ∇Θ log P(τ (i) , Θ) R(τ (i) )
m
i=1
Valid for discontinuous / unknown R.

Works with discrete or continuous state / action.
Works by improving the probability of paths with good R (and reducing
the probability of paths with bad R).

Computing the gradient of the log probability
Express the log prob as a dynamics model and a policy part:
"H #

(i) (i) (i) (i) (i)
Y
∇Θ log P(τ (i) , Θ) = ∇Θ log P(st+1 |st , at )πτ (at |st )
t=0
" H H
#
(i) (i) (i) (i) (i)
X X
= ∇Θ log P(st+1 |st , at ) + log πτ (at |st )
t=0 t=0
H
(i) (i)
X
= ∇Θ log πτ (at |st )
t=0
The log prob gradient depends only on the policy! No dynamics model.
This method is unbiased:
∇Θ V (Θ) = E [ĝ ] ,
PM
= m1 (i) (i)

so use ∇Θ V (Θ) ≈ ĝ i=1 ∇Θ log P(τ , Θ) R(τ ),
(i) (i)
with ∇Θ log P(τ (i) , Θ) = ∇Θ H
P
t=0 log πτ (at |st ).
Proximal Policy Optimization (PPO)
State of the art gradient based method:
’Proximal Policy Optimization Algorithms’, Schulman et. al., ArXiv

(2017).
Implement a few tricks:

Clip gradient: avoid too large policy updated.
Explore trust regions first.
Used in the following; Open Source implementation:
’Tensorforce: a TensorFlow library for applied reinforcement learning’,

Kuhnle et. al., Github (2017)
https://github.com/tensorforce/tensorforce
Built as a Tensorflow computational graph for efficiency.

Real world application
Games; Go; some recent industry deployments:
”Google just gave control over data center cooling to an AI”,

MIT Technology Review, Aug. 2018.
Energy saving in servers cooling, -40% electricity.

Application to a simple case of Active Flow Control

Karman vortex street and the Turek benchmark
Karman vortex street is emblematic of Fluid Mechanics.
Thereafter: 2D FEniCS at Re = ŪL/ν = 100: laminar unsteady, as in:

’Benchmark Computations of Laminar Flow Around a Cylinder’,
Schäfer and Turek, Flow Sims. with High-Perf. Computers (1996).
D
Add velocity probes (st ), normal jets (at ), reduce CD = 1/2ρU 2 L
(rt ).
Learning an active flow control strategy
Simple network: fully connected [512, 512].
Reward: rt = −hCD iT − 0.2|hCL iT |.
Learning for 1300 vortex sheddings.
ANN plays with the flow and observes the effect on the reward. Do more
of the successful actuation, less of the unsuccessful. Weights of the ANN
encode its ’understanding’ of the behavior of the system.
Need some tricks:

Frequency of control: learn necessity to set time-correlated signals.
Do not let the network cheat...
Results, methodology, implementation:
”Artificial Neural Networks trained through Deep Reinforcement Learning

discover control strategies for active flow control”,
Rabault et. al., ArXiv (2018), accepted JFM (2019).
Results (I)
Effective control of the flow configuration and wake (video:

https://www.youtube.com/watch?v=0QbKk5SzMws).
8% drag reduction, 130% increase wake area, altered shedding frequency.

Results (II)

Results (III)
Very small jets allow for ’boat tailing’.

Results (IV)
Recover 93 % of the drag reduction relative with the case without vortex
shedding; very similar recirculation area.
Many convenient points with Deep Reinforcement Learning:

Learn once, control as many times and for as long as wanted.
Cheap evaluation.
Trainable offline.
Scalable training.
Easy to use, black box ’solve it all’.
Learn the transfer functions of a real-world system (?).
Transfer Learning (?).
Taking it further: taming non linearity

Simple PDE models
”Machine Learning Control - Taming Nonlinear Dynamics and

Turbulence”, Duriez et. al., Springer (2016).
Dynamic system:
da1
dt= σa1 − a2
da2
dt= σa2 + a1
da3
dt= −0.1a3 − 10a4
da4
dt= −0.1a4 + 10a3 + b
σ = 0.1 − a12 − a22 − a32 − a42
Cost function:
J(a1 (t), a2 (t), b(t)) = a12 + a22 + γb 2
Weak (nonlinear) coupling between modes; disappear when linearized.

Pseudo harmonic, amplified oscillations.
Machine learning control
Adjoint method fails (trapped in local minima, weak coupling).
Genetic Programming works (with the right base functions).
Satisfactory Deep Reinforcement Learning.
a1 (t) and a2 (t)
a3 (t) and a4 (t)

Conclusion

Summary of the results
Artificial Neural Networks trained through Deep Reinforcement

Learning can be successful at performing active flow control.
Opens a new direction for active flow control.
More generally, a new tool to attack complexity?
ANNs/DRL is a promising non linear optimization tool for complex
systems:
Robust algorithms.
Easy to use: plug-and-play, out-of-the-box.
Leverage ANNs for optimization problems.
Unsupervised.
Field of (very) active research, many optimizations possible, constant
improvement of the algorithms, get rid of the need for a ’smart’ user...

Dramatic developments in the field
Figures from Google Brain blog.

Bonus material

ANNs, computing power, CPUs, GPUs, Moore’s law
Increasingly attractive to use ANNs: the Moore’s law is NOT dead yet.
But one thread, free performance updates are dead, and many algorithms
are difficult to parallelize.
Power limitations is the new wall computing power is hitting.

Dennard scaling, energy wall
Moore’s law holds, Dennard scaling is dead.
”Moore’s law gives us more transistors, Dennard scaling makes it useful”,
Bob Colwell, chief engineer P. II, P. III, P. 4, Intel;
Director of the Microsystems Technology Office, DARPA.

The way out: new architectures
Many problems: power dissipation wall, dark silicon, etc.
CPU: 1700 pJ/Op, GPU: 160 pJ/Op, ASIC: < 10 pJ/Op.
Do like the brain: low frequency, high latency, highly parallel, very large
computation power, very energy efficient.
Need adapted algos to take advantage of computing power increases!
Collaboration with MINDS

Collaboration with MINDS
Many thanks to Elie Hachem and Thierry Coupez!

Identify simple, proof-of-concept problems.
Refine to more realistic configurations.
Flow control: take it further (3D, higher Re, different shapes).

New problems: optimal cooling, optimal design.
Solver optimization / Physics informed networks.
Many ideas to try. Powerful nonlinear optimization tool.
High risk, high gain.

Slides Active Flow Control Deep Reinforcement Learning

Uploaded by

Copyright:

Available Formats

You might also like

Slides Active Flow Control Deep Reinforcement Learning

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Slides Active Flow Control Deep Reinforcement Learning

Uploaded by

Copyright:

Available Formats

Deep Reinforcement Learning: a new tool to apprehend

complexity arising from Fluid Mechanics and

January 31st, 2019

J. Rabault et. al. MINDS seminar January 31st, 2019 1 / 46

PostDoc, University of Oslo. Cooperation UNIS / Svalbard, Mines.

J. Rabault et. al. MINDS seminar January 31st, 2019 2 / 46

J. Rabault et. al. MINDS seminar January 31st, 2019 3 / 46

Interest in Machine Learning / ANNs for Fluid Mechanics.

J. Rabault et. al. MINDS seminar January 31st, 2019 4 / 46

Modern science has much difficulty to handle combination of:

J. Rabault et. al. MINDS seminar January 31st, 2019 5 / 46

’Delaying transition to turbulence by a passive mechanism’, Fransson et. al. (2006).

Flow control: a relevant theoretical and industrial problem.

Introduction to Artificial Neural Networks and Supervised Learning.

A few words of bonus.

J. Rabault et. al. MINDS seminar January 31st, 2019 7 / 46

J. Rabault et. al. MINDS seminar January 31st, 2019 8 / 46

Idea of ANNs appeared with the first computers:

’The perceptron, a perceiving and recognizing automaton’, Rosenblatt,

It took some time before useful applications:

Convolution layers are a special type of layer:

J. Rabault et. al. MINDS seminar January 31st, 2019 10 / 46

Linear activation function: only linear network...

Solution: gradient descent, back-propagation of errors: ’Applications

Feed data in, compute error.

Large amounts of data.

’Large-scale deep unsupervised learning using graphics processors’, Raina

’Understanding the difficulty of training deep feedforward neural networks’,

’Deep sparse rectifier neural networks’, Glorot et. al. (2011)

’Imagenet classification with deep convolutional neural networks’,

J. Rabault et. al. MINDS seminar January 31st, 2019 13 / 46

Supervised learning, image analysis work...

J. Rabault et. al. MINDS seminar January 31st, 2019 14 / 46

J. Rabault et. al. MINDS seminar January 31st, 2019 15 / 46

J. Rabault et. al. MINDS seminar January 31st, 2019 16 / 46

Guide through the reward function: no need to know the solution in

J. Rabault et. al. MINDS seminar January 31st, 2019 17 / 46

’Human-level control through deep reinforcement learning’, Mnih et. al.,

Based on older work about Q-Learning:

Discounted future reward: Rt = rt + γrt+1 + γ 2 rt+2 + ... = rt + γRt+1 .

Given a state s and a policy π define the value function:

and the optimal value function: V ∗ (s) = maxπ V π (s).

Introduce the action of the agent: expected return, starting in state s,

π ∗ (s) = arg max Q ∗ (s, a).

Random action at probability , arg max action at probability 1 − , and:

Qn+1 (st , at ) = Qn (st , at ) + α(rt+1 + γ max Qn (st+1 , a) − Qn (st , at )).

J. Rabault et. al. MINDS seminar January 31st, 2019 19 / 46

J. Rabault et. al. MINDS seminar January 31st, 2019 20 / 46

... use a Deep Neural Network for Q. Therefore ’Deep Q-Learning’.

’Mastering the game of Go without human knowledge’, Silver et. al.,

Parametrize Q(s, a, Θi ) using a deep ANN (Θi network weights).

Policy Gradient: directly learn the policy π.

Policy gradient vs Q-Learning:

J. Rabault et. al. MINDS seminar January 31st, 2019 22 / 46

Valid for discontinuous / unknown R.

J. Rabault et. al. MINDS seminar January 31st, 2019 24 / 46

State of the art gradient based method:

Random action at probability , arg max action at probability 1 − , and: