Scalable Implicit Algorithms For Sti Hyperbolic PDE Systems: L. Chacón

Scalable implicit algorithms for sti hyperbolic PDE systems
L. Chacn
Oak Ridge National Laboratory
J. N. Shadid, E. Cyr, P. T. Lin, R. Tuminaro, R. Pawlowski

Sandia National Laboratories
DOE Applied Mathematics Program Meeting October 17-19, 2011 Washington, DC
Luis Chacn, chaconl@ornl.gov
Outline
Motivation: the tyranny of scales Block-factorization preconditioning of hyperbolic PDEs Compressible resistive MHD Compressible extended MHD Incompressible Navier-Stokes and MHD (innite sound-speed limit)
The tyranny of scales (2006 NSF SBES report)
Figure 1: Time scales in fusion plasmas (FSP report)
Algorithmic challenges in temporal scale-bridging
PDE systems of interest typically have mixed character, with hyperbolic and parabolic components.
J J
Hyperbolic stiness (linear and dispersive waves): Parabolic stiness (diusion):
( J ) t f ast
( J)
t D x2
t tCFL
In some applications, fast hyperbolic modes carry a lot of energy (e.g., shocks, fast advection of solution structures), and the modeler must follow them. In others, however, fast time scales are parasitic , and carry very little energy.
J J J J
These are the ones that are usually targeted for scale-bridging.
Bridging the time-scale disparity requires a combination of approaches: Analytical elimination (e.g., reduced models). Well-posed numerical discretization (e.g., asymptotic preserving methods) Some level of implicitness in the temporal formulation (for stability; accuracy requires care).
Key algorithmic requirement: SCALABILITY
CPU O
N np
Algorithmic scalability vs. parallel scalability
"The tyranny of scales will not be simply defeated by building bigger and faster computers" (NSF SBES 2006 report, p. 30)
Optimal algorithm:
CPU N/n p
N 1+ np
1
CPU
; N=
0, algorithmic scalability 0, parallel scalability
Much emphasis has been placed on parallel scalability ( ). However, parallel (weak) scalability is limited by the lack of algorithmic scalability:
J N n p CPU n p
Explicit
requires
= = 0!
Implicit (multilevel)
Implicit (direct)
Implicit (Krylov iterative)
= 1/d
= 2 2/d
> 1 (varies)
0
How do multilevel (multigrid) methods work?
MG employs a divide-and-conquer approach to attack error components in the solution.
J J
Oscillatory components of the error are EASY to deal with (if a SMOOTHER exists) Smooth components are DIFFICULT.
Idea: coarsen grid to make "smooth" components appear oscillatory, and proceed recursively
SMOOTHER is make or break of MG! Smoothers are hard to nd for hyperbolic systems, but fairly easy for parabolic ones:
Can one make hyperbolic PDEs MG-friendly?
Implicit discretization of hyperbolic PDEs: a case study
1 t u = x v
1 t v = x u
is a measure of hyperbolic stiness. Discretize implicitly in time:
1 u n +1 = u n + x v n +1 ,
Very ill conditioned as
1 v n +1 = v n + x u n +1 .
0!
However, if one combines equations:
2 u n +1 = u n + x
x vn
Equation is now well-posed when
(i.e., it is asymptotic-preserving)!
J J
Limit system is elliptic/parabolic (MG-friendly!)
Temporally unresolved

hyperbolic time scales have been parabolized.
No further manipulation of PDE than implicit dierencing (no terms added to PDE)! This fact can be exploited to devise optimal solution algorithms (block factorization)!
Block-factorization of hyperbolic PDEs
u n +1 = u n +
x v n +1 , v n +1 = v n +
x u n +1
Coupling structure:
t x
I
u n +1 v
n +1
un vn
block can be formally inverted via block factorization:
D1
1
D2
I 0
UD2 1
D1
UD2 1 L
0 D2
1
I
D2 1 L
0 I
Only inverse of
D1 UD2 1 L
(Schur complement) is required!
D1
UD2 1 L 2
= I
2 x
Nonlinear hyperbolic PDEs: JFNK and block factorization preconditioning
Objective: solve nonlinear system
G ( x n +1 ) = 0
eciently (scalably).
Converge nonlinear couplings using Newton-Raphson method:
G xk = G ( xk ) x
k
Jacobian-free implementation:
G x
y = Jk y = lim
k
G ( xk + y) G ( xk )
Krylov method of choice: GMRES (nonsymmetric systems). Right preconditioning: solve equivalent Jacobian system for
y = Pk x:
Jk Pk 1 Pk x = Gk
y
Approximations in preconditioner do not aect accuracy of converged solution; only eciency!
Block-factorization+MG will be our preconditioning strategy.
Implicit
resistive
MHD solver
L. Chacon, Phys. Plasmas (2008)
Resistive MHD model equations
t B t (v) + t
+ + +
(v) = 0, E = 0, B2 v + I ( p + ) = 0, 2 ( 1) T v = 0,
vv B B
T +v t T
Plasma is assumed polytropic Resistive Ohm's law:
p n . E = v B +
Resistive MHD Jacobian block structure
The linearized resistive MHD model has the following couplings:
T B v
= = = =
L (, v) L T (T, v) L B ( B, v) Lv (v, B, , T )
Therefore, the Jacobian of the resistive MHD model has the following coupling structure:
J x =
D 0 0 Lv
0 DT 0 L Tv
0 0 DB L Bv
UvT T B UvB v Dv
Uv
Diagonal blocks contain advection-diusion contributions, and are easy to invert using MG techniques. O diagonal blocks
and
contain all hyperbolic couplings.
Block factorization of resistive MHD
We consider the block structure:
J x = M L U Dv y v
0 DT 0 0 0 DB
D ; y = T ; M = 0 0 B
is easy to invert (advection-diusion, not very sti, MG-friendly).
Schur complement analysis of 2x2 block
yields:
M L
U Dv
0 I
M 1 0
0
1 PSchur
I 0
M 1 U
I
LM1
PSchur = Dv LM1U
EXACT Jacobian inverse only requires
M 1
and
1 PSchur .
Physics-based preconditioner (PBP)
3-step EXACT inversion algorithm:
Predictor Velocity update Corrector

MG treatment of
: : :
y = M1 Gy
1 v = PSchur [ Gv Ly ],
PSchur = Dv LM1U
y = y M1Uv M 1 . v v A M1 t I
(cheap)
PSchur
is impractical due to
We consider here the small-flow limit:
We have extended the formulation to arbitrary-ows, expensive, but more robust ).
v vA
based on commutation ideas
(more
1 Elman, SISC 27, 1651 2 L. Chacn, J. Physics:
(2006)
Conf. Series,
125, 012041 (2008)

PBP: Small-ow limit
Small ow approximation:
M1 t I
in steps 2 & 3 of Schur algorithm:
y v y
where:
M1 Gy
PSI1 [ Gv Ly ] ; PSI = Dv tLU
y tUv
PSI = n I/t + (v0 W ( B0 , p0 ) = B0
I+I
v0 n
I) + t 2W ( B0 , p0 )
[I B0 ] j0
[I B0 ]
[I
p0 + p0
I]
Operator
W ( B0, p0 )
is ideal MHD energy operator, which has real eigenvalues!
PSI
is parabolic, and hence block diagonally dominant by construction!
We employ multigrid methods (MG) to approximately invert
PSI
and
M:
1 V(4,4) cycle
PBP: 2D
serial
performance (tearing mode)
Grid convergence study (t
N
32x32 64x64 128x128 256x256
GMRES/t 14 11.8 11.2 11.4
= 1.0 A ) CPUexp /CPU t/tCFL

2.43 5.8 13.3 28.5 159 322 667 1429
CPU O( N ) OPTIMAL SCALING!
t t
0.5 0.75 1.0 1.5
convergence study (128x128)
GMRES/t
CPUexp /CPU
8.0 10.0 12.7 14.6
t/tCFL
380 570 760 1140
8.0 9.5 11.2 14.6
CPU O(t0.6 ) FAVORABLE SCALING!
PBP: 3D
serial
performance (island coalescence)
10 time steps,
t = 0.1,
V(3,3) cycles, mg tol=1e-2
Grid
GMRES/t 5.5 7.9 7.0
CPU 81 1176 11135
163 323 643
PBP: 3D
parallel
performance (island coalescence) points per processor, Cray XT4)
(Weak scaling,
163
t = 0.1
tCFL
Key to parallel performance:
J J J
Matrix-light multigrid, where only diagonals are stored; residuals are calculated matrix-free. Operator coarsening via rediscretization: avoids forming/communicating a matrix.
Current limitations: we do not feature a coarse-solve beyond the processor skeleton grid. This eventually degrades algorithmic scalability (only shows at > 1000-processor level).
Implicit
extended
MHD solver
Extended (two-uid, Hall) MHD model equations

t B t (v) + t
+ + + +
(v) = 0, E = 0, B2 + I ( p + ) = 0, 2 ( 1) Te v = ( 1)
Q q , (1 + )
vv B B
Te + v t Te
di j Ti j ; = = i + e ; e = e ve ; ve = v di ; v = v 1+ Te
E = v B + j + di ( j B pe e ) Ohm s Law : E = v B + j + d [ v + v v + 1 ( p + i t i electron EOM
i )] ion EOM
Note that EOMi - EOMe = EOM. Admits an energy principle.
This model supports fast dispersive waves
k2 .
Extended MHD Jacobian block structure: electron EOM (standard choice)
E = v B + j +
di (j B
pe
e )
Linearized induction equation
B = B
E =
has the following couplings:
L B ( B, v, , T )
Jacobian coupling structure:
L TB J x = L B Lv D 0 DT L TB L Tv 0 UBT DB L Bv Uv
UvT T B UvB v D
v
We have added o-diagonal couplings to block Stiest block is
M.
DB
breaks approximations in block-factorization approach. UNSUITABLE!
Extended MHD Jacobian block structure: ion EOM
E v B + j + di [ t v + v
1 v + ( pi +
i [v])]
Hall coupling is mainly via
t v.

Jacobian coupling structure becomes:
J x
D 0 0 Lv
0 DT 0 L Tv
0 0 DB L Bv
Uv UvT
R H UvB + UvB
Dv
T B v
We can therefore reuse ALL resistive MHD PC framework!
Extended MHD preconditioner
Use same block factorization approach. block contains ion time scales only
M1 t I
is a
very
good approximation
Additional block
H UvB :
PSI v = n v/t + (v0
v + v
v0 +
) + t 2W ( B0 , p0 )v
W ( B0 , p0 ) = B0
[I B0
di I ] j0 t
[I B0 ]
[I
p0 + p0
I]
Additional term brings in dispersive waves
k2 !
We can show analytically that additional term (yellow) is amenable to simple damped Jacobi smoothing!
We can use classical MG!
On the issue of dissipation in extended MHD
Dispersive waves
k2
require higher order dissipation.
Resistivity is unable to provide a dissipation scale. Dissipation scale dened by electron viscosity,
e : v) e
4
e e
Viscosity coecient can be determined to provide adequate dissipation of dispersive waves
v A di k2 : e k4
e > C
di v A k ,max k3 max
In the preconditioner, we deal with them coupled within MG.
by considering 2 second-order systems, and solving
Extended MHD performance results (2D tearing mode)
di = 0.05, e = 2.5 106

100 time steps, GMRES/t 22.3 15.4 10.6 13.1
t = 1.0,
1 V(4,4) MG cycle
Grid 32x32 64x64 128x128 256x256
CPUexp /CPU
0.74 10.9 214 3097
t/texp
135 1582 23809 370370
t/tCFL
110 384 1436 5660
2D nonlinear verication: GEM challenge Ion Hall vs. electron Hall
Magnetic energy 160 150 140 130 120 110 100 90 Ion Hall e Hall 9 8 7 6 5 4 3 2 1 0 5 10 15 20 25 30 35 40 45 0
Kinetic energy Ion Hall e Hall 120 115 110 105 100 95 90 85 80 75 70 0 5 10 15 20 25 30 35 40 45 65
Thermal energy Ion Hall e Hall
0 5 10 15 20 25 30 35 40 45
Incompressible Navier-Stokes solver

Cyr, Shadid, Tuminaro, JCP 2011 Elman, Howle, Shadid, Shuttleworth, Tuminaro, JCP 2008
Block preconditioning: CFD example

Consider discretized Navier-Stokes equations
Fully Coupled Jacobian
Block Factorization
Preconditioner
Coupling in Schur-complement
Required operators: Multigrid PCD, LSC, SIMPLEC
Properties of block factorization

1. Important coupling in Schur-complement 2. Better targets for AMG leveraging scalability
Properties of approximate Schur-complement

1. Nearly replicates physical coupling 2. Invertible operators good for AMG
Brief Overview of Block Preconditioning Methods for Navier-Stokes: (A Taxonomy based on Approximate Block Factorizations, JCP 2008)
Discrete N-S Exact LDU Factorization Approx. LDU
Precond. Type
References
Chorin(1967);Temam (1969); Perot (1993): Quateroni et. al. (2000) as solvers Patankar et. al. (1980) as solvers; Pernice and Tocci (2001) as smoothers/MG Kay, Loghin, Wathan, Silvester, Elman (1999 2006); Elman, Howle, S., Shuttleworth, Tuminaro (2003,2008)
Pres. Proj;
1st Term Neumann Series
SIMPLEC Pressure Convection / Diffusion
Now use AMG type methods on sub-problems. Momentum transient convection-diffusion:
Pressure Poisson type:
CFD Weak Scaling: Steady Backward Facing Step
Fully coupled Algebraic AggC: Aggressive Coarsening Multigrid DD: Additive Schwarz Domain Decomposition
Block Preconditioners PCD & LSC: Commuting Schur complement SIMPLEC: Physics-based Schur complement
Take home: Block preconditioners competitive with fully coupled multigrid for CFD
* Paper accepted: E. C. Cyr, J. N. Shadid, R. S. Tuminaro, Stabilization and Scalable Block Preconditioning for the Navier-Stokes Equations, Accepted by J. Comp. Phys., 2011.
Weak Scaling of NK Solver with Fullycoupled AMG and Approx. Block Factorization Preconditioners
Transient Kelvin-Helmholtz instability (Re = 5 x 103 shear layer, constant CFL = 2.5)
Quad-core Nehalems with Infini-band SNL Red Sky
Incompressible MHD solver
Incompressible MHD: 2D Vector Potential

Formulation
Magnetohydrodynamics (MHD) equations couple fluid flow to Maxwells equations

u +u t u
2
u+
p+
1 1 BB+ B 2I 0 20 Az + u Az t 0 A, A = (0, 0, Az )
2
=f
u=0
0 Az = Ez
where B =
Discretized using a stabilized finite element formulation
Block LU Factorization
F B Y where S = C BF 1 B T P = D Y F 1 (I + B T S 1 BF 1 )Z
C is zero for mixed interpolation FE and staggered FV methods, nonzero for stabilized FE Indefinite system hard to solve with incomplete factorizations without pivoting Block factorization of 3x3 system leads to nested Schur complements Use an operator splitting approximation to factor
B C 0
I Z 0 = BF 1 D Y F 1
I Y F 1 B T S 1
F I
B S
Z BF 1 Z P
Reduces to 2 2x2 systems for Navier-Stokes and magnetics-velocity blocks; C need not be non-zero or invertible (C-1 doesn t need to exist!)
Transient Hydro-Magnetic Kelvin-Helmholtz Problem

(Re = 700, S = 700)
Block Preconditioners Split: New Operator split preconditioner SIMPLEC: Extreme diagonal approximations
Take home: AggC and Split preconditioner scale algorithmically 1. SIMPLE preconditioner performance suffers with increased CFL 2. Run times are for unoptimized code 3. AggC not applicable to mixed discretizations, block factorization is
Driven Magnetic Reconnection: Magnetic Island Coalescence Half domain symmetry on [0,1]x[-1,1] with S = 10e+4
Block Preconditioners Split: New Operator split preconditioner SIMPLEC: Extreme diagonal approximations
Take home: Split preconditioner scales algorithmically
Initial Weak Scaling Performance of AMG V-cycle on Leadership Class Machines Cray XE6 and BG/P Weak Scaling (Transport-reaction: Drift-diffusion simulations)
Sub-domain smoothers: Impact of data locality of smoother? BG/P: ILU(2); overlap = 1
BG/P: ILU(0); overlap = 0 [Better scaling and faster time to solution than ILU(2),ov=1]
> 2200x
Cray XE6: ILU(2); overlap = 1 Steady-state drift-diffusion BJT TFQMR time per iteration Cray XE6 2.4GHz 8-core Magny-Cours
(Paul Lin)
Summary and Conclusions
Sti hyperbolic PDEs describe many applications of interest to DOE. In applications where fast time scales are parasitic, an implicit treatment is possible to bridge time-scale disparity. A fully implicit solution may only realize its eciency potential if a suitable scalable algorithmic route is available. Here, we have identied sti-wave block-preconditioning (aka physics-based preconditioning) in the context of JFNK methods as a suitable algorithmic pathway.
An important property is that it renders the numerical system suitable for multilevel preconditioning.
We have demonstrated the eectiveness of the approach in incompressible Navier-Stokes, incompressible MHD, and compressible resistive MHD and extended MHD.
In all these applications, the approach is robust and scalable, both algorithmically and in parallel.

Scalable Implicit Algorithms For Sti Hyperbolic PDE Systems: L. Chacón

Uploaded by

Copyright:

Available Formats

You might also like

Scalable Implicit Algorithms For Sti Hyperbolic PDE Systems: L. Chacón

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Scalable Implicit Algorithms For Sti Hyperbolic PDE Systems: L. Chacón

Uploaded by

Copyright:

Available Formats

Scalable implicit algorithms for sti hyperbolic PDE systems

J. N. Shadid, E. Cyr, P. T. Lin, R. Tuminaro, R. Pawlowski

DOE Applied Mathematics Program Meeting October 17-19, 2011 Washington, DC

Luis Chacn, chaconl@ornl.gov

Luis Chacn, chaconl@ornl.gov

The tyranny of scales (2006 NSF SBES report)

Figure 1: Time scales in fusion plasmas (FSP report)

Luis Chacn, chaconl@ornl.gov

Algorithmic challenges in temporal scale-bridging

Hyperbolic stiness (linear and dispersive waves): Parabolic stiness (diusion):

Key algorithmic requirement: SCALABILITY

Luis Chacn, chaconl@ornl.gov

Algorithmic scalability vs. parallel scalability

0, algorithmic scalability 0, parallel scalability

Implicit (Krylov iterative)

How do multilevel (multigrid) methods work?

MG employs a divide-and-conquer approach to attack error components in the solution.

Can one make hyperbolic PDEs MG-friendly?

Luis Chacn, chaconl@ornl.gov

Implicit discretization of hyperbolic PDEs: a case study

is a measure of hyperbolic stiness. Discretize implicitly in time:

However, if one combines equations:

Equation is now well-posed when

Limit system is elliptic/parabolic (MG-friendly!)

hyperbolic time scales have been parabolized.

Luis Chacn, chaconl@ornl.gov

Block-factorization of hyperbolic PDEs

block can be formally inverted via block factorization:

(Schur complement) is required!

Luis Chacn, chaconl@ornl.gov

Nonlinear hyperbolic PDEs: JFNK and block factorization preconditioning

Objective: solve nonlinear system

Converge nonlinear couplings using Newton-Raphson method:

Approximations in preconditioner do not aect accuracy of converged solution; only eciency!

Block-factorization+MG will be our preconditioning strategy.

Luis Chacn, chaconl@ornl.gov

L. Chacon, Phys. Plasmas (2008)

Luis Chacn, chaconl@ornl.gov

Resistive MHD model equations

Plasma is assumed polytropic Resistive Ohm's law:

Luis Chacn, chaconl@ornl.gov

Resistive MHD Jacobian block structure

The linearized resistive MHD model has the following couplings:

contain all hyperbolic couplings.

Luis Chacn, chaconl@ornl.gov

Block factorization of resistive MHD

We consider the block structure:

is easy to invert (advection-diusion, not very sti, MG-friendly).

Schur complement analysis of 2x2 block

Luis Chacn, chaconl@ornl.gov

Physics-based preconditioner (PBP)

3-step EXACT inversion algorithm:

Predictor Velocity update Corrector

We consider here the small-flow limit:

We have extended the formulation to arbitrary-ows, expensive, but more robust ).

based on commutation ideas

1 Elman, SISC 27, 1651 2 L. Chacn, J. Physics:

Scalable implicit algorithms for sti hyperbolic PDE systems

The tyranny of scales (2006 NSF SBES report)

Hyperbolic stiness (linear and dispersive waves): Parabolic stiness (diusion):

is a measure of hyperbolic stiness. Discretize implicitly in time:

hyperbolic time scales have been parabolized.

Approximations in preconditioner do not aect accuracy of converged solution; only eciency!

is easy to invert (advection-diusion, not very sti, MG-friendly).

We have extended the formulation to arbitrary-ows, expensive, but more robust ).

PBP: Small-ow limit

Small ow approximation:

Extended (two-uid, Hall) MHD model equations

We have added o-diagonal couplings to block Stiest block is

Viscosity coecient can be determined to provide adequate dissipation of dispersive waves

2D nonlinear verication: GEM challenge Ion Hall vs. electron Hall