Scalable Implicit Algorithms For Sti Hyperbolic PDE Systems: L. Chacón

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 38

Scalable implicit algorithms for sti hyperbolic PDE systems

L. Chacn
Oak Ridge National Laboratory

J. N. Shadid, E. Cyr, P. T. Lin, R. Tuminaro, R. Pawlowski


Sandia National Laboratories

DOE Applied Mathematics Program Meeting October 17-19, 2011 Washington, DC

Luis Chacn, chaconl@ornl.gov

Outline

Motivation: the tyranny of scales Block-factorization preconditioning of hyperbolic PDEs Compressible resistive MHD Compressible extended MHD Incompressible Navier-Stokes and MHD (innite sound-speed limit)

Luis Chacn, chaconl@ornl.gov

The tyranny of scales (2006 NSF SBES report)

Figure 1: Time scales in fusion plasmas (FSP report)

Luis Chacn, chaconl@ornl.gov

Algorithmic challenges in temporal scale-bridging

PDE systems of interest typically have mixed character, with hyperbolic and parabolic components.

J J

Hyperbolic stiness (linear and dispersive waves): Parabolic stiness (diusion):

( J ) t f ast

( J)

t D x2

t tCFL

In some applications, fast hyperbolic modes carry a lot of energy (e.g., shocks, fast advection of solution structures), and the modeler must follow them. In others, however, fast time scales are parasitic , and carry very little energy.

J J J J

These are the ones that are usually targeted for scale-bridging.

Bridging the time-scale disparity requires a combination of approaches: Analytical elimination (e.g., reduced models). Well-posed numerical discretization (e.g., asymptotic preserving methods) Some level of implicitness in the temporal formulation (for stability; accuracy requires care).

Key algorithmic requirement: SCALABILITY

CPU O

N np

Luis Chacn, chaconl@ornl.gov

Algorithmic scalability vs. parallel scalability

"The tyranny of scales will not be simply defeated by building bigger and faster computers" (NSF SBES 2006 report, p. 30)

Optimal algorithm:

CPU N/n p
N 1+ np
1

CPU

; N=

0, algorithmic scalability 0, parallel scalability

Much emphasis has been placed on parallel scalability ( ). However, parallel (weak) scalability is limited by the lack of algorithmic scalability:

J N n p CPU n p
Explicit

requires

= = 0!
Implicit (multilevel)

Implicit (direct)

Implicit (Krylov iterative)

= 1/d

= 2 2/d

> 1 (varies)

0
Luis Chacn, chaconl@ornl.gov

How do multilevel (multigrid) methods work?

MG employs a divide-and-conquer approach to attack error components in the solution.

J J

Oscillatory components of the error are EASY to deal with (if a SMOOTHER exists) Smooth components are DIFFICULT.

Idea: coarsen grid to make "smooth" components appear oscillatory, and proceed recursively

SMOOTHER is make or break of MG! Smoothers are hard to nd for hyperbolic systems, but fairly easy for parabolic ones:

Can one make hyperbolic PDEs MG-friendly?

Luis Chacn, chaconl@ornl.gov

Implicit discretization of hyperbolic PDEs: a case study

1 t u = x v

1 t v = x u

is a measure of hyperbolic stiness. Discretize implicitly in time:

1 u n +1 = u n + x v n +1 ,
Very ill conditioned as

1 v n +1 = v n + x u n +1 .

0!

However, if one combines equations:

2 u n +1 = u n + x

x vn

Equation is now well-posed when

(i.e., it is asymptotic-preserving)!

J J

Limit system is elliptic/parabolic (MG-friendly!)

Temporally unresolved

hyperbolic time scales have been parabolized.

No further manipulation of PDE than implicit dierencing (no terms added to PDE)! This fact can be exploited to devise optimal solution algorithms (block factorization)!

Luis Chacn, chaconl@ornl.gov

Block-factorization of hyperbolic PDEs

u n +1 = u n +

x v n +1 , v n +1 = v n +

x u n +1

Coupling structure:

t x
I

u n +1 v
n +1

un vn

block can be formally inverted via block factorization:

D1
1

D2

I 0

UD2 1

D1

UD2 1 L

0 D2
1

I
D2 1 L

0 I

Only inverse of

D1 UD2 1 L

(Schur complement) is required!

D1

UD2 1 L 2

= I

2 x

Luis Chacn, chaconl@ornl.gov

Nonlinear hyperbolic PDEs: JFNK and block factorization preconditioning

Objective: solve nonlinear system

G ( x n +1 ) = 0

eciently (scalably).

Converge nonlinear couplings using Newton-Raphson method:

G xk = G ( xk ) x
k

Jacobian-free implementation:

G x

y = Jk y = lim
k

G ( xk + y) G ( xk )

Krylov method of choice: GMRES (nonsymmetric systems). Right preconditioning: solve equivalent Jacobian system for

y = Pk x:

Jk Pk 1 Pk x = Gk
y

Approximations in preconditioner do not aect accuracy of converged solution; only eciency!

Block-factorization+MG will be our preconditioning strategy.

Luis Chacn, chaconl@ornl.gov

Implicit

resistive

MHD solver

L. Chacon, Phys. Plasmas (2008)

Luis Chacn, chaconl@ornl.gov

Resistive MHD model equations

t B t (v) + t

+ + +

(v) = 0, E = 0, B2 v + I ( p + ) = 0, 2 ( 1) T v = 0,

vv B B
T +v t T

Plasma is assumed polytropic Resistive Ohm's law:

p n . E = v B +

Luis Chacn, chaconl@ornl.gov

Resistive MHD Jacobian block structure

The linearized resistive MHD model has the following couplings:

T B v

= = = =

L (, v) L T (T, v) L B ( B, v) Lv (v, B, , T )

Therefore, the Jacobian of the resistive MHD model has the following coupling structure:

J x =

D 0 0 Lv

0 DT 0 L Tv

0 0 DB L Bv

UvT T B UvB v Dv

Uv

Diagonal blocks contain advection-diusion contributions, and are easy to invert using MG techniques. O diagonal blocks

and

contain all hyperbolic couplings.

Luis Chacn, chaconl@ornl.gov

Block factorization of resistive MHD

We consider the block structure:

J x = M L U Dv y v

0 DT 0 0 0 DB

D ; y = T ; M = 0 0 B

is easy to invert (advection-diusion, not very sti, MG-friendly).

Schur complement analysis of 2x2 block

yields:

M L

U Dv

0 I

M 1 0

0
1 PSchur

I 0

M 1 U
I

LM1

PSchur = Dv LM1U
EXACT Jacobian inverse only requires

M 1

and

1 PSchur .

Luis Chacn, chaconl@ornl.gov

Physics-based preconditioner (PBP)

3-step EXACT inversion algorithm:

Predictor Velocity update Corrector


MG treatment of

: : :

y = M1 Gy
1 v = PSchur [ Gv Ly ],

PSchur = Dv LM1U

y = y M1Uv M 1 . v v A M1 t I
(cheap)

PSchur

is impractical due to

We consider here the small-flow limit:

We have extended the formulation to arbitrary-ows, expensive, but more robust ).

v vA

based on commutation ideas

(more

1 Elman, SISC 27, 1651 2 L. Chacn, J. Physics:

(2006)
Conf. Series,

125, 012041 (2008)


Luis Chacn, chaconl@ornl.gov

PBP: Small-ow limit

Small ow approximation:

M1 t I

in steps 2 & 3 of Schur algorithm:

y v y
where:

M1 Gy
PSI1 [ Gv Ly ] ; PSI = Dv tLU

y tUv

PSI = n I/t + (v0 W ( B0 , p0 ) = B0

I+I

v0 n

I) + t 2W ( B0 , p0 )

[I B0 ] j0

[I B0 ]

[I

p0 + p0

I]

Operator

W ( B0, p0 )

is ideal MHD energy operator, which has real eigenvalues!

PSI

is parabolic, and hence block diagonally dominant by construction!

We employ multigrid methods (MG) to approximately invert

PSI

and

M:

1 V(4,4) cycle

Luis Chacn, chaconl@ornl.gov

PBP: 2D

serial

performance (tearing mode)

Grid convergence study (t

N
32x32 64x64 128x128 256x256

GMRES/t 14 11.8 11.2 11.4

= 1.0 A ) CPUexp /CPU t/tCFL


2.43 5.8 13.3 28.5 159 322 667 1429

CPU O( N ) OPTIMAL SCALING!

t t
0.5 0.75 1.0 1.5

convergence study (128x128)

GMRES/t

CPUexp /CPU
8.0 10.0 12.7 14.6

t/tCFL
380 570 760 1140

8.0 9.5 11.2 14.6

CPU O(t0.6 ) FAVORABLE SCALING!

Luis Chacn, chaconl@ornl.gov

PBP: 3D

serial

performance (island coalescence)

10 time steps,

t = 0.1,

V(3,3) cycles, mg tol=1e-2

Grid

GMRES/t 5.5 7.9 7.0

CPU 81 1176 11135

163 323 643

Luis Chacn, chaconl@ornl.gov

PBP: 3D

parallel

performance (island coalescence) points per processor, Cray XT4)

(Weak scaling,

163

t = 0.1

tCFL

Key to parallel performance:

J J J

Matrix-light multigrid, where only diagonals are stored; residuals are calculated matrix-free. Operator coarsening via rediscretization: avoids forming/communicating a matrix.

Current limitations: we do not feature a coarse-solve beyond the processor skeleton grid. This eventually degrades algorithmic scalability (only shows at > 1000-processor level).

Luis Chacn, chaconl@ornl.gov

Implicit

extended

MHD solver

Luis Chacn, chaconl@ornl.gov

Extended (two-uid, Hall) MHD model equations


t B t (v) + t

+ + + +

(v) = 0, E = 0, B2 + I ( p + ) = 0, 2 ( 1) Te v = ( 1)
Q q , (1 + )

vv B B
Te + v t Te

di j Ti j ; = = i + e ; e = e ve ; ve = v di ; v = v 1+ Te
E = v B + j + di ( j B pe e ) Ohm s Law : E = v B + j + d [ v + v v + 1 ( p + i t i electron EOM

i )] ion EOM

Note that EOMi - EOMe = EOM. Admits an energy principle.

This model supports fast dispersive waves

k2 .
Luis Chacn, chaconl@ornl.gov

Extended MHD Jacobian block structure: electron EOM (standard choice)

E = v B + j +

di (j B

pe

e )

Linearized induction equation

B = B

E =

has the following couplings:

L B ( B, v, , T )

Jacobian coupling structure:

L TB J x = L B Lv D 0 DT L TB L Tv 0 UBT DB L Bv Uv

UvT T B UvB v D
v

We have added o-diagonal couplings to block Stiest block is

M.

DB

breaks approximations in block-factorization approach. UNSUITABLE!

Luis Chacn, chaconl@ornl.gov

Extended MHD Jacobian block structure: ion EOM

E v B + j + di [ t v + v

1 v + ( pi +

i [v])]

Hall coupling is mainly via

t v.

Jacobian coupling structure becomes:

J x

D 0 0 Lv

0 DT 0 L Tv

0 0 DB L Bv

Uv UvT
R H UvB + UvB

Dv

T B v

We can therefore reuse ALL resistive MHD PC framework!

Luis Chacn, chaconl@ornl.gov

Extended MHD preconditioner

Use same block factorization approach. block contains ion time scales only

M1 t I

is a

very

good approximation

Additional block

H UvB :

PSI v = n v/t + (v0

v + v

v0 +

) + t 2W ( B0 , p0 )v

W ( B0 , p0 ) = B0

[I B0

di I ] j0 t

[I B0 ]

[I

p0 + p0

I]

Additional term brings in dispersive waves

k2 !

We can show analytically that additional term (yellow) is amenable to simple damped Jacobi smoothing!

We can use classical MG!

Luis Chacn, chaconl@ornl.gov

On the issue of dissipation in extended MHD

Dispersive waves

k2

require higher order dissipation.

Resistivity is unable to provide a dissipation scale. Dissipation scale dened by electron viscosity,

e : v) e
4

e e

Viscosity coecient can be determined to provide adequate dissipation of dispersive waves

v A di k2 : e k4

e > C

di v A k ,max k3 max

In the preconditioner, we deal with them coupled within MG.

by considering 2 second-order systems, and solving

Luis Chacn, chaconl@ornl.gov

Extended MHD performance results (2D tearing mode)

di = 0.05, e = 2.5 106


100 time steps, GMRES/t 22.3 15.4 10.6 13.1

t = 1.0,

1 V(4,4) MG cycle

Grid 32x32 64x64 128x128 256x256

CPUexp /CPU
0.74 10.9 214 3097

t/texp
135 1582 23809 370370

t/tCFL
110 384 1436 5660

Luis Chacn, chaconl@ornl.gov

2D nonlinear verication: GEM challenge Ion Hall vs. electron Hall

Magnetic energy 160 150 140 130 120 110 100 90 Ion Hall e Hall 9 8 7 6 5 4 3 2 1 0 5 10 15 20 25 30 35 40 45 0

Kinetic energy Ion Hall e Hall 120 115 110 105 100 95 90 85 80 75 70 0 5 10 15 20 25 30 35 40 45 65

Thermal energy Ion Hall e Hall

0 5 10 15 20 25 30 35 40 45

Luis Chacn, chaconl@ornl.gov

Incompressible Navier-Stokes solver


Cyr, Shadid, Tuminaro, JCP 2011 Elman, Howle, Shadid, Shuttleworth, Tuminaro, JCP 2008

Luis Chacn, chaconl@ornl.gov

Block preconditioning: CFD example


Consider discretized Navier-Stokes equations

Fully Coupled Jacobian

Block Factorization

Preconditioner

Coupling in Schur-complement

Required operators: Multigrid PCD, LSC, SIMPLEC

Properties of block factorization


1. Important coupling in Schur-complement 2. Better targets for AMG leveraging scalability

Properties of approximate Schur-complement


1. Nearly replicates physical coupling 2. Invertible operators good for AMG

Brief Overview of Block Preconditioning Methods for Navier-Stokes: (A Taxonomy based on Approximate Block Factorizations, JCP 2008)
Discrete N-S Exact LDU Factorization Approx. LDU

Precond. Type

References
Chorin(1967);Temam (1969); Perot (1993): Quateroni et. al. (2000) as solvers Patankar et. al. (1980) as solvers; Pernice and Tocci (2001) as smoothers/MG Kay, Loghin, Wathan, Silvester, Elman (1999 2006); Elman, Howle, S., Shuttleworth, Tuminaro (2003,2008)

Pres. Proj;
1st Term Neumann Series

SIMPLEC Pressure Convection / Diffusion

Now use AMG type methods on sub-problems. Momentum transient convection-diffusion:

Pressure Poisson type:

CFD Weak Scaling: Steady Backward Facing Step

Fully coupled Algebraic AggC: Aggressive Coarsening Multigrid DD: Additive Schwarz Domain Decomposition

Block Preconditioners PCD & LSC: Commuting Schur complement SIMPLEC: Physics-based Schur complement

Take home: Block preconditioners competitive with fully coupled multigrid for CFD
* Paper accepted: E. C. Cyr, J. N. Shadid, R. S. Tuminaro, Stabilization and Scalable Block Preconditioning for the Navier-Stokes Equations, Accepted by J. Comp. Phys., 2011.

Weak Scaling of NK Solver with Fullycoupled AMG and Approx. Block Factorization Preconditioners

Transient Kelvin-Helmholtz instability (Re = 5 x 103 shear layer, constant CFL = 2.5)

Quad-core Nehalems with Infini-band SNL Red Sky

Incompressible MHD solver

Luis Chacn, chaconl@ornl.gov

Incompressible MHD: 2D Vector Potential


Formulation

Magnetohydrodynamics (MHD) equations couple fluid flow to Maxwells equations


u +u t u
2

u+

p+

1 1 BB+ B 2I 0 20 Az + u Az t 0 A, A = (0, 0, Az )
2

=f

u=0
0 Az = Ez

where B =

Discretized using a stabilized finite element formulation

Block LU Factorization
F B Y where S = C BF 1 B T P = D Y F 1 (I + B T S 1 BF 1 )Z
C is zero for mixed interpolation FE and staggered FV methods, nonzero for stabilized FE Indefinite system hard to solve with incomplete factorizations without pivoting Block factorization of 3x3 system leads to nested Schur complements Use an operator splitting approximation to factor

B C 0

I Z 0 = BF 1 D Y F 1

I Y F 1 B T S 1

F I

B S

Z BF 1 Z P

Reduces to 2 2x2 systems for Navier-Stokes and magnetics-velocity blocks; C need not be non-zero or invertible (C-1 doesn t need to exist!)

Transient Hydro-Magnetic Kelvin-Helmholtz Problem


(Re = 700, S = 700)

Fully coupled Algebraic AggC: Aggressive Coarsening Multigrid DD: Additive Schwarz Domain Decomposition

Block Preconditioners Split: New Operator split preconditioner SIMPLEC: Extreme diagonal approximations

Take home: AggC and Split preconditioner scale algorithmically 1. SIMPLE preconditioner performance suffers with increased CFL 2. Run times are for unoptimized code 3. AggC not applicable to mixed discretizations, block factorization is

Driven Magnetic Reconnection: Magnetic Island Coalescence Half domain symmetry on [0,1]x[-1,1] with S = 10e+4

Fully coupled Algebraic AggC: Aggressive Coarsening Multigrid DD: Additive Schwarz Domain Decomposition

Block Preconditioners Split: New Operator split preconditioner SIMPLEC: Extreme diagonal approximations

Take home: Split preconditioner scales algorithmically

Initial Weak Scaling Performance of AMG V-cycle on Leadership Class Machines Cray XE6 and BG/P Weak Scaling (Transport-reaction: Drift-diffusion simulations)

Sub-domain smoothers: Impact of data locality of smoother? BG/P: ILU(2); overlap = 1

BG/P: ILU(0); overlap = 0 [Better scaling and faster time to solution than ILU(2),ov=1]

> 2200x

Cray XE6: ILU(2); overlap = 1 Steady-state drift-diffusion BJT TFQMR time per iteration Cray XE6 2.4GHz 8-core Magny-Cours

(Paul Lin)

Summary and Conclusions

Sti hyperbolic PDEs describe many applications of interest to DOE. In applications where fast time scales are parasitic, an implicit treatment is possible to bridge time-scale disparity. A fully implicit solution may only realize its eciency potential if a suitable scalable algorithmic route is available. Here, we have identied sti-wave block-preconditioning (aka physics-based preconditioning) in the context of JFNK methods as a suitable algorithmic pathway.

An important property is that it renders the numerical system suitable for multilevel preconditioning.

We have demonstrated the eectiveness of the approach in incompressible Navier-Stokes, incompressible MHD, and compressible resistive MHD and extended MHD.

In all these applications, the approach is robust and scalable, both algorithmically and in parallel.

Luis Chacn, chaconl@ornl.gov

You might also like