Professional Documents
Culture Documents
Scalable Implicit Algorithms For Sti Hyperbolic PDE Systems: L. Chacón
Scalable Implicit Algorithms For Sti Hyperbolic PDE Systems: L. Chacón
Scalable Implicit Algorithms For Sti Hyperbolic PDE Systems: L. Chacón
L. Chacn
Oak Ridge National Laboratory
Outline
Motivation: the tyranny of scales Block-factorization preconditioning of hyperbolic PDEs Compressible resistive MHD Compressible extended MHD Incompressible Navier-Stokes and MHD (innite sound-speed limit)
PDE systems of interest typically have mixed character, with hyperbolic and parabolic components.
J J
( J ) t f ast
( J)
t D x2
t tCFL
In some applications, fast hyperbolic modes carry a lot of energy (e.g., shocks, fast advection of solution structures), and the modeler must follow them. In others, however, fast time scales are parasitic , and carry very little energy.
J J J J
These are the ones that are usually targeted for scale-bridging.
Bridging the time-scale disparity requires a combination of approaches: Analytical elimination (e.g., reduced models). Well-posed numerical discretization (e.g., asymptotic preserving methods) Some level of implicitness in the temporal formulation (for stability; accuracy requires care).
CPU O
N np
"The tyranny of scales will not be simply defeated by building bigger and faster computers" (NSF SBES 2006 report, p. 30)
Optimal algorithm:
CPU N/n p
N 1+ np
1
CPU
; N=
Much emphasis has been placed on parallel scalability ( ). However, parallel (weak) scalability is limited by the lack of algorithmic scalability:
J N n p CPU n p
Explicit
requires
= = 0!
Implicit (multilevel)
Implicit (direct)
= 1/d
= 2 2/d
> 1 (varies)
0
Luis Chacn, chaconl@ornl.gov
J J
Oscillatory components of the error are EASY to deal with (if a SMOOTHER exists) Smooth components are DIFFICULT.
Idea: coarsen grid to make "smooth" components appear oscillatory, and proceed recursively
SMOOTHER is make or break of MG! Smoothers are hard to nd for hyperbolic systems, but fairly easy for parabolic ones:
1 t u = x v
1 t v = x u
1 u n +1 = u n + x v n +1 ,
Very ill conditioned as
1 v n +1 = v n + x u n +1 .
0!
2 u n +1 = u n + x
x vn
(i.e., it is asymptotic-preserving)!
J J
Temporally unresolved
No further manipulation of PDE than implicit dierencing (no terms added to PDE)! This fact can be exploited to devise optimal solution algorithms (block factorization)!
u n +1 = u n +
x v n +1 , v n +1 = v n +
x u n +1
Coupling structure:
t x
I
u n +1 v
n +1
un vn
D1
1
D2
I 0
UD2 1
D1
UD2 1 L
0 D2
1
I
D2 1 L
0 I
Only inverse of
D1 UD2 1 L
D1
UD2 1 L 2
= I
2 x
G ( x n +1 ) = 0
eciently (scalably).
G xk = G ( xk ) x
k
Jacobian-free implementation:
G x
y = Jk y = lim
k
G ( xk + y) G ( xk )
Krylov method of choice: GMRES (nonsymmetric systems). Right preconditioning: solve equivalent Jacobian system for
y = Pk x:
Jk Pk 1 Pk x = Gk
y
Implicit
resistive
MHD solver
t B t (v) + t
+ + +
(v) = 0, E = 0, B2 v + I ( p + ) = 0, 2 ( 1) T v = 0,
vv B B
T +v t T
p n . E = v B +
T B v
= = = =
L (, v) L T (T, v) L B ( B, v) Lv (v, B, , T )
Therefore, the Jacobian of the resistive MHD model has the following coupling structure:
J x =
D 0 0 Lv
0 DT 0 L Tv
0 0 DB L Bv
UvT T B UvB v Dv
Uv
Diagonal blocks contain advection-diusion contributions, and are easy to invert using MG techniques. O diagonal blocks
and
J x = M L U Dv y v
0 DT 0 0 0 DB
D ; y = T ; M = 0 0 B
yields:
M L
U Dv
0 I
M 1 0
0
1 PSchur
I 0
M 1 U
I
LM1
PSchur = Dv LM1U
EXACT Jacobian inverse only requires
M 1
and
1 PSchur .
: : :
y = M1 Gy
1 v = PSchur [ Gv Ly ],
PSchur = Dv LM1U
y = y M1Uv M 1 . v v A M1 t I
(cheap)
PSchur
is impractical due to
v vA
(more
(2006)
Conf. Series,
M1 t I
y v y
where:
M1 Gy
PSI1 [ Gv Ly ] ; PSI = Dv tLU
y tUv
I+I
v0 n
I) + t 2W ( B0 , p0 )
[I B0 ] j0
[I B0 ]
[I
p0 + p0
I]
Operator
W ( B0, p0 )
PSI
PSI
and
M:
1 V(4,4) cycle
PBP: 2D
serial
N
32x32 64x64 128x128 256x256
t t
0.5 0.75 1.0 1.5
GMRES/t
CPUexp /CPU
8.0 10.0 12.7 14.6
t/tCFL
380 570 760 1140
PBP: 3D
serial
10 time steps,
t = 0.1,
Grid
PBP: 3D
parallel
(Weak scaling,
163
t = 0.1
tCFL
J J J
Matrix-light multigrid, where only diagonals are stored; residuals are calculated matrix-free. Operator coarsening via rediscretization: avoids forming/communicating a matrix.
Current limitations: we do not feature a coarse-solve beyond the processor skeleton grid. This eventually degrades algorithmic scalability (only shows at > 1000-processor level).
Implicit
extended
MHD solver
+ + + +
(v) = 0, E = 0, B2 + I ( p + ) = 0, 2 ( 1) Te v = ( 1)
Q q , (1 + )
vv B B
Te + v t Te
di j Ti j ; = = i + e ; e = e ve ; ve = v di ; v = v 1+ Te
E = v B + j + di ( j B pe e ) Ohm s Law : E = v B + j + d [ v + v v + 1 ( p + i t i electron EOM
i )] ion EOM
k2 .
Luis Chacn, chaconl@ornl.gov
E = v B + j +
di (j B
pe
e )
B = B
E =
L B ( B, v, , T )
L TB J x = L B Lv D 0 DT L TB L Tv 0 UBT DB L Bv Uv
UvT T B UvB v D
v
M.
DB
E v B + j + di [ t v + v
1 v + ( pi +
i [v])]
t v.
J x
D 0 0 Lv
0 DT 0 L Tv
0 0 DB L Bv
Uv UvT
R H UvB + UvB
Dv
T B v
Use same block factorization approach. block contains ion time scales only
M1 t I
is a
very
good approximation
Additional block
H UvB :
v + v
v0 +
) + t 2W ( B0 , p0 )v
W ( B0 , p0 ) = B0
[I B0
di I ] j0 t
[I B0 ]
[I
p0 + p0
I]
k2 !
We can show analytically that additional term (yellow) is amenable to simple damped Jacobi smoothing!
Dispersive waves
k2
Resistivity is unable to provide a dissipation scale. Dissipation scale dened by electron viscosity,
e : v) e
4
e e
v A di k2 : e k4
e > C
di v A k ,max k3 max
t = 1.0,
1 V(4,4) MG cycle
CPUexp /CPU
0.74 10.9 214 3097
t/texp
135 1582 23809 370370
t/tCFL
110 384 1436 5660
Magnetic energy 160 150 140 130 120 110 100 90 Ion Hall e Hall 9 8 7 6 5 4 3 2 1 0 5 10 15 20 25 30 35 40 45 0
Kinetic energy Ion Hall e Hall 120 115 110 105 100 95 90 85 80 75 70 0 5 10 15 20 25 30 35 40 45 65
0 5 10 15 20 25 30 35 40 45
Block Factorization
Preconditioner
Coupling in Schur-complement
Brief Overview of Block Preconditioning Methods for Navier-Stokes: (A Taxonomy based on Approximate Block Factorizations, JCP 2008)
Discrete N-S Exact LDU Factorization Approx. LDU
Precond. Type
References
Chorin(1967);Temam (1969); Perot (1993): Quateroni et. al. (2000) as solvers Patankar et. al. (1980) as solvers; Pernice and Tocci (2001) as smoothers/MG Kay, Loghin, Wathan, Silvester, Elman (1999 2006); Elman, Howle, S., Shuttleworth, Tuminaro (2003,2008)
Pres. Proj;
1st Term Neumann Series
Fully coupled Algebraic AggC: Aggressive Coarsening Multigrid DD: Additive Schwarz Domain Decomposition
Block Preconditioners PCD & LSC: Commuting Schur complement SIMPLEC: Physics-based Schur complement
Take home: Block preconditioners competitive with fully coupled multigrid for CFD
* Paper accepted: E. C. Cyr, J. N. Shadid, R. S. Tuminaro, Stabilization and Scalable Block Preconditioning for the Navier-Stokes Equations, Accepted by J. Comp. Phys., 2011.
Weak Scaling of NK Solver with Fullycoupled AMG and Approx. Block Factorization Preconditioners
Transient Kelvin-Helmholtz instability (Re = 5 x 103 shear layer, constant CFL = 2.5)
u+
p+
1 1 BB+ B 2I 0 20 Az + u Az t 0 A, A = (0, 0, Az )
2
=f
u=0
0 Az = Ez
where B =
Block LU Factorization
F B Y where S = C BF 1 B T P = D Y F 1 (I + B T S 1 BF 1 )Z
C is zero for mixed interpolation FE and staggered FV methods, nonzero for stabilized FE Indefinite system hard to solve with incomplete factorizations without pivoting Block factorization of 3x3 system leads to nested Schur complements Use an operator splitting approximation to factor
B C 0
I Z 0 = BF 1 D Y F 1
I Y F 1 B T S 1
F I
B S
Z BF 1 Z P
Reduces to 2 2x2 systems for Navier-Stokes and magnetics-velocity blocks; C need not be non-zero or invertible (C-1 doesn t need to exist!)
Fully coupled Algebraic AggC: Aggressive Coarsening Multigrid DD: Additive Schwarz Domain Decomposition
Block Preconditioners Split: New Operator split preconditioner SIMPLEC: Extreme diagonal approximations
Take home: AggC and Split preconditioner scale algorithmically 1. SIMPLE preconditioner performance suffers with increased CFL 2. Run times are for unoptimized code 3. AggC not applicable to mixed discretizations, block factorization is
Driven Magnetic Reconnection: Magnetic Island Coalescence Half domain symmetry on [0,1]x[-1,1] with S = 10e+4
Fully coupled Algebraic AggC: Aggressive Coarsening Multigrid DD: Additive Schwarz Domain Decomposition
Block Preconditioners Split: New Operator split preconditioner SIMPLEC: Extreme diagonal approximations
Initial Weak Scaling Performance of AMG V-cycle on Leadership Class Machines Cray XE6 and BG/P Weak Scaling (Transport-reaction: Drift-diffusion simulations)
BG/P: ILU(0); overlap = 0 [Better scaling and faster time to solution than ILU(2),ov=1]
> 2200x
Cray XE6: ILU(2); overlap = 1 Steady-state drift-diffusion BJT TFQMR time per iteration Cray XE6 2.4GHz 8-core Magny-Cours
(Paul Lin)
Sti hyperbolic PDEs describe many applications of interest to DOE. In applications where fast time scales are parasitic, an implicit treatment is possible to bridge time-scale disparity. A fully implicit solution may only realize its eciency potential if a suitable scalable algorithmic route is available. Here, we have identied sti-wave block-preconditioning (aka physics-based preconditioning) in the context of JFNK methods as a suitable algorithmic pathway.
An important property is that it renders the numerical system suitable for multilevel preconditioning.
We have demonstrated the eectiveness of the approach in incompressible Navier-Stokes, incompressible MHD, and compressible resistive MHD and extended MHD.
In all these applications, the approach is robust and scalable, both algorithmically and in parallel.