Professional Documents
Culture Documents
19 - Hermite Integrator For High-Order Mesh-Free Schemes
19 - Hermite Integrator For High-Order Mesh-Free Schemes
Abstract
In most mesh-free methods, the calculation of interactions between sample points or
“particles” is the most time-consuming. When we use mesh-free methods with high
spatial orders, the order of the time integration should also be high. If we use usual
Runge–Kutta schemes, we need to perform the interaction calculation multiple times per
time step. One way to reduce the number of interaction calculations is to use Hermite
schemes, which use the time derivatives of the right-hand side of differential equations,
since Hermite schemes require a smaller number of interaction calculations than Runge–
Kutta schemes do to achieve the same order. In this paper, we construct a Hermite scheme
for a mesh-free method with high spatial orders. We performed several numerical tests
with fourth-order Hermite schemes and Runge–Kutta schemes. We found that, for both
Hermite and Runge–Kutta schemes, the overall error is determined by the error of spatial
derivatives, for time steps smaller than the stability limit. The calculation cost at the time-
step size of the stability limit is smaller for Hermite schemes. Therefore, we conclude
that Hermite schemes are more efficient than Runge–Kutta schemes and thus useful for
high-order mesh-free methods for Lagrangian hydrodynamics.
Key words: hydrodynamics — galaxies: formation — methods: numerical — planets and satellites: formation
C The Author(s) 2018. Published by Oxford University Press on behalf of the Astronomical Society of Japan. All rights reserved.
In this section, we present the derivation of the fourth-order In this section, we describe how we calculate high-
Hermite schemes for CPHSF. order time-derivatives for hydrodynamics equations in the
Lagrangian view. Our approach is essentially the same
as that of Aoki (1997), who derived higher-order time-
2.1 Hermite scheme derivatives for the Eulerian view. Aoki (1997) considered
In this section we present the formulation of the fourth- the following equation:
order Hermite schemes (Makino 1991; Makino & Aarseth ∂
f = ξx f, (10)
1992). Consider a second-order differential equation, ∂t
where ξ x is some linear operator. By taking time deriva-
dx tives of both sides of equation (10), they derived a series of
= v, (1) equations;
dt
Publications of the Astronomical Society of Japan (2019), Vol. 71, No. 1 18-3
is given by
where γ is the ratio of specific heat, and P
∂2
f = ξx ξx f, (11) = γ P.
P (27)
∂t 2
∂3 For the equation of state for weakly compressible fluid used
f = ξx ξx ξx f, (12)
∂t 3 in subsection 3.2,
P = c02 (ρ − ρair ) + Pair , (28)
and so on. In this paper, we consider the equation
where ρ air , Pair , g, H, and c0 are air density, air pressure,
d
f = ξx f, (13) gravity, height of fluid, and sound velocity, respectively. We
dt
set
In this section, we compare the original number of arith- of particle i, and Ndist is 22, 45, and 48 for one, two,
metic operations and the additional number of the oper- and three dimensions. Then, we evaluate elements of Bi
ations necessary for the Hermite scheme. First, we show given by equation (38), the polynomial equation given by
how to derive the spatial high-order derivatives of a phys- equation (36), and the kernel function Wij , to calculate
ical quantity f. Secondly, the original number of arithmetic equation (35). One interaction calculation between particle
operations of CPHSF is derived. We call this value Nop . i and particle j in [Bi ]αβ is given by {[pij ]α [pij ]β Wij }. The
Note that we assume that Nop comprises only the number number of combinations of [pij ]α [pij ]β is n(2np , D). The
of operations for the evaluation of the inverse matrix of Bi in parameter n(np , D) is the number of bases of a polynomial
equation (35) and interaction calculation between particles fitting in equation (35), where D is the number of dimen-
jerk, snap, and crackle is derived. We call this value Nadd . n(n p , D) = (n p + m). (41)
D! m=0
Finally, we compare Nop and Nadd . To obtain the number of
arithmetic operations, we calculate the number of floating- For example, if we consider the one-dimensional case,
point operations per particle of CPHSF. If a quantity has {[pij ]α [pij ]β Wij } is given by xiαj xiβj Wi j and thus [Bi ]α1 β1 is the
been derived, we assume that it will not be unnecessarily same as [Bi ]α2 β2 with (α 1 + β 1 ) = (α 2 + β 2 ). Therefore,
recalculated. We assume that the numbers of floating-point the number of the terms of the form of {[pij ]α [pij ]β Wij }
operations required to evaluate division and square root are is n(2np , D). Since we assume that a quantity which
both 20. has been derived will not be unnecessarily recalculated,
First, we show how to derive the spatial high-order the number of floating-point operations for the evalua-
derivatives of f. In CPHSF, the mth spatial order deriva- tion of {[pij ]α [pij ]β Wij } except for {[pij ]0 [pij ]0 Wij } is 1.
tives of f is given by the following equations: For example, if we consider the one-dimensional case,
we can get ximj Wi j by multiplying xim−1 j Wi j by xij and thus
δm f = Bi−1 mα f j pα,i j Wi j , (35)
the number of floating-point operations is only 1 for the
α j
T evaluation of ximj Wi j . In addition, the number of floating-
1 2 n p −1 np
δ = 1, ∇x , ∇ y , ∇z , ∇x , ∇x ∇ y , . . . , ∇ y ∇z , ∇z , (36) point operations for summing each term {[pij ]α [pij ]β Wij }
2
T with respect to j is 1. Therefore, the total number of
n −1 n
pi j = 1, xi j , yi j , zi j , xi2j , xi j yi j , . . . , yi j zi jp , zi jp , (37) floating-point operations for one interaction calculation
in Bi is 2n(2np , D) − 1. One interaction calculation
Bi = Wi j pi j ⊗ pi j , (38) between particle i and particle j in the calculation of
j
equation (35) is given by Wi j f j pi j . The number of the
where i and j are indices of particles, m and α are integers, terms of the form of Wij fj [pij ]α is n(np , D). We have den-
np and Wij are the spatial order of the scheme and a Kernel sity (pressure), energy, and velocity, and thus the number
function, and xij , yij , and zij are xj − xi , yj − yi , and zj − zi . of physical quantities is (D + 2). Therefore, the total
In CPHSF, the total number of floating point operations number of floating-point operations for one interaction cal-
per neighbor particle is given by culation in mth derivatives of density (pressure), energy,
Nop = Nint Nnb + Ninv , (39) and velocity given equation (35) is 2(D + 2)n(np , D).
Therefore, the number of floating-point operations for the
where Nnb is the number of neighbor particles, and Nint and
CPHSF fitting is given by
Ninv are the numbers of floating-point operations for the
interaction calculation between particles and the evaluation Nsf (n p , D) = 2n(2n p , D) + 2(D + 2)n(n p , D) − 1. (42)
of the inverse matrix of Bi in equation (35). The number The number of floating-point operations necessary for the
of floating-point operations for interaction calculation is evaluation of the kernel function, Nkernel , are 33, 35, and
given by 36 for one, two, and three dimensions, respectively. From
Nint = Ndist + Nkernel + Nsf , (40) the above, the total numbers of floating-point operations
for the calculation of equation (35) are [33 + Nsf (np , 1)],
where Ndist and Nkernel are the number of floating-point
[35 + Nsf (np , 2)], and [36 + Nsf (np , 3)] for one, two,
operations necessary to evaluate the relative distance and
and three dimensions.
the kernel function. The last term, Nsf , represents the
From the above, the total numbers of floating-point
number of floating-point operations for the CPHSF fitting.
operations for one interaction calculation of CPHSF, Nint ,
In CPHSF, first of all, we evaluate only |xi j |/ hi , where
are
xi j is the displacement of particles i and j and hi is the
Kernel length of particle i, to search neighbor particles Nint [55 + Nsf (n p , 1)], (43)
Publications of the Astronomical Society of Japan (2019), Vol. 71, No. 1 18-5
Ndist 22 45 48
Nkernel 33 35 36
Nsf Nsf (np , 1) Nsf (np , 2) Nsf (np , 3)
Total [55 + Nsf (np , 1)] [80 + Nsf (np , 2)] [84 + Nsf (np , 3)]
3 Numerical experiments
Fig. 1. Values of Nadd and Nop plotted against np . The dashed and solid In this section, we present the result of the Sod shock tube
lines show these values for Nadd and Nop . From left to right, the values test in subsection 3.1 and that for the test of the surface
of D are 1, 2, and 3.
gravity wave in subsection 3.2. We compare the results
of fourth-order Hermite schemes and second- and fourth-
Nint [80 + Nsf (n p , 2)], (44) order Runge–Kutta schemes in the Sod shock tube test, and
the results of an fully implicit Hermite-scheme, the implicit
Nint [84 + Nsf (n p , 3)], (45) fourth-order Runge–Kutta scheme, and the backward-Euler
scheme in the surface gravity wave test.
for one, two, and three dimensions, respectively. Table 1
shows the summary of the numbers of floating-point oper-
ations for one interaction calculation of CPHSF. 3.1 Sod shock tube
The evaluation of the inverse matrix of Bi also dominates In this section, we present the result of the Sod shock
in CPHSF and the number of floating-point operations of it, tube test (Sod 1978). We assume that fluid is an
Ninv , is 2n(np , D)3 /3. Therefore, the numbers of floating- ideal gas with γ = 1.4. The computational domain is
point operations per particle of CPHSF are −0.5 ≤ x < 0.5 with a periodic boundary, and the initial
2 boundary of two fluids are at x = −0.5 and 0. In this test,
Nop Nnb [55 + Nsf (n p , 1)] + n(n p , 1)3 , (46)
3 we used equal-mass particles. The initial velocity is given
2 by v x = 0. The density is smoothed by a C5 polynomial,
Nop Nnb [80 + Nsf (n p , 2)] + n(n p , 2)3 , (47)
3 and is given by
2 ⎧
Nop Nnb [84 + Nsf (n p , 3)] + n(n p , 3)3 , (48)
3 ⎪
⎨ ρh −0.25 ≤ x < −x0 ,
ρh −ρl 5 ρh +ρl
for one, two and three dimensions, respectively. ρ(x) = m=0 bm x
2m+1
+ −x0 ≤ x < x0 ,
⎪
⎩ρ
2 2
In the following, we derive Nadd . To derive jerk, snap, l x0 ≤ x ≤ 0.25,
and crackle in the Hermite schemes, we need to calculate the
(49)
second spatial order derivatives of fi given by equation (35).
Here, the values of j fj pα, ij Wij and [Bi−1 ]mα have been cal- where (b0 , b1 , b2 , b3 , b4 , b5 ) = (−693/256, 1155/256,
culated in the derivation of the spatial first-order deriva- −693/128, 495/128, −385/256, 63/256), and ρ h and ρ l
tive. Therefore, we must calculate only the multiplication are the values of initial density in the high- and low-density
of [Bi−1 ]mα by j fj pα, ij Wij , and the additional number of cal- regions. We used ρ h = 1 and ρ l = 0.25. The parameter
culations for one physical quantity is given by D H2 n(np , D). x0 represents the width of the smoothing region, and we
We have density (pressure), energy, and velocity, and thus used two values of x0 . One is an initial condition with
the number of physical quantities in a numerical calcula- x0 = 0.006, and the other is a smooth initial condition with
tion is (D + 2). Therefore, the total additional number of x0 = 0.03. We set the initial condition for 0.25 ≤ x < 0.5
calculations is Nadd = (D + 2)d H2 n(np , D). to mirror that of 0 < x ≤ 0.25, and −0.5 ≤ x ≤ −0.25
Figure 1 shows Nop and Nadd with respect to np . We as mirroring −0.25 ≤ x ≤ 0. The positions of particles in
assume Nnb = 10, 75, and 600 for one, two, and three the smoothing region are determined so that position xi of
18-6 Publications of the Astronomical Society of Japan (2019), Vol. 71, No. 1
particle i satisfies
xi
1
ρ(x)dx = , (50)
xi−1 2N h
T
pi j = 1, xi j , xi2j , xi3j , xi4j , xi5j . (52)
The kernel function is the fourth-order Wendland function Fig. 3. Same as figure 2, but for Nx = 4000.
(Wendland 1995). The kernel length is given by
1/D
m̃i We compare results with PEC, PECE, and P(EC)2 forms
hi = η , (54)
ρi of Hermite schemes, and Heun’s scheme (hereafter RK2)
and the classical fourth-order Runge–Kutta scheme (here-
m̃i = ρt=0,i Vt=0,i , (55) after RK4). The numbers of particles, Nx , are 1000, 2000,
and 4000. Calculation codes used in this study were devel-
where ρ t = 0, i and Vt = 0, i are the density and geometric oped using FDPS (Iwasawa et al. 2016).
volume, respectively, of a particle i at t = 0. We set η = 3.8. Figure 2 shows density profiles at t = 0.1 for the tests
We calculated the L1-norm error of density at t = 0.1 to with Nx = 1000 and dt dtmax /4, where dtmax is the max-
verify the spatial order of the schemes and to compare the imum time-step in the stability region, with the PEC form
accuracy of the schemes; of the Hermite scheme. Note that the results for all schemes
Nx
1 |ρn − ρnhres | are similar to that for the PEC form of the Hermite scheme.
ρ = , (56) We can see that the shock wave can be captured. How-
n=1
Nx ρnhres
ever, the post-shock oscillation is strong for x0 = 0.006.
where ρnhres is the result of a high-resolution test in which Figure 3 is the same as figure 2, but for Nx = 4000. Note that
the number of particles, Nx , is 8000 and dt = 10−6 . When the results are independent of the time-integration scheme
we derived equation (56), we calculated ρ n of particles rear- used and the results for Nx = 2000 are similar to those for
ranged at the same positions as those of the high-resolution Nx = 4000. We can see that the shock wave can be captured
test. The time integrator for high-resolution test is the Her- clearly even if the initial condition is not smooth. Therefore,
mite scheme of the P(EC)2 form. For the test of the time if the initial condition is not smooth, the resolution of time
order of the scheme for the test with Nx = N0 , ρn,t hres
is and space should be higher.
the result of a high-resolution test in which Nx is N0 and Now we check the spatial order of the scheme. We used
dt = 10−6 . The time integrator for the high-resolution test the sixth-order shape function and then the first and second
is the same as that for ρ n . In this case we define the error derivatives are fifth and fourth orders in space. Therefore, if
as the result converges to an exact solution following the order
Nx
1 |ρn − ρn,t
hres
| of the method, the order of the scheme should be larger than
ρ,t = . (57)
Nx ρn,t
hres or equal to 4, and thus ρ should be given by ρ ∝ Nx−m
n=1
where m is larger than or equal to 4. Figure 4 shows that
Publications of the Astronomical Society of Japan (2019), Vol. 71, No. 1 18-7
ρ for the P(EC)2 form of the Hermite scheme for runs with
dt = 10−6 plotted against Nx−1 . The results are independent
of the time-integration scheme used. The value of ρ for runs
with x0 = 0.006 is proportional to Nx−4 . The value of ρ in
the large Nx region for runs with x0 = 0.03 is proportional
to Nx−1 since, in this region, the round-off error dominates
the total error. In the other region, ρ is proportional to Nx−4 .
From these results, we can conclude that the spatial order
of the scheme is consistent with theoretical expectation.
Fig. 5. ρ, t at t = 0.1 for the tests with x0 = 0.006 plotted against dtic . From left to right, panels show the results for the Nx = 1000, 2000 and 4000.
Triangles, squares and crosses show the results for Hermite schemes in PEC, PECE, P(EC)2 forms, and open and filled circles show the results for
RK4 and RK2. Solid and dashed curves show the theoretical models for the error of second- and fourth-order schemes.
the solution in the limit of dt → 0 to converge to a solu- a Hermite scheme is equal to two. From these results, we
tion that is different from the exact solution, but the rate can conclude that the time orders of the schemes are consis-
of the convergence is the order of the time-integration tent. The fact that the apparent error order of the Hermite
scheme, since we can regard the space-discretized differ- scheme is 2 does not imply it is a second-order scheme,
ential equations as the set of ordinal differential equations. because when we simultaneously shrink the interparticle
However, in the case of the Hermite scheme, we construct distance and time step, the error will be O(dt 4 ) as expected.
the second time-derivatives of physical quantities from the The second-order behaviour occurs only when the spatial
original equations and high-order spatial derivatives, and error dominates the total error.
these spatial derivatives contain discretization errors. Thus, Figure 7 shows errors for tests with x0 = 0.006 plotted
Nt
1 d An d An+1
+ + J ,n − − J ,n+1 t 2 , 3.2 Surface gravity wave test
n=0
12 dt dt
The surface gravity wave test is useful for the investiga-
(61) tion of the capability of numerical schemes to handle two-
where Nt is given by Nt = T/t. Here, we can assume that dimensional fluid dynamics with high accuracy and small
dv dissipation. The initial condition is the same as those in
= lim A. (62)
dt t=0 Antuono et al. (2011) and Yamamoto and Makino (2017),
Therefore, equation (61) becomes but sound velocity given by equation (29) is 10 times smaller
1 than that of Yamamoto and Makino (2017). We assume
v(t=T) = v A(T) + O(t)4 + J ,(t=0) − J ,(t=T) t 2 , (63) that fluid is weakly compressible with an equation of state
12
where v A (T) is the analytical solution for the velocity which given by equation (28) with ρ air = 103 and Pair = 105 and
satisfies equation (62). We can see that the time order of sound velocity given by equation (29) with g = −10 and
Publications of the Astronomical Society of Japan (2019), Vol. 71, No. 1 18-9
2
ρ(y) = ρair e g(H−y)/c0 . (64)
Initial velocity is
|g|k cosh(ky)
vx = A sin(kx), (65)
ω cosh(kH)
|g|k sinh(ky)
v y = −A cos(kx), (66)
ω cosh(kH)
T
pi j = 1, xi j , yi j , xi2j , xi j yi j , yi2j , . . . , xi2j yi3j , xi j yi4j , yi5j . (68)
Fig. 11. Results of the surface gravity wave tests with N = 16 × 17; from 17 and dt dtmax /4. Note that the results are independent
top to bottom, the snapshots at t = 0, 0.25T, 0.5T, and 0.75T are shown. of the time-integration scheme used and N.
Now we check the spatial order of the scheme. We used
the fifth-order shape function and then the first and second
the Gauss–Legendre scheme (hereafter IRK4). The numbers derivatives are fourth and third orders in space. Therefore,
of particles, N, are 16 × 17, 32 × 33, and 64 × 65. if the result converges to an exact solution following the
Figure 11 shows the time evolution up to t = 0.75T order of the method, the order of the scheme should be
with the implicit Hermite scheme, N = 16 × 17 and dt larger than or equal to 3, and thus vx should be given by
dtmax /4. Figure 12 shows y of the particle initially at vx ∝ Nx−m where m is larger than or equal to 3 and Nx is the
(x, y) = (0, 1) with the implicit Hermite scheme, N = 16 × number of particles in the x-direction. Figure 13 shows vx
Fig. 14. vx ,t plotted against dtic . From left to right, panels show the results for the Nx = 16, 32 and 64. Crosses and open and filled circles show
the results of the implicit Hermite scheme, IRK4, and IRK1. Dashed, solid and treble-dot–dashed curves show the theoretical models for the error of
second-, first-, and fourth-order schemes.
18-12 Publications of the Astronomical Society of Japan (2019), Vol. 71, No. 1
stable region plotted against Nx−1 . We can see that the region of the implicit Hermite scheme, IRK4, and IRK1.
4 Summary
If we use multi-stage integration schemes, such as Runge– Acknowledgments
Kutta schemes, with mesh-free methods we need to perform We would like to thank the referee for his or her insightful comments
the interaction calculation, which is the most expensive and suggestions. We also thank the editor for his or her assistance.
part of the calculation, multiple times per time step. We We thank Masaki Iwasawa, Keigo Nitadori and Daisuke Namekata
for discussions about Hermite schemes and Runge–Kutta schemes.
constructed a Hermite scheme for a high-order mesh-free
This research was supported by RIKEN Junior Research Associate
method. The accuracy of fourth-order Hermite schemes is Program and MEXT as “Exploratory Challenge on Post-K com-
at least similar to those of Runge–Kutta schemes and the puter” (Elucidation of the Birth of Exoplanets [Second Earth] and
region of stability of Hermite schemes is better than those the Environmental Variations of Planets in the Solar System).
Publications of the Astronomical Society of Japan (2019), Vol. 71, No. 1 18-13