Professional Documents
Culture Documents
Approximation of the invariant measure of stable SDEs
Approximation of the invariant measure of stable SDEs
com
ScienceDirect
Received 14 January 2023; received in revised form 22 May 2023; accepted 7 June 2023
Available online 13 June 2023
Abstract
We propose two Euler–Maruyama (EM) type numerical schemes in order to approximate the invariant
measure of a stochastic differential equation (SDE) driven by an α-stable Lévy process (1 < α < 2): an
approximation scheme with the α-stable distributed noise and a further scheme with Pareto-distributed
noise. Using a discrete version of Duhamel’s principle and Bismut’s formula in Malliavin calculus, we
2
prove that the error bounds in Wasserstein-1 distance are in the order of η1−ϵ and η α −1 , respectively,
where ϵ ∈ (0, 1) is arbitrary and η is the step size of the approximation schemes. For the Pareto-driven
2
scheme, an explicit calculation for Ornstein–Uhlenbeck α-stable process shows that the rate η α −1 cannot
be improved.
© 2023 Elsevier B.V. All rights reserved.
∗ Corresponding author.
E-mail addresses: chenpengmath@nuaa.edu.cn (P. Chen), dengcs@whu.edu.cn (C.-S. Deng),
rene.schilling@tu-dresden.de (R.L. Schilling), lihuxu@um.edu.mo (L. Xu).
https://doi.org/10.1016/j.spa.2023.06.001
0304-4149/© 2023 Elsevier B.V. All rights reserved.
Chen, Deng, Schilling, Xu Stochastic Processes and their Applications 163 (2023) 136–167
1. Introduction
We study the solution (X t )t≥0 of the following stochastic differential equation (SDE) driven
by an α-stable Lévy process:
dX t = b(X t ) dt + dZ t , X 0 = x, (1.1)
where x ∈ R is the starting point, (Z t )t≥0 is a d-dimensional, rotationally invariant α-stable
d
Lévy process with index α ∈ (1, 2), and b : Rd → Rd is a function satisfying Assumption A.
The Euler–Maruyama (EM) scheme of the SDE (1.1), with a step size η ∈ (0, 1), is defined
by
Y0 = x, Yk+1 = Yk + ηb(Yk ) + (Z (k+1)η − Z kη ), k = 0, 1, 2, . . . , (1.2)
see, e.g. [17,40]. It is easy to see that (Yk )k≥0 is a Markov chain. A drawback of the scheme
(1.2) is that there is no explicit representation for the probability density of α-stable noise
Z (k+1)η − Z kη , α ∈ (1, 2), making the numerical simulation is complicated and numerically
expensive, see the very recent monograph [27, Section 1.9] for a detailed discussion about
the difficulties arising in the multivariate stable distribution simulations. See also [5,24,26] for
sampling stable distributed random variables. In contrast, the Pareto distribution has a simple
probability density and thus can be easily sampled by the classical acceptance and rejection
method. Since the stable and the Pareto distribution have the same tail behavior, and inspired
by the stable central limit theorem (see, e.g. [7,13]), we replace the stable noise in (1.2) with
a Pareto distributed noise, and consider the following EM scheme:
Let Z̃ 1 , Z̃ 2 , . . . be an iid sequence of d-dimensional random vectors, which are Pareto
distributed, i.e.
α
Z̃ 1 ∼ p(z) = 1(1,∞) (|z|); (1.3)
σd−1 |z|α+d
d
we denote by σd−1 = 2π 2 /Γ ( d2 ) the surface area of the unit sphere Sd−1 ⊂ Rd . We will
approximate the SDE (1.1) by the following approximation scheme:
η1/α
Ỹ0 = x, Ỹk+1 = Ỹk + ηb(Ỹk ) + Z̃ k+1 , k = 0, 1, 2, . . . , (1.4)
σ
where η > 0 is the step size, σ α = α/(σd−1 Cd,α ), and
)−1
Γ d+α
(∫ ( )
α dy α−1 −d/2
Cd,α = |ξ | (1 − cos⟨ξ, y⟩) = α2 π 2
), (1.5)
|y|α+d Γ 1 − α2
(
Rd \{0}
see e.g. [4, Example 2.4.d)] and [2, III.18.23]. It is easy to see that (Ỹk )k≥0 is a Markov chain.
We aim to study the error bounds in the Wasserstein-1 distance for the above two schemes,
in particular for large time.
The EM approximation of SDEs is a classical research topic, both in probability theory and
in numerical analysis, and over the past decades there have been many contributions, see for
instance [1,9,11,18,30,37] for SDEs driven by a Brownian motion, and [14,15,17,23,31,34] for
SDEs driven by Lévy noise. Most of these papers focus on error bounds of the solution to the
SDE and the EM approximation in a time interval [0, T ] for some finite T > 0; typically, there
appears a constant C T (depending on T ) in the error bounds, which tends to ∞ as T → ∞.
137
Chen, Deng, Schilling, Xu Stochastic Processes and their Applications 163 (2023) 136–167
The recent use of Langevin samplings in machine learning, has caused a surge of interest
error bounds for the invariant measures of the solution of the SDE and of the EM discretization,
see e.g. [6,16,25,38,44]. We refer the reader to [19,35,39] for discrete schemes for the invariant
measure of SDEs driven by a Brownian motion. Panloup [32] uses certain recursive procedures
to compute the invariant measure of Lévy-driven SDEs, but he does not determine the
convergence rate. To the best of our knowledge, our paper is the first contribution studying
the bound between the invariant measures of solutions to SDEs driven by stable noise and
their EM discretizations.
A further motivation of our research is to show that the EM scheme with Pareto distributed
innovations can indeed be used to approximate the invariant measures of SDEs driven by an α-
stable noise with α ∈ (1, 2). In order to speed up the EM scheme, actual implementations of the
discretization (1.4), use iid random variables ( Z̃ k )k≥1 with Pareto distribution rather than stable
innovations. The advantage of this approach is that the Pareto distribution has an explicitly
given density (see (1.3)) which allows for a much simpler sampling than stable random
variables. We also show that the convergence rate η2/α−1 is optimal for the Ornstein–Uhlenbeck
process on R.
For α = 2, the stable process Z t is (essentially) a d-dimensional standard Brownian motion
√
and the convergence rate for the corresponding invariant measure is η (up to a logarithmic
2 −1 √
correction), see for instance [12]. Our optimal rate η α will tend to O(1) rather than η as
α ↑ 2; this type of “phase transition” has been observed in many situations, e.g. in the stable
law CLT [13,42]. This is due to the fact that α-stable distributions with α ∈ (0, 2) do not have
second moments, while the 2-stable distribution is the Gaussian law having arbitrary moments.
Our approach in proving the main results is via a discrete version of Duhamel principle
and Bismut’s formula in Malliavin calculus. More precisely, we split the stochastic process
(X t )t≥0 into smaller pieces (X t )(k−1)η≤t≤kη for k ≥ 1 and replace (X t )(k−1)η≤t≤kη with Ỹk and
Yk , respectively. This procedure is reminiscent to Lindeberg’s method for the CLT. In order
to bound the error caused by these replacements, we use the semigroup Pt given by (X t )t≥0
and study its regularity using Malliavin’s calculus for jump processes. In order to bound the
second-order derivative of Pt , we need to adopt the framework of the time-change argument
established in [43] and use the Bismut formula.
1.2. Notation
with all their derivatives]; ∇ f (x) ∈ Rd and ∇ 2 f (x) ∈ Rd×d are the gradient and the Hessian.
For v, v1 , v2 , x ∈ Rd , the directional derivatives are given by
f (x + εv) − f (x)
∇v f (x) = ⟨∇ f (x), v⟩ = lim ,
ε→0 ε
∇v f (x + εv2 ) − ∇v1 f (x)
∇v2 ∇v1 f (x) = ∇ 2 f (x), v1 v2⊤ HS = lim 1 ,
⟨ ⟩
ε→0 ε
138
Chen, Deng, Schilling, Xu Stochastic Processes and their Applications 163 (2023) 136–167
∑d
where ⟨A, B⟩HS := i, Ai j Bi j for A, B ∈ Rd×d . The Hilbert–Schmidt norm of a matrix
√j=1
d 2
A ∈ Rd×d is ∥A∥HS =
∑
i, j=1 Ai j .
The directional derivatives are similarly defined for (sufficiently smooth) vector-valued
functions f = ( f 1 , f 2 , . . . , f d )⊤ : Rd → Rd : let v, v1 , v2 , x ∈ Rd , then ∇v f (x) =
(∇v f 1 , ∇v , . . . , ∇v f d )⊤ , ∇v2 ∇v1 f (x) = (∇v2 ∇v1 f 1 , . . . , ∇v2 ∇v1 f d )⊤ .
For f ∈ Cb2 (Rd , R), we will use the supremum and the supremum Hilbert–Schmidt norm
∥∇ f ∥∞ = sup |∇ f (x)|, ∥∇ 2 f ∥HS,∞ = sup ∥∇ 2 f (x)∥HS .
x∈Rd x∈Rd
where C(µ1 , µ2 ) is the set of all coupling realizations of µ1 , µ2 , i.e. all random variables
with values in R2d with marginals µ1 , µ2 . We also have the following dual description of the
Wasserstein distance
W1 (µ1 , µ2 ) = sup |µ1 (h) − µ2 (h)|,
h∈Lip(1)
Remark 1.1. Note that (1.8) immediately implies the following linear growth condition
|b(x) − b(0)| ≤ θ2 |x|, x ∈ Rd . (1.9)
Under Assumption A, we will show that both (X t )t≥0 and (Ỹk )k≥0 are ergodic; we write µ
and µ̃η , respectively, for their invariant measures, see Propositions 1.5 and 1.7. Throughout the
paper the constants C, c1 , c2 , c3 , c4 and λ may depend on θ1 , θ2 , θ3 , K , α, d, |b(0)| and β for
some constant β ∈ [1, α), but we often suppress this in our notation; moreover, the exact values
of the constants may vary from line to line. Our main results are the following two theorems:
139
Chen, Deng, Schilling, Xu Stochastic Processes and their Applications 163 (2023) 136–167
Theorem 1.2. Let (X t )t≥0 and (Ỹk )k≥0 be defined by (1.1) and (1.4) (step size η), and denote
by µ and µ̃η , their invariant measures. Under Assumption A, there exists a constant C such
that the following two statements hold:
(1) For every N ≥ 2 and step size η < min 1, θ1 /(8θ22 ), 1/θ1 , one has
{ }
(2) For every step size η < min 1, θ1 /θ22 , 1/θ1 , one has
{ }
W1 µ, µ̃η ≤ Cη2/α−1 .
( )
Theorem 1.3. Let (X t )t≥0 and (Yk )k≥0 be defined by (1.1) and (1.2) (step size η), and denote
by µ and µη , their invariant measures. Under Assumption A, for any β ∈ [1, α), there exists
a constant C depending on β such that the following two statements hold:
(1) For every N ≥ 2 and step size η < min 1, θ1 /(8θ22 ), 1/θ1 , one has
{ }
1+ 1 − 1
W1 law(X ηN ), law(Y N ) ≤ C(1 + |x|β )η α β .
( )
(2) For every step size η < min 1, θ1 /θ22 , 1/θ1 , one has
{ }
1+ 1 − 1
W1 µ, µη ≤ Cη α β .
( )
Remark 1.4. The rate η2/α−1 in the first theorem is optimal for the one-dimensional
Ornstein–Uhlenbeck process, see Proposition B.1.
The proofs of Theorems 1.2 and 1.3 are presented in Section 2. In Section 3, we use a
time-change argument and the Bismut formula to prove Lemma 2.1, which is the key to the
proof of our main result. Appendix A includes the proofs of the propositions in this section
for the completeness. Finally, in Appendix B, the exact convergence rate η2/α−1 is reached for
the Ornstein–Uhlenbeck process on R, which shows that the rate in Theorem 1.2 (2) is sharp.
Here we collect a few auxiliary properties of (X t )t≥0 and (Yk )k≥0 . The proofs are standard,
but we include them in Appendix A to be self-contained. Recall that Vβ (x) = (1 + |x|2 )β/2 .
Proposition 1.5. Let Assumption A hold and denote by (X t )t≥0 the solution to the SDE (1.1).
Then, (X t )t≥0 admits a unique invariant probability measure µ such that for 1 ≤ β < α
sup ⏐E[ f (X tx )] − µ( f )⏐ ≤ c1 Vβ (x)e−c2 t , t > 0,
⏐ ⏐
(1.10)
| f |≤Vβ
for some constants c1 , c2 > 0. In particular, there exists a constant C > 0 such that
E|X tx |β ≤ C(1 + |x|β ), t > 0. (1.11)
Proposition 1.6. Under Assumption A, there exist for every t > 0 and x, y ∈ Rd constants
C > 0 and λ > 0 such that
y )
W1 law(X tx ), law(X t ) ≤ Ce−λt |x − y|.
(
140
Chen, Deng, Schilling, Xu Stochastic Processes and their Applications 163 (2023) 136–167
Proposition 1.7. Let Assumption A hold and denote by (Yk )k≥0 and (Ỹk )k≥0 the Markov
{ defined by (1.2)
chains and (1.4), respectively. Assume that the step size satisfies η <
min 1, θ1 /θ22 , 1/θ1 . Then
}
(1) the chain (Yk )k≥0 admits a unique invariant measure µη , such that for all x ∈ Rd and
k > 0,
sup |E f (Ykx ) − µη ( f )| ≤ c1 V1 (x)e−c2 k , (1.12)
| f |≤V1
Lemma 1.8. Let Assumption A hold and denote by (Yk )k≥0 and (Ỹk )k≥0 the
{ Markov}chains
defined by (1.2) and (1.4), respectively. If the step size satisfies η < min 1, 8θθ12 , θ1 , then
2 1
there is a constant C > 0, which is independent of η, such that
E|Ykx |β ≤ C(1 + |x|β ), (1.14)
E|Ỹkx | ≤ C(1 + |x|), (1.15)
hold for any β ∈ [1, α), x ∈ R and k > 0.
d
The first auxiliary lemma is about the regularity of the semigroup induced by (X t )t≥0 .
Lemma 2.1. Let h ∈ Lip(1) and X tx be the solution to the SDE (1.1). For all vectors
v, v1 , v2 ∈ Rd and t ∈ (0, 1], we have
|∇v Pt h(x)| ≤ eθ2 |v| (2.1)
and
|∇v2 ∇v1 Pt h(x)| ≤ Ct −1/α |v1 | |v2 | , (2.2)
for some constant C > 0.
Remark 2.2. Our proof of Lemma 2.1 in Section 3 is based on Norris’ approach to Malliavin
calculus [28] and Bismut’s formula. One of the referees kindly pointed out an elegant alternative
proof using the approach in [29] and the Bismut–Elworthy–Li formula. Although the alternative
proof is shorter, we decided to keep our own approach, since we refer to exactly these
arguments in several other publications. In the arXiv version of the present paper, we will
add the referee’s alternative proof as an appendix.
141
Chen, Deng, Schilling, Xu Stochastic Processes and their Applications 163 (2023) 136–167
Using the inequalities (1.11) and (1.14), we can obtain the following estimates:
Lemma 2.3. Let (X t )t≥0 be the solution to the SDE { (1.1) and} (Yk )k≥0 be the Markov chains
defined by (1.2). If the step size satisfies η < min 1, 8θθ12 , θ1 , then the following estimates
2 1
hold for all t ∈ (0, 1], β ∈ [1, α):
E|Y1x − x|β ≤ C(1 + |x|β )ηβ/α , (2.3)
β β β/α
E|X tx − x| ≤ C(1 + |x| )t , (2.4)
β+ βα
E|X ηx − Y1x |β ≤ C(1 + |x|β )η . (2.5)
142
Chen, Deng, Schilling, Xu Stochastic Processes and their Applications 163 (2023) 136–167
Lemma 2.4. Let α ∈ (1, 2) and f : Rd → R satisfying ∥∇ f ∥∞ < ∞ and ∥∇ 2 f ∥HS,∞ < ∞.
For all x, y ∈ Rd one has
Proof. From the definition of the fractional Laplacian (2.6) and the symmetry of the
representing measure we have for any R > 0
f (x + r θ) − f (x) − r ⟨θ, ∇ f (x)⟩ 1(0,R) (r )
∫ ∫ ∞
α/2
(−∆) f (x) = Cd,α dr dθ
Sd−1 0 r α+1
⟨θ, ∇ f (x + θs) − ∇ f (x)⟩
∫ ∫ R∫ r
= Cd,α ds dr dθ
Sd−1 0 r α+1
∫ ∞0 ∫ r
⟨θ, ∇ f (x + θ s)⟩
∫
+ Cd,α ds dr dθ
Sd−1 R 0 r α+1
Then, for all x, y ∈ Rd ,
⏐(−∆)α/2 f (x) − (−∆)α/2 f (y)⏐
⏐ ⏐
|∇ f (x + θ s) − ∇ f (x) − ∇ f (y + θ s) + ∇ f (y)|
∫ ∫ R∫ r
≤ Cd,α ds dr dθ
Sd−1 0 r α+1
∫ ∞0 ∫ r
|∇ f (x + θ s) − ∇ f (y + θ s)|
∫
+ Cd,α ds dr dθ.
S d−1 R 0 r α+1
For the first integral we have
|∇ f (x + θs) − ∇ f (x) − ∇ f (y + θ s) + ∇ f (y)|
∫ ∫ R∫ r
Cd,α ds dr dθ
Sd−1 0 0 r α+1
|∇ f (x + θ s) − ∇ f (x)| + |∇ f (y + θ s) − ∇ f (y)|
∫ ∫ R∫ r
≤ Cd,α ds dr dθ
Sd−1 0 0 r α+1
Cd,α ∥∇ 2 f ∥HS,∞ σd−1 2−α
∫ ∫ r
s
≤ 2Cd,α ∥∇ 2 f ∥HS,∞ α+1
ds dr dθ = R ,
Sd−1 0 r 2−α
and for the second term we get
|∇ f (x + θ s) − ∇ f (y + θ s)|
∫ ∫ ∞∫ r
Cd,α ds dr dθ
S d−1 R 0 r α+1
∫ ∫ ∞∫ r
|x − y|
≤ Cd,α ∥∇ 2 f ∥HS,∞ ds dr dθ
Sd−1 R 0 r α+1
Cd,α ∥∇ 2 f ∥HS,∞ σd−1
= |x − y|R 1−α .
α−1
Hence, the assertion follows upon taking R = |x − y|. □
Lemma 2.5. Let (X t )t≥0 and (Ỹk )k≥0 be defined by (1.1) and (1.4), respectively. There exists
a constant C > 0 such that for all x ∈ Rd , η ∈ (0, 1), f : Rd → R satisfying ∥∇ f ∥∞ < ∞
and ∥∇ 2 f ∥HS,∞ < ∞,
143
Chen, Deng, Schilling, Xu Stochastic Processes and their Applications 163 (2023) 136–167
η1/α
[ ( ) ]
−E f x + ηb(x) + Z̃ − f (x + ηb(x)) .
σ
We can bound J1 using (1.8) and (2.4) with β = 1:
⏐∫ η ⏐
b(X r ) dr − ηb(x)⏐⏐
⏐ x
⏐
|J1 | ≤ ∥∇ f ∥∞ E ⏐⏐
∫ η0
≤ ∥∇ f ∥∞ E|b(X rx ) − b(x)| dr
0∫
η
≤ θ2 ∥∇ f ∥∞ E|X rx − x| dr
0 ∫ η
≤ Cθ2 (1 + |x|)∥∇ f ∥∞ r 1/α dr
0
≤ C(1 + |x|)∥ ∇ f ∥∞ η1+1/α .
For the first term of J2 we use Dynkin’s formula (see e.g. [8]) to get
∫ η
E f x + ηb(x) + Z η − f x + ηb(x) = E (−∆)α/2 f x + ηb(x) + Z r dr.
[ ( ) ( )] [ ( )]
0
For the second part of J2 we use that Cd,α = ασd−1 σ and Taylor’s formula to see
−1 −α
η1/α
[ ( ) ]
E f x + ηb(x) + Z̃ − f x + ηb(x)
( )
σ
η η1/α
1/α [∫ 1 ⟨ ( ) ⟩ ]
= E ∇ f x + ηb(x) + t Z̃ , Z̃ dt
σ 0 σ
η η1/α )
1/α ∫ ∫ 1 ⟨ ⟩
dt dz
α ∇ f x + ηb(x) + tz , z
(
=
σ |z|≥1 0 σ σd−1 |z|α+d
αη
∫ ∫ 1
) ⟩ dt dz
∇ f x + ηb(x) + t z , z
⟨ (
=
σd−1 σ α |z|≥σ −1 η1/α 0 |z|α+d
α/2
= η(−∆) f (x + ηb(x)) − R,
where
∫ ∫ 1 ) ⟩ dt dz
R := ηCd,α ∇ f x + ηb(x) + t z , z .
⟨(
|z|<σ −1 η1/α 0 |z|α+d
Together, the above estimates yield
⏐∫ η ⏐
α/2 α/2
E (−∆) f x + ηb(x) + Z r dr − η(−∆) f (x + ηb(x))⏐⏐ .
⏐ [ ( )] ⏐
|J2 | ≤ |R| + ⏐
⏐
0
144
Chen, Deng, Schilling, Xu Stochastic Processes and their Applications 163 (2023) 136–167
Further, we have
1
⏐∫ ∫ ⏐
) ⟩ dt dz ⏐
|R| = ηCd,α ⏐⏐ ∇ f x + ηb(x) + t z − ∇ f x + ηb(x) , z
⏐ ⟨ ( ) (
⏐
|z|<σ −1 η1/α 0 |z|α+d ⏐
∫ ∫ 1
≤ ηCd,α ⏐∇ f x + ηb(x) + t z − ∇ f x + ηb(x) ⏐ dt dz
⏐ ( ) ( )⏐
|z|<σ η
−1 1/α 0 |z|α+d−1
∫
1 dz
≤ ηCd,α ∥∇ 2 f ∥HS,∞ α+d−2
≤ C∥∇ 2 f ∥HS,∞ η2/α .
2 |z|<σ η
−1 1/α |z|
By Lemma 2.4, we also have
⏐∫ η ⏐
α/2 α/2
E (−∆) f x + ηb(x) + Z r dr − η(−∆) f (x + ηb(x))⏐⏐
⏐ [ ( )] ⏐
⏐
⏐
0
∫ η
E ⏐(−∆)α/2 f x + ηb(x) + Z r − (−∆)α/2 f (x + ηb(x))⏐ dr
⏐ ( ) ⏐
≤
0 ∫ η
2
E |Z r |2−α dr
[ ]
≤ C∥∇ f ∥HS,∞
∫0 η
2
E |Z 1 |2−α r 2/α−1 dr
[ ]
= C∥∇ f ∥HS,∞
] 0
≤ CE |Z 1 |2−α ∥∇ 2 f ∥HS,∞ η2/α .
[
Lemma 2.6. Assume that f satisfies ∥∇ f ∥∞ < ∞ and ∥∇ 2 f ∥HS,∞ < ∞. For any β ∈ [1, 2]
and x, y ∈ Rd , we have
|∇ f (x) − ∇ f (y)| ≤ 2∥∇ f ∥∞ + ∥∇ 2 f ∥HS,∞ |x − y|β−1 .
( )
Lemma 2.7. Let (X t )t≥0 and (Yk )k≥0 be defined by (1.1) and (1.2), respectively. There exists
a constant C > 0 such that for all x ∈ Rd , η ∈ (0, 1), β ∈ [1, α) and f : Rd → R satisfying
∥∇ f ∥∞ < ∞ and ∥∇ 2 f ∥HS,∞ < ∞,
) 2+ 1 − 1
|Pη f (x) − Q 1 f (x)| ≤ C(1 + |x|β ) ∥∇ f ∥∞ + ∥∇ 2 f ∥HS,∞ η α β .
(
) 0
= E ∇ f x + ηb(x) + Z η − ∇ f (x + ηb(x)) , X ηx − Y1x
⟨ ( ⟩
145
Chen, Deng, Schilling, Xu Stochastic Processes and their Applications 163 (2023) 136–167
+ E ∇ f (x + ηb(x)) , X ηx − Y1x
⟨ ⟩
∫ 1
∇ f Y1x + r (X ηx − Y1x ) − ∇ f (Y1x ), X ηx − Y1x dr
⟨ ( ) ⟩
+E
0
=: I + II + III.
For the first term I we have
I = E ∇ f x + ηb(x) + Z η − ∇ f (x + ηb(x)) , X ηx − Y1x 1(0,1] (|Z η |) + 1(1,∞) (|Z η |)
⟨ ( ) ⟩( )
=: I1 + I2 .
We need the following estimates for the truncated moment of order λ > α and the tail of the
α-stable random variable Z η and η ≤ 1:
P |Z η | > 1 ≤ cη and E |Z η |λ 1(0,1] (Z η ) ≤ Cη.
( ) [ ]
Both estimates follow from a straightforward calculation using the standard estimate qα (η, x) ≤
Cη/(η1/α + |x|)α+d for the density of Z η , see e.g. [3, Theorem 2.1]. Since β−1 β
> α, we can
use the Hölder inequality and (2.5) to get
|I1 | ≤ E ⏐∇ f x + ηb(x) + Z η − ∇ f (x + ηb(x))⏐ 1(0,1] (|Z η |) ⏐ X ηx − Y1x ⏐
[⏐ ( ) ⏐ ⏐ ⏐]
⏐ ⏐]
≤ ∥∇ 2 f ∥HS,∞ E |Z η |1(0,1] (|Z η |) ⏐ X ηx − Y1x ⏐
[
]) β−1
β ( ⏐β ) β1
( [
β ⏐
2 β−1
≤ ∥∇ f ∥HS,∞ E |Z η | 1(0,1] (|Z η |) E ⏐ X ηx − Y1x ⏐
β−1 1
≤ C(1 + |x|)∥∇ 2 f ∥HS,∞ η β η1+ α
2+ α1 − β1
= C(1 + |x|)∥∇ 2 f ∥HS,∞ η ,
whereas by the Hölder inequality
|I2 | ≤ E ⏐∇ f x + ηb(x) + Z η − ∇ f (x + ηb(x))⏐ 1(1,∞) (|Z η |) ⏐ X ηx − Y1x ⏐
[⏐ ( ) ⏐ ⏐ ⏐]
⏐ ⏐]
≤ 2∥∇ f ∥∞ E 1(1,∞) (|Z η |) ⏐ X ηx − Y1x ⏐
[
) β−1 ( ⏐ ⏐β ) β1
≤ 2∥∇ f ∥∞ E1(1,∞) (|Z η |) β E ⏐ X ηx − Y1x ⏐
(
β−1 1
≤ C(1 + |x|)∥∇ f ∥∞ η β η1+ α
2+ α1 − β1
= C(1 + |x|)∥∇ f ∥∞ η .
Hence, we have
) 2+ 1 − 1
|I| ≤ C(1 + |x|) ∥∇ f ∥∞ + ∥∇ 2 f ∥HS,∞ η α β .
(
For II we use Itô’s formula and the definitions (1.1), (1.2) of X ηx and Y1x to see
⟨ ∫ η ⟩
II = E ∇ f (x + ηb(x)) , b(X sx ) − b(x) ds
[ ]
⟨ ∫ η0 ⟩
= ∇ f (x + ηb(x)) , E b(X sx ) − b(x) ds
[ ]
⟨ ∫0 η ∫ s [ ⟩
α
]
= ∇ f (x + ηb(x)) , x x x
⟨ ⟩
E ∇b(X r ), b(X r ) + (−∆) b(X r ) dr ds
2
0 0
≤ C∥∇ f ∥∞ (1 + |x|)η2 .
146
Chen, Deng, Schilling, Xu Stochastic Processes and their Applications 163 (2023) 136–167
In the last inequality we use the estimate (1.9) (for b(xrx )) and Lemma 2.4 (for (−∆)α/2 b(X rx )),
combined with the moment estimate (2.4).
Finally, III is estimated by Lemma 2.6 and (2.5),
) ⏐ ⏐β
III ≤ C ∥∇ f ∥∞ + ∥∇ 2 f ∥HS,∞ E ⏐ X ηx − Y1x ⏐
(
β
≤ C(1 + |x|β ) ∥∇ f ∥∞ + ∥∇ 2 f ∥HS,∞ ηβ+ α
( )
) 2+ 1 − 1
≤ C(1 + |x|β ) ∥∇ f ∥∞ + ∥∇ 2 f ∥HS,∞ η α β .
(
Proof of Theorem 1.2. Thanks to the discrete version of the classical Duhamel principle, it
is easy to check that for h ∈ Lip(1)
N
∑ ( )
PN η h(x) − Q̃ N h(x) = Q̃ i−1 Pη − Q̃ 1 P(N −i)η h(x). (2.8)
i=1
Then we have
( )
W1 law(X ηN ), law(Ỹ N ) = sup |PN η h(x) − Q̃ N h(x)|
h∈Lip(1)
N −1
∑ ⏐ ⏐
( )
≤ sup ⏐ Q̃ i−1 Pη − Q̃ 1 P(N −i)η h(x)⏐ (2.9)
⏐ ⏐
i=1 h∈Lip(1)
⏐ ⏐
+ sup ⏐ Q̃ N −1 (Pη − Q̃ 1 )h(x)⏐ .
⏐ ⏐
h∈Lip(1)
First, we bound the last term. By (2.4) with β = 1, (1.4) and (1.8), for h ∈ Lip(1) and η < 1,
⏐ ⏐ ⏐ ⏐
⏐(Pη − Q̃ 1 )h(x)⏐ = ⏐Eh(X ηx ) − Eh(Ỹ1 )⏐
⏐ ⏐ ⏐ ⏐
⏐ ⏐ ⏐⏐ ⏐
≤ ⏐Eh(X ηx ) − h(x)⏐ + ⏐Eh(Ỹ1 ) − h(x)⏐
⏐
≤ E|X ηx − x| + E|Ỹ1 − x|
≤ C(1 + |x|) η1/α + η |b(x)| + σ −1 η1/α E| Z̃ 1 |
≤ C(1 + |x|) η1/α + η1/α (|b(0)| + θ2 |x|) + σ −1 η1/α E| Z̃ 1 |
≤ C(1 + |x|) η1/α .
Together with (1.15) we get
⏐ ⏐
sup ⏐ Q̃ N −1 (Pη − Q̃ 1 )h(x)⏐ ≤ C(1 + E|Ỹ Nx −1 |)η1/α ≤ C(1 + |x|)η2/α−1 . (2.10)
⏐ ⏐
h∈Lip(1)
Next, we bound the first term in (2.9); we distinguish between two cases:
Case 1: N ≤ η−1 + 1. By Lemmas 2.5 and 2.1,
⏐( ⏐
⏐ Pη − Q̃ 1 P(N −i)η h(x)⏐ ≤ C(1 + |x|) ∥∇ P(N −i)η h∥∞ + ∥∇ 2 P(N −i)η h∥HS,∞ η2/α
⏐ ) ⏐ ( )
N −1
∑ N −1
∑ ∫ N −1
[(N − i)η]−1/α = η−1/α i −1/α ≤ η−1/α r −1/α dr
i=1 i=1 0
α α
= η−1/α (N − 1)−1/α+1 ≤ η−1 .
α−1 α−1
This gives the upper bound
N −1
∑ ⏐ ⏐ N −1
∑
sup ⏐ Q̃ i−1 Pη − Q̃ 1 P(N −i)η h(x)⏐ ≤ C(1 + |x|)η2/α [(N − i)η]−1/α
⏐ ( ) ⏐
i=1 h∈Lip(1) i=1
α
≤C (1 + |x|)η2/α−1 .
α−1
Case 2: N > η−1 + 1. By Proposition 1.6, for any x, y ∈ Rd , there exist constants C > 0 and
λ > 0 such that
|Pt h(x) − Pt h(y)| ≤ Ce−λt |x − y|, h ∈ Lip(1), t ≥ 0.
This implies that
⏐ ( ) ⏐ ⏐ ( ) ⏐
sup ⏐Q i−1 Pη − Q̃ 1 P(N −i)η h(x)⏐ = sup ⏐ Q̃ i−1 Pη − Q̃ 1 P1 P(N −i)η−1 h(x)⏐
⏐ ⏐ ⏐ ⏐
h∈Lip(1) h∈Lip(1)
⏐ ⏐
≤ Ce−λ[(N −i)η−1] sup ⏐ Q̃ i−1 Pη − Q̃ 1 P1 g(x)⏐ ,
⏐ ( ) ⏐
g∈Lip(1)
Observe that
⌊N −η−1 ⌋ N −1 ∫ N −1
λ
∑ ∑
−λ[(N −i)η−1] −λ(iη−1)
e = e ≤e e−ληr dr
i=1 ⌊η−1 ⌋−1
i=⌊η−1 ⌋
∫ ∞
λ −1
≤e η e−λr dr = λ−1 eλ η−1 .
0
148
Chen, Deng, Schilling, Xu Stochastic Processes and their Applications 163 (2023) 136–167
Thus, we get
⌊N −η−1 ⌋ ⏐ ⏐
sup ⏐ Q̃ i−1 Pη − Q̃ 1 P(N −i)η h(x)⏐ ≤ Cλ−1 eλ (1 + |x|)η2/α−1 .
∑ ⏐ ( ) ⏐
i=1 h∈Lip(1)
For i ≥ ⌊N − η−1 ⌋ + 1, by almost the same as the calculation in the first case, we find
N −1
∑ α
[(N − i)η]−1/α ≤ η−1 .
α−1
i=⌊N −η−1 ⌋+1
∑⌊N −η−1 ⌋
. . . and . . . . Adding them up we
∑ N −1
We have just shown estimates for i=1 i=⌊N −η−1 ⌋+1
arrive at
N −1
∑ ⏐ ⏐
( )
sup ⏐ Q̃ i−1 Pη − Q̃ 1 P(N −i)η h(x)⏐
⏐ ⏐
i=1 h∈Lip(1)
⎛ ⎞
⌊N −η−1 ⌋ N −1 ⏐ ⏐
∑ ∑
⎠ sup ⏐⏐ Q̃ i−1 Pη − Q̃ 1 P(N −i)η h(x)⏐⏐ ≤ C(1 + |x|)η2/α−1 .
( )
=⎝ +
i=1 h∈Lip(1)
i=⌊N −η−1 ⌋+1
Substituting this and (2.10) into (2.9), the first assertion of Theorem 1.2 follows.
It remains to prove Part (2). It is easy to see from (1.10) and (1.13), that we have
Proof of Theorem 1.3. In the above proof, replacing Lemma 2.5 and (1.13) with Lemma 2.7
and (1.12), the proof of Theorem 1.3 is similar to that of Theorem 1.2. □
149
Chen, Deng, Schilling, Xu Stochastic Processes and their Applications 163 (2023) 136–167
The Jacobian flow is the derivative of X tx with respect to the initial value x; the Jacobian
flow in direction v ∈ Rd is defined by
X tx+ϵv − X tx
∇v X tx := lim , t ≥ 0.
ϵ→0 ϵ
This limit exists and satisfies
d
∇v X tx = ∇∇v X tx b(X tx ), ∇v X 0x = v. (3.1)
dt
Similarly, for v1 , v2 ∈ Rd , we can define ∇v2 ∇v1 X tx , which satisfies
d
∇v ∇v X x = ∇∇v2 ∇v1 X tx b(X tx ) + ∇∇v2 X tx ∇∇v1 X tx b(X tx ), ∇v1 ∇v1 X 0x = 0. (3.2)
dt 2 1 t
Then, we first have the following estimates of ∇v1 X tx and ∇v2 ∇v1 X tx .
Lemma 3.1. For any starting point x ∈ Rd and all directions v1 , v2 ∈ Rd the following
(deterministic) estimates hold:
|∇v1 X tx | ≤ eθ2 |v1 | , t ∈ (0, 1], (3.3)
θ3
|∇v2 ∇v1 X tx | ≤ √ e4θ2 |v1 | |v2 | , t ∈ (0, 1]. (3.4)
2 2θ2
adapted to the filtration (Ft )t≥0 with Ft := σ (Ws : 0 ≤ s ≤ t); i.e. u(t) is Ft measurable for
t ≥ 0. Define
∫ •
U= u(s) ds. (3.5)
0
For a t > 0, let Ft : C([0, t], Rd ) → Rm be an Ft measurable map. If the following limit
exists
Ft (W + ϵU ) − Ft (W )
DU Ft (W ) = lim
ϵ→0 ϵ
in L ((Ω , F, P); R ), then Ft (W ) is said to be m-dimensional Malliavin differentiable and
2 m
(See also [43]). We will now turn to the SDE driven by a rotationally invariant α-stable Lévy
process with α ∈ (1, 2). We can express such drivers as subordinated Brownian motion. More
precisely, let {St }t≥0 be an independent α2 -stable subordinator. Then, Z t := W St is a rotationally
invariant α-stable Lévy process, see e.g. [36]. This means that we can re-write (1.1) as
dX t = b(X t ) dt + dW St , X 0 = x. (3.7)
d
Let W be the space of all continuous functions from [0, ∞) to R vanishing at t = 0;
we equip W with the topology of locally uniform convergence, and the Wiener measure µW ;
therefore, the coordinate process
Wt (w) = wt
are a standard d-dimensional Brownian motion. Let S be the space of all increasing, càdlàg
(right continuous with finite left limits) functions from [0, ∞) to [0, ∞) vanishing at t = 0;
we equip S with the Skorohod metric and the probability measure µS so that for any l ∈ S the
coordinate process
St (l) := lt
α
is an 2
-stable subordinator. On the product measure space
(Ω , F, P) := (W × S, B(W) ⊗ B(S), µW × µS ) ,
we define
L t (w, l) := wlt .
151
Chen, Deng, Schilling, Xu Stochastic Processes and their Applications 163 (2023) 136–167
The process {L t }t≥0 is a rotationally invariant α-stable Lévy process on (Ω , F, P). We will
use the following two natural filtrations associated with the Lévy process L t and the Brownian
motion Wt :
Ft := σ {L s (w, l); s ≤ t} and FtW := σ {Ws (w); s ≤ t} .
In particular, we can regard the solution X tx of the SDE (3.7) as an (Ft )-adapted functional on
Ω , and therefore,
∫ ∫
E f X tx = f X tx (wl ) µW (dw) µS (dl).
( ) ( )
S W
Lemma 3.2. Under Assumption A one has for all functions φ ∈ Cb2 (Rd , R), all directions
v1 , v2 ∈ Rd and x ∈ Rd , t ∈ (0, 1]
|∇v1 X tx;l | ≤ eθ2 |v1 | (3.9)
and
⏐ [ )]⏐⏐
⏐E ∇∇v X x;l ∇∇v X x;l φ X tx;l ⏐
⏐ (
⏐ 2[ t 1 t
( x;l ) t ⟨ θ3 2θ2
∫ ]⏐
1
x;l φ X t ∇v2 X s , dWls ⏐⏐ + ∥∇φ∥∞ √ e |v1 | |v2 | ,
⏐ x;l
⟩ ⏐
≤ ⏐E
⏐ ∇
lt ∇v1 X t 0 2θ2
where ∇vi X tx;l (i = 1, 2) is determined by the following linear equation:
d
∇v X x;l = ∇∇v X x;l b(X tx;l ), ∇vi X 0x;l = vi . (3.10)
dt i t i t
In order to prove Lemma 3.2, we use a time-change argument to transform the SDE (3.8)
into an SDE driven by a standard Brownian motion; this allows us to use Bismut’s formula
(3.6). For every ϵ ∈ (0, 1) we define
∫ 1
1 t+ϵ
∫
ϵ
lt := ls ds + ϵt = lϵs+t ds + ϵt.
ϵ t 0
Since t ↦→ lt is increasing and right continuous, it follows that for each t ≥ 0,
ltϵ ↓ lt as ϵ ↓ 0.
Moreover, t ↦→ ltϵ is absolutely continuous and strictly increasing. Let γ ϵ be the inverse
function of l ϵ , i.e.
lγϵ ϵ = t, t ≥ l0ϵ and γlϵϵ = t, t ≥ 0.
t t
152
Chen, Deng, Schilling, Xu Stochastic Processes and their Applications 163 (2023) 136–167
ϵ
By definition, γtϵ is absolutely continuous on [l0ϵ , ∞). Let X tx;l be the solution to the SDE
ϵ ϵ ϵ
dX tx;l = b(X tx;l ) dt + dWltϵ −l0ϵ , X 0x;l = x. (3.11)
Let us now define
ϵ ϵ
Ytx;l := X γx;lε , t ≥ l0ϵ .
t
(γ̇sϵ denotes the derivative in s). Hence, for any vector v ∈ Rd , we have
∫ t
ϵ ϵ
( )
∇v Ytx;l = v + ∇∇ Y x;l ϵ b Ysx;l γ̇sϵ ds, (3.13)
v s
l0ϵ
ϵ ϵ
ϵ Ytx;l (W + δU ) − Ytx;l (W )
DU Ytx;l (W ) = lim .
δ→0 δ
ϵ ϵ ϵ
To simplify notation, we drop the W in DU Ytx;l (W ) and write DU Ytx;l = DU Ytx;l (W ). By
(3.12), it satisfies the equation
∫ t(
ϵ ϵ
( ) )
DU Ytx;l = ∇ D Y x;l ϵ b Ysx;l γ̇sϵ + u(s) ds,
l0ϵ U s
Proof. Recall that θ1 > 0 and γ̇sϵ ≥ 0. By (3.13) and (1.8), we have for any l0ϵ ≤ s ≤ t
d ϵ 2
⟨ ϵ
( ϵ
)⟩ ϵ 2
|∇vi Ysx;l | = 2γ̇sϵ ∇vi Ysx;l , ∇∇ Y x;l ϵ b Ysx;l ≤ 2θ2 γ̇sϵ |∇vi Ysx;l | ,
ds vi s
x;l ϵ ϵ
⟨ ( )⟩
ϵ
+ 2γ̇s DU2 ∇v1 Ys , ∇ D Y x;l ϵ ∇∇ Y x;l ϵ b Ysx;l
U2 s v1 s
⏐ ϵ
⏐2 ⏐ ϵ⏐⏐
⏐⏐ ϵ⏐⏐
⏐⏐ ϵ⏐
⏐
≤ 2θ2 γ̇sϵ ⏐DU2 ∇v1 Ysx;l ⏐ + 2θ3 γ̇sϵ ⏐DU2 ∇v1 Ysx;l ⏐ ⏐DU2 Ysx;l ⏐ ⏐∇v1 Ysx;l ⏐ .
⏐ ⏐ ⏐
θ2 θ2
∫ s
ϵ ⏐2 ϵ ϵ ϵ ϵ
⏐ ⏐
⏐DU2 ∇v1 Ysx;l ⏐ ≤ 3 |v1 |2 |v2 |2 γ̇uϵ e4θ2 γu e4θ2 (γs −γu ) du = 3 |v1 |2 |v2 |2 e4θ2 γs γsϵ . □
⏐
2θ2 ϵ
l0 2θ 2
θ3 2θ2 t √
= ∥∇φ∥∞ √ e t |v1 | |v2 |
2θ2
θ3 2θ2
≤ ∥∇φ∥∞ √ e |v1 | |v2 | ,
2θ2
155
Chen, Deng, Schilling, Xu Stochastic Processes and their Applications 163 (2023) 136–167
and so
ϵ) ⏐
⏐ [ ]⏐
⏐E ∇∇v X x;l ϵ ∇∇v X x;l ϵ φ X tx;l ⏐
⏐ (
2 t
[ 1 t
( x;l ϵ ) t ⟨ θ3 2θ2
⏐ ∫ ⟩]⏐⏐
⏐ 1 ϵ
ϵ E ∇∇v X tx;l φ X t ∇v2 X s , dWlsϵ ⏐⏐ + ∥∇φ∥∞ √ e |v1 | |v2 | .
ϵ
x;l
≤⏐ ϵ
⏐
lt − l0 1 0 2θ2
(3.22)
By the same argument as in the proof of [43, Lemma 2.5], we obtain
( x;l ϵ ) t ⟨
[ ∫ ⟩]
1 x;l ϵ
lim E ∇∇ X x;l φ X t
ϵ ∇v2 X s , dWls ϵ
ϵ→0 ltϵ − l 0ϵ v1 t
0
[ ∫ t ] (3.23)
1
= E ∇∇v X x;l φ X t ∇v2 X s , dWls .
( x;l ) ⟨ x;l
⟩
lt 1 t 0
On the other hand, from [43, Lemma 2.2], we know that
ϵ)
[ ] [ )]
lim E ∇∇ X x;l ϵ ∇∇ X x;l ϵ φ X tx;l φ X tx;l .
( (
= E ∇∇v X x;l ∇∇v x;l (3.24)
ϵ→0 v2 t v1 t 2 t 1 Xt
Letting in (3.22) ϵ → 0 and using (3.23) and (3.24), completes the proof. □
Because of (3.3), we can use the differentiation theorem for parameter dependent integrals
to get
⏐ ⏐ ⏐ [
|∇v Pt h(x)| = ⏐∇v E[h(X tx )]⏐ = ⏐E ∇∇v X tx h(X tx ) ⏐
]⏐
where gϵ is the density of the normal distribution N (0, ϵ 2 Id ). It is easy to see that h ϵ is smooth,
limϵ→0 h ϵ (x) = h(x), limϵ→0 ∇h ϵ (x) = ∇h(x) and |h ϵ (x)| ≤ C(1 + |x|) for all x ∈ Rd and
some C > 0. Moreover, ∥∇h ϵ ∥ ≤ ∥∇h∥ ≤ 1. Using the differentiability theorem for parameter
dependent integrals we get
[ ] [ ]
∇v2 ∇v1 E h ϵ (X tx ) = E ∇∇v2 ∇v1 X tx h ϵ (X tx ) + E ∇∇v2 X tx ∇∇v1 X tx h ϵ (X tx ) .
[ ]
( x) t ⟨ θ3 2θ2
⏐ [ ∫ ]⏐
1
∇v2 X s , dW Ss ⏐⏐ + ∥∇h ϵ ∥∞ √
⏐ x
⟩ ⏐
≤ ⏐E
⏐ ∇∇v1 X tx h ϵ X t e |v1 | |v2 |
St 0 2θ2
θ3 2θ2
[ ⏐∫ t ⏐]
1 ⏐⏐ ⟨
≤ eθ2 |v1 | E ∇v2 X sx , dW Ss ⏐⏐ + √ e |v1 | |v2 | .
⟩⏐
St 0 ⏐ 2θ2
156
Chen, Deng, Schilling, Xu Stochastic Processes and their Applications 163 (2023) 136–167
θ3 θ3 2θ2
≤ √ e4θ2 |v1 | |v2 | + eθ2 |v1 | Ceθ2 |v2 | t −1/α + √ e |v1 | |v2 |
2 2θ2 2θ2
θ3 θ3
( )
≤ e4θ2 √ + Ct −1/α + √ |v1 | |v2 |
2 2θ2 2θ2
≤ Ce4θ2 t −1/α |v1 | |v2 | .
Finally, we can let ϵ → 0 using dominated convergence,
lim ∇v2 ∇v1 E h ϵ (X tx ) = ∇v2 ∇v1 E h(X tx ) ,
[ ] [ ]
ϵ→0
None.
Acknowledgments
We would like to gratefully thank the associate editor and the two anonymous referees
for their very helpful and constructive comments, in particular for pointing out an elegant
alternative proof for Lemma 2.1, see also Remark 2.2. The research of L. Xu has been
supported in part by NSFC, China No. 12071499, Macao S.A.R. grant FDCT0090/2019/A2 and
University of Macau grant MYRG2020-00039-FST. R.L. Schilling has been supported through
the joint Polish–German NCN–DFG ‘Beethoven 3’ grant (NCN 2018/31/G/ST1/02252; DFG
SCHI 419/11-1). C.-S. Deng is supported by Natural Science Foundation of Hubei Province
of China (2022CFB129). P. Chen is supported by the NSF of Jiangsu Province, China grant
BK20220867 and the Initial Scientific Research Fund of Young Teachers in Nanjing University
of Aeronautics and Astronautics, China (1008-YAH21111).
157
Chen, Deng, Schilling, Xu Stochastic Processes and their Applications 163 (2023) 136–167
with λ1 = 21 θ1 ,
√
Cd,α β(3 − β) dσd−1 Cd,α βσd−1
θ1 |b(0)|β
1−β
q 1 = θ1 β + β K + + +
2(2 − α) α−β
Cd,α σd−1 β
( )1−β (
θ1
)
+ ,
4 α−1
{ )1/β }
and the compact set A1 = x ∈ Rd : |x| ≤ 4θ1 −1 q1
(
.
x
Thus, [22, Theorem 5.1] yields that the process (X t )t≥0 is ergodic, i.e. there exists a unique
invariant probability measure µ such that for all x ∈ Rd and t > 0,
where Pt (x, dz) is the transition function of the process (X tx )t≥0 and ∥ · ∥TV denotes the total
variation norm on the space of signed measures. Furthermore, because of the inequality above
and [22, Theorem 6.1], we have
for suitable constants c1 , c2 > 0. In addition, by Itô’s formula, the integrability of X tx can be
derived directly from the Lyapunov condition (A.3). □
Proof of Proposition 1.7. We show only (1.12), as (1.13) can be proved in the same way.
Denote by P(x, dy) = P(Y1 ∈ dy | Y0 = x). Since V1 (y) ≤ |y| + 1 and
Y1 = x + ηb(x) + Z η ,
159
Chen, Deng, Schilling, Xu Stochastic Processes and their Applications 163 (2023) 136–167
we have
∫ ∫
V1 (y) P(x, dy) ≤ (|y| + 1) p (η, y − x − ηb(x)) dy
Rd ∫R
d
which implies
∫
1
V1 (y) P(x, dy) ≤ η α E|Z 1 | + (1 − 2θ1 η + θ22 η2 )1/2 |x| + 2K η + η|b(0)| + 1
√
Rd
1
≤ (1 − θ1 η) |x| + η α E|Z 1 | + 2K η + η|b(0)| + 1,
√
where the last two inequalities hold because of η < min 1, θ1 θ2−2 , θ1−1 . Hence, we have
{ }
∫
V1 (y) P(x, dy) ≤ λ2 V1 (x) + q2 1 A2 (x)
Rd
with
θ1 θ1 1
λ2 = 1 − η < 1,q2 = 1 + η + η α E|Z 1 | + 2K η + η|b(0)|,
√
2 { 2 √ }
2 1 2 2K − 1 2|b(0)|
d
and the compact set A2 = x ∈ R : |x| ≤ 1 + E|Z 1 |η α + η 2+ .
θ1 θ1 θ1
The proof of irreducibility is standard, see e.g. [20, Appendix A].
We can now use this and [21, Theorem 6.3] to see that the process (Ykx )k≥0 is exponentially
ergodic, i.e. there exists a unique invariant probability µη such that for all x ∈ Rd and t > 0,
sup |E f (Yk ) − µη ( f )| ≤ c1 V1 (x)e−c2 k ,
| f |≤V1
Proof of Lemma 1.8. We show only (1.14) as the inequality (1.15) can be proved in the same
way.
Notice that |y|β ≤ Vβ (y) and
Vβ (Yk+1 ) = Vβ Yk + ηb(Yk ) + Z η
( )
160
Chen, Deng, Schilling, Xu Stochastic Processes and their Applications 163 (2023) 136–167
2−β
where Z η is independent of Yk . Since ∇Vβ (x) = βx(1 + |x|2 )− 2 , (1.7) implies that
∫ η
∇Vβ (Yk + r b(Yk )) , b(Yk ) dr
⟨ ⟩
0
∫ η
β ⟨Yk , b(Yk )⟩ + βr |b(Yk )|2
≤
0
( ) 2−β dr
2 2
1 + |Yk + r b(Yk )|
∫ η (A.4)
β ⟨Yk , b(Yk ) − b(0)⟩ + β|b(0)||Yk | + βr |b(Yk )|2
≤ ) 2−β dr
0
1 + |Yk + r b(Yk )|2 2
(
∫ η
−θ1 β|Yk |2 + β K + β|b(0)||Yk | + βr |b(Yk )|2
≤ ) 2−β dr.
0
1 + |Yk + r b(Yk )|2 2
(
{ }
θ1
One can write by (1.9) and the fact r ≤ η ≤ min 1, 8θ 2 , θ 1
2 1
θ1 β |Yk |2 β|b(0)|2
≤− +
2 ( 2r |b(0)|2
) 2−β
2 θ1
|Yk |2 + θ1
+ 2η2 |b(0)|2 + 1 + 2ηK
+ 2βr |b(0)|2 + β K
( )β
θ1 β 2 2r |b(0)|2 2 2
2
≤− |Yk | + + 2η |b(0)| + 1 + 2ηK + C1
2 θ1
θ1 β
≤− Vβ (Yk ) + C1 ,
2
161
Chen, Deng, Schilling, Xu Stochastic Processes and their Applications 163 (2023) 136–167
where
( )
θ1 β 2r |b(0)|2 β|b(0)|2
C1 := + 2η2 |b(0)|2 + 1 + 2ηK + + 2βr |b(0)|2 + β K .
2 θ1 θ1
In addition, for any y ∈ Rd , Itô’s formula and the inequality (A.2) imply that
⏐E Vβ y + ηb(y) + Z η − Vβ (y + ηb(y)) ⏐
⏐ [ ( ) ]⏐
⏐∫ η ⏐
α/2
E (−∆) Vβ (y + ηb(y) + Z r ) dr ⏐⏐
⏐ [ ] ⏐
=⏐⏐
0
√
E|y + ηb(y) + Z r |β−1
∫ η[ ( )]
Cd,α β(3 − β) dσd−1 1
≤ + Cd,α βσd−1 + dr
0 2(2 − α) α−1 α−β
√
|y|β−1 + ηβ−1 |b(y)|β−1 + E|Z r |β−1
∫ η[ ]
(3 − β) d 1
≤ Cd,α βσd−1 + + dr.
0 2(2 − α) α−1 α−β
⏐E Vβ y + ηb(y) + Z η − Vβ (y + ηb(y)) ⏐
⏐ [ ( ) ]⏐
( √ β−1
(3 − β) dη η 1 + θ2
≤ Cd,α βσd−1 + + η|y|β−1
2(2 − α) α−β α−1
E|Z r |β−1
∫ η )
β−1
+η|b(0)| + dr
0 α−1
( √ β−1
(3 − β) dη η 1 + θ2
≤ Cd,α βσd−1 + + η|y|β−1
2(2 − α) α−β α−1
E|Z 1 |β−1
)
+η|b(0)|β−1 + η ,
α−1
where the first inequality uses (1.9) and the fact that 0 < η < 1. Since Z η is independent of
Yk , we can derive that
Therefore,
θ1
( )
E[Vβ (Yk+1 )] ≤ 1 − η E[Vβ (Yk )] + (C1 + C2 )η,
2
which we can iterate this to get
k (
θ1 k+1 θ1 j
( ) ∑ )
E[Vβ (Yk+1 )] ≤ 1 − η Vβ (x) + (C1 + C2 )η 1− η
2 j=0
2
2(C1 + C2 )
≤ Vβ (x) + .
θ1
Using that Vβ (x) ≤ 1 + |x|β , we finally get
In this section, we assume that µ is the invariant measure of the Ornstein–Uhlenbeck process
on R:
dX t = −X t dt + dZ t , X 0 = x, (B.1)
where Z t is a rotationally symmetric α-stable Lévy process (1 < α < 2), and µ̃η is the invariant
measure of
η1/α
Ỹk+1 = Ỹk − ηỸk + Z̃ k+1 , k = 0, 1, 2, . . . ,
σ
( ∫ )−1
where η ∈ (0, 1), Ỹ0 = x, σ = 2dαα
( )1/α ∞ 1−cos y
with dα = C1,α = 2 0 y α+1
dy , and Z̃ j are
i.i.d. random variables with density
α
p(z) = 1(1,∞) (|z|). (B.2)
2|z|α+1
−t x −α −1 |ξ |α (1−e−αt )
= eiξ e e
−α −1 |ξ |α
[ −1/α ]
−−−→ e = E eiξ α Z1
.
t→∞
Denote by ϕ(ξ ) = E eiξ Z̃ j the characteristic function of the Pareto distribution. Then we have
[ ]
k [ 1/α ]
η
[ ] k+1 ∏ i
E eiξ Ỹk+1 = eiξ (1−η) x E eiξ σ (1−η) Z̃ k+1−i
i=0
k
η
( 1/α )
iξ (1−η)k+1 x
∏
=e ϕ (1 − η) ξ .
i
i=0
σ
Letting k → ∞ and denoting by Ỹη a random variable with distribution µ̃η , we get
∞
η
[ ] ∏ ( 1/α )
E eiξ Ỹη = ϕ (1 − η)i ξ . (B.3)
i=0
σ
For ξ > 0, we have
∫ ∞ ∫ ∞
dz
1 − ϕ(ξ ) = 2 [1 − cos(ξ z)] p(z) dz = α [1 − cos(ξ z)] α+1
1 (∫ ∫ ξ1 z)
∞
α du du
= αξ [1 − cos u] α+1 − [1 − cos u] α+1
0 u 0 u
∫ ξ
du
= σ α ξ α − αξ α [1 − cos u] α+1 .
0 u
Since p(z) is symmetric, cf. (B.2), we have ϕ(ξ ) = ϕ(−ξ ), and so
∫ |ξ |
du
ϕ(ξ ) = 1 − σ α |ξ |α + α|ξ |α [1 − cos u] α+1 .
0 u
for all ξ ∈ R. Since c := inf0<u≤1 (1 − cos u) /u 2 > 0, we get for all |ξ | ≤ 1,
∫ |ξ |
α α α du cα
ϕ(ξ ) ≥ 1 − σ |ξ | + α|ξ | cu 2 α+1 = 1 − σ α |ξ |α + |ξ |2 .
0 u 2−α
Thus, for |ξ | ≤ 1 and 0 < η < 1 ∧ σ α ,
( ⏐α ⏐2 )
η α ⏐η cα ⏐⏐ η1/α
( 1/α ) ⏐ 1/α ⏐
log ϕ (1 − η) ξ ≥ log 1 − σ ⏐ (1 − η) ξ ⏐ + (1 − η) ξ ⏐
i
⏐ i ⏐
⏐ i ⏐
⏐
σ σ 2−α σ ⏐
( )
αi α cα
= log 1 − η(1 − η) |ξ | + η (1 − η) |ξ | .
2/α 2i 2
(2 − α)σ 2
(B.4)
164
Chen, Deng, Schilling, Xu Stochastic Processes and their Applications 163 (2023) 136–167
Observe that
( )
cα
log 1 − x + (2−α)σ 2
x 2/α + x cα
lim = .
x↓0 x 2/α (2 − α)σ 2
Therefore, there is some constant C = C(α, σ ) > 0 such that for small enough x > 0
c1 α
( )
log 1 − x + x 2/α
≥ −x + C x 2/α .
(2 − α)σ 2
If we use this in (B.4), we obtain for all |ξ | ≤ 1 and small enough η > 0,
η
( 1/α )
log ϕ (1 − η)i ξ ≥ −η(1 − η)αi |ξ |α + Cη2/α (1 − η)2i |ξ |2 .
σ
Inserting this into (B.3), we see for all |ξ | ≤ 1 and small enough η > 0
∞
η
[ ] ∑ ( 1/α )
log E e iξ Ỹη
= log ϕ (1 − η) ξ
i
i=0
σ
∞ ∞
≥ −η|ξ |α (1 − η)αi + Cη2/α |ξ |2
∑ ∑
(1 − η)2i
i=0 i=0
η η2/α
= −|ξ |α + C|ξ |2
1 − (1 − η)α 1 − (1 − η)2
1
= − |ξ |α − |ξ |α Ω (η) + |ξ |2 Ω (η2/α−1 ).
α
In the last equality we use that
η α−1 η2/α
[ ]
1 −1 1
lim α
− η = and lim η−2/α+1 = .
η↓0 1 − (1 − η) α 2α η↓0 1 − (1 − η) 2 2
Here and in the following, the notation f (η) = Ω (g(η)) as η ↓ 0 means that limη↓0 g(η) f (η)
is
a positive (finite) constant, where f and g are some positive functions. With the elementary
inequality ex ≥ 1 + x for x ∈ R we see for all |ξ | ≤ 1 and sufficiently small η > 0 that
[ ]
[
iξ Ỹη
] 1 α α 2 2/α−1
E e ≥ exp − |ξ | − |ξ | Ω (η) + |ξ | Ω (η )
α
α
≥ e−|ξ | /α 1 − |ξ |α Ω (η) + |ξ |2 Ω (η2/α−1 ) ,
[ ]
which yields
∫ 1( [ ∫ 1
−1 |ξ |α
] [ −1/α ])
iξ α
iξ Ỹη Z1
e−α −|ξ |α Ω (η) + |ξ |2 Ω (η2/α−1 ) dξ
[ ]
E e −E e dξ ≥
−1 −1
= −Ω (η) + Ω (η2/α−1 ) = Ω (η2/α−1 ).
(B.5)
Define
( )
1 sin x
h(x) := 1{x̸=0} + 1{x=0} , x ∈ R,
M x
where
⏐ ⏐
⏐ x cos x − sin x ⏐
M := sup ⏐ ⏐ ⏐ ∈ (0, ∞).
x∈R\{0} x2 ⏐
165
Chen, Deng, Schilling, Xu Stochastic Processes and their Applications 163 (2023) 136–167
References
[1] V. Bally, D. Talay, The law of the Euler scheme for stochastic differential equations, Probab. Theory Related
Fields 104 (1996) 43–60.
[2] C. Berg, G. Forst, Potential Theory on Locally Compact Abelian Groups, in: Ergebnisse der Mathematik und
ihrer Grenzgebiete. II. Ser., Bd. 87, Springer, Berlin, 1975.
[3] R.M. Blumenthal, R.K. Getoor, Some theorems on stable processes, Trans. Amer. Math. Soc. 95 (1960)
263–273.
[4] B. Böttcher, R.L. Schilling, J. Wang, Lévy-Type Processes: Construction, Approximation and Sample Path
Properties, in: Lecture Notes in Mathematics, Lévy Matters III, vol. 2099, Springer, Cham, 2013.
[5] J.M. Chambers, C.L. Mallows, B. Stuck, A method for simulating stable random variables, J. Amer. Statist.
Assoc. 71 (1976) 340–344.
[6] P. Chen, J. Lu, L. Xu, Approximation to stochastic variance reduced gradient Langevin dynamics by stochastic
delay differential equations, Appl. Math. Optim. 85 (2022) 1–40.
[7] P. Chen, I. Nourdin, L. Xu, X. Yang, Multivariate stable approximation in Wasserstein distance by Stein’s
method, 2019, Preprint arXiv:1911.12917.
[8] P. Chen, L. Xu, Approximation to stable law by the Lindeberg principle, J. Math. Anal. Appl. 480 (2019)
123338.
[9] K. Dareiotis, M. Gerencs, On the regularisation of the noise for the Euler–Maruyama scheme with irregular
drift, Electron. J. Probab. 25 (2020) 1–18.
[10] C.-S. Deng, R.L. Schilling, Y.-H. Song, Subgeometric rates of convergence for Markov processes under
subordination, Adv. Appl. Probab. 49 (2017) 162–181.
[11] W. Fang, M.B. Giles, Adaptive Euler–Maruyama method for SDEs with non-globally Lipschitz drift, in:
International Conference on Monte Carlo and Quasi-Monte Carlo Methods in Scientific Computing, Springer,
Cham, 2016, pp. 217–234.
[12] X. Fang, Q.-M. Shao, L. Xu, Multivariate approximations in Wasserstein distance by Stein’s method and
Bismut’s formula, Probab. Theory Related Fields 174 (2019) 945–979.
[13] P. Hall, Two-sided bounds on the rate of convergence to a stable law, Z. Wahrscheinlichkeitstheor. Verwandte
Geb. 57 (1981) 349–364.
[14] J. Jacod, The Euler scheme for Lévy driven stochastic differential equations: limit theorems, Ann. Probab. 32
(2004) 1830–1872.
[15] A. Janicki, Z. Michna, A. Weron, Approximation of stochastic differential equations driven by α-stable Lévy
motion, Appl. Math. 24 (1996) 149–168.
166
Chen, Deng, Schilling, Xu Stochastic Processes and their Applications 163 (2023) 136–167
[16] X. Jin, G. Pang, L. Xu, X. Xu, An approximation to steady-state of M/Ph/n + M queue, 2021, Preprint ar
Xiv:2109.03623.
[17] F. Kühn, R.L. Schilling, Strong convergence of the Euler–Maruyama approximation for a class of Lévy-driven
SDEs, Stochastic Process. Appl. 129 (2019) 2654–2680.
[18] V. Lemaire, An adaptive scheme for the approximation of dissipative systems, Stochastic Process. Appl. 117
(2007) 1491–1518.
[19] X. Li, Q. Ma, H. Yang, C. Yuan, The numerical invariant measure of stochastic differential equations with
Markovian switching, SIAM J. Numer. Anal. 56 (2018) 1435–1455.
[20] J. Lu, Y. Tan, L. Xu, Central limit theorem and self-normalized Cramér-type moderate deviation for
Euler–Maruyama scheme, Bernoulli 28 (2020) 937–964.
[21] S.P. Meyn, R.L. Tweedie, Stability of Markovian processes I: Criteria for discrete-time chains, Adv. Appl.
Probab. 24 (1992) 542–574.
[22] S.P. Meyn, R.L. Tweedie, Stability of Markovian processes III: Foster–Lyapunov criteria for continuous-time
processes, Adv. Appl. Probab. 25 (1993) 518–548.
[23] R. Mikulevičius, F. Xu, On the rate of convergence of strong Euler approximation for SDEs driven by Lévy
processes, Stochastics 90 (2018) 569–604.
[24] R. Modarres, J.P. Nolan, A method for simulating stable random vectors, Comput. Statist. 9 (1994) 11–19.
[25] T.H. Nguyen, U. Simsekli, G. Richard, Non-asymptotic analysis of Fractional Langevin Monte Carlo for non-
convex optimization, in: International Conference on Machine Learning, in: Proceedings of Machine Learning
Research, 2019, pp. 4810–4819.
[26] J.P. Nolan, An overview of multivariate stable distributions, 2008, Online: http://hdl.handle.net/1961/auisland
ora:68717 (accessed January 12, 2023).
[27] J.P. Nolan, Univariate Stable Distributions: Models for Heavy Tailed Data, in: Springer Series in Operations
Research and Financial Engineering, Springer, Cham, 2020.
[28] J. Norris, Simplified Malliavin calculus, in: Séminaire de Probabilités XX 1984/85, Springer, Berlin, 1986,
pp. 101–130.
[29] D. Nualart, The Malliavin Calculus and Related Topics, second ed., Springer, Berlin, 2006.
[30] G. Pagès, F. Panloup, Unadjusted Langevin algorithm with multiplicative noise: Total variation and wasserstein
bounds, 2020, Preprint arXiv:2012.14310.
[31] O.M. Pamen, D. Taguchi, Strong rate of convergence for the Euler–Maruyama approximation of SDEs with
Hölder continuous drift coefficient, Stochastic Process. Appl. 127 (2017) 2542–2559.
[32] F. Panloup, Recursive computation of the invariant measure of a stochastic differential equation driven by a
Lévy process, Ann. Appl. Probab. 18 (2008) 379–426.
[33] P.E. Protter, Stochastic Integration and Differential Equations, second ed., Springer, Berlin, 2004.
[34] P. Protter, D. Talay, The Euler scheme for Lévy driven stochastic differential equations, Ann. Probab. 25
(1997) 393–423.
[35] J.M. Sanz-Serna, K.C. Zygalakis, Wasserstein distance estimates for the distributions of numerical
approximations to ergodic stochastic differential equations, J. Mach. Learn. Res. 22 (2021) 1–37.
[36] K. Sato, Lévy Processes and Infinitely Divisible Distributions, Cambridge University Press, Cambridge, 1999.
[37] J. Shao, Weak convergence of Euler–Maruyama’s approximation for SDEs under integrability condition, 2018,
Preprint arXiv:1808.07250.
[38] U. Simsekli, L. Sagun, M. Gurbuzbalaban, A tail-index analysis of stochastic gradient noise in deep neural
networks, in: International Conference on Machine Learning, in: Proceedings of Machine Learning Research,
2019, pp. 5827–5837.
[39] D. Talay, Second-order discretization schemes of stochastic differential systems for the computation of the
invariant law, Stochastics 29 (1990) 13–36.
[40] B. Tarami, M. Avaji, Convergence of Euler–Maruyama method for stochastic differential equations driven by
α-stable Lévy motion, J. Math. Ext. 12 (2018) 31–53.
[41] J. Wang, L p -Wasserstein distance for stochastic differential equations driven by Lévy processes, Bernoulli 22
(2016) 1598–1616.
[42] L. Xu, Approximation of stable law in Wasserstein-1 distance by Stein’s method, Ann. Appl. Probab. 29
(2019) 458–504.
[43] X. Zhang, Derivative formulas and gradient estimates for SDEs driven by α-stable processes, Stochastic
Process. Appl. 123 (2013) 1213–1228.
[44] P. Zhou, J. Feng, C. Ma, C. Xiong, S.C.H. Hoi, Towards theoretically understanding why sgd generalizes
better than adam in deep learning, Adv. Neural Inf. Process. Syst. 33 (2020) 21285–21296.
167