Download as pdf or txt
Download as pdf or txt
You are on page 1of 32

Available online at www.sciencedirect.

com

ScienceDirect

Stochastic Processes and their Applications 163 (2023) 136–167


www.elsevier.com/locate/spa

Approximation of the invariant measure of stable SDEs


by an Euler–Maruyama scheme
Peng Chena , Chang-Song Dengb ,∗, René L. Schillingc , Lihu Xud,e
a School of Mathematics, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China
b School of Mathematics and Statistics, Wuhan University, Wuhan 430072, China
c TU Dresden, Fakultät Mathematik, Institut für Mathematische Stochastik, 01062 Dresden, Germany
d Department of Mathematics, Faculty of Science and Technology, University of Macau, Macao Special Administrative
Region of China
e Zhuhai UM Science & Technology Research Institute, Zhuhai, China

Received 14 January 2023; received in revised form 22 May 2023; accepted 7 June 2023
Available online 13 June 2023

Abstract
We propose two Euler–Maruyama (EM) type numerical schemes in order to approximate the invariant
measure of a stochastic differential equation (SDE) driven by an α-stable Lévy process (1 < α < 2): an
approximation scheme with the α-stable distributed noise and a further scheme with Pareto-distributed
noise. Using a discrete version of Duhamel’s principle and Bismut’s formula in Malliavin calculus, we
2
prove that the error bounds in Wasserstein-1 distance are in the order of η1−ϵ and η α −1 , respectively,
where ϵ ∈ (0, 1) is arbitrary and η is the step size of the approximation schemes. For the Pareto-driven
2
scheme, an explicit calculation for Ornstein–Uhlenbeck α-stable process shows that the rate η α −1 cannot
be improved.
© 2023 Elsevier B.V. All rights reserved.

MSC: 60H10; 37M25; 60G51; 60H07; 60H35; 60G52


Keywords: Euler–Maruyama method; Invariant measure; Convergence rate; Wasserstein distance

∗ Corresponding author.
E-mail addresses: chenpengmath@nuaa.edu.cn (P. Chen), dengcs@whu.edu.cn (C.-S. Deng),
rene.schilling@tu-dresden.de (R.L. Schilling), lihuxu@um.edu.mo (L. Xu).

https://doi.org/10.1016/j.spa.2023.06.001
0304-4149/© 2023 Elsevier B.V. All rights reserved.
Chen, Deng, Schilling, Xu Stochastic Processes and their Applications 163 (2023) 136–167

1. Introduction
We study the solution (X t )t≥0 of the following stochastic differential equation (SDE) driven
by an α-stable Lévy process:
dX t = b(X t ) dt + dZ t , X 0 = x, (1.1)
where x ∈ R is the starting point, (Z t )t≥0 is a d-dimensional, rotationally invariant α-stable
d

Lévy process with index α ∈ (1, 2), and b : Rd → Rd is a function satisfying Assumption A.
The Euler–Maruyama (EM) scheme of the SDE (1.1), with a step size η ∈ (0, 1), is defined
by
Y0 = x, Yk+1 = Yk + ηb(Yk ) + (Z (k+1)η − Z kη ), k = 0, 1, 2, . . . , (1.2)
see, e.g. [17,40]. It is easy to see that (Yk )k≥0 is a Markov chain. A drawback of the scheme
(1.2) is that there is no explicit representation for the probability density of α-stable noise
Z (k+1)η − Z kη , α ∈ (1, 2), making the numerical simulation is complicated and numerically
expensive, see the very recent monograph [27, Section 1.9] for a detailed discussion about
the difficulties arising in the multivariate stable distribution simulations. See also [5,24,26] for
sampling stable distributed random variables. In contrast, the Pareto distribution has a simple
probability density and thus can be easily sampled by the classical acceptance and rejection
method. Since the stable and the Pareto distribution have the same tail behavior, and inspired
by the stable central limit theorem (see, e.g. [7,13]), we replace the stable noise in (1.2) with
a Pareto distributed noise, and consider the following EM scheme:
Let Z̃ 1 , Z̃ 2 , . . . be an iid sequence of d-dimensional random vectors, which are Pareto
distributed, i.e.
α
Z̃ 1 ∼ p(z) = 1(1,∞) (|z|); (1.3)
σd−1 |z|α+d
d
we denote by σd−1 = 2π 2 /Γ ( d2 ) the surface area of the unit sphere Sd−1 ⊂ Rd . We will
approximate the SDE (1.1) by the following approximation scheme:
η1/α
Ỹ0 = x, Ỹk+1 = Ỹk + ηb(Ỹk ) + Z̃ k+1 , k = 0, 1, 2, . . . , (1.4)
σ
where η > 0 is the step size, σ α = α/(σd−1 Cd,α ), and
)−1
Γ d+α
(∫ ( )
α dy α−1 −d/2
Cd,α = |ξ | (1 − cos⟨ξ, y⟩) = α2 π 2
), (1.5)
|y|α+d Γ 1 − α2
(
Rd \{0}

see e.g. [4, Example 2.4.d)] and [2, III.18.23]. It is easy to see that (Ỹk )k≥0 is a Markov chain.
We aim to study the error bounds in the Wasserstein-1 distance for the above two schemes,
in particular for large time.

1.1. Motivation, contribution and method

The EM approximation of SDEs is a classical research topic, both in probability theory and
in numerical analysis, and over the past decades there have been many contributions, see for
instance [1,9,11,18,30,37] for SDEs driven by a Brownian motion, and [14,15,17,23,31,34] for
SDEs driven by Lévy noise. Most of these papers focus on error bounds of the solution to the
SDE and the EM approximation in a time interval [0, T ] for some finite T > 0; typically, there
appears a constant C T (depending on T ) in the error bounds, which tends to ∞ as T → ∞.
137
Chen, Deng, Schilling, Xu Stochastic Processes and their Applications 163 (2023) 136–167

The recent use of Langevin samplings in machine learning, has caused a surge of interest
error bounds for the invariant measures of the solution of the SDE and of the EM discretization,
see e.g. [6,16,25,38,44]. We refer the reader to [19,35,39] for discrete schemes for the invariant
measure of SDEs driven by a Brownian motion. Panloup [32] uses certain recursive procedures
to compute the invariant measure of Lévy-driven SDEs, but he does not determine the
convergence rate. To the best of our knowledge, our paper is the first contribution studying
the bound between the invariant measures of solutions to SDEs driven by stable noise and
their EM discretizations.
A further motivation of our research is to show that the EM scheme with Pareto distributed
innovations can indeed be used to approximate the invariant measures of SDEs driven by an α-
stable noise with α ∈ (1, 2). In order to speed up the EM scheme, actual implementations of the
discretization (1.4), use iid random variables ( Z̃ k )k≥1 with Pareto distribution rather than stable
innovations. The advantage of this approach is that the Pareto distribution has an explicitly
given density (see (1.3)) which allows for a much simpler sampling than stable random
variables. We also show that the convergence rate η2/α−1 is optimal for the Ornstein–Uhlenbeck
process on R.
For α = 2, the stable process Z t is (essentially) a d-dimensional standard Brownian motion

and the convergence rate for the corresponding invariant measure is η (up to a logarithmic
2 −1 √
correction), see for instance [12]. Our optimal rate η α will tend to O(1) rather than η as
α ↑ 2; this type of “phase transition” has been observed in many situations, e.g. in the stable
law CLT [13,42]. This is due to the fact that α-stable distributions with α ∈ (0, 2) do not have
second moments, while the 2-stable distribution is the Gaussian law having arbitrary moments.
Our approach in proving the main results is via a discrete version of Duhamel principle
and Bismut’s formula in Malliavin calculus. More precisely, we split the stochastic process
(X t )t≥0 into smaller pieces (X t )(k−1)η≤t≤kη for k ≥ 1 and replace (X t )(k−1)η≤t≤kη with Ỹk and
Yk , respectively. This procedure is reminiscent to Lindeberg’s method for the CLT. In order
to bound the error caused by these replacements, we use the semigroup Pt given by (X t )t≥0
and study its regularity using Malliavin’s calculus for jump processes. In order to bound the
second-order derivative of Pt , we need to adopt the framework of the time-change argument
established in [43] and use the Bismut formula.

1.2. Notation

Whenever we want to emphasize the starting point X 0 = x for a given x ∈ Rd , we will


y y
write X tx instead of X t ; we use this also for Yk and Ỹk for a given y ∈ Rd . By Pt , Q k and
Q̃ k we denote the Markov semigroups of X t , Yk and Ỹk , respectively, i.e.
Pt f (x) = E f (X tx ), Q k f (x) = E f (Ykx ), and Q̃ k f (x) = E f (Ỹkx ).
for a bounded measurable function f : Rd → R, x ∈ Rd , t ≥ 0 and k = 0, 1, 2, . . . .
As usual, C(Rd , R) denote the continuous functions f : Rd → R, and C 2 (Rd , R)
[Cb (Rd , R)] are the twice continuously differentiable functions [which are bounded together
2

with all their derivatives]; ∇ f (x) ∈ Rd and ∇ 2 f (x) ∈ Rd×d are the gradient and the Hessian.
For v, v1 , v2 , x ∈ Rd , the directional derivatives are given by
f (x + εv) − f (x)
∇v f (x) = ⟨∇ f (x), v⟩ = lim ,
ε→0 ε
∇v f (x + εv2 ) − ∇v1 f (x)
∇v2 ∇v1 f (x) = ∇ 2 f (x), v1 v2⊤ HS = lim 1 ,
⟨ ⟩
ε→0 ε
138
Chen, Deng, Schilling, Xu Stochastic Processes and their Applications 163 (2023) 136–167
∑d
where ⟨A, B⟩HS := i, Ai j Bi j for A, B ∈ Rd×d . The Hilbert–Schmidt norm of a matrix
√j=1
d 2
A ∈ Rd×d is ∥A∥HS =

i, j=1 Ai j .
The directional derivatives are similarly defined for (sufficiently smooth) vector-valued
functions f = ( f 1 , f 2 , . . . , f d )⊤ : Rd → Rd : let v, v1 , v2 , x ∈ Rd , then ∇v f (x) =
(∇v f 1 , ∇v , . . . , ∇v f d )⊤ , ∇v2 ∇v1 f (x) = (∇v2 ∇v1 f 1 , . . . , ∇v2 ∇v1 f d )⊤ .
For f ∈ Cb2 (Rd , R), we will use the supremum and the supremum Hilbert–Schmidt norm
∥∇ f ∥∞ = sup |∇ f (x)|, ∥∇ 2 f ∥HS,∞ = sup ∥∇ 2 f (x)∥HS .
x∈Rd x∈Rd

The Wasserstein-1 distance between two probability measures µ1 and µ2 on Rd is defined


as
W1 (µ1 , µ2 ) = inf E|X − Y |, (1.6)
(X,Y )∈C(µ1 ,µ2 )

where C(µ1 , µ2 ) is the set of all coupling realizations of µ1 , µ2 , i.e. all random variables
with values in R2d with marginals µ1 , µ2 . We also have the following dual description of the
Wasserstein distance
W1 (µ1 , µ2 ) = sup |µ1 (h) − µ2 (h)|,
h∈Lip(1)

where Lip(1) = {h : Rd → R; |h(y) − h(x)| ≤ |y − x|} and µi (h) = h(x) µi (dx), i = 1, 2.



R
We will frequently need the following weight function
Vβ (x) = (1 + |x|2 )β/2 , x ∈ Rd , β ≥ 0.
Finally, we write ⌊x⌋ for the largest integer which is less than or equal to x ∈ R, and
throughout Cd,α is the constant (1.5).

1.3. Assumptions and main results

Throughout this paper, we make the following assumption:

Assumption A. The function b : Rd → Rd is twice continuously differentiable and there


exist constants θ1 , θ2 > 0 and θ3 , K ≥ 0 such that
⟨b(x) − b(y), x − y⟩ ≤ −θ1 |x − y|2 + K ∀x, y ∈ Rd (1.7)
and
|∇v b(x)| ≤ θ2 |v| , |∇v1 ∇v2 b(x)| ≤ θ3 |v1 | |v2 | ∀v, v1 , v2 , x ∈ Rd . (1.8)

Remark 1.1. Note that (1.8) immediately implies the following linear growth condition
|b(x) − b(0)| ≤ θ2 |x|, x ∈ Rd . (1.9)
Under Assumption A, we will show that both (X t )t≥0 and (Ỹk )k≥0 are ergodic; we write µ
and µ̃η , respectively, for their invariant measures, see Propositions 1.5 and 1.7. Throughout the
paper the constants C, c1 , c2 , c3 , c4 and λ may depend on θ1 , θ2 , θ3 , K , α, d, |b(0)| and β for
some constant β ∈ [1, α), but we often suppress this in our notation; moreover, the exact values
of the constants may vary from line to line. Our main results are the following two theorems:
139
Chen, Deng, Schilling, Xu Stochastic Processes and their Applications 163 (2023) 136–167

Theorem 1.2. Let (X t )t≥0 and (Ỹk )k≥0 be defined by (1.1) and (1.4) (step size η), and denote
by µ and µ̃η , their invariant measures. Under Assumption A, there exists a constant C such
that the following two statements hold:
(1) For every N ≥ 2 and step size η < min 1, θ1 /(8θ22 ), 1/θ1 , one has
{ }

W1 law(X ηN ), law(Ỹ N ) ≤ C(1 + |x|)η2/α−1 .


( )

(2) For every step size η < min 1, θ1 /θ22 , 1/θ1 , one has
{ }

W1 µ, µ̃η ≤ Cη2/α−1 .
( )

Theorem 1.3. Let (X t )t≥0 and (Yk )k≥0 be defined by (1.1) and (1.2) (step size η), and denote
by µ and µη , their invariant measures. Under Assumption A, for any β ∈ [1, α), there exists
a constant C depending on β such that the following two statements hold:
(1) For every N ≥ 2 and step size η < min 1, θ1 /(8θ22 ), 1/θ1 , one has
{ }

1+ 1 − 1
W1 law(X ηN ), law(Y N ) ≤ C(1 + |x|β )η α β .
( )

(2) For every step size η < min 1, θ1 /θ22 , 1/θ1 , one has
{ }

1+ 1 − 1
W1 µ, µη ≤ Cη α β .
( )

Remark 1.4. The rate η2/α−1 in the first theorem is optimal for the one-dimensional
Ornstein–Uhlenbeck process, see Proposition B.1.
The proofs of Theorems 1.2 and 1.3 are presented in Section 2. In Section 3, we use a
time-change argument and the Bismut formula to prove Lemma 2.1, which is the key to the
proof of our main result. Appendix A includes the proofs of the propositions in this section
for the completeness. Finally, in Appendix B, the exact convergence rate η2/α−1 is reached for
the Ornstein–Uhlenbeck process on R, which shows that the rate in Theorem 1.2 (2) is sharp.

1.4. Auxiliary propositions

Here we collect a few auxiliary properties of (X t )t≥0 and (Yk )k≥0 . The proofs are standard,
but we include them in Appendix A to be self-contained. Recall that Vβ (x) = (1 + |x|2 )β/2 .

Proposition 1.5. Let Assumption A hold and denote by (X t )t≥0 the solution to the SDE (1.1).
Then, (X t )t≥0 admits a unique invariant probability measure µ such that for 1 ≤ β < α
sup ⏐E[ f (X tx )] − µ( f )⏐ ≤ c1 Vβ (x)e−c2 t , t > 0,
⏐ ⏐
(1.10)
| f |≤Vβ

for some constants c1 , c2 > 0. In particular, there exists a constant C > 0 such that
E|X tx |β ≤ C(1 + |x|β ), t > 0. (1.11)

Proposition 1.6. Under Assumption A, there exist for every t > 0 and x, y ∈ Rd constants
C > 0 and λ > 0 such that
y )
W1 law(X tx ), law(X t ) ≤ Ce−λt |x − y|.
(

140
Chen, Deng, Schilling, Xu Stochastic Processes and their Applications 163 (2023) 136–167

Proposition 1.7. Let Assumption A hold and denote by (Yk )k≥0 and (Ỹk )k≥0 the Markov
{ defined by (1.2)
chains and (1.4), respectively. Assume that the step size satisfies η <
min 1, θ1 /θ22 , 1/θ1 . Then
}

(1) the chain (Yk )k≥0 admits a unique invariant measure µη , such that for all x ∈ Rd and
k > 0,
sup |E f (Ykx ) − µη ( f )| ≤ c1 V1 (x)e−c2 k , (1.12)
| f |≤V1

for some constants c1 , c2 > 0.


(2) the chain (Ỹk )k≥0 admits a unique invariant measure µ̃η , such that for all x ∈ Rd and
k > 0,
sup |E f (Ỹkx ) − µ̃η ( f )| ≤ c3 V1 (x)e−c4 k , (1.13)
| f |≤V1

for some constants c3 , c4 > 0.

Lemma 1.8. Let Assumption A hold and denote by (Yk )k≥0 and (Ỹk )k≥0 the
{ Markov}chains
defined by (1.2) and (1.4), respectively. If the step size satisfies η < min 1, 8θθ12 , θ1 , then
2 1
there is a constant C > 0, which is independent of η, such that
E|Ykx |β ≤ C(1 + |x|β ), (1.14)
E|Ỹkx | ≤ C(1 + |x|), (1.15)
hold for any β ∈ [1, α), x ∈ R and k > 0.
d

2. Proof of Theorems 1.2 and 1.3


We begin with several auxiliary lemmas which will be used to prove Theorems 1.2 and 1.3.

2.1. Auxiliary lemmas

The first auxiliary lemma is about the regularity of the semigroup induced by (X t )t≥0 .

Lemma 2.1. Let h ∈ Lip(1) and X tx be the solution to the SDE (1.1). For all vectors
v, v1 , v2 ∈ Rd and t ∈ (0, 1], we have
|∇v Pt h(x)| ≤ eθ2 |v| (2.1)
and
|∇v2 ∇v1 Pt h(x)| ≤ Ct −1/α |v1 | |v2 | , (2.2)
for some constant C > 0.

Remark 2.2. Our proof of Lemma 2.1 in Section 3 is based on Norris’ approach to Malliavin
calculus [28] and Bismut’s formula. One of the referees kindly pointed out an elegant alternative
proof using the approach in [29] and the Bismut–Elworthy–Li formula. Although the alternative
proof is shorter, we decided to keep our own approach, since we refer to exactly these
arguments in several other publications. In the arXiv version of the present paper, we will
add the referee’s alternative proof as an appendix.
141
Chen, Deng, Schilling, Xu Stochastic Processes and their Applications 163 (2023) 136–167

Using the inequalities (1.11) and (1.14), we can obtain the following estimates:

Lemma 2.3. Let (X t )t≥0 be the solution to the SDE { (1.1) and} (Yk )k≥0 be the Markov chains
defined by (1.2). If the step size satisfies η < min 1, 8θθ12 , θ1 , then the following estimates
2 1
hold for all t ∈ (0, 1], β ∈ [1, α):
E|Y1x − x|β ≤ C(1 + |x|β )ηβ/α , (2.3)
β β β/α
E|X tx − x| ≤ C(1 + |x| )t , (2.4)
β+ βα
E|X ηx − Y1x |β ≤ C(1 + |x|β )η . (2.5)

Proof. The first inequality follows immediately from


⏐β
E|Y1x − x|β = E ⏐ηb(x) + Z η ⏐ ≤ 2 ηβ |b(x)|β + E|Z η |β ≤ C(1 + |x|β )ηβ/α .
⏐ [ ]

From the Hölder inequality and (1.11), we obtain


⏐∫ t ⏐β
E|X tx − x|β = E ⏐⏐ b(X sx ) ds + Z t ⏐⏐
⏐ ⏐
0
⏐∫ t ⏐β
≤ 2E ⏐⏐ b(X sx ) ds ⏐⏐ + 2|Z t |β
⏐ ⏐
0
∫ t
β
≤ 2t β−1
E|b(X sx )|β ds + 2E|Z 1 |β t α
0
≤ C(1 + |x|β )t β/α ,
which implies the second inequality.
For the last inequality, the Hölder inequality, (1.9) and (2.4) imply
⏐∫ η ⏐β
x β
x
⏐ x

E|X η − Y1 | = E ⏐ [b(X s ) − b(x)] ds ⏐⏐

0
∫ η
≤η β−1
E|b(X sx ) − b(x)|β ds
0∫
η
≤ θη β−1
E|X sx − x|β ds
0 ∫ η
β
≤ C(1 + |x|β )ηβ−1 s α ds
0
β
≤ C(1 + |x|β )ηβ+ α . □
In order to prove Theorem 1.2, we need the following two lemmas. The first is just an
intermediate step for the proof of the second lemma, which is the key to proving Theorem 1.2.
Notice that the fractional Laplacian operator (−∆)α/2 is the infinitesimal generator of the
rotationally invariant α-stable Lévy process (Z t )t≥0 , which is defined as a principal value (p.v.)
integral: for any f ∈ C 2 (Rd , R),

dy
(−∆)α/2 f (x) = Cd,α · p.v. ( f (x + y) − f (x)) α+d . (2.6)
Rd |y|

142
Chen, Deng, Schilling, Xu Stochastic Processes and their Applications 163 (2023) 136–167

Lemma 2.4. Let α ∈ (1, 2) and f : Rd → R satisfying ∥∇ f ∥∞ < ∞ and ∥∇ 2 f ∥HS,∞ < ∞.
For all x, y ∈ Rd one has

⏐(−∆)α/2 f (x) − (−∆)α/2 f (y)⏐ ≤ Cd,α ∥∇ f ∥HS,∞ σd−1 |x − y|2−α .


⏐ ⏐ 2
(2.7)
(2 − α)(α − 1)

Proof. From the definition of the fractional Laplacian (2.6) and the symmetry of the
representing measure we have for any R > 0
f (x + r θ) − f (x) − r ⟨θ, ∇ f (x)⟩ 1(0,R) (r )
∫ ∫ ∞
α/2
(−∆) f (x) = Cd,α dr dθ
Sd−1 0 r α+1
⟨θ, ∇ f (x + θs) − ∇ f (x)⟩
∫ ∫ R∫ r
= Cd,α ds dr dθ
Sd−1 0 r α+1
∫ ∞0 ∫ r
⟨θ, ∇ f (x + θ s)⟩

+ Cd,α ds dr dθ
Sd−1 R 0 r α+1
Then, for all x, y ∈ Rd ,
⏐(−∆)α/2 f (x) − (−∆)α/2 f (y)⏐
⏐ ⏐

|∇ f (x + θ s) − ∇ f (x) − ∇ f (y + θ s) + ∇ f (y)|
∫ ∫ R∫ r
≤ Cd,α ds dr dθ
Sd−1 0 r α+1
∫ ∞0 ∫ r
|∇ f (x + θ s) − ∇ f (y + θ s)|

+ Cd,α ds dr dθ.
S d−1 R 0 r α+1
For the first integral we have
|∇ f (x + θs) − ∇ f (x) − ∇ f (y + θ s) + ∇ f (y)|
∫ ∫ R∫ r
Cd,α ds dr dθ
Sd−1 0 0 r α+1
|∇ f (x + θ s) − ∇ f (x)| + |∇ f (y + θ s) − ∇ f (y)|
∫ ∫ R∫ r
≤ Cd,α ds dr dθ
Sd−1 0 0 r α+1
Cd,α ∥∇ 2 f ∥HS,∞ σd−1 2−α
∫ ∫ r
s
≤ 2Cd,α ∥∇ 2 f ∥HS,∞ α+1
ds dr dθ = R ,
Sd−1 0 r 2−α
and for the second term we get
|∇ f (x + θ s) − ∇ f (y + θ s)|
∫ ∫ ∞∫ r
Cd,α ds dr dθ
S d−1 R 0 r α+1
∫ ∫ ∞∫ r
|x − y|
≤ Cd,α ∥∇ 2 f ∥HS,∞ ds dr dθ
Sd−1 R 0 r α+1
Cd,α ∥∇ 2 f ∥HS,∞ σd−1
= |x − y|R 1−α .
α−1
Hence, the assertion follows upon taking R = |x − y|. □

Lemma 2.5. Let (X t )t≥0 and (Ỹk )k≥0 be defined by (1.1) and (1.4), respectively. There exists
a constant C > 0 such that for all x ∈ Rd , η ∈ (0, 1), f : Rd → R satisfying ∥∇ f ∥∞ < ∞
and ∥∇ 2 f ∥HS,∞ < ∞,

|Pη f (x) − Q̃ 1 f (x)| ≤ C(1 + |x|) ∥∇ f ∥∞ + ∥∇ 2 f ∥HS,∞ η2/α .


( )

143
Chen, Deng, Schilling, Xu Stochastic Processes and their Applications 163 (2023) 136–167

Proof. From (1.1) and (1.4), we see


η
η1/α
[ ( ∫ ) ( )]
E[ f (X ηx ) − f (Ỹ1 )] = E f x + b(X rx ) dr + Z η − f x + ηb(x) + Z̃
0 σ
= J1 + J2 ,
where
[ ( ∫ η ) ]
b(X r ) dr + Z η − f x + ηb(x) + Z η ,
x
( )
J1 := E f x +
0
J2 := E f x + ηb(x) + Z η − f (x + ηb(x))
[ ( ) ]

η1/α
[ ( ) ]
−E f x + ηb(x) + Z̃ − f (x + ηb(x)) .
σ
We can bound J1 using (1.8) and (2.4) with β = 1:
⏐∫ η ⏐
b(X r ) dr − ηb(x)⏐⏐
⏐ x

|J1 | ≤ ∥∇ f ∥∞ E ⏐⏐
∫ η0
≤ ∥∇ f ∥∞ E|b(X rx ) − b(x)| dr
0∫
η
≤ θ2 ∥∇ f ∥∞ E|X rx − x| dr
0 ∫ η
≤ Cθ2 (1 + |x|)∥∇ f ∥∞ r 1/α dr
0
≤ C(1 + |x|)∥ ∇ f ∥∞ η1+1/α .
For the first term of J2 we use Dynkin’s formula (see e.g. [8]) to get
∫ η
E f x + ηb(x) + Z η − f x + ηb(x) = E (−∆)α/2 f x + ηb(x) + Z r dr.
[ ( ) ( )] [ ( )]
0

For the second part of J2 we use that Cd,α = ασd−1 σ and Taylor’s formula to see
−1 −α

η1/α
[ ( ) ]
E f x + ηb(x) + Z̃ − f x + ηb(x)
( )
σ
η η1/α
1/α [∫ 1 ⟨ ( ) ⟩ ]
= E ∇ f x + ηb(x) + t Z̃ , Z̃ dt
σ 0 σ
η η1/α )
1/α ∫ ∫ 1 ⟨ ⟩
dt dz
α ∇ f x + ηb(x) + tz , z
(
=
σ |z|≥1 0 σ σd−1 |z|α+d
αη
∫ ∫ 1
) ⟩ dt dz
∇ f x + ηb(x) + t z , z
⟨ (
=
σd−1 σ α |z|≥σ −1 η1/α 0 |z|α+d
α/2
= η(−∆) f (x + ηb(x)) − R,
where
∫ ∫ 1 ) ⟩ dt dz
R := ηCd,α ∇ f x + ηb(x) + t z , z .
⟨(
|z|<σ −1 η1/α 0 |z|α+d
Together, the above estimates yield
⏐∫ η ⏐
α/2 α/2
E (−∆) f x + ηb(x) + Z r dr − η(−∆) f (x + ηb(x))⏐⏐ .
⏐ [ ( )] ⏐
|J2 | ≤ |R| + ⏐

0
144
Chen, Deng, Schilling, Xu Stochastic Processes and their Applications 163 (2023) 136–167

Further, we have
1
⏐∫ ∫ ⏐
) ⟩ dt dz ⏐
|R| = ηCd,α ⏐⏐ ∇ f x + ηb(x) + t z − ∇ f x + ηb(x) , z
⏐ ⟨ ( ) (

|z|<σ −1 η1/α 0 |z|α+d ⏐
∫ ∫ 1
≤ ηCd,α ⏐∇ f x + ηb(x) + t z − ∇ f x + ηb(x) ⏐ dt dz
⏐ ( ) ( )⏐
|z|<σ η
−1 1/α 0 |z|α+d−1

1 dz
≤ ηCd,α ∥∇ 2 f ∥HS,∞ α+d−2
≤ C∥∇ 2 f ∥HS,∞ η2/α .
2 |z|<σ η
−1 1/α |z|
By Lemma 2.4, we also have
⏐∫ η ⏐
α/2 α/2
E (−∆) f x + ηb(x) + Z r dr − η(−∆) f (x + ηb(x))⏐⏐
⏐ [ ( )] ⏐


0
∫ η
E ⏐(−∆)α/2 f x + ηb(x) + Z r − (−∆)α/2 f (x + ηb(x))⏐ dr
⏐ ( ) ⏐

0 ∫ η
2
E |Z r |2−α dr
[ ]
≤ C∥∇ f ∥HS,∞
∫0 η
2
E |Z 1 |2−α r 2/α−1 dr
[ ]
= C∥∇ f ∥HS,∞
] 0
≤ CE |Z 1 |2−α ∥∇ 2 f ∥HS,∞ η2/α .
[

The proof follows if we combine all estimates. □


In order to prove Theorem 1.3, we need two more lemmas. The first is just an intermediate
step for the proof of the second lemma, which is the key to proving Theorem 1.3.

Lemma 2.6. Assume that f satisfies ∥∇ f ∥∞ < ∞ and ∥∇ 2 f ∥HS,∞ < ∞. For any β ∈ [1, 2]
and x, y ∈ Rd , we have
|∇ f (x) − ∇ f (y)| ≤ 2∥∇ f ∥∞ + ∥∇ 2 f ∥HS,∞ |x − y|β−1 .
( )

Proof. For |x − y| > 1 we have


|∇ f (x) − ∇ f (y)| ≤ 2∥∇ f ∥∞ ≤ ∥∇ f ∥∞ |x − y|β−1 ,
and for |x − y| ≤ 1 we have
|∇ f (x) − ∇ f (y)| ≤ ∥∇ 2 f ∥HS,∞ |x − y| ≤ ∥∇ 2 f ∥HS,∞ |x − y|β−1 . □

Lemma 2.7. Let (X t )t≥0 and (Yk )k≥0 be defined by (1.1) and (1.2), respectively. There exists
a constant C > 0 such that for all x ∈ Rd , η ∈ (0, 1), β ∈ [1, α) and f : Rd → R satisfying
∥∇ f ∥∞ < ∞ and ∥∇ 2 f ∥HS,∞ < ∞,
) 2+ 1 − 1
|Pη f (x) − Q 1 f (x)| ≤ C(1 + |x|β ) ∥∇ f ∥∞ + ∥∇ 2 f ∥HS,∞ η α β .
(

Proof. We use a Taylor expansion to get


E f (X ηx ) − E f (Y1x )
∫ 1
= E ∇ f (Y1x ), X ηx − Y1x + E ∇ f Y1x + r (X ηx − Y1x ) − ∇ f (Y1x ), X ηx − Y1x dr
⟨ ⟩ ⟨ ( ) ⟩

) 0
= E ∇ f x + ηb(x) + Z η − ∇ f (x + ηb(x)) , X ηx − Y1x
⟨ ( ⟩

145
Chen, Deng, Schilling, Xu Stochastic Processes and their Applications 163 (2023) 136–167

+ E ∇ f (x + ηb(x)) , X ηx − Y1x
⟨ ⟩
∫ 1
∇ f Y1x + r (X ηx − Y1x ) − ∇ f (Y1x ), X ηx − Y1x dr
⟨ ( ) ⟩
+E
0
=: I + II + III.
For the first term I we have
I = E ∇ f x + ηb(x) + Z η − ∇ f (x + ηb(x)) , X ηx − Y1x 1(0,1] (|Z η |) + 1(1,∞) (|Z η |)
⟨ ( ) ⟩( )

=: I1 + I2 .
We need the following estimates for the truncated moment of order λ > α and the tail of the
α-stable random variable Z η and η ≤ 1:
P |Z η | > 1 ≤ cη and E |Z η |λ 1(0,1] (Z η ) ≤ Cη.
( ) [ ]

Both estimates follow from a straightforward calculation using the standard estimate qα (η, x) ≤
Cη/(η1/α + |x|)α+d for the density of Z η , see e.g. [3, Theorem 2.1]. Since β−1 β
> α, we can
use the Hölder inequality and (2.5) to get
|I1 | ≤ E ⏐∇ f x + ηb(x) + Z η − ∇ f (x + ηb(x))⏐ 1(0,1] (|Z η |) ⏐ X ηx − Y1x ⏐
[⏐ ( ) ⏐ ⏐ ⏐]
⏐ ⏐]
≤ ∥∇ 2 f ∥HS,∞ E |Z η |1(0,1] (|Z η |) ⏐ X ηx − Y1x ⏐
[

]) β−1
β ( ⏐β ) β1
( [
β ⏐
2 β−1
≤ ∥∇ f ∥HS,∞ E |Z η | 1(0,1] (|Z η |) E ⏐ X ηx − Y1x ⏐
β−1 1
≤ C(1 + |x|)∥∇ 2 f ∥HS,∞ η β η1+ α
2+ α1 − β1
= C(1 + |x|)∥∇ 2 f ∥HS,∞ η ,
whereas by the Hölder inequality
|I2 | ≤ E ⏐∇ f x + ηb(x) + Z η − ∇ f (x + ηb(x))⏐ 1(1,∞) (|Z η |) ⏐ X ηx − Y1x ⏐
[⏐ ( ) ⏐ ⏐ ⏐]
⏐ ⏐]
≤ 2∥∇ f ∥∞ E 1(1,∞) (|Z η |) ⏐ X ηx − Y1x ⏐
[

) β−1 ( ⏐ ⏐β ) β1
≤ 2∥∇ f ∥∞ E1(1,∞) (|Z η |) β E ⏐ X ηx − Y1x ⏐
(

β−1 1
≤ C(1 + |x|)∥∇ f ∥∞ η β η1+ α
2+ α1 − β1
= C(1 + |x|)∥∇ f ∥∞ η .
Hence, we have
) 2+ 1 − 1
|I| ≤ C(1 + |x|) ∥∇ f ∥∞ + ∥∇ 2 f ∥HS,∞ η α β .
(

For II we use Itô’s formula and the definitions (1.1), (1.2) of X ηx and Y1x to see
⟨ ∫ η ⟩
II = E ∇ f (x + ηb(x)) , b(X sx ) − b(x) ds
[ ]

⟨ ∫ η0 ⟩
= ∇ f (x + ηb(x)) , E b(X sx ) − b(x) ds
[ ]

⟨ ∫0 η ∫ s [ ⟩
α
]
= ∇ f (x + ηb(x)) , x x x
⟨ ⟩
E ∇b(X r ), b(X r ) + (−∆) b(X r ) dr ds
2
0 0
≤ C∥∇ f ∥∞ (1 + |x|)η2 .
146
Chen, Deng, Schilling, Xu Stochastic Processes and their Applications 163 (2023) 136–167

In the last inequality we use the estimate (1.9) (for b(xrx )) and Lemma 2.4 (for (−∆)α/2 b(X rx )),
combined with the moment estimate (2.4).
Finally, III is estimated by Lemma 2.6 and (2.5),
) ⏐ ⏐β
III ≤ C ∥∇ f ∥∞ + ∥∇ 2 f ∥HS,∞ E ⏐ X ηx − Y1x ⏐
(
β
≤ C(1 + |x|β ) ∥∇ f ∥∞ + ∥∇ 2 f ∥HS,∞ ηβ+ α
( )
) 2+ 1 − 1
≤ C(1 + |x|β ) ∥∇ f ∥∞ + ∥∇ 2 f ∥HS,∞ η α β .
(

This finishes the proof. □

2.2. Proof of Theorems 1.2 and 1.3

Proof of Theorem 1.2. Thanks to the discrete version of the classical Duhamel principle, it
is easy to check that for h ∈ Lip(1)
N
∑ ( )
PN η h(x) − Q̃ N h(x) = Q̃ i−1 Pη − Q̃ 1 P(N −i)η h(x). (2.8)
i=1

Then we have
( )
W1 law(X ηN ), law(Ỹ N ) = sup |PN η h(x) − Q̃ N h(x)|
h∈Lip(1)
N −1
∑ ⏐ ⏐
( )
≤ sup ⏐ Q̃ i−1 Pη − Q̃ 1 P(N −i)η h(x)⏐ (2.9)
⏐ ⏐
i=1 h∈Lip(1)
⏐ ⏐
+ sup ⏐ Q̃ N −1 (Pη − Q̃ 1 )h(x)⏐ .
⏐ ⏐
h∈Lip(1)

First, we bound the last term. By (2.4) with β = 1, (1.4) and (1.8), for h ∈ Lip(1) and η < 1,
⏐ ⏐ ⏐ ⏐
⏐(Pη − Q̃ 1 )h(x)⏐ = ⏐Eh(X ηx ) − Eh(Ỹ1 )⏐
⏐ ⏐ ⏐ ⏐
⏐ ⏐ ⏐⏐ ⏐
≤ ⏐Eh(X ηx ) − h(x)⏐ + ⏐Eh(Ỹ1 ) − h(x)⏐

≤ E|X ηx − x| + E|Ỹ1 − x|
≤ C(1 + |x|) η1/α + η |b(x)| + σ −1 η1/α E| Z̃ 1 |
≤ C(1 + |x|) η1/α + η1/α (|b(0)| + θ2 |x|) + σ −1 η1/α E| Z̃ 1 |
≤ C(1 + |x|) η1/α .
Together with (1.15) we get
⏐ ⏐
sup ⏐ Q̃ N −1 (Pη − Q̃ 1 )h(x)⏐ ≤ C(1 + E|Ỹ Nx −1 |)η1/α ≤ C(1 + |x|)η2/α−1 . (2.10)
⏐ ⏐
h∈Lip(1)

Next, we bound the first term in (2.9); we distinguish between two cases:
Case 1: N ≤ η−1 + 1. By Lemmas 2.5 and 2.1,
⏐( ⏐
⏐ Pη − Q̃ 1 P(N −i)η h(x)⏐ ≤ C(1 + |x|) ∥∇ P(N −i)η h∥∞ + ∥∇ 2 P(N −i)η h∥HS,∞ η2/α
⏐ ) ⏐ ( )

≤ C(1 + |x|)[(N − i)η]−1/α η2/α .


147
Chen, Deng, Schilling, Xu Stochastic Processes and their Applications 163 (2023) 136–167

Combining this with (1.15), we get


⏐ ⏐
x
|)[(N − i)η]−1/α η2/α
( )
sup ⏐Q i−1 Pη − Q̃ 1 P(N −i)η h(x)⏐ ≤ C(1 + E|Ỹi−1
⏐ ⏐
h∈Lip(1) (2.11)
≤ C(1 + |x|)[(N − i)η] η
−1/α 2/α
.
Since N − 1 ≤ η , −1

N −1
∑ N −1
∑ ∫ N −1
[(N − i)η]−1/α = η−1/α i −1/α ≤ η−1/α r −1/α dr
i=1 i=1 0
α α
= η−1/α (N − 1)−1/α+1 ≤ η−1 .
α−1 α−1
This gives the upper bound
N −1
∑ ⏐ ⏐ N −1

sup ⏐ Q̃ i−1 Pη − Q̃ 1 P(N −i)η h(x)⏐ ≤ C(1 + |x|)η2/α [(N − i)η]−1/α
⏐ ( ) ⏐
i=1 h∈Lip(1) i=1
α
≤C (1 + |x|)η2/α−1 .
α−1

Case 2: N > η−1 + 1. By Proposition 1.6, for any x, y ∈ Rd , there exist constants C > 0 and
λ > 0 such that
|Pt h(x) − Pt h(y)| ≤ Ce−λt |x − y|, h ∈ Lip(1), t ≥ 0.
This implies that
⏐ ( ) ⏐ ⏐ ( ) ⏐
sup ⏐Q i−1 Pη − Q̃ 1 P(N −i)η h(x)⏐ = sup ⏐ Q̃ i−1 Pη − Q̃ 1 P1 P(N −i)η−1 h(x)⏐
⏐ ⏐ ⏐ ⏐
h∈Lip(1) h∈Lip(1)
⏐ ⏐
≤ Ce−λ[(N −i)η−1] sup ⏐ Q̃ i−1 Pη − Q̃ 1 P1 g(x)⏐ ,
⏐ ( ) ⏐
g∈Lip(1)

where i ≤ ⌊N − η ⌋. By Lemmas 2.5 and 2.1,


−1
⏐( ⏐
⏐ Pη − Q̃ 1 P1 g(x)⏐ ≤ C(1 + |x|) ∥∇ P1 g∥∞ + ∥∇ 2 P1 g∥HS,∞ η2/α ≤ C(1 + |x|) η2/α .
⏐ ) ⏐ ( )

Combining this with (1.15), we get


⌊N −η−1 ⌋ ⏐ ⏐ ⌊N −η−1 ⌋ ( )
∑ ∑
2/α x
e−λ[(N −i)η−1] 1 + E|Ỹi−1
( )
sup ⏐ Q̃ i−1 Pη − Q̃ 1 P(N −i)η h(x)⏐ ≤ Cη |
⏐ ⏐
i=1 h∈Lip(1) i=1
⌊N −η−1 ⌋

≤ C(1 + |x|)η 2/α
e−λ[(N −i)η−1] .
i=1

Observe that
⌊N −η−1 ⌋ N −1 ∫ N −1
λ
∑ ∑
−λ[(N −i)η−1] −λ(iη−1)
e = e ≤e e−ληr dr
i=1 ⌊η−1 ⌋−1
i=⌊η−1 ⌋
∫ ∞
λ −1
≤e η e−λr dr = λ−1 eλ η−1 .
0

148
Chen, Deng, Schilling, Xu Stochastic Processes and their Applications 163 (2023) 136–167

Thus, we get
⌊N −η−1 ⌋ ⏐ ⏐
sup ⏐ Q̃ i−1 Pη − Q̃ 1 P(N −i)η h(x)⏐ ≤ Cλ−1 eλ (1 + |x|)η2/α−1 .
∑ ⏐ ( ) ⏐
i=1 h∈Lip(1)

For i ≥ ⌊N − η−1 ⌋ + 1, by almost the same as the calculation in the first case, we find
N −1
∑ α
[(N − i)η]−1/α ≤ η−1 .
α−1
i=⌊N −η−1 ⌋+1

Combining this with (2.11), we obtain


N −1
∑ ⏐ ⏐
( )
sup ⏐ Q̃ i−1 Pη − Q̃ 1 P(N −i)η h(x)⏐
⏐ ⏐
h∈Lip(1)
i=⌊N −η−1 ⌋+1
N −1
∑ α
≤ C(1 + |x|) η2/α [(N − i)η]−1/α ≤ C (1 + |x|) η2/α−1 .
α−1
i=⌊N −η−1 ⌋+1

∑⌊N −η−1 ⌋
. . . and . . . . Adding them up we
∑ N −1
We have just shown estimates for i=1 i=⌊N −η−1 ⌋+1
arrive at
N −1
∑ ⏐ ⏐
( )
sup ⏐ Q̃ i−1 Pη − Q̃ 1 P(N −i)η h(x)⏐
⏐ ⏐
i=1 h∈Lip(1)
⎛ ⎞
⌊N −η−1 ⌋ N −1 ⏐ ⏐
∑ ∑
⎠ sup ⏐⏐ Q̃ i−1 Pη − Q̃ 1 P(N −i)η h(x)⏐⏐ ≤ C(1 + |x|)η2/α−1 .
( )
=⎝ +
i=1 h∈Lip(1)
i=⌊N −η−1 ⌋+1

Both Case 1 and Case 2 lead to an estimate of the form


N −1
∑ ⏐ ⏐
sup ⏐ Q̃ i−1 Pη − Q̃ 1 P(N −i)η h(x)⏐ ≤ C(1 + |x|) η2/α−1 .
⏐ ( ) ⏐
i=1 h∈Lip(1)

Substituting this and (2.10) into (2.9), the first assertion of Theorem 1.2 follows.
It remains to prove Part (2). It is easy to see from (1.10) and (1.13), that we have

lim W1 µ, law(X ηk ) = lim W1 law(Ỹk ), µ̃η = 0.


( ) ( )
k→∞ k→∞

By the triangle inequality and Part (1) with x = 0,

W1 µ, µ̃η ≤ W1 µ, law(X ηN ) + W1 law(X ηN ), law(Ỹ N ) + W1 law(Ỹ N ), µ̃η


( ) ( ) ( ) ( )

≤ W1 µ, law(X ηN ) + C η2/α−1 + W1 law(Ỹ N ), µ̃η .


( ) ( )

Letting N → ∞ finishes the proof. □

Proof of Theorem 1.3. In the above proof, replacing Lemma 2.5 and (1.13) with Lemma 2.7
and (1.12), the proof of Theorem 1.3 is similar to that of Theorem 1.2. □
149
Chen, Deng, Schilling, Xu Stochastic Processes and their Applications 163 (2023) 136–167

3. Malliavin calculus and the proof of Lemma 2.1


3.1. Jacobi flow associated with the SDE (1.1)

The Jacobian flow is the derivative of X tx with respect to the initial value x; the Jacobian
flow in direction v ∈ Rd is defined by
X tx+ϵv − X tx
∇v X tx := lim , t ≥ 0.
ϵ→0 ϵ
This limit exists and satisfies
d
∇v X tx = ∇∇v X tx b(X tx ), ∇v X 0x = v. (3.1)
dt
Similarly, for v1 , v2 ∈ Rd , we can define ∇v2 ∇v1 X tx , which satisfies
d
∇v ∇v X x = ∇∇v2 ∇v1 X tx b(X tx ) + ∇∇v2 X tx ∇∇v1 X tx b(X tx ), ∇v1 ∇v1 X 0x = 0. (3.2)
dt 2 1 t
Then, we first have the following estimates of ∇v1 X tx and ∇v2 ∇v1 X tx .

Lemma 3.1. For any starting point x ∈ Rd and all directions v1 , v2 ∈ Rd the following
(deterministic) estimates hold:
|∇v1 X tx | ≤ eθ2 |v1 | , t ∈ (0, 1], (3.3)
θ3
|∇v2 ∇v1 X tx | ≤ √ e4θ2 |v1 | |v2 | , t ∈ (0, 1]. (3.4)
2 2θ2

Proof. By (3.1) and (1.8), we have


d ⟨ ⟩
|∇v1 X tx |2 = 2 ∇v1 X tx , ∇∇v1 X tx b(X tx ) ≤ 2θ2 |∇v1 X tx |2 ,
dt
and Gronwall’s inequality yields for t ∈ (0, 1]
|∇v1 X tx |2 ≤ e2θ2 t |v1 |2 ≤ e2θ2 |v1 |2 .
This proves the first assertion. Writing ζ (t) := ∇v2 ∇v1 X tx , we see from (3.2), (1.8), (3.3), the
Cauchy–Schwarz inequality, and the elementary estimate 2AB ≤ A2 + B 2 that
d ⟨ ⟩
|ζ (t)|2 = 2 ζ (t), ∇ζ (t) b(X tx ) + 2 ζ (t), ∇∇v2 X tx ∇∇v1 X tx b(X tx )
⟨ ⟩
dt
≤ 2θ2 |ζ (t)|2 + 2θ3 e2θ2 |v1 | |v2 | |ζ (t)|
θ32 4θ2
≤ 4θ2 |ζ (t)|2 + e |v1 |2 |v2 |2 .
2θ2
Since ζ (0) = 0, we can use again Gronwall’s inequality and get for all t ∈ (0, 1]
θ32 4θ2 θ2
∫ t
2
|ζ (t)| ≤ 2
e |v1 | |v2 | 2
e4θ2 (t−s) ds ≤ 32 e8θ2 |v1 |2 |v2 |2 . □
2θ2 0 8θ2

3.2. Bismut’s formula


∫t
2
(See also [28]). Let u ∈ L loc ([0, ∞) × (Ω , F, P); Rd ), i.e. we have E 0 |u(s)|2 ds < ∞
for all t > 0. Let {Wt }t≥0 be a d-dimensional standard Brownian motion and assume that u is
150
Chen, Deng, Schilling, Xu Stochastic Processes and their Applications 163 (2023) 136–167

adapted to the filtration (Ft )t≥0 with Ft := σ (Ws : 0 ≤ s ≤ t); i.e. u(t) is Ft measurable for
t ≥ 0. Define
∫ •
U= u(s) ds. (3.5)
0

For a t > 0, let Ft : C([0, t], Rd ) → Rm be an Ft measurable map. If the following limit
exists
Ft (W + ϵU ) − Ft (W )
DU Ft (W ) = lim
ϵ→0 ϵ
in L ((Ω , F, P); R ), then Ft (W ) is said to be m-dimensional Malliavin differentiable and
2 m

DU Ft (W ) is called the Malliavin derivative of Ft (W ) in the direction U .


Let φ ∈ Cb2 (Rd , R) and both Ft (W ) and G t (W ) be d-dimensional Malliavin differentiable
functionals. Then we have the following product and chain rules:
( )
DU ⟨Ft (W ), G t (W )⟩ = ⟨DU Ft (W ), G t (W )⟩ + ⟨Ft (W ), DU G t (W )⟩,
and
DU ∇φ(Ft (W )) = ∇ DU Ft (W ) ∇φ(Ft (W )).
The following integration by parts formula is often called Bismut’s formula. For a Malliavin
differentiable Ft (W ) such that Ft (W ), DU Ft (W ) ∈ L 2 ((Ω , F, P); R), we have
[ ∫ t ]
E[DU Ft (W )] = E Ft (W ) ⟨u(s), dWs ⟩ . (3.6)
0

3.3. Time-change method for the SDE (1.1)

(See also [43]). We will now turn to the SDE driven by a rotationally invariant α-stable Lévy
process with α ∈ (1, 2). We can express such drivers as subordinated Brownian motion. More
precisely, let {St }t≥0 be an independent α2 -stable subordinator. Then, Z t := W St is a rotationally
invariant α-stable Lévy process, see e.g. [36]. This means that we can re-write (1.1) as
dX t = b(X t ) dt + dW St , X 0 = x. (3.7)
d
Let W be the space of all continuous functions from [0, ∞) to R vanishing at t = 0;
we equip W with the topology of locally uniform convergence, and the Wiener measure µW ;
therefore, the coordinate process
Wt (w) = wt
are a standard d-dimensional Brownian motion. Let S be the space of all increasing, càdlàg
(right continuous with finite left limits) functions from [0, ∞) to [0, ∞) vanishing at t = 0;
we equip S with the Skorohod metric and the probability measure µS so that for any l ∈ S the
coordinate process
St (l) := lt
α
is an 2
-stable subordinator. On the product measure space
(Ω , F, P) := (W × S, B(W) ⊗ B(S), µW × µS ) ,
we define
L t (w, l) := wlt .
151
Chen, Deng, Schilling, Xu Stochastic Processes and their Applications 163 (2023) 136–167

The process {L t }t≥0 is a rotationally invariant α-stable Lévy process on (Ω , F, P). We will
use the following two natural filtrations associated with the Lévy process L t and the Brownian
motion Wt :
Ft := σ {L s (w, l); s ≤ t} and FtW := σ {Ws (w); s ≤ t} .
In particular, we can regard the solution X tx of the SDE (3.7) as an (Ft )-adapted functional on
Ω , and therefore,
∫ ∫
E f X tx = f X tx (wl ) µW (dw) µS (dl).
( ) ( )
S W

For every fixed l ∈ S, we denote by X tl the solution to the SDE


dX tl = b(X tl ) dt + dWlt , X 0l = x. (3.8)
We will now fix a path l ∈ S, and consider the SDE (3.8). Unless otherwise mentioned, all
expectations are taken with respect to the Wiener space (W, B(W), µW ). First of all, notice
that t → Wlt is a centered Gaussian process with independent increments. In particular, Wlt is
a càdlàg FlW
t
-martingale. Thus, under Assumption A, it is well known that for each x ∈ Rd ,
the SDE (3.8) admits a unique càdlàg FlWt
-adapted solution X tx;l , see e.g. [33, p.249, Theorem
6].
The main aim of this section is to establish the following result:

Lemma 3.2. Under Assumption A one has for all functions φ ∈ Cb2 (Rd , R), all directions
v1 , v2 ∈ Rd and x ∈ Rd , t ∈ (0, 1]
|∇v1 X tx;l | ≤ eθ2 |v1 | (3.9)
and
⏐ [ )]⏐⏐
⏐E ∇∇v X x;l ∇∇v X x;l φ X tx;l ⏐
⏐ (
⏐ 2[ t 1 t
( x;l ) t ⟨ θ3 2θ2
∫ ]⏐
1
x;l φ X t ∇v2 X s , dWls ⏐⏐ + ∥∇φ∥∞ √ e |v1 | |v2 | ,
⏐ x;l
⟩ ⏐
≤ ⏐E
⏐ ∇
lt ∇v1 X t 0 2θ2
where ∇vi X tx;l (i = 1, 2) is determined by the following linear equation:
d
∇v X x;l = ∇∇v X x;l b(X tx;l ), ∇vi X 0x;l = vi . (3.10)
dt i t i t

In order to prove Lemma 3.2, we use a time-change argument to transform the SDE (3.8)
into an SDE driven by a standard Brownian motion; this allows us to use Bismut’s formula
(3.6). For every ϵ ∈ (0, 1) we define
∫ 1
1 t+ϵ

ϵ
lt := ls ds + ϵt = lϵs+t ds + ϵt.
ϵ t 0
Since t ↦→ lt is increasing and right continuous, it follows that for each t ≥ 0,
ltϵ ↓ lt as ϵ ↓ 0.
Moreover, t ↦→ ltϵ is absolutely continuous and strictly increasing. Let γ ϵ be the inverse
function of l ϵ , i.e.
lγϵ ϵ = t, t ≥ l0ϵ and γlϵϵ = t, t ≥ 0.
t t
152
Chen, Deng, Schilling, Xu Stochastic Processes and their Applications 163 (2023) 136–167
ϵ
By definition, γtϵ is absolutely continuous on [l0ϵ , ∞). Let X tx;l be the solution to the SDE
ϵ ϵ ϵ
dX tx;l = b(X tx;l ) dt + dWltϵ −l0ϵ , X 0x;l = x. (3.11)
Let us now define
ϵ ϵ
Ytx;l := X γx;lε , t ≥ l0ϵ .
t

Changing variables in (3.11) we see that for t ≥ l0ϵ ,


∫ γtϵ ( ∫ t (
ϵ ϵ ϵ
) )
Ytx;l = x + b X sx;l ds + Wt−l0ϵ = x + b Ysx;l γ̇sϵ ds + Wt−l0ϵ (3.12)
0 l0ϵ

(γ̇sϵ denotes the derivative in s). Hence, for any vector v ∈ Rd , we have
∫ t
ϵ ϵ
( )
∇v Ytx;l = v + ∇∇ Y x;l ϵ b Ysx;l γ̇sϵ ds, (3.13)
v s
l0ϵ

and the differential form can be written as


d ϵ
( ϵ
) ϵ
∇v Ytx;l = ∇b Ytx;l γ̇tϵ ∇v Ytx;l , t ≥ l0ϵ ,
dt
which has a solution of the form
ϵ ϵ
ϵ ,t v,
∇v Ytx;l = Jlx;l (3.14)
0

involving a matrix exponential


[∫ t ]
x;l ϵ ϵ
( )
Js,t = exp ∇b Ysx;l γ̇sϵ ds , l0ϵ ≤ s ≤ t < ∞. (3.15)
s
ϵ ϵ ϵ
x;l
It is easy to see that Js,t Jlx;l
ϵ = Jlx;l
ϵ for all l0ϵ ≤ s ≤ t < ∞.
0 ,s 0 ,t
Now, we come back to the Malliavin calculus from Section 3.2. Fixing t ≥ l0ϵ and x ∈ Rd ,
ϵ
the solution Ytx;l is a d-dimensional functional of Brownian motion {Ws }l0ϵ ≤s≤t .
ϵ
Let U be as in Section 3.2. The Malliavin derivative of Ytx;l in direction U exists in
L 2 (W, B(W), µW ); Rd and is given by
( )

ϵ ϵ
ϵ Ytx;l (W + δU ) − Ytx;l (W )
DU Ytx;l (W ) = lim .
δ→0 δ
ϵ ϵ ϵ
To simplify notation, we drop the W in DU Ytx;l (W ) and write DU Ytx;l = DU Ytx;l (W ). By
(3.12), it satisfies the equation
∫ t(
ϵ ϵ
( ) )
DU Ytx;l = ∇ D Y x;l ϵ b Ysx;l γ̇sϵ + u(s) ds,
l0ϵ U s

the differential form of the above equation can be written as


d ϵ
( ϵ
) ϵ
DU Ytx;l = ∇b Ytx;l γ̇tϵ DU Ytx;l + u(t), t ≥ l0ϵ ,
dt
and this equation has a unique solution which is given via the matrix exponential (3.15):
∫ t
ϵ x;l ϵ
DU Ytx;l = Js,t u(s) ds. (3.16)
l0ϵ
153
Chen, Deng, Schilling, Xu Stochastic Processes and their Applications 163 (2023) 136–167

For a fixed t > 0, for any v1 , v2 , x ∈ Rd , we define u i , Ui : [l0ϵ , t] → Rd by


∫ s
1 x;l ϵ
u i (s) := ∇v Y , Ui;s := u i (r ) dr (3.17)
t − l0ϵ i s 0

for l0ϵ ≤ s ≤ t and i = 1, 2. Then


ϵ s − l0ϵ ϵ
DUi Ysx;l = ∇v Y x;l , l0ϵ ≤ s ≤ t. (3.18)
t − l0ϵ i s
In addition, (3.13) implies that for s ∈ [l0ϵ , t]
∫ s(
x;l ϵ ϵ ϵ
( ) ( ))
DU2 ∇v1 Ys = ∇ D Y x;l ϵ ∇∇ Y x;l ϵ b Yrx;l + ∇ D x;l ϵ b Yrx;l γ̇rϵ dr. (3.19)
v1 r U2 ∇v1 Yr
l0ϵ U2 r

The following lemma contains the upper bounds on the derivatives.

Lemma 3.3. Let v1 , v2 , x ∈ Rd and t ∈ (0, 1]. Then,


ϵ ϵ
|∇vi Ysx;l | ≤ eθ2 γs |vi | (3.20)
and
ϵ θ3 2θ2 γsϵ √ ϵ
|DU2 ∇v1 Ysx;l | ≤ √ e γs |v1 | |v2 | . (3.21)
2θ2
for l0ϵ ≤ s ≤ t and i = 1, 2.

Proof. Recall that θ1 > 0 and γ̇sϵ ≥ 0. By (3.13) and (1.8), we have for any l0ϵ ≤ s ≤ t
d ϵ 2
⟨ ϵ
( ϵ
)⟩ ϵ 2
|∇vi Ysx;l | = 2γ̇sϵ ∇vi Ysx;l , ∇∇ Y x;l ϵ b Ysx;l ≤ 2θ2 γ̇sϵ |∇vi Ysx;l | ,
ds vi s

and this implies, because of Gronwall’s lemma,


[ ]
2θ2 (γsϵ −γ ϵϵ )
∫ s
x;l ϵ 2 ϵ ϵ
|∇vi Ys | ≤ exp 2θ2 γ̇r dr |vi |2 = e l0
|vi |2 = e2θ2 γs |vi |2 .
l0ϵ

Using (3.19) and (1.8) we find for any l0ϵ ≤ s ≤ t


d ϵ 2
⟨ ϵ
( ϵ
)⟩
|DU2 ∇v1 Ysx;l | = 2γ̇sϵ DU2 ∇v1 Ysx;l , ∇ D ∇ Y x;l ϵ b Ysx;l
ds U2 v1 s

x;l ϵ ϵ
⟨ ( )⟩
ϵ
+ 2γ̇s DU2 ∇v1 Ys , ∇ D Y x;l ϵ ∇∇ Y x;l ϵ b Ysx;l
U2 s v1 s
⏐ ϵ
⏐2 ⏐ ϵ⏐⏐
⏐⏐ ϵ⏐⏐
⏐⏐ ϵ⏐

≤ 2θ2 γ̇sϵ ⏐DU2 ∇v1 Ysx;l ⏐ + 2θ3 γ̇sϵ ⏐DU2 ∇v1 Ysx;l ⏐ ⏐DU2 Ysx;l ⏐ ⏐∇v1 Ysx;l ⏐ .
⏐ ⏐ ⏐

Now (3.18) and (3.20) imply


d ⏐⏐ ϵ ⏐2
⏐ ⏐ ϵ ⏐2
⏐ ⏐ ϵ⏐⏐
⏐⏐ ϵ⏐⏐
⏐⏐ ϵ⏐

⏐DU2 ∇v1 Ysx;l ⏐ ≤ 2θ2 γ̇sϵ ⏐DU2 ∇v1 Ysx;l ⏐ + 2θ3 γ̇sϵ ⏐DU2 ∇v1 Ysx;l ⏐ ⏐∇v2 Ysx;l ⏐ ⏐∇v1 Ysx;l ⏐
⏐ ⏐
ds
⏐ ϵ ⏐2
⏐ θ2 ⏐ ϵ ⏐2 ⏐
⏐ ⏐ ϵ ⏐2

≤ 4θ2 γ̇sϵ ⏐DU2 ∇v1 Ysx;l ⏐ + 3 γ̇sϵ ⏐∇v2 Ysx;l ⏐ ⏐∇v1 Ysx;l ⏐
⏐ ⏐
2θ2
⏐ ϵ
⏐2 θ2 ϵ
≤ 4θ2 γ̇sϵ ⏐DU2 ∇v1 Ysx;l ⏐ + 3 γ̇sϵ e4θ2 γs |v1 |2 |v2 |2 .
⏐ ⏐
2θ2
154
Chen, Deng, Schilling, Xu Stochastic Processes and their Applications 163 (2023) 136–167
ϵ
Since DU2 ∇v1 Ylx;l
ϵ = 0, we can use Gronwall’s inequality to see
0

θ2 θ2
∫ s
ϵ ⏐2 ϵ ϵ ϵ ϵ
⏐ ⏐
⏐DU2 ∇v1 Ysx;l ⏐ ≤ 3 |v1 |2 |v2 |2 γ̇uϵ e4θ2 γu e4θ2 (γs −γu ) du = 3 |v1 |2 |v2 |2 e4θ2 γs γsϵ . □

2θ2 ϵ
l0 2θ 2

Proof of Lemma 3.2. From (1.8) we see that


d ⏐⏐ ⏐2 ⟨ ⟩ ⏐2
∇v1 X tx;l ⏐ = 2 ∇v1 X tx;l , ∇∇v X x;l b(X tx;l ) ≤ 2θ2 ⏐∇v1 X tx;l ⏐ .

dt 1 t

This yields for all t ∈ (0, 1]


⏐∇v X x;l ⏐2 ≤ e2θ2 t |v1 |2 ≤ e2θ2 |v1 |2 ,
⏐ ⏐
1 t

i.e. (3.9) holds.


Using (3.18) with s = t and i = 2, the chain rule and Bismut’s formula (3.6), we see that
ϵ)
[ ]
E ∇∇ Y x;l ϵ ∇∇ Y x;l ϵ φ Ytx;l
(
v2 t v1 t
ϵ)
[ ]
= E ∇ D Y x;l ϵ ∇∇ Y x;l ϵ φ Ytx;l
(
U2 t v1 t
ϵ) ϵ)
[ ( )] [ ]
= E DU2 ∇∇ Y x;l ϵ φ Ytx;l − E ∇ D ∇ Y x;l ϵ φ Ytx;l
( (
v t U2 v1 t
[ 1 ∫ t⟨ ⟩]
1 x;l ϵ x;l ϵ
[ ϵ)
]
φ , − E ∇ D ∇ Y x;l ϵ φ Ytx;l .
( ) (
= ϵ E ∇ ∇ Y x;l ϵ Yt ∇ Y
v2 s dW s
t − l0 v1 t U2 v1 t
0

We can now use the fact that for each t ≥ 0,


ϵ ϵ ϵ ϵ
Ylx;l
ϵ = X tx;l and ∇v Ylx;l
ϵ = ∇v X tx;l .
t t

Replacing t with ltϵ in (3.17), this yields


ϵ)
[ ]
E ∇∇ X x;l ϵ ∇∇ X x;l ϵ φ X tx;l
(
v2 t v1 t
[ ∫ ϵ ]
( x;l ϵ ) lt ⟨
[ ]
1 ϵ
⟩ ( x;l ϵ )
= ϵ E ∇∇ X x;l ϵ φ X t ∇v2 Ys , dWs − E ∇ D ∇ Y x;l ϵ φ X t
x;l
lt − l0ϵ v1 t
0 U2 v1 l ϵ
t
[ ∫ t⟨ ] [ )]
1 ( x;l ϵ ) x;l ϵ
⟩ (
x;l ϵ
= ϵ E ∇∇ X x;l φ X t
ϵ ∇v2 X s , dWls − E ∇ D ∇ Y x;l φ X t
ϵ ϵ .
lt − l0ϵ v1 t
0 U2 v1 l ϵ
t

Since γlϵϵ = t, (3.21) implies that for every t ∈ (0, 1]


t
⏐ [ ]⏐
( x;l ϵ ) ⏐ ⏐⏐ [⟨ ( x;l ϵ ) x;l ϵ ⏐
⟩]⏐
ϵφ X ,

⏐E ∇ x;l
⏐ = ⏐E ∇φ X DU ∇ v Y ϵ
⏐ D ∇ Y
U2 v1 l ϵ t ⏐ t 2 1 l ⏐
t
t
θ3 2θ2 γ ϵϵ √
≤ ∥∇φ∥∞ √ e γlϵϵ |v1 | |v2 |
lt
2θ2 t

θ3 2θ2 t √
= ∥∇φ∥∞ √ e t |v1 | |v2 |
2θ2
θ3 2θ2
≤ ∥∇φ∥∞ √ e |v1 | |v2 | ,
2θ2
155
Chen, Deng, Schilling, Xu Stochastic Processes and their Applications 163 (2023) 136–167

and so
ϵ) ⏐
⏐ [ ]⏐
⏐E ∇∇v X x;l ϵ ∇∇v X x;l ϵ φ X tx;l ⏐
⏐ (
2 t
[ 1 t
( x;l ϵ ) t ⟨ θ3 2θ2
⏐ ∫ ⟩]⏐⏐
⏐ 1 ϵ
ϵ E ∇∇v X tx;l φ X t ∇v2 X s , dWlsϵ ⏐⏐ + ∥∇φ∥∞ √ e |v1 | |v2 | .
ϵ
x;l
≤⏐ ϵ

lt − l0 1 0 2θ2
(3.22)
By the same argument as in the proof of [43, Lemma 2.5], we obtain
( x;l ϵ ) t ⟨
[ ∫ ⟩]
1 x;l ϵ
lim E ∇∇ X x;l φ X t
ϵ ∇v2 X s , dWls ϵ
ϵ→0 ltϵ − l 0ϵ v1 t
0
[ ∫ t ] (3.23)
1
= E ∇∇v X x;l φ X t ∇v2 X s , dWls .
( x;l ) ⟨ x;l

lt 1 t 0
On the other hand, from [43, Lemma 2.2], we know that
ϵ)
[ ] [ )]
lim E ∇∇ X x;l ϵ ∇∇ X x;l ϵ φ X tx;l φ X tx;l .
( (
= E ∇∇v X x;l ∇∇v x;l (3.24)
ϵ→0 v2 t v1 t 2 t 1 Xt

Letting in (3.22) ϵ → 0 and using (3.23) and (3.24), completes the proof. □

3.4. Proof of Lemma 2.1

Because of (3.3), we can use the differentiation theorem for parameter dependent integrals
to get
⏐ ⏐ ⏐ [
|∇v Pt h(x)| = ⏐∇v E[h(X tx )]⏐ = ⏐E ∇∇v X tx h(X tx ) ⏐
]⏐

= ⏐E ∇h(X x ), ∇v X x ⏐ ≤ ∥∇h∥ eθ2 |v| ≤ eθ2 |v| .


⏐ [⟨ ⟩]⏐
t t

In order to see the second inequality, we define for every ϵ > 0



h ϵ (x) := gϵ (y)h(x − y) dy, (3.25)
Rd

where gϵ is the density of the normal distribution N (0, ϵ 2 Id ). It is easy to see that h ϵ is smooth,
limϵ→0 h ϵ (x) = h(x), limϵ→0 ∇h ϵ (x) = ∇h(x) and |h ϵ (x)| ≤ C(1 + |x|) for all x ∈ Rd and
some C > 0. Moreover, ∥∇h ϵ ∥ ≤ ∥∇h∥ ≤ 1. Using the differentiability theorem for parameter
dependent integrals we get
[ ] [ ]
∇v2 ∇v1 E h ϵ (X tx ) = E ∇∇v2 ∇v1 X tx h ϵ (X tx ) + E ∇∇v2 X tx ∇∇v1 X tx h ϵ (X tx ) .
[ ]

From (3.4) we get


⏐ [ ]⏐ ⏐ [⟨ θ3
⏐E ∇∇v2 ∇v1 X tx h ϵ (X tx ) ⏐ = ⏐E ∇h ϵ (X tx ), ∇v2 ∇v1 X tx ⏐ ≤ √ e4θ2 |v1 | |v2 | .
⏐ ⏐ ⟩]⏐
2 2θ2
It follows from Lemma 3.2, [43, (3.3)] and (3.3) that
⏐ [ ]⏐⏐
⏐E ∇∇v2 X tx ∇∇v1 X tx h ϵ (X tx ) ⏐

( x) t ⟨ θ3 2θ2
⏐ [ ∫ ]⏐
1
∇v2 X s , dW Ss ⏐⏐ + ∥∇h ϵ ∥∞ √
⏐ x
⟩ ⏐
≤ ⏐E
⏐ ∇∇v1 X tx h ϵ X t e |v1 | |v2 |
St 0 2θ2
θ3 2θ2
[ ⏐∫ t ⏐]
1 ⏐⏐ ⟨
≤ eθ2 |v1 | E ∇v2 X sx , dW Ss ⏐⏐ + √ e |v1 | |v2 | .
⟩⏐
St 0 ⏐ 2θ2
156
Chen, Deng, Schilling, Xu Stochastic Processes and their Applications 163 (2023) 136–167

The Cauchy–Schwarz inequality, Itô’s isometry and (3.9) give


[ ⏐∫ t
1 ⏐⏐ t ⟨
⏐] ∫ ⏐∫ ⏐
1 ⏐⏐ ⟨
∇v2 X s , dW Ss ⏐ = ∇v2 X s , dWls ⏐⏐ µS (dl)
x
⟩⏐ x;l
⟩⏐
E ⏐ E⏐
St ⏐ 0 S lt 0
∫ ( ∫ t )1/2
1 x;l 2
≤ E |∇v2 X s | dls µS (dl)
S lt 0

1
≤ eθ2 |v2 | √ µS (dl)
S lt
[ ]
= eθ2 |v2 | E St
−1/2

≤ Ceθ2 |v2 | t −1/α ,


where the last inequality is taken from [10, Theorem 2.1 (ii) (c)]. Thus, we have for all
t ∈ (0, 1],
⏐ ]⏐ ⏐ [ ]⏐ ⏐ [ ]⏐
⏐∇v ∇v E h ϵ (X x ) ⏐ ≤ ⏐⏐E ∇∇ ∇ X x h ϵ (X x ) ⏐⏐ + ⏐⏐E ∇∇ X x ∇∇ X x h ϵ (X x ) ⏐⏐
[
2 1 t v2 v1 t t v2 t v1 t t

θ3 θ3 2θ2
≤ √ e4θ2 |v1 | |v2 | + eθ2 |v1 | Ceθ2 |v2 | t −1/α + √ e |v1 | |v2 |
2 2θ2 2θ2
θ3 θ3
( )
≤ e4θ2 √ + Ct −1/α + √ |v1 | |v2 |
2 2θ2 2θ2
≤ Ce4θ2 t −1/α |v1 | |v2 | .
Finally, we can let ϵ → 0 using dominated convergence,
lim ∇v2 ∇v1 E h ϵ (X tx ) = ∇v2 ∇v1 E h(X tx ) ,
[ ] [ ]
ϵ→0

completing the proof of Lemma 2.1. □

Declaration of competing interest

None.

Acknowledgments

We would like to gratefully thank the associate editor and the two anonymous referees
for their very helpful and constructive comments, in particular for pointing out an elegant
alternative proof for Lemma 2.1, see also Remark 2.2. The research of L. Xu has been
supported in part by NSFC, China No. 12071499, Macao S.A.R. grant FDCT0090/2019/A2 and
University of Macau grant MYRG2020-00039-FST. R.L. Schilling has been supported through
the joint Polish–German NCN–DFG ‘Beethoven 3’ grant (NCN 2018/31/G/ST1/02252; DFG
SCHI 419/11-1). C.-S. Deng is supported by Natural Science Foundation of Hubei Province
of China (2022CFB129). P. Chen is supported by the NSF of Jiangsu Province, China grant
BK20220867 and the Initial Scientific Research Fund of Young Teachers in Nanjing University
of Aeronautics and Astronautics, China (1008-YAH21111).
157
Chen, Deng, Schilling, Xu Stochastic Processes and their Applications 163 (2023) 136–167

Appendix A. Proofs of Propositions 1.5–1.7, and Lemma 1.8

Proof of Proposition 1.5. The generator Lα of the process X t is given


Lα f (x) = ⟨b(x), ∇ f (x)⟩ + (−∆)α/2 f (x), f ∈ C 2 (Rd , R)
where (−∆)α/2 is the fractional Laplace operator, which is the generator of the rotationally
symmetric α-stable Lévy process Z t ; it is defined as a principal value integral

dy
(−∆)α/2 f (x) = Cd,α · p.v. ( f (x + y) − f (x)) α+d . (A.1)
R d |y|
It is not difficult to see that for all functions from the set
{ ∫ }
dz
D := f ∈ C 2 (Rd , R) ; | f (x + z) − f (x)| α+d < ∞ .
|z|≥1 |z|
α α ˆα
L f is well-defined; moreover D × L (D) can be embedded into the full generator ∫ t L , i.e.
the set of all pairs of (bounded) Borel functions ( f, g) such that f (X t ) − f (X 0 ) − 0 g(X s ) ds
is a (local) martingale, see the discussion in [4, pp. 24–26].
Recall that Vβ (x) = (1 + |x|2 )β/2 . It is easy to check that Vβ ∈ D(Lα ). Since
βx β Id β(β − 2)x x ⊤
∇Vβ (x) = , ∇ 2 Vβ (x) = + ,
2 2−β 2 1− β2 β
(1 + |x| ) 2 (1 + |x| ) (1 + |x|2 )2− 2
(Id denotes the d × d identity matrix) we see that for any x ∈ Rd

|∇Vβ (x)| ≤ β|x|β−1 , ∥∇ 2 Vβ (x)∥HS ≤ β(3 − β) d.
Thus, (1.7) and Young’s inequality (AB ≤ 1
p
A p + q1 B q with p = β and q = β/(β − 1)) imply
⟨ ⟩ β β
b(x), ∇Vβ (x) = ⟨b(x) − b(0), x⟩ + ⟨b(0), x⟩
2 2−β 2−β
(1 + |x| ) 2 (1 + |x|2 ) 2
β|x|2 βK β|b(0)||x|
≤ −θ1 2−β
+ 2−β
+ 2−β
(1 + |x|2 ) 2 (1 + |x|2 ) 2 (1 + |x|2 ) 2
≤ −θ1 βVβ (x) + θ1 β + β K + β|b(0)||x|β−1
≤ −θ1 Vβ (x) + θ1 β + β K + θ1 |b(0)|β .
1−β

Therefore, we see from (A.1) that



α/2
( ⟨ ⟩ ) dy
(−∆) Vβ (x) = Cd,α Vβ (x + y) − Vβ (x) − ∇Vβ (x), y 1(0,1) (|y|)
|y|α+d
∫ ∫ 1∫ r
⟨ 2 dy
∇ Vβ (x + sy), yy ⊤ HS ds dr

= Cd,α α+d
|y|<1 0 |y|
∫ ∫0 1
⟨ ⟩ dy
+ Cd,α ∇Vβ (x + r y), y dr
|y|≥1 0 |y|α+d
β(3 − β) √ 2
|x|β−1 |y| + |y|β
∫ ∫
|y|
≤ Cd,α d α+d
dy + C d,α β dy
2
√ |y|<1 |y| |y|α+d
( β−1 |y|≥1
Cd,α β(3 − β) dσd−1
)
|x| 1
= + Cd,α βσd−1 + .
2(2 − α) α−1 α−β
(A.2)
158
Chen, Deng, Schilling, Xu Stochastic Processes and their Applications 163 (2023) 136–167

Again by Young’s inequality we get



Cd,α σd−1 β
( )1−β (
⏐(−∆)α/2 Vβ (x)⏐ ≤ Cd,α β(3 − β) dσd−1 + Cd,α βσd−1 + θ1
)
⏐ ⏐
2(2 − α) α−β 4 α−1
θ1
+ Vβ (x).
4
Hence, we have

Lα Vβ (x) ≤ −λ1 Vβ (x) + q1 1 A1 (x), (A.3)

with λ1 = 21 θ1 ,

Cd,α β(3 − β) dσd−1 Cd,α βσd−1
θ1 |b(0)|β
1−β
q 1 = θ1 β + β K + + +
2(2 − α) α−β
Cd,α σd−1 β
( )1−β (
θ1
)
+ ,
4 α−1
{ )1/β }
and the compact set A1 = x ∈ Rd : |x| ≤ 4θ1 −1 q1
(
.
x
Thus, [22, Theorem 5.1] yields that the process (X t )t≥0 is ergodic, i.e. there exists a unique
invariant probability measure µ such that for all x ∈ Rd and t > 0,

lim ∥Pt (x, ·) − µ∥TV = 0,


t→∞

where Pt (x, dz) is the transition function of the process (X tx )t≥0 and ∥ · ∥TV denotes the total
variation norm on the space of signed measures. Furthermore, because of the inequality above
and [22, Theorem 6.1], we have

sup ⏐E[ f (X tx )] − µ( f )⏐ ≤ c1 Vβ (x)e−c2 t


⏐ ⏐
| f |≤Vβ

for suitable constants c1 , c2 > 0. In addition, by Itô’s formula, the integrability of X tx can be
derived directly from the Lyapunov condition (A.3). □

Proof of Proposition 1.6. For any x, y ∈ Rd , (1.8) implies that

⟨b(x) − b(y), x − y⟩ ≤ θ2 |x − y|2 ,

and (1.7) shows for all |x − y|2 > 2K /θ1 that


θ1
⟨b(x) − b(y), x − y⟩ ≤ −θ1 |x − y|2 + K ≤ − |x − y|2 .
2
θ1
Hence, we can use [41, Theorem 1.2] with K 1 = θ2 , K 2 = 2
and L 0 = 2K
θ1
to get the desired
estimate. □

Proof of Proposition 1.7. We show only (1.12), as (1.13) can be proved in the same way.
Denote by P(x, dy) = P(Y1 ∈ dy | Y0 = x). Since V1 (y) ≤ |y| + 1 and

Y1 = x + ηb(x) + Z η ,
159
Chen, Deng, Schilling, Xu Stochastic Processes and their Applications 163 (2023) 136–167

we have
∫ ∫
V1 (y) P(x, dy) ≤ (|y| + 1) p (η, y − x − ηb(x)) dy
Rd ∫R
d

= (|z + x + ηb(x)| + 1) p(η, z) dz


d
∫R
≤ (|z| + |x + ηb(x)| + 1) p(η, z) dz
Rd
≤ E|Z η | + ⏐x + η b(x) − b(0) ⏐ + η|b(0)| + 1.
⏐ ( )⏐

By (1.7) and (1.9), we further have


⏐x + η b(x) − b(0) ⏐2 = |x|2 + 2η ⟨b(x) − b(0), x⟩ + η2 |b(x) − b(0)|2
⏐ ( )⏐

≤ 1 − 2θ1 η + θ22 η2 |x|2 + 2K η,


( )

which implies

1
V1 (y) P(x, dy) ≤ η α E|Z 1 | + (1 − 2θ1 η + θ22 η2 )1/2 |x| + 2K η + η|b(0)| + 1

Rd
1
≤ (1 − θ1 η) |x| + η α E|Z 1 | + 2K η + η|b(0)| + 1,

where the last two inequalities hold because of η < min 1, θ1 θ2−2 , θ1−1 . Hence, we have
{ }

V1 (y) P(x, dy) ≤ λ2 V1 (x) + q2 1 A2 (x)
Rd
with
θ1 θ1 1
λ2 = 1 − η < 1,q2 = 1 + η + η α E|Z 1 | + 2K η + η|b(0)|,

2 { 2 √ }
2 1 2 2K − 1 2|b(0)|
d
and the compact set A2 = x ∈ R : |x| ≤ 1 + E|Z 1 |η α + η 2+ .
θ1 θ1 θ1
The proof of irreducibility is standard, see e.g. [20, Appendix A].
We can now use this and [21, Theorem 6.3] to see that the process (Ykx )k≥0 is exponentially
ergodic, i.e. there exists a unique invariant probability µη such that for all x ∈ Rd and t > 0,
sup |E f (Yk ) − µη ( f )| ≤ c1 V1 (x)e−c2 k ,
| f |≤V1

for suitable constants c1 , c2 > 0. □

Proof of Lemma 1.8. We show only (1.14) as the inequality (1.15) can be proved in the same
way.
Notice that |y|β ≤ Vβ (y) and
Vβ (Yk+1 ) = Vβ Yk + ηb(Yk ) + Z η
( )

= Vβ (Yk + ηb(Yk )) + Vβ Yk + ηb(Yk ) + Z η − Vβ (Yk + ηb(Yk ))


( )
∫ η
∇Vβ (Yk + r b(Yk )) , b(Yk ) dr
⟨ ⟩
= Vβ (Yk ) +
( 0
+ Vβ Yk + ηb(Yk ) + Z η − Vβ (Yk + ηb(Yk )) ,
)

160
Chen, Deng, Schilling, Xu Stochastic Processes and their Applications 163 (2023) 136–167

2−β
where Z η is independent of Yk . Since ∇Vβ (x) = βx(1 + |x|2 )− 2 , (1.7) implies that
∫ η
∇Vβ (Yk + r b(Yk )) , b(Yk ) dr
⟨ ⟩
0
∫ η
β ⟨Yk , b(Yk )⟩ + βr |b(Yk )|2

0
( ) 2−β dr
2 2
1 + |Yk + r b(Yk )|
∫ η (A.4)
β ⟨Yk , b(Yk ) − b(0)⟩ + β|b(0)||Yk | + βr |b(Yk )|2
≤ ) 2−β dr
0
1 + |Yk + r b(Yk )|2 2
(
∫ η
−θ1 β|Yk |2 + β K + β|b(0)||Yk | + βr |b(Yk )|2
≤ ) 2−β dr.
0
1 + |Yk + r b(Yk )|2 2
(

{ }
θ1
One can write by (1.9) and the fact r ≤ η ≤ min 1, 8θ 2 , θ 1
2 1

− θ1 β|Yk |2 + β K + β|b(0)||Yk | + βr |b(Yk )|2


3 β|b(0)|2
≤ − θ1 β|Yk |2 + 2ηθ22 β|Yk |2 + + 2βr |b(0)|2 + β K
4 θ1
θ1 β β|b(0)|2
≤− |Yk |2 + + 2βr |b(0)|2 + β K ,
2 θ1
whereas

|Yk + r b(Yk )|2 + 1


= |Yk |2 + 2r ⟨Yk , b(Yk )⟩ + r 2 b(Yk )2 + 1
≤ |Yk |2 + 2r ⟨Yk , b(Yk ) − b(0)⟩ + 2r |b(0)||Yk | + 2r 2 θ22 |Yk |2 + 2r 2 |b(0)|2 + 1
≤ (1 − 2r θ1 )|Yk |2 + 2r |b(0)||Yk | + 2r 2 θ22 |Yk |2 + 2η2 |b(0)|2 + 1 + 2ηK
2r |b(0)|2
≤ |Yk |2 + + 2η2 |b(0)|2 + 1 + 2ηK .
θ1
) 2−β
Noting that 1 + |Yk + r b(Yk )|2 2 ≥ 1, we get
(

−θ1 β|Yk |2 + β K + β|b(0)||Yk | + βr |b(Yk )|2


) 2−β
1 + |Yk + r b(Yk )|2 2
(

θ1 β |Yk |2 β|b(0)|2
≤− +
2 ( 2r |b(0)|2
) 2−β
2 θ1
|Yk |2 + θ1
+ 2η2 |b(0)|2 + 1 + 2ηK
+ 2βr |b(0)|2 + β K
( )β
θ1 β 2 2r |b(0)|2 2 2
2
≤− |Yk | + + 2η |b(0)| + 1 + 2ηK + C1
2 θ1
θ1 β
≤− Vβ (Yk ) + C1 ,
2
161
Chen, Deng, Schilling, Xu Stochastic Processes and their Applications 163 (2023) 136–167

where
( )
θ1 β 2r |b(0)|2 β|b(0)|2
C1 := + 2η2 |b(0)|2 + 1 + 2ηK + + 2βr |b(0)|2 + β K .
2 θ1 θ1

Combining this with (A.4), we arrive at


∫ η
θ1 β
∇Vβ (Yk + r b(Yk )) , b(Yk ) dr ≤ − ηVβ (Yk ) + C1 η.
⟨ ⟩
0 2

In addition, for any y ∈ Rd , Itô’s formula and the inequality (A.2) imply that

⏐E Vβ y + ηb(y) + Z η − Vβ (y + ηb(y)) ⏐
⏐ [ ( ) ]⏐
⏐∫ η ⏐
α/2
E (−∆) Vβ (y + ηb(y) + Z r ) dr ⏐⏐
⏐ [ ] ⏐
=⏐⏐
0

E|y + ηb(y) + Z r |β−1
∫ η[ ( )]
Cd,α β(3 − β) dσd−1 1
≤ + Cd,α βσd−1 + dr
0 2(2 − α) α−1 α−β

|y|β−1 + ηβ−1 |b(y)|β−1 + E|Z r |β−1
∫ η[ ]
(3 − β) d 1
≤ Cd,α βσd−1 + + dr.
0 2(2 − α) α−1 α−β

This yields, in turn,

⏐E Vβ y + ηb(y) + Z η − Vβ (y + ηb(y)) ⏐
⏐ [ ( ) ]⏐
( √ β−1
(3 − β) dη η 1 + θ2
≤ Cd,α βσd−1 + + η|y|β−1
2(2 − α) α−β α−1
E|Z r |β−1
∫ η )
β−1
+η|b(0)| + dr
0 α−1
( √ β−1
(3 − β) dη η 1 + θ2
≤ Cd,α βσd−1 + + η|y|β−1
2(2 − α) α−β α−1
E|Z 1 |β−1
)
+η|b(0)|β−1 + η ,
α−1

where the first inequality uses (1.9) and the fact that 0 < η < 1. Since Z η is independent of
Yk , we can derive that

⏐Vβ Yk + ηb(Yk ) + Z η − Vβ (Yk + ηb(Yk ))⏐


⏐ ( ) ⏐
( √ β−1
(3 − β) dη η 1 + θ2
≤ Cd,α βσd−1 + + ηE|Yk |β−1
2(2 − α) α−β α−1
E|Z 1 |β−1
)
β−1
+η|b(0)| + η
α−1
θ1 (β − 1)
≤ η Vβ (Yk ) + C2 η,
2
162
Chen, Deng, Schilling, Xu Stochastic Processes and their Applications 163 (2023) 136–167

where the last inequality follows from Young’s inequality and


( √ β−1
)
(3 − β) d 1 E|Z |
+ |b(0)|β−1 +
1
C2 = Cd,α βσd−1 +
2(2 − α) α−β α−1
β−1 β(
2 β−1
( )
Cd,α σd−1 (1 + θ2 )
)
+ .
α−1 θ1

Therefore,
θ1
( )
E[Vβ (Yk+1 )] ≤ 1 − η E[Vβ (Yk )] + (C1 + C2 )η,
2
which we can iterate this to get
k (
θ1 k+1 θ1 j
( ) ∑ )
E[Vβ (Yk+1 )] ≤ 1 − η Vβ (x) + (C1 + C2 )η 1− η
2 j=0
2
2(C1 + C2 )
≤ Vβ (x) + .
θ1
Using that Vβ (x) ≤ 1 + |x|β , we finally get

E|Ykx |β ≤ E[Vβ (Ykx )] ≤ C(1 + |x|β ),

for some constant C which is independent of η. □

Appendix B. Exact rate for the Ornstein–Uhlenbeck process

In this section, we assume that µ is the invariant measure of the Ornstein–Uhlenbeck process
on R:

dX t = −X t dt + dZ t , X 0 = x, (B.1)

where Z t is a rotationally symmetric α-stable Lévy process (1 < α < 2), and µ̃η is the invariant
measure of
η1/α
Ỹk+1 = Ỹk − ηỸk + Z̃ k+1 , k = 0, 1, 2, . . . ,
σ
( ∫ )−1
where η ∈ (0, 1), Ỹ0 = x, σ = 2dαα
( )1/α ∞ 1−cos y
with dα = C1,α = 2 0 y α+1
dy , and Z̃ j are
i.i.d. random variables with density
α
p(z) = 1(1,∞) (|z|). (B.2)
2|z|α+1

Proposition B.1. For every x ∈ R and α ∈ (1, 2),


W1 (µ, µ̃η ) W1 (µ, µ̃η )
0 < lim inf ≤ lim sup < ∞.
η↓0 η 2/α−1
η↓0 η2/α−1
163
Chen, Deng, Schilling, Xu Stochastic Processes and their Applications 163 (2023) 136–167
∫t
Proof. Since X t = e−t x + e−t 0 es dZ s , we get
−t s α
[ ∫ t −t s ] ∫t
E eiξ X t = eiξ e x E ei 0 ξ e e dZ s = eiξ e x e− 0 |ξ e e | ds
[ ] −t −t

−t x −α −1 |ξ |α (1−e−αt )
= eiξ e e
−α −1 |ξ |α
[ −1/α ]
−−−→ e = E eiξ α Z1
.
t→∞

Hence, the invariant measure µ is given by the law of α −1/α Z 1 .


It is easy to see that
k
η1/α ∑
Ỹk+1 = (1 − η)k+1 x + (1 − η)i Z̃ k+1−i .
σ i=0

Denote by ϕ(ξ ) = E eiξ Z̃ j the characteristic function of the Pareto distribution. Then we have
[ ]

k [ 1/α ]
η
[ ] k+1 ∏ i
E eiξ Ỹk+1 = eiξ (1−η) x E eiξ σ (1−η) Z̃ k+1−i
i=0
k
η
( 1/α )
iξ (1−η)k+1 x

=e ϕ (1 − η) ξ .
i

i=0
σ

Letting k → ∞ and denoting by Ỹη a random variable with distribution µ̃η , we get

η
[ ] ∏ ( 1/α )
E eiξ Ỹη = ϕ (1 − η)i ξ . (B.3)
i=0
σ
For ξ > 0, we have
∫ ∞ ∫ ∞
dz
1 − ϕ(ξ ) = 2 [1 − cos(ξ z)] p(z) dz = α [1 − cos(ξ z)] α+1
1 (∫ ∫ ξ1 z)

α du du
= αξ [1 − cos u] α+1 − [1 − cos u] α+1
0 u 0 u
∫ ξ
du
= σ α ξ α − αξ α [1 − cos u] α+1 .
0 u
Since p(z) is symmetric, cf. (B.2), we have ϕ(ξ ) = ϕ(−ξ ), and so
∫ |ξ |
du
ϕ(ξ ) = 1 − σ α |ξ |α + α|ξ |α [1 − cos u] α+1 .
0 u
for all ξ ∈ R. Since c := inf0<u≤1 (1 − cos u) /u 2 > 0, we get for all |ξ | ≤ 1,
∫ |ξ |
α α α du cα
ϕ(ξ ) ≥ 1 − σ |ξ | + α|ξ | cu 2 α+1 = 1 − σ α |ξ |α + |ξ |2 .
0 u 2−α
Thus, for |ξ | ≤ 1 and 0 < η < 1 ∧ σ α ,
( ⏐α ⏐2 )
η α ⏐η cα ⏐⏐ η1/α
( 1/α ) ⏐ 1/α ⏐
log ϕ (1 − η) ξ ≥ log 1 − σ ⏐ (1 − η) ξ ⏐ + (1 − η) ξ ⏐
i
⏐ i ⏐
⏐ i ⏐

σ σ 2−α σ ⏐
( )
αi α cα
= log 1 − η(1 − η) |ξ | + η (1 − η) |ξ | .
2/α 2i 2
(2 − α)σ 2
(B.4)
164
Chen, Deng, Schilling, Xu Stochastic Processes and their Applications 163 (2023) 136–167

Observe that
( )

log 1 − x + (2−α)σ 2
x 2/α + x cα
lim = .
x↓0 x 2/α (2 − α)σ 2
Therefore, there is some constant C = C(α, σ ) > 0 such that for small enough x > 0
c1 α
( )
log 1 − x + x 2/α
≥ −x + C x 2/α .
(2 − α)σ 2
If we use this in (B.4), we obtain for all |ξ | ≤ 1 and small enough η > 0,
η
( 1/α )
log ϕ (1 − η)i ξ ≥ −η(1 − η)αi |ξ |α + Cη2/α (1 − η)2i |ξ |2 .
σ
Inserting this into (B.3), we see for all |ξ | ≤ 1 and small enough η > 0

η
[ ] ∑ ( 1/α )
log E e iξ Ỹη
= log ϕ (1 − η) ξ
i

i=0
σ
∞ ∞
≥ −η|ξ |α (1 − η)αi + Cη2/α |ξ |2
∑ ∑
(1 − η)2i
i=0 i=0
η η2/α
= −|ξ |α + C|ξ |2
1 − (1 − η)α 1 − (1 − η)2
1
= − |ξ |α − |ξ |α Ω (η) + |ξ |2 Ω (η2/α−1 ).
α
In the last equality we use that
η α−1 η2/α
[ ]
1 −1 1
lim α
− η = and lim η−2/α+1 = .
η↓0 1 − (1 − η) α 2α η↓0 1 − (1 − η) 2 2
Here and in the following, the notation f (η) = Ω (g(η)) as η ↓ 0 means that limη↓0 g(η) f (η)
is
a positive (finite) constant, where f and g are some positive functions. With the elementary
inequality ex ≥ 1 + x for x ∈ R we see for all |ξ | ≤ 1 and sufficiently small η > 0 that
[ ]
[
iξ Ỹη
] 1 α α 2 2/α−1
E e ≥ exp − |ξ | − |ξ | Ω (η) + |ξ | Ω (η )
α
α
≥ e−|ξ | /α 1 − |ξ |α Ω (η) + |ξ |2 Ω (η2/α−1 ) ,
[ ]

which yields
∫ 1( [ ∫ 1
−1 |ξ |α
] [ −1/α ])
iξ α
iξ Ỹη Z1
e−α −|ξ |α Ω (η) + |ξ |2 Ω (η2/α−1 ) dξ
[ ]
E e −E e dξ ≥
−1 −1
= −Ω (η) + Ω (η2/α−1 ) = Ω (η2/α−1 ).
(B.5)
Define
( )
1 sin x
h(x) := 1{x̸=0} + 1{x=0} , x ∈ R,
M x
where
⏐ ⏐
⏐ x cos x − sin x ⏐
M := sup ⏐ ⏐ ⏐ ∈ (0, ∞).
x∈R\{0} x2 ⏐
165
Chen, Deng, Schilling, Xu Stochastic Processes and their Applications 163 (2023) 136–167

Since h ∈ Lip(1) and


∫ 1
1
h(x) = eiξ x dξ, x ∈ R,
2M −1
it follows from Fubini’s theorem and (B.5) that
⏐ [ ] ]⏐⏐
W1 (µ, µ̃η ) ≥ ⏐E h(Ỹη ) − E h(α −1/α Z 1 ) ⏐
⏐ [
⏐∫ ( ∫ 1 )
⏐ 1
= ⏐⏐ eiξ x dξ P(Ỹη ∈ dx)
R 2M −1
∫ ( ∫ 1 ) ⏐
1 ⏐
− eiξ x dξ P(α −1/α Z 1 ∈ dx)⏐⏐
2M
⏐ R ∫ 1 [ −1 ] ∫ 1 ⏐
⏐ 1 1
E eiξ α
[ −1/α Z ] ⏐
= ⏐⏐ E eiξ Ỹη dξ − 1 dξ ⏐
2M −1 2M −1 ⏐
⏐∫ 1 ( [ ⏐
1 ⏐⏐ ] [ −1/α Z ]) ⏐
= E eiξ Ỹη − E eiξ α 1 dξ ⏐⏐
2M −1⏐
≥ Ω (η2/α−1 ).
Combining this with the upper bound in Theorem 1.2 (2), finishes the proof. □

References
[1] V. Bally, D. Talay, The law of the Euler scheme for stochastic differential equations, Probab. Theory Related
Fields 104 (1996) 43–60.
[2] C. Berg, G. Forst, Potential Theory on Locally Compact Abelian Groups, in: Ergebnisse der Mathematik und
ihrer Grenzgebiete. II. Ser., Bd. 87, Springer, Berlin, 1975.
[3] R.M. Blumenthal, R.K. Getoor, Some theorems on stable processes, Trans. Amer. Math. Soc. 95 (1960)
263–273.
[4] B. Böttcher, R.L. Schilling, J. Wang, Lévy-Type Processes: Construction, Approximation and Sample Path
Properties, in: Lecture Notes in Mathematics, Lévy Matters III, vol. 2099, Springer, Cham, 2013.
[5] J.M. Chambers, C.L. Mallows, B. Stuck, A method for simulating stable random variables, J. Amer. Statist.
Assoc. 71 (1976) 340–344.
[6] P. Chen, J. Lu, L. Xu, Approximation to stochastic variance reduced gradient Langevin dynamics by stochastic
delay differential equations, Appl. Math. Optim. 85 (2022) 1–40.
[7] P. Chen, I. Nourdin, L. Xu, X. Yang, Multivariate stable approximation in Wasserstein distance by Stein’s
method, 2019, Preprint arXiv:1911.12917.
[8] P. Chen, L. Xu, Approximation to stable law by the Lindeberg principle, J. Math. Anal. Appl. 480 (2019)
123338.
[9] K. Dareiotis, M. Gerencs, On the regularisation of the noise for the Euler–Maruyama scheme with irregular
drift, Electron. J. Probab. 25 (2020) 1–18.
[10] C.-S. Deng, R.L. Schilling, Y.-H. Song, Subgeometric rates of convergence for Markov processes under
subordination, Adv. Appl. Probab. 49 (2017) 162–181.
[11] W. Fang, M.B. Giles, Adaptive Euler–Maruyama method for SDEs with non-globally Lipschitz drift, in:
International Conference on Monte Carlo and Quasi-Monte Carlo Methods in Scientific Computing, Springer,
Cham, 2016, pp. 217–234.
[12] X. Fang, Q.-M. Shao, L. Xu, Multivariate approximations in Wasserstein distance by Stein’s method and
Bismut’s formula, Probab. Theory Related Fields 174 (2019) 945–979.
[13] P. Hall, Two-sided bounds on the rate of convergence to a stable law, Z. Wahrscheinlichkeitstheor. Verwandte
Geb. 57 (1981) 349–364.
[14] J. Jacod, The Euler scheme for Lévy driven stochastic differential equations: limit theorems, Ann. Probab. 32
(2004) 1830–1872.
[15] A. Janicki, Z. Michna, A. Weron, Approximation of stochastic differential equations driven by α-stable Lévy
motion, Appl. Math. 24 (1996) 149–168.
166
Chen, Deng, Schilling, Xu Stochastic Processes and their Applications 163 (2023) 136–167

[16] X. Jin, G. Pang, L. Xu, X. Xu, An approximation to steady-state of M/Ph/n + M queue, 2021, Preprint ar
Xiv:2109.03623.
[17] F. Kühn, R.L. Schilling, Strong convergence of the Euler–Maruyama approximation for a class of Lévy-driven
SDEs, Stochastic Process. Appl. 129 (2019) 2654–2680.
[18] V. Lemaire, An adaptive scheme for the approximation of dissipative systems, Stochastic Process. Appl. 117
(2007) 1491–1518.
[19] X. Li, Q. Ma, H. Yang, C. Yuan, The numerical invariant measure of stochastic differential equations with
Markovian switching, SIAM J. Numer. Anal. 56 (2018) 1435–1455.
[20] J. Lu, Y. Tan, L. Xu, Central limit theorem and self-normalized Cramér-type moderate deviation for
Euler–Maruyama scheme, Bernoulli 28 (2020) 937–964.
[21] S.P. Meyn, R.L. Tweedie, Stability of Markovian processes I: Criteria for discrete-time chains, Adv. Appl.
Probab. 24 (1992) 542–574.
[22] S.P. Meyn, R.L. Tweedie, Stability of Markovian processes III: Foster–Lyapunov criteria for continuous-time
processes, Adv. Appl. Probab. 25 (1993) 518–548.
[23] R. Mikulevičius, F. Xu, On the rate of convergence of strong Euler approximation for SDEs driven by Lévy
processes, Stochastics 90 (2018) 569–604.
[24] R. Modarres, J.P. Nolan, A method for simulating stable random vectors, Comput. Statist. 9 (1994) 11–19.
[25] T.H. Nguyen, U. Simsekli, G. Richard, Non-asymptotic analysis of Fractional Langevin Monte Carlo for non-
convex optimization, in: International Conference on Machine Learning, in: Proceedings of Machine Learning
Research, 2019, pp. 4810–4819.
[26] J.P. Nolan, An overview of multivariate stable distributions, 2008, Online: http://hdl.handle.net/1961/auisland
ora:68717 (accessed January 12, 2023).
[27] J.P. Nolan, Univariate Stable Distributions: Models for Heavy Tailed Data, in: Springer Series in Operations
Research and Financial Engineering, Springer, Cham, 2020.
[28] J. Norris, Simplified Malliavin calculus, in: Séminaire de Probabilités XX 1984/85, Springer, Berlin, 1986,
pp. 101–130.
[29] D. Nualart, The Malliavin Calculus and Related Topics, second ed., Springer, Berlin, 2006.
[30] G. Pagès, F. Panloup, Unadjusted Langevin algorithm with multiplicative noise: Total variation and wasserstein
bounds, 2020, Preprint arXiv:2012.14310.
[31] O.M. Pamen, D. Taguchi, Strong rate of convergence for the Euler–Maruyama approximation of SDEs with
Hölder continuous drift coefficient, Stochastic Process. Appl. 127 (2017) 2542–2559.
[32] F. Panloup, Recursive computation of the invariant measure of a stochastic differential equation driven by a
Lévy process, Ann. Appl. Probab. 18 (2008) 379–426.
[33] P.E. Protter, Stochastic Integration and Differential Equations, second ed., Springer, Berlin, 2004.
[34] P. Protter, D. Talay, The Euler scheme for Lévy driven stochastic differential equations, Ann. Probab. 25
(1997) 393–423.
[35] J.M. Sanz-Serna, K.C. Zygalakis, Wasserstein distance estimates for the distributions of numerical
approximations to ergodic stochastic differential equations, J. Mach. Learn. Res. 22 (2021) 1–37.
[36] K. Sato, Lévy Processes and Infinitely Divisible Distributions, Cambridge University Press, Cambridge, 1999.
[37] J. Shao, Weak convergence of Euler–Maruyama’s approximation for SDEs under integrability condition, 2018,
Preprint arXiv:1808.07250.
[38] U. Simsekli, L. Sagun, M. Gurbuzbalaban, A tail-index analysis of stochastic gradient noise in deep neural
networks, in: International Conference on Machine Learning, in: Proceedings of Machine Learning Research,
2019, pp. 5827–5837.
[39] D. Talay, Second-order discretization schemes of stochastic differential systems for the computation of the
invariant law, Stochastics 29 (1990) 13–36.
[40] B. Tarami, M. Avaji, Convergence of Euler–Maruyama method for stochastic differential equations driven by
α-stable Lévy motion, J. Math. Ext. 12 (2018) 31–53.
[41] J. Wang, L p -Wasserstein distance for stochastic differential equations driven by Lévy processes, Bernoulli 22
(2016) 1598–1616.
[42] L. Xu, Approximation of stable law in Wasserstein-1 distance by Stein’s method, Ann. Appl. Probab. 29
(2019) 458–504.
[43] X. Zhang, Derivative formulas and gradient estimates for SDEs driven by α-stable processes, Stochastic
Process. Appl. 123 (2013) 1213–1228.
[44] P. Zhou, J. Feng, C. Ma, C. Xiong, S.C.H. Hoi, Towards theoretically understanding why sgd generalizes
better than adam in deep learning, Adv. Neural Inf. Process. Syst. 33 (2020) 21285–21296.

167

You might also like