Averaging and Passage Through Resonances in Two-Frequency Systems Near Separatrices

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 49

Averaging and passage through resonances in two-frequency

systems near separatrices∗

Anatoly Neishtadt and Alexey Okunev

Abstract
We study averaging method for time-periodic perturbations of one-frequency Hamiltonian
systems such that solutions of the perturbed system cross separatrices of the unperturbed
system. The Hamiltonian depends on a parameter that slowly changes for the perturbed
system (so Hamiltonian systems with two and a half degree of freedom are included in our

arXiv:2108.08540v1 [math.DS] 19 Aug 2021

class). We prove that for most initial conditions (except a set of measure O( ε| ln5 ε|), here
ε is the small parameter) the evolution of slow variables is described by the averaged system

with accuracy O( ε| ln ε|) over time ∼ ε−1 .

1 Introduction
Small perturbations of integrable Hamiltonian systems is an important class of dynamical
systems encountered in various applications. Far from the separatrices of the unperturbed
system one can use the action-angle variables of the unperturbed system. The values of action
variables slowly change for the perturbed system and their evolution can be approximately
described by the averaged system that is obtained by averaging the rate of change of the action
variables over all values of the angle variables. For perturbations of one-frequency systems,
the averaged system describes the evolution of the action variables for all initial data with
accuracy O(ε) over times ∼ ε−1 ([1], [2]). Here ε is the small parameter of perturbed system.
For two-frequency systems, the averaged system describes the evolution of most initial data
√ √
(except a set with measure O( ε)) with accuracy O( ε| ln ε|) over times ∼ ε−1 . This result
is proved in [3] (see also review [4] and references therein), two-frequency systems were earlier
studied in [5] under a condition that prohibits capture into resonances. Resonances between
two frequencies of the problem are possible, and, as these frequencies change along solutions
of the perturbed system, it is possible that the ratio between the frequencies remains near a
resonant value for times ∼ ε−1 . This phenomenon is called capture into resonance, and there

are examples (e.g., in [4]) when this happens for a set of initial data with measure ∼ ε.
General results that hold for multi-frequency systems are weaker. For example, in [6] it is
shown that for any ρ > 0 the measure of the set of initial data such that accuracy of averaging
method is less than ρ is o(1)ε→0 .
The phase space of the unperturbed system is often divided into several domains by separa-
trices. Solutions of the perturbed system can cross separatrices of the unperturbed system and
move between these domains. Effects of separatrix crossing are well studied for one-frequency
systems ([7] and references within). Importantly, separatrix crossing leads to probabilistic phe-
nomena. For example, trajectory of a point moving in a double well potential (unperturbed
system) with small friction (perturbation) eventually exhibits separatrix crossing and remains
bounded in one of the two wells. Trajectories caught in each well are finely mixed in the phase
space when friction is small, thus capture in each well can be considered a ”random event”
with definite probability (see the discussion in [8], [7]). One can modify averaged system to
cover trajectories crossing separatrices: when the solution of averaged system in some domain
reaches the separatrices (and thus averaged system in this domain is no longer defined), one
can write averaged system in another domain bounded by the same separatrices and continue
this solution using averaged system in the other domain. When capture in several domains
is possible after separatrix crossing, there are modified averaging systems describing capture
into each such domain. The evolution of most initial data (with exceptional set having mea-
sure O(εr ) for any r > 0, this set corresponds to solutions that come too close to the saddle
of perturbed system) is described by a solution of modified averaged system with accuracy
O(ε| ln ε|) ([7]).
Separatrix crossing is much less studied for perturbations of Hamiltonian systems with two
and more frequencies. Separatrix crossing for time-periodic perturbations of one-frequency sys-
tems of special form (periodically forced by a single harmonic and weakly damped strongly
nonlinear oscillator corresponding to a double-well potential) is considered in [9]. Authors
∗ The work was supported by the Leverhulme Trust (Grant No. RPG-2018-143).

1
use multiphase averaging between resonances and single-phase averaging near resonances to
obtain formulas for the boundaries of the sets of initial data captured in each well (without rig-
orous justification). Stochastic perturbations of time-periodic perturbations of one-frequency
systems were studied in [10].
The goal of this paper is to obtain realistic estimates for the accuracy of averaging method
for time-periodic perturbations of one-frequency systems with separatrix crossing. This is
the simplest case when there are both resonances and separatrix crossing. Phase angle
on the closed phase trajectories of the unperturbed system and time are two angle vari-
ables, and resonances between their frequencies are possible. Far from separatrices results
on two-frequency systems mentioned earlier are applicable. We prove that accuracy of aver-

aging method O( ε| ln ε|) holds for most initial conditions, the exceptional set has measure
√ 5
O( ε| ln ε|). We also prove formulas for ”probabilities” of proceeding into different domains
after separatrix crossing similar to the formulas that hold in one-frequency case [7].
Our methods are as follows. We split the phase space into resonant zones near resonances,
non-resonant zones between resonant zones, and a small neighborhood of separatrices (with
width ∼ ε| ln5 ε|, it is chosen so that estimates for resonant zones work outside this neigh-
borhood). In non-resonant zones we use the standard coordinate change used to justify the
averaging method, taking into account that many functions describing this coordinate change
are unbounded near the separatrices.
In the resonant zones we average over time and obtain auxiliary system describing passage
through resonances. This is parallel to the methods used far from separatrices [4], but with
some important differences. It turns out that single step of averaging as in [4] is not sufficient,
and we need to use many steps of averaging method as proposed in [11] instead. To apply this
method, we continue the perturbed system in the action-angle variables near a resonance in
a complex domain. In [4] the auxiliary system is an equation describing the movement of a
particle in a periodic force field with a small perturbation. In our case the unperturbed system
is the same, but the perturbation consists of a Hamiltonian part that grows for resonances
near separatrices (this is why we need some separation from the separatrices) and a non-
Hamiltonian part that decreases for resonances near separatrices. Finally, while far from
separatrices capture only in finitely many resonances is possible, resonant zones such that
capture is possible may accumulate on the separatrices. We show that such resonances ω =
s2 /s1 (here ω is the frequency of the unperturbed system) can be split into finitely many series
(capture is possible only for finitely many values of s2 , for each s2 we have two series, one
with all odd s1 and another one with all even s1 ) such that in each of these series auxiliary
systems (more precisely, their unperturbed systems) have a limit when s1 → ∞.

2 Statement of results
2.1 Our setting
Consider a Hamiltonian system with one degree of freedom
∂H ∂H
q̇ = , ṗ = − (2.1)
∂p ∂q
with the Hamiltonian H(p, q, z) depending on a vector parameter z = (z1 , . . . , zn ) ∈ Rn . We
will call this system the unperturbed system. Denote by Z0 ⊂ Rn an open set of parameters,
we will only consider z ∈ Z0 . We assume that for all z ∈ Z0 the Hamiltonian H has a saddle
C(z) with two separatrix loops l1 (z) and l2 (z) forming a figure eight and C S
is a non-degenerate
critical point of H. Denote by B ⊂ R2p,q × Z0 some open neighborhood of z∈Z0 l1 (z) ∪ l2 (z).

l1
l2

B1 B2 B3
C

Figure 1: The unperturbed system.

2
S
The separatrices z∈Z0 l1 (z) ∪ l2 (z) cut B into three open domains B1 and B2 inside each
separatrix loops and B3 outside the union of the separatrices. Denote
h(p, q, z) = H(p, q, z) − H(C(z), z).
Then h = 0 on the separatrices for all z. We will assume h > 0 in B3 and h < 0 in B1 ∪ B2 .
We will assume that Bi are foliated by the level sets of H, so, with a slight abuse of notation,
we may write (h, z) ∈ Bi . We assume that the Hamiltonian is analytic on some compact set
B̃ with B ⊂ int B̃ and B̃ is also foliated by the level sets of H. Take a compact Z ⊂ Z0 . Then
for small enough ch , cz > 0 for any z∗ ∈ Z for any h, z with |h| < ch , kz − z∗ k < cz we have
(h, z) ∈ B.
Consider the perturbed system
∂H
q̇ = + εfq (p, q, z, λ, ε),
∂p
∂H
ṗ = − + εfp (p, q, z, λ, ε), (2.2)
∂q
ż = εfz (p, q, z, λ, ε),
λ̇ = 1.

Here H is real-analytic in B̃ and fp , fq , fz are C 2 in B̃ × Rλ × [−ε0 , ε0 ] for some ε0 > 0 and


2π-periodic in λ. We assume that for ε = 0 the functions fp , fq , fz are real-analytic in B̃ × Rλ .
Denote Ai = Bi × [0, 2π]λ , A = B × [0, 2π]λ .

2.2 Averaged system


The rate of change of h along the solutions of the perturbed system is εfh , where
∂h ∂h ∂h
fh = fp + fq + fz .
∂p ∂q ∂z
The variables h and z are slow variables of the perturbed system. To approximately track
the evolution of these variables in each domain Bi , the customary procedure is to consider the
averaged system. In each Bi denote by T (h, z) and ω(h, z) the period and the frequency of
the solution of the unperturbed system (2.1) with given h, z. The averaged system is given
by the equations
h˙ = εfh,0 (h, z), ż = εfz,0 (h, z), (2.3)
where Z 2π I Z 2π I
1 1
fh,0 = fh |ε=0 dtdλ, fz,0 = fz |ε=0 dtdλ.
2πT 0 2πT 0
The inner integrals above are taken along the closed trajectory of the unperturbed system
given by h = h, z = z (inside Bi ) and this trajectory is parametrized by the time t. Recall
that l1 and l2 denote the separatrices of the unperturbed system. Denote
Z 2π I
1
Θi (z) = − fh |ε=0 (p(t), q(t), z, λ)dtdλ for i = 1, 2; Θ3 = Θ1 + Θ2 . (2.4)
2π 0 li

(the separatrices are again parametrized by the time). These integrals converge, see [7, Sec-
tion 2.2].
We will assume Θi > 0, i = 1, 2, 3. Note that near separatrices we have T fh,0 ≈ −Θi in
Bi . Thus close to the separatrices h decreases in all Bi along the solutions of averaged system.
Moreover, for some Kh > 0 for small enough |h| > 0 we have
fh,0 < −Kh | ln−1 h| < 0. (2.5)
Once solution of the averaged system in B3 reaches h = 0, one can continue this solution
using the averaged system in B1 or B2 . We will say that such ”glued” solutions correspond to
capture into B1 or B2 , respectively. This is discussed in more detail in [7, Section 2.3].

2.3 Condition B 0
Let ϕ be the angle variable (from the pair of action-angle variables) defined in B3 . Pick the
transversal (to the solutions of unperturbed system) ϕ = 0 so that for all z it is a smooth
curve that crosses l1 at some point a1 (z) 6= C(z). It is easy to check that the transversal ϕ = π
crosses l2 at some point that we denote by a2 (z). Let us define the coordinates t1 , t2 on l1
and l2 as the time (for the unperturbed system) passed after the point a1 and a2 , respectively.
Given s = (s1 , s2 ) ∈ Z2>0 , set
Z D 
∗ j E
Fs,i (Q, z) = − (2π)−1 fh h=0, z, ti =ti , λ=ti −Q+2π , ε=0 dti for i = 1, 2;
li s2 j=0,...,s2 −1

∗ ∗ ∗ {s1 /2}
Fs,3 (Q, z) =Fs,1 (Q, z) + Fs,2 (Q + 2π , z).
s2

3
Here {·} denotes the fractional part and hψ(j)ij=0,...,k−1 denotes the average ψ(0)+···+ψ(k−1)
k
.

The functions Fs,i (Q) are periodic with period 2π/s2 .
RQ
For a function F (Q, z) set VF (Q, z) = 0 F (Q̃, z)dQ̃. We will need the following
Condition B0 (V). All extrema of the function V (Q) are non-degenerate (i.e. for all q such
2
that ∂V∂Q
= 0 we have ∂∂QV2 6= 0). Moreover, at different local maxima of V the values of V are
different.
Condition B0 (s, z, i). The function V (Q) = VFs,i 0
∗ (Q, z) satisfies the condition B above.

For fixed s this is a codimension one genericity condition on (fp , fq , fz ) and z.


Condition B0 (z, i). For any s = (s1 , s2 ) condition B 0 (s, z, i) holds.
The lemma below means that B 0 (z, i) is also a codimension one genericity condition on
(fp , fq , fz ) and z, as VF has no extrema if F > 0 and thus satisfies condition B 0 (V ).
Lemma 2.1. Given a uniform bound
Θi
kfp kC 2 , kfq kC 2 , kfz kC 2 ≤ K, > K −1 for i = 1, 2, 3,


there exists S(K) such that Fs,i > K −1 /2 if s2 > S(K).

Proof. We have fh (C(z), z) = 0 by [7, Lemma 2.1]. As (p(t1 ), q(t1 )) exponentially converges
to C for t1 → +∞ and t1 → −∞, there exists T1 such that
Z −T1 Z ∞
max |fh (h=0, t1 , λ)| + max |fh (h=0, t1 , λ)| < K −1 /100. (2.6)
t1 =−∞ λ t1 =T1 λ

For |t1 | < T1 (i.e. far from C) the transition between coordinates h, t1 and p, q is smooth, so
fh (h=0, t1 , λ) is a smooth function of t1 , λ with bounded C 2 -norm. We have
Z T1 D  Z T1
j E
fh h=0, z, t1 , λ=t1 −Q+2π dt1 = hfh (h=0, z, t1 , λ)iλ + O(s−2
2 ),
−T1 s2 j=0,...,s2 −1 −T1

as averaging over j gives trapezoidal rule approximation for averaging over λ. The integral on
the right-hand side is approximately −Θ1 with error at most K −1 /100 by (2.6). Take S such
that the O(s−2
2 ) term is less then K
−1 ∗
/100, then for s with s2 > S we have |2πFs,1 − Θ1 | <
−1 ∗
3K /100. We can obtain similar estimate for |2πFs,2 − Θ2 |, together they yield the estimate

on |2πFs,3 − Θ3 |.

3 Main theorem
Denote X = (h, z). Take a point
ˆ , ẑ ) ∈ B
ˆ = (h
X 0 0 0 3

and Λ > 0.
ˆ (λ) the solution of averaged system (2.3) describing capture from
For i = 1, 2 denote by X i
ˆ ˆ
B to B with X (0) = X . Denote by Z ⊂ Z the set of z that satisfy condition B 0 (z, i) for
3 i i 0 B
all i = 1, 2, 3.
Suppose that
∂ω
1. ω decreases along solutions of the averaged system: f
∂z z,0
+ ∂ω f < 0 in B1 ∪ B2 ∪ B3 ;
∂h h,0
ˆ (λ) are in B and at least
2. there exists cB > 0 such that for any λ ∈ [0, ε−1 Λ] the points X i
cB -far from the border of B;
3. Xˆ (ε−1 Λ) ∈ B , Xˆ (ε−1 Λ) ∈ B . Denote by λ the time when X ˆ cross separatrices (i.e.
1 1 2 2 ∗ i
ˆ (λ ) = 0) and set z = ẑ (λ ) (note that λ and z are the same for i = 1, 2);
hi ∗ ∗ i ∗ ∗ ∗

4. z∗ ∈ ZB ;
ˆ satisfy condition B from [4,
5. For some small enough1 Λ1 for i = 1, 2 the solutions X i
Section 2] for λ ∈ [0, λ∗ − ε Λ1 ] and λ ∈ [λ∗ + ε Λ1 , ε−1 Λ]. This is a genericity
2 −1 −1

condition, as explained in [4]


Let Br (X) ⊂ Rn+1
h,z denote the open ball with center X and radius r. Denote

Ar (X) = Br (X) × [0, 2π]2λ,ϕ ⊂ A.

Denote by m the Lebesgue measure on Rn+2


p,q,z × [0, 2π]λ .

1 It should be so small that if λ is ε−1 Λ -close to λ , we have |h (λ)| < c /2 and kz (λ) − z k < c /2, where
1 ∗ i h i ∗ z
ch , cz should be so small that we can apply Theorem 4.1 below.
2 In the notation of [4] these solutions should not cross the set θ.

4
Theorem 3.1. There exist C, r > 0 such that for any small enough ε there exists
ˆ )⊂A √
E ⊂ Ar (X 0 3 with m(E) ≤ C ε| ln5 ε|

ˆ ) \ E.
such that the following holds for any X0 ∈ Ar (X 0
Set λ0 = λ(X0 ). Denote by X(λ) the solution of the perturbed system (2.2) with X(λ0 ) =
X0 . Then X(λ0 + ε−1 Λ) ∈ Ai for some i = 1, 2. Denote by X(λ) the solution of the averaged
system (2.3) describing transition from B3 to Bi with X(λ0 ) = (h(X0 ), z(X0 )). Then for any
λ ∈ [λ0 , λ0 + ε−1 Λ] we have
√ √
|h(X(λ)) − h(X(λ))| < C ε| ln ε|, z(X(λ)) − z(X(λ)) < C ε| ln ε|. (3.1)

This theorem is proved in Section 4, it is reduced to technical Theorem 4.1 below on crossing
a small neighborhood of separatrices.
Let us now discuss ”probabilities” of capture in A1 and A2 . Given X0 = (p0 , q0 , z0 , λ0 ) ∈
A3 and small δ > 0, denote I0 = I(p0 , q0 , z0 ), ϕ0 = ϕ(p0 , q0 , z0 ), h0 = h(p0 , q0 , z0 ) (here I, ϕ
are action-angle variables of unperturbed system in A3 ). Let us define the set U δ ⊂ A3 by

U δ = {(I, z, ϕ, λ) : |I − I0 |, kz − z0 k , |ϕ − ϕ0 |, |λ − λ0 | < δ}.

Solutions of the perturbed system with initial data in U δ \ E are described by solutions of
averaged system describing transition to B1 (we say that such initial data is captured in A1 )
or transition to B2 (we say that such initial data is captured in A2 ). Denote by U1δ , U2δ ⊂ U δ \E
the sets of initial data captured in A1 and A2 , respectively.
Proposition 3.2.
m(Uiδ )
lim lim = Θi (z∗ )/Θ3 (z∗ ), i = 1, 2.
δ→0 ε→0 m(U δ )
Here z∗ is the value of z when the solution of averaged system with initial data (h0 , z0 ) crosses
separatrices.
This proposition is proved in Section 12.
Remark 3.3. Our results are applicable for Hamiltonian systems with two and a half degrees
of freedom, i.e., with the Hamiltonian

H(p, q, y, x, t) = H0 (p, q, y, x) + εH1 (p, q, y, x, t),

where H1 is 2π-periodic in time t. Here pairs of conjugate variables are (q, p) and (ε−1 x, y);
(q, p) are fast variables and (x, y) are slow variables. Indeed, we can take z = (x, y), then
Hamilton’s equations will be of form (2.2).

4 Approaching separatrices and passing through sep-


aratrices
In this section we state a technical theorem on crossing a small neighborhood of separatrices
and reduce Theorem 3.1 to this technical theorem. Then we split the technical theorem
into three lemmas on approaching separatrices, crossing separatrices, and moving away from
separatrices.
• For given Xinit ∈ A3 , denote λinit = λ(Xinit ) and let X(λ) be the solution of the
perturbed system (2.2) with initial data X(λinit ) = Xinit .
• Given X 0 ∈ Rn+1
h,z and λ0 , denote by X(λ) the solution of the averaged system (one
needs to specify in which domain Bi or in which union of these domains) with initial
data X(λ0 ) = X 0 .
Theorem 4.1. Given any z∗ ∈ ZB , for any small enough cz , ch > 0 for any C0 , Λ > 0 there
exists C > 0 such that for any small enough ε there exists E ⊂ A3 with

m(E) ≤ C ε| ln5 ε|

such that the following holds for any Xinit ∈ A3 \ E.


Suppose at some time
λ0 ∈ [λinit , λinit + ε−1 Λ]
the point X0 = X(λ0 ) satisfies

X0 ∈ A3 , kz(X0 ) − z∗ k ≤ cz , h(X0 ) = ch .

Then there exists i = 1, 2 and λ1 > λ0 such that

X(λ1 ) ∈ Ai , h(X(λ1 )) = −ch .

5
Take any X 0 ∈ Rn+1
h,z with

X 0 − (h(X0 ), z(X0 )) < C0 ε| ln ε|

and consider the solution X(λ) of averaged system corresponding to capture from B3 to Bi with
initial data X(λ0 ) = X 0 . Then for any λ ∈ [λ0 , λ1 ] we have
√ √
|h(X(λ)) − h(X(λ))| < C ε| ln ε|, z(X(λ)) − z(X(λ)) < C ε| ln ε|. (4.1)

Let us now prove the main theorem using the technical theorem to cover passage near
separatrices and [4, Theorem 1 and Corollary 3.1] far from separatrices. Let us now state this
result from [4] using our notation.
Theorem 4.2 ([4]). Pick X ˆ and Λ > 0. Suppose that solution of the averaged system X ˆ with
0
ˆ
initial data X(0) =X ˆ stays far from the separatrices for λ ∈ [0, ε−1 Λ] and satisfies certain
0
conditions (discussed right after the statement of theorem).
Then for small enough r > 0 for any small enough ε > 0 there exists E ⊂ Ar (X ˆ ) with
0
√ ˆ
m(E) = O( ε) such that for any X0 ∈ Ar (X 0 ) \ E we have
√ √
h(X(λ)) − h(X(λ)) = O( ε| ln ε|), z(X(λ)) − z(X(λ)) = O( ε| ln ε|)

for λ ∈ [λ0 , λ0 + ε−1 Λ], where λ0 = λ(X0 ), X(λ) is the solution of perturbed system with
initial data X(λ0 ) = X0 and X(λ) is the solution of averaged system with initial data X(λ0 ) =
(h(X0 ), z(X0 )).
For a full statement of the conditions, we refer the reader to [4, Section 2]. When we apply
this theorem below, these conditions are satisfied, the conditions in Section 3 are written for
this purpose.
We will also need the lemma below, it is proved in Section 13.4.
Lemma 4.3. For any Λ > 0 there exists C > 1 such that the flow g λ of (2.2) satisfies the
−1
following. Take open A ⊂ R2+n
p,q,z × [0, 2π]λ with m(A) < ∞. Then for any λ ∈ [−ε Λ, ε−1 Λ]
we have
m(g λ (A)) ≤ Cm(A).

Proof of Theorem 3.1. Take ch , cz > 0 such that we can apply Theorem 4.1 with these con-
ˆ (λ), ẑ (λ)) denotes the solution of averaged system describing
ˆ (λ) = (h
stants. Recall that X i i i
ˆ ˆ
capture in B with X (0) = X . Define λ by h ˆ (λ ) = h ˆ (λ ) = 2c /3 and λ by
i i 0 + 1 + 2 + h −,i
ˆ (λ ) = −2c /3, i = 1, 2. The number Λ from the conditions for the main theorem is such
h i −,i h 1
ˆ (λ)| < c /2 and
that if λ is ε−1 Λ1 -close to λ∗ , we have |h

ẑ i (λ) − z∗ < cz /2 for i = 1, 2.
i h
Thus
λ+ < λ∗ − ε−1 Λ1 < λ∗ + ε−1 Λ1 < λ−,i .
This means condition B from [4, Section 2] is satisfied for
ˆ (λ), λ ∈ [0, λ ]
X and ˆ (λ), λ ∈ [λ , ε−1 Λ].
X
i + i −,i

0 0
For small enough r solutions X (λ) = (h (λ), z 0 (λ)) of averaged system with any initial condi-
0 ˆ ) satisfy
tion X (0) ∈ Br (X 0

0 0
h (λ+ ) ∈ [ch /2, 3ch /4], hi (λ−,i ) ∈ [−3ch /4, −ch /2]. (4.2)

By [4, Corollary 3.1] we have (3.1) for λ ∈ [0, λ+ ], given that X0 is not in some set E1

of measure . ε. Together with (4.2) for small ε this implies h(X(λ0 + λ+ )) < 4ch /5. By
continuity we have h(X(λ0 +λ0+ )) = ch for some λ0+ ∈ [0, λ+ ]. Thus we can apply Theorem 4.1
(with λ0 in this theorem equal to λ0 + λ0+ ). This theorem gives (possibly after reducing r) a

set E2 of measure . ε| ln5 ε| such that if X0 6∈ E1 ∪ E2 , there is i = 1, 2 and λ0−,i such that
X(λ0 + λ0−,i ) ∈ Ai , with h(X(λ0 + λ0−,i )) = −ch and (3.1) holds for λ ∈ [λ0 + λ+ , λ0 + λ0−,i ].
We have hi (λ0 + λ0−,i ) < −4ch /5, by (4.2) this implies λ0−,i > λ−,i and so (3.1) holds for
λ = λ0 + λ−,i .
Denote
Xˆ ˆ ˆ ˆ
−,1 = X 1 (λ−,1 ), X −,2 = X 2 (λ−,2 ).
0 0 √
By [4, Corollary 3.1] there exist r− and E−,i , i = 1, 2 with m(E−,i ) = O( ε) such that for
i = 1, 2 solutions starting in Ar− (Xˆ ) \ E 0 are approximated by solutions of the averaged
−,i −,i √ 0
system (with the same initial h, z) with error O( ε ln ε). Let E−,i be the preimage of E−,i

under the flow of perturbed system over time λ−,i , we have m(E−,i ) = O( ε) by Lemma 4.3.
We can now write the exceptional set in the current theorem: E = E1 ∪ E2 ∪ E−,1 ∪ E−,2 .
Reducing r if needed, we may assume that solutions of averaged system describing capture in
Bi starting in Ar (Xˆ ) at λ = λ are in A ˆ
0 0 r− /2 (X −,i ) at λ = λ0 + λ−,i for i = 1, 2. Thus X(λ)

is O( ε| ln ε|)-close to the solution of averaged system with initial data

X(λ0 + λ−,i ) = (h(X(λ0 + λ−,i )), z(X(λ0 + λ−,i )))

6
for λ ∈ [λ0 + λ−,i , λ0 + ε−1 Λ]. The difference between this solution of averaged system and

X i (λ) at the moment λ = λ0 + λ−,i is O( ε| ln ε|), it stays of the same order, as the dynamics
in slow time takes time O(1) and the averaged system is smooth far from separatrices. Thus we
have (3.1) for λ ∈ [λ0 + λ−,i , λ0 + ε−1 Λ]. Now we have proved (3.1) for all λ ∈ [λ0 , λ0 + ε−1 Λ],
as required.

Let us now split Theorem 4.1 into a lemma on approaching the separatrices, lemma on
crossing immediate neighborhood of separatrices, and lemma on moving away from the sepa-
ratrices.
• Take ρ = 5, suppose we are given Cρ > 0. Denote h∗ = Cρ ε| lnρ ε|.
The immediate neighborhood of separatrices is given by |h| < h∗ (ε).
Lemma 4.4 (On approaching separatrices). Take any z∗ ∈ ZB , any small enough ch , cz > 0,
any large enough Cρ > 0. Then for any C0 , Λ > 0 there exists C > 0 such that for any small
enough ε there exists E ⊂ A3 with √
m(E) < C ε
such that for any Xinit ∈ A3 \ E the following holds.
Suppose that at some time

λ0 ∈ [λinit , λinit + ε−1 Λ]

the point X0 = X(λ0 ) satisfies

X0 ∈ A3 , h(X0 ) = ch , kz(X0 ) − z∗ k ≤ cz .

Then at some time λ1 > λ0 we have

X(λ1 ) ∈ A3 , h(X(λ1 )) = h∗ (ε).

Take any X 0 ∈ Rn+1


h,z with

X 0 − (h(X0 ), z(X0 ) < C0 ε| ln ε|.

and consider the solution X(λ) of averaged system in B3 with initial data X(λ0 ) = X 0 . Then
for any λ ∈ [λ0 , λ1 ] we have
√ √
|h(X(λ)) − h(X(λ))| < C ε| ln ε|, z(X(λ)) − z(X(λ)) < C ε| ln ε|.

Moreover, √
|h(X(λ1 )) − h(X(λ1 ))| < C ε.
This lemma will be proved in Section 7.
Lemma 4.5 (On moving away from separatrices). Take any z∗ ∈ ZB , any small enough
ch , cz > 0, any large enough Cρ > 0. Then for any C0 , Λ > 0 there exists C > 0 such that for
any small enough ε there exists E ⊂ A3 with

m(E) < C ε

such that for any Xinit ∈ A3 \ E the following holds.


Suppose that for i = 1 or i = 2 at some time

λ0 ∈ [λinit , λinit + ε−1 Λ]

the point X0 = X(λ0 ) satisfies

X0 ∈ Ai , h(X0 ) = −h∗ (ε), kz(X0 ) − z∗ k ≤ cz .

Take any X 0 = (h0 , z 0 ) ∈ Rn+1


h,z with
√ √
|h0 − h(X0 )| < C0 ε, kz 0 − z(X0 )k < C0 ε| ln ε|

and consider the solution X(λ) of averaged system in Bi with initial data X(λ0 ) = X 0 . Then
for any λ ∈ [λ0 , λinit + ε−1 Λ] we have
√ √
|h(X(λ)) − h(X(λ))| < C ε| ln ε|, z(X(λ)) − z(X(λ)) < C ε| ln ε|.

This lemma is proved similarly to the previous one, thus we omit the proof. In the proofs
of these two lemmas we estimate the difference between solutions of perturbed and averaged

system
∂h in the−1chart w = (I, z), and this distance is O( ε| ln ε|). This and the estimate
= O(ln h) in Lemma 5.1 below explains why near separatrices (when ln h ∼ ln ε) the
∂w √ √
distance in h is O( ε), while far from separatrices this distance is O( ε| ln ε|).

7
Lemma 4.6 (On passing separatrices). Take any z∗ ∈ Z, any small enough cz > 0, any large
enough Cρ > 0. Then for any Λ > 0 and γ ∈ R there exists C > 0 such that for any small
enough ε > 0 there exists a set E ⊂ A3 with

m(E) ≤ C ε| lnρ−γ+1 ε|

such that for any Xinit ∈ A3 \ E the following holds.


Suppose that at some time

λ0 ∈ [λinit , λinit + ε−1 Λ]

the point X0 = X(λ0 ) satisfies

X0 ∈ A3 , kz(X0 ) − z∗ k ≤ cz , h(X0 ) = h∗ (ε).

Then at some time λ1 > λ0 with



ε(λ1 − λ0 ) ≤ ε| lnγ ε|.

we have
X0 ∈ A1 ∪ A2 , h(X(λ1 )) = −h1 .
This lemma will be proved in Section 11.
(0)
Proof of Theorem 4.1. Suppose that cz is small enough for all three lemmas, take in the
(0)
theorem cz = cz /3. Take ch in the theorem such that
• it is small enough for all three lemmas;
• while any solution of averaged system describing passage from B3 to B1 or B2 passes
from h = ch to h = −2ch the total variation of z is at most cz /3. This can be done, as
dz
dh
= O(ln h) for solutions of averaged system.
Take Cρ large enough for all three lemmas. Take Λ(0) such that solutions of averaged systems
describing capture in B1 and B2 starting with h = ch reach h = −2ch after time less than
ε−1 Λ(0) passes. We will apply the three lemmas with Λ greater than given in the theorem by
Λ(0) .
Given C0 and Λ, Lemma 4.4 gives us C, E that we denote by C (1) , E (1) . For Xinit ∈ A3 \E (1)
(1) (1)
the solution X(λ) reaches h = h∗ (ε) at some moment that we denote λ . For λ ∈ [λ0 , λ ]
(1) (1) (0)
we have (4.1) with C = C . Note that z(X(λ )) − z∗ < cz for small enough ε due


to (4.1) and z(X(λ(1) )) − z∗ < cz /3 (this holds by our choice of ch ).

(0)
Apply Lemma 4.6 with γ = 1, cz = cz , C0 = C (1) , it gives us C, E that we denote by
C , E (2) . For Xinit ∈ A3 \(E (1) ∪E (2) ) the solution X(λ) reaches h = −h∗ (ε) at some moment
(2)

that we denote λ(2) . We have ε(λ(2) − λ(1) ) ≤ ε| ln ε|. The change of h, z, h, z during this

time is bounded by C (2) ε| ln ε| for some C (2) > 0. Then for λ ∈ [λ(1) , λ(2) ] we have (4.1)
(0)
with C = C (1) + 2C (2) . As above, we get z(X(λ(2) )) − z∗ < cz .

(0)
Finally, apply Lemma 4.5 with cz = cz and C0 = C (1) + 2C (1) , it gives us C, E that we
denote by C (3) , E (3) . Set E in the theorem equal to E (1) ∪ E (2) ∪ E (3) . For Xinit ∈ A3 \ E the
solution X(λ) reaches h = −ch at some moment that we denote λ(3) . For λ ∈ [λ(2) , λ(3) ] we
have (4.1) with C = C (3) . Take in the theorem C = C (3) + 2C (1) + 2C (2) .

5 Analysis of the perturbed system


We now focus on the proof of the lemma on approaching separatrices. Let us consider the
perturbed system in A3 .

5.1 Action-angle and energy-angle variables


In the domain B3 let us consider the action-angle variables I(p, q, z), ϕ(p, q, z) of the unper-
turbed system. We choose the angle variable in such a way that ϕ = 0 is given by an analytic
transversal to one of the separatrices that is far away from the saddle. In the energy-angle
variables h, ϕ the perturbed system rewrites as

ḣ = εfh (h, z, ϕ, λ, ε),


ż = εfz (h, z, ϕ, λ, ε),
(5.1)
ϕ̇ = ω(h, z) + εfϕ (h, z, ϕ, λ, ε),
λ̇ = 1.

8
∂ ∂
We will denote by ∂z the partial derivative for fixed h, ϕ and by ∂z p,q
the partial derivative
for fixed p, q. We will often use the action I instead of h, then the perturbed system rewrites
as
I˙ = εfI (I, z, ϕ, λ, ε),
ż = εfz (I, z, ϕ, λ, ε),
ϕ̇ = ω(I, z) + εfϕ (I, z, ϕ, λ, ε),
λ̇ = 1,
∂I ∂I
where fI = f
∂h h
+ f .
∂z z
We will denote w = (I, z) and f = (fI , fz ). Then the system above
rewrites as
ẇ = εf (w, ϕ, λ, ε), ϕ̇ = ω(w) + εfϕ (w, ϕ, λ, ε), λ̇ = 1. (5.2)
Denote
f0 = hf (w, ϕ, λ, 0)iϕ,λ , fz,0 = hfz (w, ϕ, λ, 0)iϕ,λ ,
fh,0 = hfh (w, ϕ, λ, 0)iϕ,λ , fI,0 = hfI (w, ϕ, λ, 0)iϕ,λ .
2π 2π
Here hψ(ϕ, λ)i denotes the average 4π1 2 0 0 ψ(ϕ, λ)dϕdλ. Note that this gives the same
R R

fh,0 and fz,0 as the formulas in Section 2.2. The averaged system can be rewritten using I
instead of h as follows
ẇ = εf0 (w). (5.3)
As ∂ω
∂h
∼ h−1 ln−2 h > 0 (see (5.8) below), (2.5) implies that close enough to separatrices for
some Kω > 0
∂ω ∂ω ∂ω
fz,0 < −Kω h−1 ln−3 h < 0.

f0 = fh,0 + (5.4)
∂w ∂h ∂z
This means that near separatrices ω decreases along the solutions of averaged system.
The following estimates on the connection between I and h will be proved in Section 13.3.
Denote v = (h, z).
Lemma 5.1. We have

∂h ∂w ∂h −1
∂fI,0 ∂fz,0 −1 −3
= ω, ∂v = O(ln h),

∂w = O(ln h),

∂w ∂w = O(h ln h).
,
∂I

5.2 Complex continuation


Taking a finite subcover, it is easy to prove that there is K1 > 0 such that the function H and
the unperturbed system (2.1) can be continued to the set

UK1 (B) = {v + v 0 , v = (p, q, z) ∈ B, v 0 ∈ Cn+2


0
p,q,z ,
v < K1 },

while the functions fp |ε=0 , fq |ε=0 , fz |ε=0 can be continued to

UK1 (A) = {X + X 0 , X = (p, q, z, λ) ∈ A, X 0 ∈ Cn+3


0
p,q,z,λ ,
X < K1 }.

Let us now discuss analytic continuation of the angle variable near the separatrices. The proofs
of the statements below can be found in Section 13.1.
For given ω̂ define ĥ(z) by the equality ω(ĥ, z) = ω̂.
Lemma 5.2. For any c1 > 0 there exists c2 > 0 such that for any (h0 , z0 ) ∈ B3 with 0 < h0 <
c2 the following holds. Set ω̂ = ω(h0 , z0 ) and T̂ = T (h0 , z0 ). Then ĥ is uniquely defined for
all z with kz − z0 k < c2 and the period T (h, z) can be continued to

{(h, z) ∈ Cn+1 , kz − z0 k < c2 , |h − ĥ(z)| < c2 |ĥ(z)|}



with T (h, z) − T̂ < c1 T̂ . Moreover, in the neighborhood above we have T = A(h, z) ln h +

B(h, z), where A and B are bounded analytic functions with A 6= 0 and ln is the branch of the
complex logarithm obtained by analytic continuation of the real logarithm.
Lemma 5.3. Denote r(h, z, ϕ) = (p, q). Then there is c ∈ (0, c2 ) (here c2 is the constant
from Lemma 5.2) such that for any z0 and any ω̂ ∈ (0, c) with (ĥ(z0 ), z0 ) ∈ B3 the function
r(h, z, ϕ) can be continued to

D = (h, z, ϕ) ∈ Cn+2 , kz − z0 k < c, |h − ĥ(z)| < c|ĥ(z)|, |Im ϕ| < cω̂




with (r, z) ∈ UK1 (B) and r(h, z, ϕ) = r(h, z, ϕ + 2π).


Let us now consider a resonance given by ω = s2 /s1 for coprime positive integers s1 , s2 . Set
ω̂ = s2 /s1 . Given a function ψ(h, z), we denote ψ̂(z) = ψ(ĥ(z), z). Take z0 with (ĥ(z0 ), z0 ) ∈
B3 . By Lemma 5.3 the system (5.1) can be continued to the complex domain
n o
D0 = z, h, ϕ, λ ∈ Cn+3 : |z − z0 | < ccont , |h−ĥ(z)| < ccont |ĥ(z)|, |Im ϕ| < ccont ω̂, |Im λ| < ccont
(5.5)

9
for some ccont > 0. The function T (h, z) also continues in this domain with |T (h, z) − T̂ | <
dcont T̂ , where T̂ = 2π/ω̂ and dcont > 0 can be made as small as needed by reducing ccont , by
Lemma 5.2. The constants ccont and dcont are uniform, they do not depend on s2 , s1 , h0 , z0 .
Finally, we need the following technical lemma.
Lemma 5.4. Consider an analytic function ψ(p, q, z, λ) (such that ψ can be continued to
UK1 (A)) such that ψ = 0 at C(z) for all z and λ. Rewrite this function in the chart h, z, ϕ, λ.
Then for any (h, z, ϕ0 , λ) ∈ D0 with Re ϕ0 ∈ [0, 2π] we have
Z ϕ0
ω −1 ψ(h, z, ϕ, λ)dϕ = O(1).
ϕ=0

Here the integral is taken along any path homotopic to the segment [0, ϕ0 ].

5.3 Estimates
Lemma 5.5. We have the following estimates and equalities valid in D0 , the constants in
O-estimates below do not depend on s1 , s2 , h0 , z0 .
∂T ∂T
T ∼ ln h, ∼ h−1 , = O(ln h),
∂h ∂z
∂ω ∂2ω ∂2ω
∼ h−1 ln−2 h, = O(h−1 ln−2 h), = O(h−2 ln−2 h),
∂h ∂h∂z ∂h2
∂ω ∂2ω
∼ h−1 ln−3 h, = O(h−2 ln−4 h), (5.6)
∂I ∂I 2
Z T
∂h ∂I 1 ∂H

= ω, =− (h, z, ωt)dt = O(1),
∂I ∂z 2π t=0 ∂z p,q
∂ ĥ
= O(h ln h).
∂z
Here and below in this paper we use the expressions y = O(x) and y ∼ x for negative or
complex x and y as a shorthand for |y| = O(|x|) and |y| ∼ |x|.

Proof. We have ∂h∂I


= ω from the Hamiltonian equations in the coordinates I, ϕ. The estimates
for T and ω and their derivatives follow from the last part of Lemma 5.2 and the formula for
∂h ∂I
∂I
. The formula for ∂z is well known. It can be found, e.g., in [7, Corollary 3.2], where the
∂I
estimate ∂z = O(1) is proved in the real case. In the complex case this estimate follows from
∂I
the formula for ∂z by Lemma 5.4.
ˆ
Let us prove the estimate for ∂ ĥ . We have 0 = ∂ ω̂ = ∂ω ˆ , thus ∂ ĥ = − ∂ω
ˆ ∂ ĥ + ∂ω ∂z
=
∂z ∂z ∂h ∂z ∂z ∂z ˆ
∂ω
∂h
O(h ln h). Here we use the notation ψ̂(z) = ψ(z, ĥ(z)).

Let us move on from estimates on the complex continuation of the perturbed system to
estimates on the real perturbed system. We will use the notation O∗ . A precise definition can
be found in [12, Table 1]. Roughly speaking, g = O∗ (ha lnb h) means that gh−a ln−b h = O(1)
and fastly decreases near the saddle C. We will need the following fact ([12, Lemma 11.1])
Z 2π
O∗ (ha lnb h)dϕ = O(ha lnb−1 h). (5.7)
0

Lemma 5.6. Let ψ(p, q, z) denote any smooth function such as3 fh or fz . Then we have in
the real domain B3

∂ψ ∂ψ
= O∗ (h−1 ln−1 h),
∂z = O(1),

∂h
fϕ = O∗ (h−1 ln−2 h), fI = O(ln h), fh,0 ∼ ln−1 h, fI,0 = O(1), fz,0 = O(1),

∂f ∂ω
−1 −1
∼ h−1 ln−2 h, ∂ω = O(ln−1 h).

∂w = O∗ (h ln h),

∂h ∂z
(5.8)

These estimates hold with O-estimates bounded from above by uniform constants that depend
only on the system and the domain B3 and ∼-estimates bounded from above and from below
by uniform constants.
∂f
Proof. For the proofs of these estimates (except the estimate for ∂w ) see [12, Table 1] and the
∂fz ∂fz −1 −2
references within. We have ∂I = ω ∂h = O∗ (h ln h) and

∂fI ∂
= (ω −1 fh ) = ω −1 ∂fh + ∂ (ω −1 )fh = O∗ (h−1 ln−1 h).


∂I ∂I ∂I ∂I
3 We can ignore that fh and fz also depends on λ, as we can just use the estimates on ψ for each fixed value of λ.

10

In the same way we get ∂f = O(1) and ∂f = O(ω −1 ). This implies the estimate for

z I
∂z ∂z
∂f
∂w
.

6 Resonant and non-resonant zones


6.1 Fourier series
Denote by fm , m = (m1 , m2 ) ∈ Z2 the Fourier coefficients of the vector-valued function f |ε=0 :
X
f (w, ϕ, λ, 0) = f0 (w) + fm (w)ei(m1 ϕ+m2 λ) .
|m|6=0

The following lemma will be proved in Subsection 13.2, using the results on complex continu-
ation stated in Subsection 5.2.
Lemma 6.1. There is CF such that for any m ∈ Z2 we have in B3
 
|m1 |
kfm k . exp − CF |m2 | − CF . (6.1)
T

Moreover,

∂fm −1 −1
∂fm
∂h . |h ln h| exp(−CF |m2 |), ∂z . exp(−CF |m2 |),

For given N that will be chosen in the lemma below let us denote
X
RN f (w, ϕ, λ) = fm (w)ei(m1 ϕ+m2 λ) .
|m|>N

Lemma 6.2. There is N ∼ ln2 ε such that for h > ε we have kRN f k < ε.

Proof. As h > ε, we have T . | ln ε|. Hence, we can take N ∼ ln2 ε such that CF N/T ≥ 2| ln ε|,
then exp −CF |m2 | − CF |mT1 | ≤ ε2 for any m with |m| > N . From this it is easy to obtain
kRN f k < ε.

6.2 Non-resonant zones


A resonance is given by ω(h, z) = ξ > 0, where ξ = s2 /s1 with s = (s1 , s2 ) ∈ Z2>0 ;
|s| = |s1 | + |s2 | ≤ N and the numbers s1 and s2 are coprime. Assume that the resonances
are enumerated in such a way that ξ1 > ξ2 > ξ3 > . . . . Let sr denote the vector (s1 , s2 )
corresponding to ξr . Take small k > 0 and denote by B3,∗ ⊂ B3 the set of points (h, z) that
satisfy kw(h, z) − w(h=0, z∗ )k < k. The value of k is picked so that B3,∗ is far from ∂B. The
constants ch , cz in Theorem 4.1 are small enough; we will assume that B3,∗ contains the set
given by h ∈ (0, 2ch ), kz − z∗ k < 2cz . Let us fix γ = 5 and consider the sets

Π = {(h, z) ∈ B3,∗ : h ≥ 2ε| lnγ ε|}, ∂Π = {(h, z) ∈ B3,∗ : h = 2ε| lnγ ε|}. (6.2)

Inside Π (and also inside the zone Π̃ defined below in Section 8) we have (for any α > 0)

ε/h ≤ ln−γ ε < 1, εh−1 ln−α h ≤ ln−(γ+α) ε .

| ln h| < | ln ε|, (6.3)

For a resonance s denote bs = e−CF s2 < 1, where CF is defined in (6.1). Recall that when s
is fixed, ĥ(z) denotes the resonant value of h given by ω(ĥ(z), z) = s2 /s1 . Set
q
δs (z) = εbs ĥ−1 ln−4 ĥ + εĥ−1 | ln−3 ĥ| ln2 ε.

Take any large enough CZ (it should be greater than some constant KZ which will be deter-
mined in the proof of Lemma 6.6). For each resonance given by s = (s1 , s2 ), define its inner
resonant zone n o
Z(s) = (h, z) ∈ Π : |ω(h, z) − s2 /s1 | ≤ CZ δs (z) .

Set Zr = Z(sr ). Such resonant zone has width ∼ εh in h if s2 ∼ 1 and h & ε ln2 ε.
The following lemma shows that the value of h stays roughly the same between two neigh-
boring resonances for fixed z.
Lemma 6.3. Let ξr > ξr+1 be neighboring resonances. Fix z and let ĥr > ĥr+1 be the
corresponding values of h. Then if ĥr+1 ≥ ε| lnγ ε|, we have
 
ĥr < 1 + O(| ln−1 ε|) ĥr+1 . (6.4)

11
−1
Proof. As ω ∼ ln−1 h, ĥr+1 ≥ ε| lnγ ε| implies ξr+1 & | ln−1 ε|. As ξr−1 and ξr+1 are two
neighboring rational numbers that can be written as s1 /s2 with |s1 | + |s2 | ≤ N , N ∼ ln2 ε,
taking fixed s2 ≈ N ξr /3 and changing s1 ≈ N/3 with unit step gives the estimate

|ξr−1 − ξr+1
−1
| . s−1
2 ∼ (ξr N )
−1
. | ln−1 ε|.

Integrating the equality ∂


∂h
(ω −1 ) ∼ h−1 , we get

| ln ĥr − ln ĥr+1 | ∼ |ξr−1 − ξr+1


−1
| . | ln−1 ε|.

Taking exponent gives (6.4).

Lemma 6.4. For any CZ > 0 for all sufficiently small ε each point w ∈ Π lies inside at most
one zone Zr .

Proof. Take two resonances s and s0 . We have

|s2 /s1 − s02 /s01 | ≥ s−1 0 −1


1 (s1 ) = s−1 0 −1
2 (s2 /s1 )(s1 ) & s−1
2 ln
−3
ε

and (as bs s22 = O(1))

(δs /|s2 /s1 − s02 /s01 |)2 . εh−1 ln−4 h ln6 ε . ln2−γ ε

with the last inequality following from (6.3). Similarly, we have (δs0 /|s2 /s1 −s02 /s01 |)2 . ln2−γ ε.
Thus for γ2 > 2 and small enough ε these resonant zones are disjoint.
s02 k−1 s2 1
Remark 6.5. Take an integer k ∼ ln ε, s01
= k2
, s1 = k+1
. We have
0
s2
− s2 = 1
∼ ln−3 ε.

s0 s1
k 2 (k + 1)
1

As bs ∼ 1 (and even as ∼ 1 for as defined later in Subsection 8) and ln h ∼ ω −1 ∼ ln ε, we


have p
δs & εh−1 ln−4 ε.
To avoid s02 /s01 being inside the resonant zone of s2 /s1 , we need h & ε ln2 ε.
Denote by Zr,r+1 ⊂ Π the zone between two neighboring resonance zones Zr and Zr+1 ,
ξr > ξr+1 . For a more formal definition we need to denote by Sz ⊂ Π the set of all w ∈ Π with
given z. The intersection Zr,r+1 ∩ Sz is defined as the segment between Zr ∩ Sz and Zr+1 ∩ Sz
if both are non-empty. If one of these sets is empty, we take the segment between the other
set and an endpoint of the segment Sz . If both these sets are empty, Zr,r+1 ∩ Sz is also empty.

6.3 Lemma on crossing non-resonant zones


It is convenient to use action-angle variables to describe crossing non-resonant zones. We
will denote W = (w, ϕ, λ). Take some initial data Winit = (winit , ϕinit , λinit ) ∈ A3 . Let
W (λ) = (w(λ), ϕ(λ), λ) be the solution of the perturbed system (5.2) with this initial data.
Let us also denote by w(λ) some solution of the averaged system (5.3) with w(λinit ) close to
w(λinit ). We will use the notation h(λ) = h(w(λ)), h(λ) = h(w(λ)).
Along the solution of the averaged system ω decreases due to (5.4). Until this solution
reaches ∂Π, it passes the zones in the following order: Z1 , Z1,2 , Z2 , Z2,3 , . . . . The evolution of
slow variables given by the perturbed system w(λ) passes the zones more or less in the same
order, but as it oscillates, it can leave and reenter the zones. The lemma below covers the
times from the first entry to Zr,r+1 until the first entry to Zr+1 (or reaching ∂Π).
Lemma 6.6. Fix z∗ ∈ Z, then there exist γ1 , KZ > 0 such that the following holds for any
CZ > KZ for some C6 , C7 , Ct > 0 and any small enough cz , ω0 , ε > 0. Suppose at some time
λ0 > λinit we have
w(λ0 ) ∈ Zr,r+1 , ω(w(λ0 )) < ω0 .
a) Then there exists λ1 > λ0 such that w(λ1 ) lies in ∂Π or on the border between Zr,r+1 and
Zr+1 .
b)
ε(λ1 − λ0 ) ≤ Ct (ξr − ξr+1 )h(λ1 ) ln3 h(λ1 ) .

c) h(λ) ≤ 35 h(λ0 ) for λ ∈ [λ0 , λ1 ].


d) Assume that for some λ01 ∈ [λ0 , λ1 ] we also have

h(λ01 ) > 0.5h(λ01 ), h(λ0 ) < 2h(λ0 ). (6.5)

Denote bsr = e−CF s2 , where ξr = s2 /s1 . Denote bsr+1 by the same formula with ξr replaced
by ξr+1 . Then for all λ ∈ [λ0 , λ01 ] we have (in the error term below we denote h = h(λ1 ))
q √
kw(λ) − w(λ)k < eC6 (ξr −ξr+1 ) kw(λ0 ) − w(λ0 )k + C7 ( bsr + bsr+1 ) εh|ln h| + C7 ε| lnγ1 ε|.
p

(6.6)

12
6.4 Outer and middle resonant zones
Suppose we are given CZ , CZ0 with CZ0 > CZ > 0 and numbers D0 (s) for each resonance
s = (s1 , s2 ). Define the outer resonant zone
n o
Z 0 (s) = (h, z) ∈ Π : |ω(h, z) − s2 /s1 | ≤ CZ0 δs (z) .

Fix some resonant ω̂ = s2 /s1 . Given a function ψ(h, z), denote ψ̂(z) = ψ(ĥ(z), z). Denote
s
ˆ
∂ω p
α(z) = αs (z) = ε/ ∼ εĥ ln3 ĥ.
∂I
Let D(h, z) = Ds (h, z) be defined by

ˆ + Dα(z) ω̂.
I(h, z) = I(z)
The middle resonant zone is defined in terms of D as
n o
Z m (s) = (h, z) ∈ Π : |D(h, z)| ≤ D0 (s) .

Set Zrm = Z m (sr ) and Zr0 = Z 0 (sr ).


Lemma 6.7. Given K, S2 > 0, there exists K1 > K such that for any Dp,0 > K1 we can pick
CZ0 > CZ > K and functions Dp0 (s, z) such that for small enough ε for any s we have
• Z(s) ⊂ Z m (s) ⊂ Z 0 (s),
• Dp0 (s, z) ≤ Dp,0 and Dp0 (s, z) = Dp,0 if s2 > S2 ,
√ q 
• Dp0 (s, z) = c1 Dp,0 bs + εĥ(z)−1 ln−1 ĥ(z) ln2 ε for some c1 > 0.

Proof. Denote n o
Z(s, C) = (h, z) ∈ Π : |ω(h, z) − s2 /s1 | ≤ Cδs (z) .
This gives inner resonant zones for C = CZ and outer resonant zones for C = CZ0 . Let Ds (z, C)
be the width of this zone in D. It is easy to check that δs (z) = O(ln−1 ε) if (ĥs (z), z) ∈ Π, so
for fixed z the values of h in Z(s, C) differ by o(h). This implies ∂ω ˆ in this zone (for
∼ ∂ω
∂I ∂I
any C this holds for small enough ε). p √
As γ ≥ 2, we have εĥ−1 ln−3 h ln2 ε . εĥ−1 ln−4 h. Denote δ̃s = εh−1 ln−4 h. We have
δs ≤ C1 δ̃s for some C1 > 0. We have
s
ˆ √
∂ω ˆ
∂ω δs
Ds (z, C) ∼ Cδs /( α ω̂) ∼ Cδs / ε ω̂ ∼ C .
∂I ∂I δ̃s
Take d0 (s) = C1−1 δs /δ̃s ≤ 1 if s2 > S2 and d0 (s) = 1 otherwise. Note that δs ∼ δ̃s if s2 ≤ S2
as bs ∼ 1 for such s. Then for some C2 > 1 we have
Ds (z, C) ∈ [CC2−1 d0 (s), CC2 d0 (s)] for all s.
Note that the value of C2 does not depend on K and Dp,0 , it only depends on S2 . Take
CZ = Dp,0 C2−1 , CZ0 = Dp,0 C2 , Dp0 (s) = Dp,0 d0 (s)
and K1 so large that Dp,0 > K1 implies CZ > K. We have
Ds (z, CZ ) ≤ CZ C2 d0 (s) = Dp,0 d0 (s) = Dp0 (s), Ds (z, CZ0 ) ≥ CZ0 C1−1 d0 (s) = Dp,0 d0 (s) = Dp0 (s).
This implies Zr (s) ⊂ Zrm (s) ⊂ Zr0 (s).

6.5 Lemmas on crossing resonant zones


We will call a resonance high-numerator if s2 < S2 and low-numerator otherwise. The constant
S2 is picked in the proof of Lemma 6.8 below. Denote by Br (O) the open ball with center
O and radius r. The lemma below covers crossing middle resonant zones of high-numerator
resonances.
Lemma 6.8. Given z∗ ∈ Z, for any Cs1 > 0 and Dp > 1 for any large enough S2 there
exist Cρ , C > 1 such that for any small enough cz , ω0 , ε0 > 0 for any ω̂ = s2 /s1 ∈ (0, ω0 ) with
s2 > S2 and s1 < Cs1 ln2 ε we have the following for any Dp0 ∈ (0, Dp ]. Take some initial
condition X0 = (p0 , q0 , z0 , λ0 ) with
ˆ 0 )| ≤ Dp0 α(z0 )ω̂ 0.5 ,
|I(X0 ) − I(z z0 ∈ Bcz (z∗ ), h(X0 ) > Cρ ε| ln ε5 |.
Denote by X(λ) the solution of the perturbed system with this initial data. Then this solution
crosses the hypersurface
ˆ − Dp0 α(z)ω̂ 0.5
I = I(z)
at some time λ1 > λ0 with
ε(λ1 − λ0 ) ≤ CDp0 α(z0 )ω̂ 0.5 + 2πs1 ε.

13
The next lemma covers crossing middle resonant zones of low-numerator resonances.
Lemma 6.9. Given z∗ ∈ ZB , for any Cs1 , Dp,0 , Λ > 1 there exist

Dp > Dp,0 , Cρ , C > 1

such that for any small enough cz , ω0 , ε > 0 for any ω̂ = s2 /s1 ∈ (0, ω0 ) with s1 < Cs1 ln2 ε
we have the following. Set
α∗ = max α(z).
Bcz (z∗ )

Then there exists a set Es ⊂ A3 with

m(Es ) ≤ Cs1 ω̂ −1 α∗

such that the following holds.


Take some initial condition Xinit = (pinit , qinit , zinit , λinit ) ∈ A3 \ Es . Let X(λ) be the
solution of the perturbed system with this initial data. Suppose that this solution crosses the
hypersurface
ˆ + Dp α(z)ω̂ 0.5
I = I(z)
at some time λ0 > λinit with ε(λ0 − λinit ) ≤ Λ, z(λ0 ) ∈ Bcz (z∗ ) and h(λ0 ) > Cρ ε| ln ε5 |. Then
this solution crosses the hypersurface
ˆ − Dp α(z)ω̂ 0.5
I = I(z)

at some time λ1 > λ0 with

ε(λ1 − λ0 ) ≤ Cα(z0 )| ln ε|ω̂ 0.5 ≤ Cα∗ | ln ε|ω̂ 0.5 .

14
7 Proof of the lemma on approaching separatrices
7.1 Picking constants and excluded set E
Considerations for non-resonant zones give us Cs1 and some bound from below KZ on CZ .
Lemma 6.9 gives us the value of S2 . Lemma 6.7 (with K = KZ and S2 fixed above) gives us
Dp,0 . Apply Lemma 6.9 (with this Dp,0 ) to get Dp . Plug K1 = Dp in Lemma 6.7 to get CZ0 >
CZ > KZ and Dp0 (s, z) ≤ Dp (and Dp0 (s, z) = Dp if s is a low-numerator resonance). Now
resonant (inner, outer, middle) zones and non-resonant zones (zones between inner resonant
zones) are defined. Pick cz , ω0 so that lemmas 6.6, 6.8 and 6.9 hold with this ω0 and with cz
in these lemmas equal to 2cz . Pick Λ > 0 so that any solution of the averaged system under
consideration starting with λ = λ0 crosses h = −ch after time less than ε−1 Λ. Let E1 be the
union of excluded sets Es for all low-numerator resonances provided by Lemma 6.9 and E2 be
the union of all middle resonant zones of all low-numerator resonances. Define the excluded
set E by E = E1 ∪ E2 .

7.2 Passing resonant and non-resonant zones


Consider solutions X(λ) of (2.2) and X(λ) of (2.3) as in the statement of Lemma 4.4 (then
X(λinit ) ∈ A3 \ E). Denote

w(λ) = (I(h(λ), z(λ)), z(λ)) = w(X(λ)), w(λ) = (I(h(λ), z(λ)), z(λ)) = w(X(λ)).

In this subsection we introduce the quantities dr and dr,r+1 that measure how much w and w
can deviate from each other when w(λ) passes through Zrm and Zr,r+1 , respectively. Denote

h∗ (s) = max hs (z), α∗ (s) = max αs (z).


z∈B2cz (z∗ ) z∈B2cz (z∗ )

For resonant zones, Lemma 6.9 and Lemma 6.8 provide estimates for the time of resonant
crossing, let us now estimate how much w can deviate from w using that ẇ and ẇ are bounded.
Lemma 7.1. For some Cres > 1 the following holds. Suppose that Xinit ∈ A3 \ E and at
some time λ0 with

ε(λ0 − λinit ) ≤ Λ, z(λ0 ) ∈ B2cz (z∗ ), ω(λ0 ) ≤ ω0


ˆ + Dp0 (s, z)α(z). We assume s1 < Cs1 ln2 ε.
w(λ) reaches the border of Z m (s) given by I = I(z)
m ˆ − Dp0 (s, z)α(z)
Then w(λ) exits Z (s) at some time λ1 > λ0 via the other border I = I(z)
and we have the estimate

kw(λ1 ) − w(λ1 )k ≤ kw(λ0 ) − w(λ0 )k + d(s),

where p
d(s) = Cres α∗ (s)| ln ε| s1 /s2 for low-numerator resonances (7.1)
and
√ p
d(s) = Cres bs α∗ (s) s1 /s2 + Cres ε| ln5 ε| for high-numerator resonances.

Proof. Low-numerator resonances. By Lemma 6.9 such λ1 exists and


p
ε(λ1 − λ0 ) . α∗ (s)| ln ε| s2 /s1 .

As ẇ and ẇ are bounded by O(εω̂ −1 ) = O(εs1 /s2 ) (we use Lemma 5.1 to estimate the rate of
change of w), this gives the estimate (7.1).
High-numerator resonances. We argue in the same way as for low-numerator resonances,
but apply Lemma 6.8 instead. We have (we use the formula for Dp0 (s, z) from Lemma 6.7 and
q
the estimates α(z0 ) ∼ εĥ(z0 ) ln3 ĥ(z0 ) and s1 . ln2 ε)

ε(λ1 − λ0 ) . Dp0 (s, z0 )α(z(λ0 ))ω̂ 0.5 + εs1 . bω̂α∗ + ε| ln5 ε|.

We will use the notation dr = d(sr ) for shorthand.


Passing non-resonant zones is described by Lemma 6.6. Denote by
p q
dr,r+1 = Cnonres εh∗ (sr )|ln h∗ (sr )|( bsr + bsr+1 ) + Cnonres ε| lnγ1 ε|.
p

Here we pick large enough Cnonres so that the last term in (6.6) is bounded by dr,r+1 (as long
as z(w(λ)) stays in B2cz (z∗ )). This is possible, as for fixed z by Lemma 6.3 the values of h in
Zr,r+1 are close to each other, so we have in the last term of (6.6) h ∼ ĥsr (z) . h∗ (sr ).

15
7.3 Estimating sums over resonant vectors s
Lemma 7.2. For any a, b ∈ R and c, d > 0 we have (sums are taken for all s = (s1 , s2 ) ∈ N2 )
X X
h∗ (s)c sa1 . 1, h∗ (s)c bds sa1 sb2 . 1,
s:s2 ≤S2 s
X √ X √
α∗ (s)sa1 . ε, bds α∗ (s)sa1 sb2 . ε.
s:s2 ≤S2 s

Proof. The period T depends on h and z: T = −a(h, z) ln h + b(h, z) by Lemma 5.2. From
this we have
ln h = −a−1 (T − b), ln ĥs (z) = −a−1 (2π ω̂ −1 − b).
Thus for some c1 > 0 we have
−1
h∗ (s) . e−c1 ω̂ . (7.2)
If s2 ≤ S2 , we also have
h∗ (s) . e−c2 s1 ,
where c2 = c1 /S2 . This implies the upper left estimate.
As for any f we have
−dCF s2 /2 f
h∗ (s)c/2 (s1 /s2 )f ∼ h∗ (s)c/2 | lnf h∗ (s)| . 1, bd/2 f
s s2 = e s2 . 1,

the upper right estimate follows from


X
h∗ (s)c/2 bd/2
s = O(1). (7.3)
s

Either ω̂ −1 = s1 /s2 & |s|


p
p prove this estimate. Recall the notation |s| = s1 + s2 .−1
Let us now
or s2 & |s|. But h decreases exponentially with the growth of ω̂ by (7.2) and √ bs decreases
d/2
exponentially with the growth of s2 . Therefore, we have h∗ (s)c/2 bs = O(e−c3 |s| ) for some
c3 , which implies the required estimate (7.3).
Finally, the lower estimates follow from the upper estimates if we take into account
q q
α ∼ εh∗ (s) ln3 h∗ (s) ∼ εh∗ (s)s31 s−3
2 .

P √
By Lemma 7.2 we√have m(E1 ) ≤ m(Es ) =√O( ε). As the width of low-numerator
√ reso-
m
nant zones in h is ∼ εhs , the width in I is ∼ εh s s1 /s2 and thus m(Z (s)) = O( εhs s1 ).

m(Z m (s)) = O( ε). This implies
P
By Lemma 7.2 we have m(E2 ) ≤

m(E) = O( ε).

Lemma 7.3. X √
dr . ε| ln ε|.
r

Proof. The sum over low-numerator resonances
√ is p
O( ε| ln ε|) by the bottom left estimate of
Lemma 7.2. The sum of the terms Cres bs α∗ (s) s1 /s2 over high-numerator resonances is

O( ε) by the bottom right estimate of Lemma 7.2. Finally, as the number of resonances is

bounded by some power of ln ε, the sum of the terms Cres ε| ln5 ε| is also O( ε).

Lemma 7.4. X √
dr,r+1 = O( ε).
r

Proof. Firstly, the sum of the terms ε| lnγ1 ε| is O( ε), as the total number of resonances is
bounded by some power of ln ε. Secondly, we have h∗ (r + 1) ∼ h∗ (r) by Lemma 6.3, thus it is
enough to prove Xp p
h∗ (r)|ln h∗ (r)| bsr = O(1).
r

This estimate follows from Lemma 7.2, as | ln h∗ (r)| ∼ (s1 /s2 )|s=sr .

16
7.4 End of the proof

Lemma 7.5. For any large enough C > 0 there is λf such that h(λf ) = C ε. For all

λ ∈ [λ0 , λf ] we have h(λ) ≥ C ε and

kw(λ) − w(λ)k < O( ε| ln ε|).
√ √
Proof. Set λf equal to the first moment such that h(λf ) = C ε or |h(λf ) − h(λf )| > 0.5C ε
−1
or kz − z∗ k = 2cz or kw(λf ) − w(h=0, z∗ )k = k or λf = λinit + ε Λ. For λ < λf we can
apply the last part of Lemma 6.6 and we have h ∼ h.
Consider the trajectory w(λ) for λ ≤ λf . We will apply Lemma 7.1 to cover the moments
from the time w(λ) first enters Zrm until w(λ) reaches the border of Zrm (with ω < ω̂(sr )).
We will apply Lemma 6.6 from this moment until w(λ) first reaches Zrm+1 . Let us renumerate
the resonances in such a way that the point w(λ0 ) is either in non-resonant zone Z12 or in
high-numerator middle resonant zone Z1m (it is not in low-numerator resonant zone, as low-
numerator resonant zones lie in E). In the latter case, by Lemma 7.1 this solution enters the
non-resonant zone Z12 and the difference between w(λ) and w(λ) accumulated is ≤ d1 . Thus
we can assume that w(λ) starts in Z12 at t = 0 with

δ1 = kw − wk ≤ d1 + O( ε| ln ε|).

Then w(λ) passes Z12 and approaches the second resonance zone Z2m , on entering this zone
by Lemma 6.6 we have
kw − wk ≤ δ1 eC6 (ξ1 −ξ2 ) + d1,2 .
On leaving Z2m into Z2,3 we have

δ2 = kw − wk ≤ δ1 eC6 (ξ1 −ξ2 ) + d1,2 + d2 .

After passing Z2,3 and Z3m , on exit from Z3m we have

δ3 = kw − wk ≤ δ1 eC6 (ξ1 −ξ3 ) + (d1,2 + d2 )eC6 (ξ2 −ξ3 ) + d2,3 + d3 .

Continuing like this, we get the estimate



δk ≤ (O( ε| ln ε|) + d1 + d1,2 + d2 + · · · + dk−1,dk + dk ) × eC6 (ξ1 −ξk ) . (7.4)

As we have ξi = O(1), by Lemma 7.3 and Lemma 7.4 this gives us



δk . ε| ln ε|.

This implies that we cannot have kz(λf ) − z∗ k = 2cz , as kz(λ0 ) − z∗ k ≤ cz . Similarly, we see
that kw(λf ) − w(h=0, z∗ )k =
6 k, as near separatrices the solution of averaged system is close
∂h
to w(h=0, z∗ ). From (7.4) and ∂w = O(ln−1 h) (this is proved in Lemma 5.1) we have

h(λ) − h(λ) . ε| ln ε| ln−1 h(λ) .


This means that for large enough C we cannot have |h(λf ) − h(λf )| = 0.5C ε. This also
−1
means h(λf ) > 0, thus we cannot have λf = λ0 + ε Λ. This leaves just one possibility:

h(λf ) = C ε.
= O(ln−1 h). Taking this into account, in
∂h
Remark 7.6. By Lemma 5.1 we have ∂w
Lemma 7.5 we have
√ √
h(λf ) − h(λf ) = O( ε), kz(λf ) − z(λf )k = O( ε|ln ε|).

Lemma 7.7. Assume that for some λf we have


√ √ √
h(λf ) = C ε, h(λ) − h(λ) = O( ε), kz(λ) − z(λ)k = O( ε| ln ε|).

Then there is λΠ > λf with h(w(λΠ )) = 2ε lnγ ε such that for all λ ∈ [λf , λΠ ]
√ √
h(λ) − h(λ) < O( ε), kz(λ) − z(λ)k < O( ε| ln ε|).

Proof. Denote v = (h, z), v(λ) = v(w(λ)) and v(λ) = v(w(λ)). Set λΠ > λf be the first
moment such that v(λΠ ) ∈ ∂Π or kz(λΠ ) − z∗ k = 2cz or kw(λΠ ) − w(h=0, z∗ )k = k or λΠ =
λinit + ε−1 Λ. Consider v(λ) for λ ∈ [λf , λΠ ].
The solution v(λ) subsequently passes non-resonant and resonant zones, and the behaviour
in these zones is described by Lemma 6.6 and Lemma 7.1, respectively. The total time λtot
between λf and λΠ is split into the time λres spent in resonant zones and the time λnonres

spent in non-resonant zones. By Lemma 7.3 we have ελres . ε|ln ε|. By Lemma 6.6 we have
2
ελnonres . hmax ln ε, where hmax is the maximum value of h(λ) for λ ∈ [λf , λΠ ]. Indeed, in

our domain we have ln h ∼ ln ε and the sum of all terms ξr − ξr+1 is O(ln−1 ε). As hmax . ε,
we have √
ελtot . ε ln2 ε.

17

As kv̇k , v̇ . ε, for t ∈ [λf , λΠ ] all the values of v(λ) and v(λ) differ from each other for

different λ by at most O( ε ln2 ε). Thus we cannot have

kz(λΠ ) − z∗ k = 2cz or kw(λΠ ) − w(h=0, z∗ )k = k or λΠ = λinit + ε−1 Λ

(in the last case we would have h(λΠ ) = −ch and this is impossible). The remaining possibility
is h(λΠ ) = 2ε lnγ ε.
The estimate on the change of z(λ) allows us to show that h(λ) decreases exponentially
with the growth of T (v(λ)). The period T depends on h and z: T = −a(h, z) ln h + b(h, z) by
Lemma 5.2. From this we have
ln h = −a−1 (T − b).
√ 2
As h and z change by at most O( ε ln ε ), for some constants a0 , b0 we have
√ √
a(h, z) = a0 + O( ε ln2 ε ), b(h, z) = b0 + O( ε ln2 ε ),

so √ −1
h = (1 + O( ε ln3 ε ))e−a0 (T −b0 ) .

Now we can get a better estimate for λnonres . Denote ωr = ω̂(sr ), Tr = 2π/ωr , let hr be
the value of h when entering Zr,r+1 . By Lemma 6.6 we have
X X X
ελnonres . (ωr −ωr+1 )hr ln3 hr = (2π −1 ) ωr ωr+1 (Tr+1 −Tr )hr ln3 hr ∼ hr ln hr (Tr+1 −Tr ).
r>r0 r>r0 r>r0

It is easy to see that Tr+1 − Tr are bounded, e.g. we can take resonances ω = 1/n, then
T = 2πn (up to T = 2πN ∼ ln2 ε). As h(T ) decreases exponentially, we have ελnonres .

hr0 +1 |ln hr0 +1 | . hmax |ln ε| ∼ ε|ln ε|. So we have

ελtot . ε| ln ε|.

We have h˙ . ε ln−1 ε . Similarly, for any λ1 , λ2 ∈ [λf , λΠ ] we have


Z λ2
ḣdλ . (λ2 − λ1 )ε ln−1 ε + O(ε),

λ1

as the integral of ḣ during one period is O(ε) and T ∼ | ln ε|. Hence,



h(λ) − h(λ) < h(λf ) − h(λf ) + O(ε ln−1 ε )λtot + O(ε) ≤ O( ε).

As ż, ż = O(ε), we have



kz(λ) − z(λ)k < kz(λf ) − z(λf )k + O(ε)λtot ≤ O( ε|ln ε|).

Together Lemma 7.5 and Lemma 7.7 imply Lemma 4.4.

18
8 Crossing non-resonant zone: proof
This section is devoted to the proof of Lemma 6.6. Let us define the function
X fm ei(m1 ϕ+m2 λ)
u(w, ϕ, λ) = .
i(m1 ω(w) + m2 )
1≤|m|≤N,m∈Z2

s2
Given w, let ξr = s1
be the nearest resonance to ω(w). Let us also denote ∆ = |ω(w) − ξr |.
Lemma 8.1. Denote
 
s1 
as (w) = exp − CF s2 − CF , bs = exp − CF s2 , 0 < as < bs < 1.
T (w)

Then inside Π we have (provided that s2 /s1 is the nearest resonance to ω(w))

kuk . as s−1
1 ∆
−1
+ ln2 ε,

∂u ∂u
. as s2 s−1 −1 2 −1 3
∆ + ln ε, ∂ϕ . as ∆ + | ln ε|,
1

∂λ

∂u
. bs s−11 h
−1
ln−2 h ∆−1 (∆−1 + |ln h|) + h−1 ln−1 h ln4 ε,
∂h

∂u
. bs s−11 | ln
−1
h| ∆−1 (∆−1 + |ln h|) + ln4 ε.
∂z

This lemma is proved in Section 13.3.


Take small k̃ > k (here k is from Section 6.2) and denote by B̃3,∗ ⊂ B3 the set of points
(h, z) that satisfy kw(h, z) − w(h=0, z∗ )k < k̃. The value of k̃ is picked so that B̃3,∗ is far from
∂B. Let us consider the sets

Π̃ = {(h, z) ∈ B̃3,∗ : h ≥ ε| lnγ ε|}, ∂ Π̃ = {(h, z) ∈ B̃3,∗ : h = ε| lnγ ε|}.

where γ > 2 is the same as in (6.2). We have Π ⊂ Π̃. The (large enough) constant CZ̃ ∈ (0, CZ )
will be chosen later in Lemma 8.5. For each resonance ξr = s2 /s1 define the zone Z̃r by the
condition n o
Z̃r = (h̃, z̃) ∈ Π̃ : |ω(h̃, z̃) − ξr | ≤ CZ̃ δs (z̃) .

Denote by Z̃r,r+1 the zone between two neighboring zones Z̃r and Z̃r+1 , ξr > ξr+1 . We have
Zr,r+1 ⊂ Z̃r,r+1 . As the zones Z̃ differ from Z only in the values of constants, the properties
of the zones Z discussed in Section 6.2 also hold for the zones Z̃ for large enough CZ̃ . Let
us also note that in Z̃r,r+1 for s corresponding to the nearest resonance (ξr or ξr+1 ) we have
(recall that by Lemma 6.3 we have ĥξr , ĥξr+1 ∈ [0.9h, 1.1h])
q
∆−1 ≤ 2CZ̃−1 h ln4 h/(εbs ), ∆−1 ≤ 2CZ̃−1 ε−1 h ln−3 h ln−2 ε. (8.1)

Lemma 8.2. There is Cu > 0 such that for large enough CZ̃ the following estimates hold in
Z̃r,r+1 for all r (with s corresponding to the nearest to ω(w) resonance)
p
• kεuk < Cu CZ̃−1 bs εh ln2 h + Cu ε ln2 ε < min(0.1h, | ln−1 ε|),
∂u
• ∂w , , ∂λ ≤ 0.1ε−1 .
∂u ∂u
∂ϕ
Moreover, the coordinate change

U : w̃, ϕ, λ 7→ w̃ + εu(w̃, ϕ, λ), ϕ, λ (8.2)

is invertible in (Z̃r,r+1 ) × [0, 2π]2 .

Proof. First, let us prove that inside Z̃r,r+1 we have (for s corresponding to ξr or ξr+1 )

s−1
1 = O(ln
−1
h). (8.3)

By (6.4) we have ξr , ξr+1 ∼ − ln−1 h, so s−1 −1


1 = s2 ξr1 = O(ln
−1
h), where r1 = r or r1 = r + 1.
The estimates on u and its derivatives follow from Lemma 8.1 and from (8.1), (8.3)
and (6.3). To estimate the I-derivative, we use ∂u
∂I
= ω ∂u
∂h
. ∂u
Denote Y = (w̃, ϕ, λ). From the estimates on the derivatives of u we have ε ∂Y < 0.5,
so the coordinate change U is invertible.

Lemma 8.3. Take


w̃ ∈ Z̃r,r+1 , ϕ, λ. Set w = w̃ + εu(w̃, ϕ, λ). Let h = h(w), h̃ = h(w̃).
some
Then we have h − h̃ < min(0.25h̃, | ln−1 ε|).

19
Proof. For α ∈ [0, 1] let us denote wα = w̃ + αεu(w̃, ϕ, λ), hα = h(wα ). Take largest
possible
α0 such that we have |hα − h̃| ≤ 0.25h̃ for all α ≤ α0 . We have hα0 − h̃ = ∂w ∂h
ξ
α0 u with
some ξ ∈ [w̃, w̃ + α0 u]. As we have h ∈ [0.75h̃, 1.25h̃] in [w̃, w̃ + α0 u], Lemma 5.1 gives
|hα0 − h̃| = O(ln−1 h)u < O(ln−1 h̃) min(0.1h̃, | ln−1 ε|) close enough to the separatrices by
Lemma 8.2. Thus we actually have α0 = 1 and the estimate for hα0 gives |h(w) − h̃| <
min(0.25h̃, | ln−1 ε|).

Recall that the zones Zr depend on the constant CZ .


Lemma 8.4. Given CZ̃ , there is CZ,0 > 0 such that for any CZ > CZ,0 for all sufficiently
small ε and we have the following. Take the map U defined by (8.2). Then for all r

Zr,r+1 × [0, 2π]2 ⊂ U (Z̃r,r+1 × [0, 2π]2 ).

Proof. For w0 ∈ Z̃r,r+1 denote h0 = h(w0 ). Set

R(w0 ) = ε maxϕ,λ ku(w0 , ϕ, λ)k , B(w0 ) = {w ∈ Z̃r,r+1 : kw − w0 k ≤ R(w0 )}.

Let us note that by Lemma 8.2 we have

sup R(w0 ) → 0 for ε → 0.


w0 ∈∪r Z̃r,r+1

By Lemma 8.3 we also have

|h(w) − h0 | ≤ 0.25h0 for w ∈ B(w0 ). (8.4)

As we have Zr,r+1 ⊂ Z̃r,r+1 (due to CZ > CZ̃ ), it is enough to prove that for any w0 ∈
∂ Z̃r,r+1 the ball B(w0 ) does not intersect Zr,r+1 .
First, we may have w0 ∈ ∂ B̃3,∗ with kw0 − w(h=0, z∗ )k = k̃. Then for w ∈ B(w0 ) we
have kw − w(h=0, z∗ )k = k̃ + o(1) > k for small ε, thus the ball B(w0 ) does not intersect
B3,∗ ⊃ Zr,r+1 .
Second, we may have w0 ∈ ∂ Π̃ with h0 = ε| lnγ ε|. By (8.4), for any (h, z) ∈ B(w0 ) we
have h ≤ 1.5ε| lnγ ε|, so B(w0 ) does not intersect Π ⊃ Zr,r+1 .
Finally, we may have |ω(w0 ) − ξr | = CZ̃ δ̂ξr (z0 ) (or the same for r+1 instead of r, this
case is treated in the same way). We will write δ instead of δξr for brevity. By (8.4) we have
h ∈ [0.25h0 , 1.25h0 ] in B(w0 ). By Lemma 6.3 this implies ĥ(z(w)) ∈ [0.5h0 , 1.5h0 ] in B(w0 ).
So in .-estimates below we will write h for both h(w) and ĥ(z(w)) for w ∈ B(w0 ). This also
means δ(z) ∈ [0.5δ(z0 ), 2δ(z0 )] in B(w0 ).
Denote ∆ω = |ω(w) − ω(w0 )|. It is enough to prove that there is a constant c1 > 0 (that
does not depend on r or w0 ) such that ∆ω < c1 δ(z0 ), then we can take CZ,0 = 2(c1 + CZ̃ )
and have
|ω − ξr | < (c1 + CZ̃ )δ(z0 ) < 2(c1 + CZ̃ )δ(z) < CZ δ(z)
in B(w0 ). Hence, B(w
∂ω0 ) does not intersect Zr,r+1 .
. h−1 ln−3 h0 .
∂ω
We have ∆ω ≤ ∂w (wint ) R(w0 ) for some wint ∈ [w0 , w]. We have ∂w 0
We will use that
R(w0 ) . εbs s−1
1 δ
−1
(w0 ) + ε ln2 ε
by Lemma 8.1. Hence we obtain using (8.3)

k∆ωk . εbs h−1 ln−4 h δ −1 (w0 ) + εh−1 ln−3 h ln2 ε.

We have εh−1 ln−3 h ln2 ε . δ(w0 ). We also have

εbs h−1 ln−4 h δ −1 (w0 ) . δ(w0 ),


p
as δ(w0 ) ≥ εbs h−1 ln−4 h. This completes the proof.

Lemma 8.5. For some Cβ > 0 and for large enough value of the constant CZ̃ defined above the
following holds. Inside Z̃r,r+1 the change of variables w = w̃ + εu(w, ϕ, λ) takes the perturbed
system (5.2) to the form

w̃˙ = εf0 (w̃) + ε2 β,


ϕ̇ = ω(w̃ + εu) + εg(w̃ + εu, ϕ, λ, ε), (8.5)
λ̇ = 1,

where β is a smooth function of w̃, ϕ, λ that depends on ε and

kβk = O(bs ∆−2 h−1 ln−4 h) + O(bs ∆−1 h−1 ln−4 h ln3 ε) + O(h−1 ln−3 h ln5 ε) (8.6)

(here s corresponds to the nearest to ω(w̃) resonance). Moreover, for small enough ε we have

ε2 kβk ≤ Cβ CZ̃−2 ε (8.7)

20
and
dh(w̃)
− 1.5εKh | ln−1 h| < < −0.5εKh | ln−1 h|,

dω(w̃)
− 1.5εKω h−1 | ln−3 h| < < −0.5εKω h−1 | ln−3 h|, (8.8)

dω(w̃)
0.5 > C dδsr (z(w̃)) ,

dλ dλ
dw̃
where dλ
is given by (8.5).

Proof. From [4, Proof of Lemma 7.3] we have


 
∂u −1
εβ = E+ε − E (f0 (z̃) + εβ1 ) + εβ1
∂ w̃
with
εβ1 = [f (w, ϕ, λ, ε) − f (w, ϕ, λ, 0)] + [f (w̃ + εu, ϕ, λ, 0) − f (w̃, ϕ, λ, 0)]
∂u ∂u
+ RN f − ε fϕ − [ω(w̃ + εu) − ω(w̃)].
∂ϕ ∂ϕ
Note that as w ∈ Π we have | ln h| < | ln ε|. As ω is bounded, 1 . ∆−1 . ∆−2 . For
∂g
a vector function g(w) = (g1 (w), . . . gn (w)) denote ( ∂w )int = ( ∂g
∂w
1
(η1 ), . . . , ∂g
∂w
n
(ηn )), where
η1 , . . . , ηn ∈ [w, w + εu] are some intermediate points. By Lemma 8.3 the values of h for the
points ηi are in [0.5h(w), 2h(w)]. We have the following estimates (using the estimates from
Lemma 8.1, Lemma 6.2, (5.8) and (8.3)).

ε−1 kf (w, ϕ, λ, ε) − f (w, ϕ, λ, 0)k = O(ln h),



−1
∂f 
u = O(h−1 ln−2 h)bs ∆−1 + O(h−1 ln−1 h ln2 ε),

ε kf (w̃ + εu, ϕ, λ, 0) − f (w̃, ϕ, λ, 0)k =

∂w int
ε−1 kRN f k < 1,

∂u −1 −2 −1 −1 −2 3
∂ϕ = O(h ln h)bs ∆ + O(h ln h ln ε),
f ϕ


∂u ∂u  ∂ω 
ε−1

∂ϕ [ω( w̃ + εu) − ω( w̃)] ∂ϕ ∂w int =
= u
 
= O(h−1 ln−4 h) b2s ∆−2 + bs ∆−1 | ln3 ε| + O(h−1 ln−3 h ln5 ε).

This gives (we use | ln ε| & | ln h| to simplify the expression below)

kβ1 k ≤ O(h−1 ln−4 h)b2s ∆−2 + O(h−1 ln−4 h ln3 ε)bs ∆−1 + O(h−1 ln−3 h ln5 ε).
∂u
As ε ∂w ≤ 0.5 and kfw,0 k = O(1), we have the estimate

∂u
kβk . kβ1 k + kf0 k

∂ w̃
As we have the estimate

∂u −1 −4 −2 −1 −1 −2 4
∂ w̃ . bs h ln h(∆ + |ln h|∆ ) + h ln h ln ε,

by Lemma 8.1 and (8.3) (we use ∂u ∂I


= ω ∂u
∂h
), this implies (8.6).
From (8.6) and (8.1) we have (8.7), the estimate for the second and third terms of (8.6)
uses (6.3). Here we use that γ > 2 and so | ln ε|2−γ < CZ̃−2 for small ε.
= O(ln−1 h) by Lemma 5.1, for large CZ̃ we have ∂w Cβ C −2 < 0.5Kh | ln−1 h|.
∂h
∂h
As ∂w
Z̃∂ω
Then (8.7) and (2.5) imply the first part of (8.8). For some c1 > 0 we have ∂w < c1 h−1 ln−3 h
−2
in B̃3,∗ . For large CZ̃ we have Cβ CZ̃ c1 < 0.5Kω . This (together with (8.7) and (5.4)) means
that the second part of (8.8) also holds.
∂ ĥ ∂δ p
We have ∂zξr = O(h ln h) by (5.6). This implies ∂zsr = O( ε/h ln3 ε). As dλ dz
= O(ε),
dδsr 3/2 −1/2 3 dω
this means dλ = O(ε h ln ε). By the estimate on dλ this implies the last part of (8.8).

Lemma 8.6. There is a constant c1 > 0 such that the following holds. Assume that a solution
(w̃(λ), ϕ(λ), λ(λ)) of (8.5) stays inside Z̃r,r+1 for λ ∈ [λ1 , λ2 ]. Denote h̃(λ) = h(w̃(λ)). Then
we have h̃(λ2 ) > c1 h̃(λ1 ).

Proof. By (8.8) both h and ω decrease along our solution and dω dh


∼ h−1 ln−2 h. This gives
d −1 −1
dh
(ω ) ∼ h . Afterwards this lemma is proved like Lemma 6.3, but we integrate the
d
derivative dh (ω −1 ) along our solution instead of the partial derivative for fixed z.

21
Let us say that a domain D ⊂ Rm is L-approximately convex (L ≥ 1), if any two points
w1 , w2 ∈ D can be connected by a piecewise linear path with length at most L kw1 − w2 k that
lies in D. In such domain any vector-function ψ(w) satisfies the estimate

∂ψ
kψ(w1 ) − ψ(w2 )k ≤ L max kw1 − w2 k .
D ∂w

We will need the following lemma. It is well-known for convex domains (with L = 1) and
generalises straightforwardly for L-approximately convex domains, using the estimate above.
Lemma 8.7. Consider two ODEs

ẇ1 = a(w1 ), ẇ2 = a(w2 ) + b(λ)

defined in some L-convex domain D. Consider two solutions w1 (λ), w2 (λ) with

kw1 (λ0 ) − w2 (λ0 )k < δ

that
∂aexist
and stay in D up to the moment T . Assume that in D we have the estimate
L ∂w ≤ A. Then for any λ ∈ [λ0 , T ) we have the estimate
 Z λ 
kw2 (λ) − w1 (λ)k ≤ eA(λ−λ0 ) δ + kb(τ )k dτ .
τ =λ0

Lemma 8.8. There exists L ≥ 1 such that for all h0 > 0 the domain {w ∈ B̃3,∗ : h(w) > h0 }
is L-approximately convex.

Proof. Let us build a path connecting w1 with w2 . We assume I(w1 ) ≤ I(w2 ). Let Imax and
Imin be the maximum and minimum of I(h0 , z), where z lies in the segment connecting z(w1 )
and z(w2 ). If I(w1 ), I(w2 ) > Imax , we can connect w1 and w2 by a segment. Otherwise, we
have I(w1 ) ≤ Imax . Then we connect by a segment w1 with (Imax , z(w1 )), then (Imax , z(w1 ))
with (Imax , z(w2 )) and, finally, (Imax , z(w2 )) with w2 . The length of this path is bounded by
(Imax − Imin ) + kz1 − z2 k + (I(w2 ) − I(w1 )) if I(w2 ) > Imax and by 2(Imax − Imin ) + kz1 − z2 k
otherwise.
Let us prove that for some L0 we have

Imax − Imin ≤ L0 kz1 − z2 k . (8.9)

As ∂I
∂h
= ω −1 , for any z 0 , z 00 ∈ [z(w1 ), z(w2 )] we have

∂ −1 0
|I(h0 , z 0 ) − I(h0 , z 00 )| ≤ |I(0, z 0 ) − I(0, z 00 )|+ ∈ λh0 0 z − z 00 dh1 . z 0 − z 00 .

∂z ω

This gives (8.9). So our domain is approximately convex with L = 2L0 + 2.

Lemma 8.9. There exist C6 , C̃t > 0 and γ1 ∈ R such that the following statement holds.
Assume that for some λ0 > 0 and w̃0 we have

w̃0 ∈ Z̃r,r+1 .

a) Then for any ϕ0 ∈ [0, 2π] there exists λ1 > λ0 such that the solution X̃(λ) of (8.5) with
X̃(λ0 ) = (w̃0 , ϕ0 , λ0 ) is defined for all λ ∈ [λ0 , λ1 ]. This solution satisfies w̃(λ) ∈ Z̃r,r+1 and
w̃(λ1 ) lies on the boundary of Z̃r,r+1 : in ∂ Π̃ or on the border with Z̃r+1 .
b) Denote h̃(λ) = h(w̃(λ)). Then

ε(λ1 − λ0 ) ≤ C̃t (ξr − ξr+1 )h̃(λ1 )| ln3 h̃(λ1 )|. (8.10)

c) Denote h(λ) = h(w(λ)). Assume that for some λ01 ∈ [λ0 , λ1 ] we have

h(λ01 ) > 0.25h̃(λ01 ), h(λ0 ) < 3h̃(λ0 ). (8.11)

Then for all λ ∈ [λ0 , λ01 ] we have (we use the notation h = h(λ1 ) in the formula below)
√ q
kw̃(λ) − w(λ)k < eC6 (ξr −ξr+1 ) kw̃(λ0 ) − w(λ0 )k+O( εh|ln h|)( bsr + bsr+1 )+O(ε| lnγ1 ε|).
p

(8.12)

Proof. By (8.8) the value of |ω − ξr | − CZ̃r δsr increases with the time, so the solution w̃(λ)
does not cross the border of Z̃r and Z̃r,r+1 . This proves the first statement of the lemma.
As ω decreases with the time, we can take ω as an independent variable. From (8.8) we
have

∼ ε−1 h ln3 h, (8.13)

so
λ1 − λ0 = O(ε−1 h ln3 h)(ω(λ0 ) − ω(λ1 )).

22
As ω ∈ [ξr+1 , ξr ], this gives (8.10).
Let us now prove the estimate for kw̃(λ) − w(λ)k. Note that by Lemma 8.6 and (8.11) the
values of h(λ) for different λ ∈ [λ0 , λ01 ] differ at most by a constant factor between themselves
and also with h(w(λ)). Thus we will simply write h in O-estimates (for definiteness, set
h = h(λ1 )). Let us use Lemma 8.7 with D = {w ∈ G : h(w) > c1 h} for c1 > 0 chosen to cover
all the values of h discussed above. This domain is L-convex for L chosen in Lemma 8.8. We
≤ A ∼ εh−1 ln−3 h . By (8.10) for

∂a

have a = (εfI,0 , εfz,0 ). By Lemma 5.1 we have L ∂w
some C6 > 0 we have
A(λ − λ0 ) ≤ C6 (ξr − ξr+1 ).
We have b(λ) = ε2 β. For some γ1 we get from (8.6) the following estimate for β
kβk = O(bs ∆−2 h−1 ln−4 h) + O(h−1 | lnγ1 −4 ε|)(bs ∆−1 + 1).
Denote ω0 = ω(λ0 ), ω1 = ω(λ1 ), we have ω0 > ω1 . By (8.13) we have
Z λ Z λ1
ω0
Z
kb(τ )k dτ . εh ln3 h

kb(τ )k dτ ≤ kβk dω.
τ =λ0 τ =λ0 ω1

We will use (8.1) in the estimates for the integrals below. Clearly,
Z ω0 q q
bs ∆−2 dω ≤ bsr ∆−1 (ω0 ) + bsr+1 ∆−1 (ω1 ) = O( h ln4 h /ε)( bsr + bsr+1 )
p
ω1

and Z ω0
(bs ∆−1 + 1)dω ≤ (| ln ∆(ω0 )| + | ln ∆(ω1 )|) + (ω0 − ω1 ) = O(ln ε).
ω1
By the estimate on kβk this yields
Z ω0 q
kβk dω = O(ε−1/2 h−1/2 ln−2 h )( bsr + bsr+1 ) + O(h−1 | lnγ1 −3 ε|),
p
ω1
Z ω0 q
|b(τ )|dτ = O(ε1/2 h1/2 |ln h|)( bsr + bsr+1 ) + O(ε| lnγ1 ε|).
p
ω1
Finally, in Lemma 8.7 we take δ = kw(λ0 ) − w(λ0 )k. Then this lemma gives the esti-
mate (8.12).

Proof of Lemma 6.6. Let us start with fixing the values of the constants CZ̃ , KZ . Take CZ̃ as
needed by Lemma 8.5. Then pick KZ so that Lemma 8.4 holds for all CZ > KZ .
Consider the coordinate change U given by (8.2). By Lemma 8.4 we have
Zr,r+1 × [0, 2π]2 ⊂ U (Z̃r,r+1 × [0, 2π]2 ).
Define w̃(λ) by the formula
(w̃(λ), ϕ(λ), λ(λ)) = U −1 (w(λ), ϕ(λ), λ(λ)) ∈ Z̃r,r+1 × [0, 2π]2 .
Set
h(λ) = h(w(λ)), h̃(λ) = h(w̃(λ)), h(λ) = h(w(λ)).
By Lemma 8.3 we have
h(λ) ∈ [0.5h̃(λ), 1.5h̃(λ)], h̃(λ) ∈ [(2/3)h(λ), 2h(λ)]. (8.14)
Let us apply Lemma 8.9, this lemma gives some moment λ1 that we denote by λ̃1 to avoid
the conflict with λ1 from the current lemma. From Lemma 8.4 and the continuity of w(λ) if
w̃(λ̃1 ) is in ∂ Π̃ or on the border between Z̃r and Z̃r+1 , for some λ1 < λ̃1 the point w(λ1 ) is in
∂Π or on the border between Zr and Zr+1 . We have proved that λ1 exists.
The estimate on λ1 − λ0 follows from the estimate on λ̃1 − λ0 provided by (8.10) and
from (8.14).
The estimate h(λ) ≤ 53 h(λ0 ) follows from Lemma 8.3, as h̃(λ) decreases: we have
5 5 5 4
h(λ) ≤ h̃(λ) ≤ h̃(λ0 ) ≤ × h(λ0 ).
4 4 4 3
By (6.5) and (8.14) we have
h(λ01 ) ≥ 0.5h(λ01 ) ≥ 0.25h̃(λ01 ), h(λ0 ) ≤ 2h(λ0 ) ≤ 3h̃(λ0 ).
Hence, we may apply the last part of Lemma 8.9 and obtain an estimate for kw̃(λ) − w(λ)k
for all λ ∈ [λ0 , λ01 ]. By the estimate on εu in Lemma 8.2 we have
q q
|kw̃(λ) − w(λ)k − kw(λ) − w(λ)k| . ( bsr + bsr+1 ) εh̃(λ) ln2 h̃(λ) + ε ln2 ε.
p

Hence, the estimate from Lemma 8.9 means that for all λ ∈ [λ0 , λ01 ]
q √
kw(λ) − w(λ)k < eC6 (ξr −ξr+1 ) kw(λ0 ) − w(λ0 )k+O( bsr + bsr+1 ) εh|ln h|+O(ε| lnγ1 ε|).
p

We have proved the estimate (6.6).

23
9 Auxiliary system describing resonance crossing
9.1 Transition to auxiliary system: statement of lemma
In this subsection we state lemmas on the auxilliary system describing passage through reso-
nances. These lemmas will be proved in the rest of the current section. Denote
s s
ˆ
∂ω p ˆ
∂ω p
α(z) = ε/ ∼ εĥ ln3 ĥ, β(z) = ε ∼ εĥ−1 ln−3 ĥ. (9.1)
∂I ∂I

We have ε = αβ. For given resonance s = (s1 , s2 ) denote ω̂ = s2 /s1 and let ĥ(z) be given by
ω(ĥ, z) = ω̂.
Lemma 9.1. There exists dZ > 0 such that for any ω̂ = s2 /s1 and z0 such that (ĥ(z0 ), z0 ) ∈ B
for any z ∈ Cn with kz − z0 k ≤ ω̂dZ we have
|ĥ(z)− ĥ(z0 )| < ĥ(z0 )/10, |α(z)−α(z0 )| < α(z0 )/10, |β(z)−β(z0 )| < β(z0 )/10. (9.2)
Lemma 9.2. For any Cs1 > 0 for any large enough Cγ > 0 there exist C, Cρ , cZ > 0 such
that for any small enough ε > 0, any z0 , and any resonance s = (s1 , s2 ) with
|s1 | < Cs1 ln2 ε, (ĥ(z0 ), z0 ) ∈ B, ĥ(z0 ) > Cρ ε| ln5 ε|
after a coordinate change (p, q) → (P, Q) depending on λ (with period 2πs1 ) and z, and the
time and coordinate change given by

= ω̂ −0.5 β(z), Z = ω̂ −1 (z − z0 ) (9.3)

the perturbed system (5.1) can be rewritten as
∂H7
P 0 = −Fs (Q, z0 + ω̂Z) − ω̂ −2 s1 β (P, Q, Z) + αuP (P, Q, Z, τ, ε),
∂Q
∂H7
Q0 = P + ω̂ −2 s1 β (P, Q, Z) + αω̂ −0.5 uQ (P, Q, Z, τ, ε), (9.4)
∂P
Z 0 = αω̂ −0.5 uz (P, Q, Z, τ, ε),
τ0 = 1
in the domain
n ω̂ −0.5 ω̂ −1 Cγ o
D = Z, P, Q, τ ∈ Rn+3 : kZk < cZ , |P | < , |Q| < .
4 4
The values of α and β in (9.4) are taken at z = z0 + ω̂Z. The function Fs (Q, z) is described
by Lemma 9.4 below. We have the estimates
kH7 kC 2 , |uQ |, |uz | < C, |uP | < C ω̂ −1
and

−1 √
ˆ − ω̂P (J, γ, z, t) < C|s1 ω̂ −1 β|,
α (I − I) |γ − ω̂Q(J, γ, z, t)| < C|s1 β|, (9.5)

where γ = ϕ − (s2 /s1 )λ.


Lemma 9.3. Denote D(P, Q, z, λ) ˆ then this function is 2πs1 -periodic in
= α−1 ω̂ −0.5 (I − I),
λ and we have

∂D ∂D ∂D
< C|s1 ω̂ −1.5 β|, < C|s1 ω̂ −1 β|, −2.5

|D − P |, − 1 ∂z < C|s1 ω̂
β|. (9.6)
∂Q ∂P

Lemma 9.4.
• Fs (Q, z) is 2π-periodic in Q.
• Fs can be continued to
DF = Q, z ∈ [0, 2π] × Cn : |z − z0 | < 0.5c ,


where c is the constant from Lemma 5.3, it does not depend on s. The function Fs is
uniquely determined by s and the perturbed system in the action-angle variables (5.1).
• For any δ1 > 0 there exist S, δ2 > 0 such that for any s2 , s1 with s2 > S and s2 /s1 < δ2
we have
∂Fs ∂Θ3
kFs − Θ3 (z)kC 1 < δ1 , ∂z − ∂z 1 < δ1
in DF
C
(recall that Θ3 is given by (2.4)).
• For each s2 there exist finite set Fs2 such that we have

 ∂Fs ∂ F̃
min Fs − F̃ + − −−−−→ 0 in DF .

F̃ ∈Fs2 C1
∂z ∂z C 1 s1 →∞

24
9.2 Rescaling the action
In this subsection we start the proof of Lemma 9.2, this proof continues till Subsection 9.5.
Lemma 9.1 is obtained as a byproduct in this subsection.
First, we use Lemma 5.3 to continue the perturbed system in the complex domain D0 given
by (5.5), we will use the notation ccont from (5.5). As ω is bounded in B, for small enough dZ
we have ω̂dZ < ccont for all s. By (5.6)

∂ ĥ
= O(h ln h) in D0 .
∂z
For any K > 10, reducing dZ if needed, we get

|ĥ(z) − ĥ(z0 )| < ĥ(z0 )/K. if |z − z0 | < ω̂dZ

This clearly implies the first estimate in (9.2), other two estimates also follow by (9.1). This
proves Lemma 9.1.
Let us continue the proof of Lemma 9.2, we will assume below that cZ in this lemma
satisfies 8cw ≤ dZ . We will use the notation ψ̂(z) = ψ(z, ĥ(z)). Let us replace the energy
variable h by the rescaled action J that will be defined
R 2π ∂qshortly. First, note that the action
I(h, z) can be continued to D0 by the formula I = ϕ=0 p ∂ϕ dϕ, where p(h, z, ϕ) and q(h, z, ϕ)
q
are defined in D0 by Lemma 5.3. As by (5.6) we have ∂I 6= 0 in D0 , ∂ω
∂ω ˆ is uniquely continued
∂I
from the real square root). Hence, α and β are correctly defined by (9.1) even for complex z.
Let us define (in D0 ) the rescaled action J by the formula

J = α−1 (I − I),
ˆ

then
∂ ∂
I = Iˆ + αJ, = α−1 .
∂I ∂J
∂ ∂J ∂J
Denote by ∂z J
the z-derivative for fixed J, ϕ, λ. Denote by fJ = f
∂h h
+ f
∂z z
the J-
component of the vector field f , here ∂J
∂h
= α−1 ω −1 . Denote

∂fz ∂fϕ ∂fJ


divJ f = + + .
∂z J ∂ϕ ∂J

Lemma 9.5. divJ f = O(1) + a(z)fz with a(z) = O(ω̂ −1 ) in D0 . Thus, divJ f = O(ω̂ −1 ).

Proof. We need the following formula [see page 15 of http://owlnet.rice.edu/ fjones/chap15.pdf]


for the divergence in curvilinear coordinates. Let x̃i be curvilinear coordinates and xi be carte-
sian coordinates, let F̃i and Fi be components of a vector field F in these coordinates. Let D
be the Jacobian of the map T given by x = T (x̃). Then
X ∂Fi X ∂ F̃i X ∂D
= + D−1 F̃i .
∂xi ∂ x̃i ∂ x̃i
Let us apply this formula to coordinate systems x̃ = (J, ϕ, z) and x = (p, q, z) (for fixed
λ). The map (I, ϕ) 7→ (p, q) is volume-preserving for any fixed value of z, so for fixed z the
map (J, ϕ) 7→ (p, q) has Jacobian equal to α(z). Thus, we have D = α(z).

∂fp ∂fq ∂fz ∂α


+ + = divJ f + α(z)−1 fz .
∂p ∂q ∂z p,q ∂z

Using that ∂ ĥ
= O(h ln h) by (5.6), we get ∂z ˆ = ∂ 2 ω + ∂ 2 ω ∂ ĥ = O(h−1 ln−1 h) by (5.6).
∂ ∂ω
∂z ∂h ∂h∂z ∂h2 ∂z
−1/2
ˆ
As we can write α = ε1/2 ω̂ −1/2 ∂ω , where only the last multiplier depends on z, we have
∂h

∂α ˆ −3/2 ∂ ∂ω
∂ω ˆ
∼ ε1/2 ω̂ −1/2 = αO(ln h).
∂z ∂h ∂z ∂h
Thus divJ f = O(1) + a(z)fz with a(z) = α−1 ∂α
∂z
= O(ln h) = O(ω̂ −1 ).

Let us prove some estimates that will be used later. We have ∂z∂ Iˆ ˆ + ∂I
∂I
= ∂z ˆ ∂ ĥ = ∂I
ˆ +
∂h ∂z ∂z
2 ∂2I ∂T ∂ ˆ = O(h − ĥ) ln h + O(h ln h) =
2
O(h ln h). As ∂z∂h ∼ ∂z = O(ln h), we have ∂z (I − I)
O(h ln2 h) in D0 . Thus we have in D0 (as I − Iˆ = O(h ln h))
∂J ∂ ˆ − α−2 ∂α (I − I)
= α−1 (I − I) ˆ = O(h ln2 h)α−1 + O(ln h)α−1 (I − I)
ˆ =
∂z ∂z ∂z (9.7)
= O(h ln2 h)α−1 = O(ln−1 h)β −1 .

25
Denote
n o
D1,0 = z, J, ϕ, λ ∈ Cn+3 : |z − z0 | < 4cw ω̂, |J| < ccont,J β −1 ln−2 ĥ, |Im ϕ| < ccont ω̂, |Im λ| < ccont ,
n ccont,J −1 −2 ccont ccont o
D1 = z, J, ϕ, λ ∈ Cn+3 : |z − z0 | < 2cw ω̂, |J| < β ln ĥ, |Im ϕ| < ω̂, |Im λ| < .
2 2 2

Here the constant ccont,J > 0 is chosen so that the image of D1,0 under the map J, z, ϕ, λ 7→
h, z, ϕ, λ lies in D0 . We will use the domain D1 below, the domain D1,0 is needed
p to obtain
Cauchy estimates in D1 . The width of these domains in J is O(α−1 ĥ ln ĥ) = O( ε−1 ĥ ln−1 ĥ) =
O(β −1 ln−2 ĥ).
We have ∂f z
∂z J
= O(ln h) in D1 by Cauchy formula. However, we have weaker estimates
∂fϕ ∂fJ
O∗ (h−1 ) for ∂ϕ
and ∂J
(this is not used later, so we skip the proof). Let us set
Z ϕ
H1 (J, z, ϕ, λ) = − (fJ (J, z, ψ, λ, ε=0) − hfJ (J, z, ϕ, λ, ε=0)iϕ )dψ. (9.8)
ψ=0

We have fJ = ∂J f |
∂h h ε=0R
+ ∂J f |
∂z z ε=0
= O(ln h)α−1 + O(ln−1 h)β −1 = O(ln h)α−1 . By the
ϕ
estimates above we have ψ=0 ∂z fz dψ = O(β −1 ln−1 h) = O(α−1 ). As ∂J
∂J
∂h
= ω −1 α−1 , we have
R ϕ ∂J −1
ω ϕ ϕ
f dψ = α−1 0 fh dt = O(α−1 ) by Lemma 5.4. Thus we have 0 fJ dψ = O(α−1 ) for
R R
0 ∂h h
any ϕ. This implies

hfJ iϕ = O(α−1 ), H1 = O(α−1 ) in D1 . (9.9)

We can compute
Z ϕ
∂H1  ∂f∂ J

=− −
hfJ iϕ dψ =
∂J ψ=0 ∂J ∂J
Z ϕ 
∂fz ∂ 
= fϕ − fϕ |ϕ=0 + − divJ f + hfJ iϕ dψ.
ψ=0 ∂z J ∂J

Hence, we have
∂H1 ∂H1
fJ = − + f˜J (J, z, λ) + fˇJ (J, z, ϕ, λ, ε), fϕ = + f˜ϕ (J, z, ϕ, λ, ε),
∂ϕ ∂J
where
f˜J = hfJ |ε=0 iϕ ,
 Z ϕ  ∂f  
z ∂
f˜ϕ = fϕ |ϕ=0 − − divJ f + hfJ iϕ dψ + (fϕ − fϕ |ε=0 ), (9.10)
ψ=0 ∂z J ∂J ε=0

fˇJ = fJ − fJ |ε=0 .

By (9.9) we have f˜J = O(α−1 ). We also have


∂ ∂fz
hfJ iϕ = hdivJ f − iϕ = O(ln h). (9.11)
∂J ∂z J
Note that f˜ϕ is clearly 2π-periodic. Let us also set f˜z = fz .
Lemma 9.6.

fJ |ε=0 − fJ = O(ln h)β, fϕ |ε=0 − fϕ = O(εh−1 ln−2 h).

Proof. Denote f~ = (fp , fq , fz ). The maps f~ 7→ fJ and f~ 7→ fϕ are linear. As f~|ε=0 − f~ = O(ε),
the lemma follows from the estimates fJ = O(ln h)α and fϕ = O(h−1 ln−2 h) that hold for
any f~.

As ϕ = 0 is given by a transversal to one of the separatrices that is separated from the


saddle, the time t = ω −1 ϕ is a smooth function of the coordinates p, q, z near ϕ = 0 (we treat
values of ϕ near 2π as negative values near 0). Therefore, from ϕ = ωt we get
∂ϕ ∂t ∂ϕ ∂t ∂ϕ ∂t

=ω = O(ln−1 h), =ω = O(ln−1 h), =ω = O(ln−1 h).
∂q ϕ=0 ∂q t=0 ∂p ϕ=0 ∂p t=0 ∂z ϕ=0 ∂z t=0

Hence, fϕ |ϕ=0 = O(ln−1 h). We have by Lemma 9.5 (we use the notation a(z) from this
lemma, with a(z) = O(ln h))
Z ϕ Z ϕ
divJ f − hdivJ f iϕ dψ = O(1) + a(z) (fz − fz (C)) − hfz − fz (C)iϕ dψ.
ψ=0 ψ=0

This is O(1), as the integral above can be estimated by Lemma 5.4. By (9.10) this implies
f˜ϕ = O(1) (note that fϕ |ε=0 − fϕ = O(1) by Lemma 9.6, as h & ε ln5 ε).

26
The system (5.1) rewrites in D1 (we use the notation ω(J, z) = ω(h(J, z), z)) as

∂H1
J˙ = −ε + εf˜J (J, z, λ) + εfˇJ (J, z, ϕ, λ, ε),
∂ϕ
ż = εf˜z (J, z, ϕ, λ, ε),
(9.12)
∂H1
ϕ̇ = ω(J, z) + ε + εf˜ϕ (J, z, ϕ, λ, ε),
∂J
λ̇ = 1.

9.3 Transition to resonant phase


It will be convenient to use new angle variables γ, µ given by

γ = ϕ − (s2 /s1 )λ, µ = λ/s1 ; ϕ = γ + s2 µ, λ = s1 µ. (9.13)


∂ ∂
Note that ∂ϕ
= ∂γ
, as both these derivatives are taken for fixed λ and µ. After the coordinate
change (J, z, ϕ, λ) 7→ (J, z, γ, µ) and the time change ψ 0 = dψ

= s1 ψ̇ the system (9.12) rewrites
as
∂H1
J 0 = −s1 ε + s1 εf˜J + s1 εfˇJ ,
∂γ
z 0 = s1 εf˜z ,
∂H1
γ 0 = s1 ω − s2 + s1 ε + s1 εf˜ϕ ,
∂J
µ0 = 1.

The coefficients of this system are 2π-periodic in γ and µ and also are invariant under the
translation (γ, µ) 7→ (γ − 2π ss12 , µ + 2π
s1
). Set
  Z J
H2 (J, z, γ, µ) = α H1 (J, z, γ + s2 µ, s1 µ) − γ f˜J (0, z, s1 µ) + β −1 ˜ z) − s2 /s1 )dJ.
(ω(J, ˜
0

Note that H2 is 2π-periodic in µ. Set

gJ (J, z, γ, µ, ε) = f˜J (J, z, s1 µ)−f˜J (0, z, s1 µ)+fˇJ , gz (J, z, γ, µ, ε) = f˜z , gϕ (J, z, γ, µ, ε) = f˜ϕ .

Then our system rewrites as


∂H2
J 0 = −s1 β + s1 εgJ ,
∂γ
z 0 = s1 εgz ,
(9.14)
0 ∂H2
γ = s1 β + s1 εgϕ ,
∂J
0
µ = 1.

We will consider the Hamiltonian part of this system (i.e. without the g∗ terms) in the
domain
n ccont ccont o
D2 = z, J, γ, µ ∈ Cn+3 : |z − z0 | < 2cw ω̂, |J| < 1, |Im γ| < ω̂, |Re γ| < Cγ , |Im µ| < ,
4 4s1
where Cγ > 0 should satisfy Cγ  ω̂ccont . Note that we have β −1 ln−3 h > 1 for small
enough ε given h > ε| ln ε|ρ for ρ > 1. Thus, the image of this domain under the map
2
J, z, γ, µ 7→ J, z, ϕ, λ lies inside D1 . Using the estimates (5.6) on ∂ω
∂I
, ∂∂Iω2 , we can compute
β −1 (ω(I=Iˆ+ αJ) − s2 /s1 ) = J + O(β ln2 h). As H1 , f˜J = O(α−1 ) by (9.9), we have H2 = O(1)
in D2 (also for ρ > 1). We also have gJ = O(ln h); gϕ , gz = O(1) in the real part of D2 . Indeed,
the estimates on f˜ϕ and f˜z were obtained above and the estimate on gJ follows from (9.11)
and Lemma 9.6.
As we have αH1 = O(1) in D1 by (9.9), we get Cauchy estimate ∂αH ∂J
1
= O(β ln2 h) in D2 .
This allows us to separate the main part of H2 : in D2 we have

H2 = J 2 /2 + H2,0 (z, γ, µ) + β ln2 h H2,1 (J, z, γ, µ), where


(9.15)
ˆ
H2,0 = αĤ1 (z, γ + s2 µ, s1 µ) − γαf˜J (z, s1 µ), H2,1 = O(1).

9.4 Averaging over time


Let us state the following lemma that follows from [11].

27
Lemma 9.7. Consider a Hamiltonian system with the Hamiltonian εH(p, q, t) periodically
depending on time t (with the period 2π) and slow variables p, q:
∂H
q̇ = ε (p, q, t),
∂p
∂H (9.16)
ṗ = −ε (p, q, t),
∂q
ṫ = 1.

Assume that the Hamiltonian εH(p, q, t) is defined for (p, q) in a complex neighborhood Uδ of
some real domain U ⊂ R2 of width δ > 0: Uδ = U +{p, q ∈ C2 ; |p|, |q| ≤ δ} and t ∈ St1 = R/2πZ
and we have |H| < CH in Uδ × St1 . We assume H to be analytic in p, q and continuous in t.
Then there is C > 0 depending only on CH and δ such that for all ε ∈ (0, C) there are new
canonical variables p̃(p, q, t), q̃(p, q, t) with

|p̃(p, q, t) − p| ≤ C −1 ε, |q̃(p, q, t) − q| ≤ C −1 ε

such that in these coordinates our system is defined in V = U0.5δ × St1 and is given by the
Hamiltonian
εH(p̃, q̃) + ε∆H(p̃, q̃, t)
with
≤ C −1 ε, k∆HkV ≤ exp −Cε−1 .

H(p̃, q̃) − hH(p, q, t)it |p=p̃,q=q̃
V
Let us obtain explicit dependence of the estimates in this lemma on the width of the complex
domain where the system is defined.
Corollary 9.8. Consider the system (9.16) with the Hamiltonian εH periodically depending
on time t (with the period 2π) and slow variables p, q. Assume that the Hamiltonian εH(p, q, t)
is defined for (p, q) in a complex neighborhood Uδp ,δq of some real domain U ⊂ R1 of width
δp , δp ∈ (0, 1) in p and q, respectively:

Uδp ,δq = U + {p, q ∈ C2 ; |p| < δp , |q| < δq }

and t ∈ St1 = R/2πZ and we have |H| < CH in Uδp ,δq × St1 . We assume H to be analytic in
p, q and continuous in t,
Then there is C > 0 depending only on CH such that for all ε ∈ (0, Cδp δq ) there are new
canonical variables p̃(p, q, t), q̃(p, q, t) with

|p̃(p, q, t) − p| ≤ C −1 εδq−1 , |q̃(p, q, t) − q| ≤ C −1 εδp−1

such that in these coordinates our system is defined in V = U0.5δp ,0.5δq × St1 and is given by
the Hamiltonian
εH(p̃, q̃) + ε∆H(p̃, q̃, t)
with

≤ C −1 εδp−1 δq−1 , k∆HkV ≤ exp −Cδp δq ε−1 .



H(p̃, q̃) − hH(p, q, t)it |p=p̃,q=q̃
V

Proof. Let us make a coordinate change p = δp p0 , q = δq q 0 . The Hamiltonian in the new


coordinates is ε(δp δq )−1 H. Denote ε0 = ε(δp δq )−1 , then the new system is given by the
Hamiltonian ε0 H. This system is analytic for (p0 , q 0 ) in a complex neighborhood U10 of some
real domain U 0 with width 1 in both p0 and q 0 . The domain U 0 is large, but the constant C in
Lemma 9.7 does not depend on U 0 and Uδ0 , it only depends on the width δ. Lemma 9.7 gives
us new coordinates p̃0 , q̃ 0 with |p̃0 − p0 |, |q̃ 0 − q 0 | ≤ C −1 ε0 . In the coordinates p̃0 , q̃ 0 the system
0
is given by the Hamiltonian ε0 H (p̃0 , q̃ 0 ) + ε0 ∆H 0 (p̃0 , q̃ 0 , t) with

0 0 0
≤ C −1 εδp−1 δq−1 ; ∆H 0 U 0 ×S 1 ≤ exp −Cδp δq ε−1 .

H (p̃ , q̃ ) − hH(p, q, t)it |p=δp p̃0 ,q=δq q̃0 0

1
U0.5 ×St 0.5 t

0
Set p̃ = δp p̃0 , q̃ = δq q̃ 0 and let H(p̃, q̃), ∆H(p̃, q̃) be H and ∆H 0 written in these coordinates.
It is easy to check that the estimates above imply the estimates in the statement of this
corollary.

9.5 After averaging


Let us recall the system (9.14)
∂H2
J 0 = −s1 β + s1 εgJ ,
∂γ
z 0 = s1 εgz ,
∂H2
γ 0 = s1 β + s1 εgϕ ,
∂J
µ0 = 1.

28
The Hamiltonian part of this system is defined in
n ccont ccont o
D2 = z, J, γ, µ ∈ Cn+3 : |z − z0 | < 2cZ ω̂, |J| < 1, |Im γ| < ω̂, |Re γ| < Cγ , |Im µ| < ,
4 4s1
and the whole system is defined in the real part of this complex domain. Let us apply
Lemma 9.8 to the Hamiltonian part of (9.14), with t = µ, U = (−0.5, 0.5)J × (−Cγ +
ccont ω̂
4
, Cγ − ccont
4
ω̂
)γ , δJ = 0.5, δγ = ccont
4
ω̂
and ε1 = s1 β. Here we denote by ε1 the ε
variable used in Corollary 9.8 to distinguish it from ε in (5.1). This corollary gives new coor-
dinates that we denote P̃ , Q̃. Corollary 9.8 is applied separately for different values of z, but
it is easy to check that the construction in [11] gives P̃ (J, γ, z, µ) and Q̃(J, γ, z, µ) that are
analytic in z. Let us make a scale transformation
√ √
Q = Q̃/ω̂, P = P̃ / ω̂; Q̃ = ω̂Q, P̃ = ω̂P,

this will simplify the main part of our system that will be written later. By Corollary 9.8 we
have √
−1
J − ω̂P (J, γ, z, t) = O(s1 ω̂ β), |γ − ω̂Q(J, γ, z, t)| = O(s1 β). (9.17)

In P̃ , Q̃ coordinates the system (9.14) without g is given by the Hamiltonian

H̃(P̃ , Q̃, z, µ) = s1 β H̃3 (P̃ , Q̃, z) + s21 ω̂ −1 β 2 H̃4 (P̃ , Q̃, z) + s1 β exp −C ω̂s−1
1 β
−1 
H̃5 (P̃ , Q̃, z, µ),
H̃3 (P̃ , Q̃, z) = hH2 (J, γ, z, µ)iµ |J=P̃ ,γ=Q̃ ,
H̃3 , H̃4 , H̃5 = O(1).

Denote by H(P, Q, z, µ) the new Hamiltonian after scaling, we have H = ω̂ −1.5 H̃. Hence, in
the P, Q coordinates this rewrites as

H = ω̂ −1.5 s1 βH3 (P, Q, z) + s21 ω̂ −2.5 β 2 H4 (P, Q, z) + ω̂ −1.5 s1 β exp −C ω̂s−1


1 β
−1 
H5 (P, Q, z, µ),
H3 (P, Q) = hH2 (J, γ, µ)iµ |J=√ω̂P,γ=ω̂Q ,
H3 , H4 , H5 = O(1).

This system is defined in the domain (we reduce this domain a bit to have a shorter formula,
taking into account that Cγ  ccont ω̂)
n ω̂ −0.5 Cγ ω̂ −1 ccont o
D3 = z, P, Q, µ ∈ Cn+2 ×R : |z − z0 | < 2cZ ω̂, |P | < , |Re Q| < , |Im Q| < .
2 2 8
Let us also consider real domain
n ω̂ −0.5 Cγ ω̂ −1 o
D4 = z, P, Q, µ ∈ Rn+3 : |z − z0 | < cZ ω̂, |P | < , |Q| <
4 4
and the same domain rewritten using Z = ω̂ −1 (z − z0 ) instead of z:
n ω̂ −0.5 Cγ ω̂ −1 o
D = Z, P, Q, µ ∈ Rn+3 : |Z| < cZ , |P | < , |Q| < .
4 4
We have Cauchy estimates valid in D

kHi kC 2 = O(1) for i = 3, 4, 5. (9.18)

= ω̂ −1.5 s1 β exp −C ω̂s−1 −1



Denote CH5 1 β . For large enough Cρ we have

Cs1 ω̂ −1 β < | ln−1 ε|/4 (9.19)

and thus CH5 = O(ε3 ). By (9.18) this means that the corresponding terms in Hamiltonian
equations are also O(ε3 ).
Let us now include the terms appearing after we reintroduce the terms s1 εg in (9.14)
rewritten in the new coordinates. Denote
∂ P̃ ∂ P̃ ∂ P̃ −1 0.5 ∂H5
uP = gJ + gϕ + gz − CH5 s−1
1 ε ω̂ ,
∂J ∂γ ∂z ∂Q
∂ Q̃ ∂ Q̃ ∂ Q̃ −1 ∂H5
uQ = gJ + gϕ + gz + CH5 s−1
1 ε ω̂ ,
∂J ∂γ ∂z ∂P
uz = gz .

29
From the estimates gJ = O(ln h), gϕ , gz = O(1) we get uP = O(ln h), uQ , uz = O(1) (we
use that ∂(P̃∂x−J) = O(1), ∂(Q̃−γ)
∂x
= O(ln−1 ε) for x = J, γ, z in D4 by the Cauchy formula
and (9.17) and (9.19)). Now the system (9.14) rewrites as
∂H3 ∂H4
P 0 = −ω̂ −1.5 s1 β (P, Q, z) − ω −2.5 s21 β 2 (P, Q, z) + s1 εω̂ −0.5 uP (P, Q, z, µ),
∂Q ∂Q
∂H3 ∂H4
Q0 = ω̂ −1.5 s1 β (P, Q, z) + ω̂ −2.5 s21 β 2 (P, Q, z) + s1 εω̂ −1 uQ (P, Q, z, µ),
∂P ∂P
z 0 = s1 εuz (P, Q, z, µ),
µ0 = 1.

After the time change dµdτ


= s1 ω̂ −0.5 β(z), dλ

= ω̂ −0.5 β(z) we obtain the system (we recycle 0

to denote also the derivative with respect to the new time τ )


∂H3 ∂H4
P 0 = −ω̂ −1 (P, Q, z) − ω̂ −2 s1 β (P, Q, z) + αuP (P, Q, z, τ ),
∂Q ∂Q
∂H3 ∂H4
Q0 = ω̂ −1 (P, Q, z) + ω̂ −2 s1 β (P, Q, z) + αω̂ −0.5 uQ (P, Q, z, τ ),
∂P ∂P
z 0 = αω̂ 0.5 uz (P, Q, z, τ ),
τ 0 = 1.

Let us now use (9.15) to separate the main part of this system. We replace

H3 = hH2 (J, γ, z, µ)iµ |J=√ω̂P,γ=ω̂Q

with its main part (corresponding to J 2 /2 + H2,0 (z, γ, µ) from (9.15)) that we denote H6 and
add the remainder to H4 (this sum is denoted H7 ). We have in D3

P2
H6 (P, Q) = ω̂ +hH2,0 iµ |γ=ω̂Q , H7 (P, Q) = H4 (P, Q)+O(s−1 √
2 )hH2,1 iµ |J= ω̂P,γ=ω̂Q = O(1).
2
(9.20)
Denote
∂H6
Fs = ω̂ −1 . (9.21)
∂Q
Then the system above rewrites as
∂H7
P 0 = −Fs (Q, z) − ω̂ −2 s1 β (P, Q, z) + αuP (P, Q, z, τ ),
∂Q
∂H7
Q0 = P + ω̂ −2 s1 β (P, Q, z) + αω̂ −0.5 uQ (P, Q, z, τ ),
∂P
z 0 = αω̂ 0.5 uz (P, Q, z, τ ),
τ 0 = 1.

This system is defined in the domain D4 ; using Z instead of z gives the system (9.4) defined
in D. As H7 = O(1) in D3 , in D we also have kH7 kC2 = O(1) by Cauchy formula (moreover,
∂H7

∂P
= O( ω̂)). Thus we have in D:

kH7 kC 2 , kuQ kC , kuz kC = O(1), kuP kC = O(ln h).

This completes the proof of Lemma 9.2.

Proof of Lemma
√ 9.3. As D depends 2π-periodically on µ, it depends 2πs1 -periodically on λ.
As D = J/ ω̂, by (9.17) we have

|D − P | = O(s1 ω̂ −1.5 β) in D3 .

The estimates of Lemma 9.3 in D4 ⊂ D3 follow by Cauchy formula.

9.6 Main part of the Hamiltonian


In this section we prove Lemma 9.4. By (9.21), (9.20), (9.15) and (9.13) we have
∂H6 ∂ ˆ
−Fs = −ω̂ −1 = −α hĤ1 (z, γ + s2 µ, s1 µ) − γ f˜J (z, s1 µ)iµ |γ=ω̂Q .
∂Q ∂γ

and f˜J = hfJ iϕ by (9.10), this rewrites as


∂ ∂


As ∂γ
and h·iµ commute, ∂γ µ=const
= ∂ϕ λ=const
D ∂  E
−Fs = α − Ĥ1 (z, ϕ, λ) + hfˆJ (z, ϕ, λ)iϕ .

∂ϕ ϕ=s2 µ+ω̂Q,λ=s1 µ µ

From (9.8) we obtain

−Fs = αhfˆJ (z, ϕ=s2 µ+ω̂Q, λ=s1 µ, ε=0)iµ = αhfˆJ (z, ϕ, λ=ω̂ −1 ϕ−Q, ε=0)iϕ∈[0,2s2 π] .

30
Let us again use the notation t = ω̂ −1 ϕ = (s1 /s2 )ϕ for the time for the unperturbed system.
ˆ fˆ . Denote (we reuse the notation g already used in Section 9.3,
We have αfˆJ = ω̂ −1 fˆh + α ∂J
∂z z
as g from Section 9.3 is not mentioned in the current section) g = fˆh + ω̂α ∂J ˆ fˆ | . We
∂z z ε=0
have αfˆJ = ω̂ −1 g. We have the estimate (9.7) ∂J = O(h ln2 h α−1 ), thus
∂z

g = fˆh + O(ĥ ln ĥ)fˆz in D0 . (9.22)

We will write g(z, t, λ) = g(z, ϕ=ωt, λ). Let us rewrite


Z s2 T
−Fs = ω̂ −1 hg(z, ϕ, λ=ω̂ −1 ϕ−Q)iϕ∈[0,2s2 π] = (2πs2 )−1 g(z, t, λ=t−Q)dt. (9.23)
t=0

Let us denote by l0 and l1 the separatrices, let l0 correspond to ϕ ≈ 0 and l1 to ϕ ≈ π. Let


us split the phase curve of the unperturbed system for given h, z into 2 pieces ˆ l0 , ˆ
l1 close to
the separatrices. We cut the phase curve by the line y = x (cf. fig. 3). Let us define the
coordinates t0 = t, t1 = t − 0.5T on ˆ l0 and ˆ
l1 , respectively. These coordinates are defined up
to adding iT, i ∈ Z and are given by the time passed after crossing the transversals ϕ = 0
and ϕ = π, respectively. One may check that for h → 0 these transversals approach some
limit points on the separatrices, so the coordinates t0 , t1 can be continued to the separatrices
themselves. Note that iT = 2π iss2
1
. We can split the integral above as
Z s2 −1
X  is1 
−Fs = (2πs2 )−1 g z, t0 =t0 , λ=t0 +2π −Q dt0 +
l̂0 i=0
s2
Z s2 −1
X  (2i + 1)s1 
+ (2πs2 )−1 g z, t1 =t1 , λ=t1 +π −Q dt1 .
l̂1 i=0
s2

2 −1 2 −1
As s1 and s2 are coprime, we have {is1 /s2 mod 1}si=0 = {i/s2 mod 1}si=0 and this rewrites
as
Z D 
i E
Fs = − (2π)−1 g z, t0 =t0 , λ=t0 +2π −Q dt0
l̂ s2 i=0,...,s2 −1
Z0D 
πs1 i E
− (2π)−1 g z, t1 =t1 , λ=t1 + +2π −Q dt1 .
l̂1 s2 s2 i=0,...,s2 −1

Taking the derivative of the expression above yields


Z D 
∂Fs ∂g i E
= (2π)−1 z, t0 =t0 , λ=t0 +2π −Q dt0
∂Q l̂0 ∂λ s2 i=0,...,s2 −1
Z D  (9.24)
∂g πs1 i E
+ (2π)−1 z, t1 =t1 , λ=t1 + +2π −Q dt1 .
l̂1 ∂λ s2 s2 i=0,...,s2 −1

Note that as ω(ĥ, z) = s2 /s1 , we have ĥ → 0 for s2 /s1 → 0. For fixed value t of t0 or t1
by (9.22) and using that the coordinates h, t do not have singularities on the separatrices (i.e.
p(h, t) and q(h, t) are smooth) we have

g(z, t0 =t, λ) = fh (h=0, z, t0 =t, λ) + O(ĥ ln ĥ),


g(z, t1 =t, λ) = fh (h=0, z, t1 =t, λ) + O(ĥ ln ĥ).

As fh (C) = 0, the values of maxλ |fh (h=0, z, t0 =t, λ)| and maxλ |fh (h=0, z, t1 =t, λ)| exponen-
tially decrease when |t| → ∞. Hence, the formulas above imply C 0 -convergergence in
Z D 
C1 −1 i E
Fs −−−−−−→ − (2π) fh h=0, z, t0 =t0 , λ=t0 −Q+2π , ε=0 dt0
s2 /s1 →0 l s2 i=0,...,s2 −1
Z0D 
s1 mod 2s2 i E
− (2π)−1 fh h=0, z, t1 =t1 , λ=t1 −Q+2π +2π , ε=0 dt1
l1 2s2 s2 i=0,...,s2 −1
(9.25)

in the domain
DF,0 = Q, z ∈ [0, 2π] × Cn : kz − z0 k < c .


We can check that − ∂F


∂Q
s
converges to the Q-derivative of the right-hand side in the same way,
∂fh fˆh fˆz
using (9.24). Note that∂λ
(C) = 0 and, similarly to (9.22), we have ∂λ ∂g
= ∂∂λ + O(h ln h) ∂∂λ .
∂ 2 Fs ∂2
Finally, by Cauchy formula ∂F
∂z
s
and ∂z 2
converge to ∂
∂z
and ∂z 2
, respectively, of the right-
∂ 2 Fs ∂2
hand side of (9.25) in DF . Similarly, ∂z∂Q converges to the ∂z∂Q of the right-hand side
of (9.25). This shows that ∂F∂z
s
converges in C 1
to the z-derivative of the right-hand side
of (9.25).

31
Proof of Lemma 9.4. The first and the second parts of Lemma 9.4 follows from (9.23). Pe-
riodicity follows from the fact that g(λ) is 2π-periodic. As the right-hand side of (9.23) is
defined in D0 , we can continue Fs in the domain DF,0 .
By the estimate for the error of the trapezoidal integration rule we have
D  i Es2 −1
fh h=0, z, t0 =t0 , λ=t0 +2π −Q = hfh (h=0, z, t0 =t0 , λ)iλ + O(s−2
2 ).
s2 i=0

Together with (9.25) this implies kFs − Θ3 kC 0 → 0 in DF,0 for s2 → ∞, s2 /s1 → 0. By Cauchy
2 2
formula this also means ∂Fs − ∂Θ3 0 → 0 and ∂ F2s − ∂ Θ23 → 0 in DF . We have

∂z ∂z C ∂z ∂z
C0

∂ D  i Es2 −1 D ∂f  i Es2 −1
h
fh h=0, z, t0 =t0 , λ=t0 +2π −Q =− h=0, z, t0 =t0 , λ=t0 +2π −Q .
∂Q s2 i=0 ∂λ s2 i=0

Applying trapezoidal rule argument again yields ∂F → 0 in DF,0 for s2 → ∞, s2 /s1 → 0.
s
∂Q 0
2 C
∂ Fs
Cauchy formula implies ∂Q∂z → 0 in DF , this proves the third part of the lemma.
C0
The last part of the lemma follows from (9.25) and C 1 -convergence of z-derivatives in this
formula established above. As the right-hand sides of (9.25) depend on s1 mod 2s2 and not
on s1 , there is a finite set of possible right-hand sides and we take this set as Fs2 .

32
10 Crossing resonant zones: proofs
10.1 High-numerator resonances: proof
Proof of Lemma 6.8. Let us apply Lemma 9.2 with z0 as in the statement of lemma and
Cγ ≥ 8π. By (9.1) and the bound on s1 from the statement of lemma we have
ω̂ −2 s1 β(z0 ) . Cρ−0.5 . (10.1)
This means that for large enough Cρ the terms containing H7 in (9.4) are small. The terms
containing u are also small for small enough ε. For large enough S2 there exists δ > 0 such
that Fs > δ if s2 > S2 (by Lemma 9.4). This means P 0 < −0.5δ in (9.4).
Lemma 9.2 gives the domain D. We need (9.2) to hold in this domain. This holds if
cZ < dZ , where cZ is from Lemma 9.2 and dZ is from Lemma 9.1. We reduce cZ if needed
so that cZ < dZ . Denote D(I, z) by the equation I = I(z) ˆ + Dα(z)ω̂ 0.5 . This also gives the
function D(P, Q, Z, τ ) defined in D. By (9.5) we have

|D − P | = O(s1 ω̂ −1.5 β) . ω̂ in D. (10.2)
Denote by D0 ⊂ D the subdomain given by the additional restriction |P | < Dp + 1.
Denote Y = (P, Q, Z, τ ) and consider a solution Y (τ ) of (9.4) with the initial condition Y0 ,
Y0 = (P0 , Q0 , Z0 , τ0 ), obtained from X0 after the coordinate change of Lemma 9.2. We can
add 2πk to ϕ(X0 ) so that γ = ϕ − (s2 /s1 )λ ∈ [−π, π], then by (9.5) and (10.1) we have
Q0 ∈ [−1.1π ω̂ −1 , 1.1π ω̂ −1 ]. By this estimate on Q0 and (10.2) Y0 ∈ D0 for small enough ω0 .
As P 0 < 0, the solution Y (τ ) cannot leave D0 through P = Dp +1. Time τ required to leave
D without reaching the hypersurface D = −Dp − 1 (i.e., via kZk = cZ or |Q| = ω̂ −1 Cγ /4) is
0

& ω̂ −1 by (9.4). On the other hand, after time τ . Dp we will have P < −Dp − 1, thus (for
small enough ω0 ) the solution Y (τ ) leaves D0 through P = −Dp − 1. Denote the time λ when
this happens by λout . At the moment λout we have P = −Dp − 1 and thus D < −Dp ≤ −Dp0 .
Hence, λ1 < λout .
Let us now obtain estimate for the time passed before crossing D = −Dp0 , i.e. for λ1 −
λ0 . Such estimate can be obtained from (10.2), but we need better estimate. Given λ2 ∈
[λ0 , λout − 2πs1 ], set λ3 = λ2 + 2πs1 . Denote by ∆τ the time τ between λ2 and λ3 . We have
∆τ ∼ s1 ω̂ −0.5 β(z0 ). Let us compare D(λ3 ) with D(λ2 ). We have from (9.4)
P (λ3 ) ≤ P (λ2 )−0.5δ∆τ, |P (λ3 )−P (λ2 )|+|Q(λ3 )−Q(λ2 )| . ∆τ, kz(λ3 ) − z(λ2 )k . α(z0 )ω̂ −1.5 ∆τ.
∂D
By (10.1) and (9.6) we have ∂P
> 0.9 for small enough ω0 . Together with the estimates above
and (9.6), this implies
 
D(X(λ3 )) ≤ D(X(λ2 )) − ∆τ 0.5δ + O(s1 ω̂ −1.5 β(z0 )) + O(s1 ω̂ −4 ε) ≤ D(X(λ2 )) − 0.25δ∆τ.

As δ ∼ 1, this gives the estimate for λ1 − λ0 from the statement of the current lemma:
λ1 − λ0 − 2πs1 . s1 Dp0 /∆τ ∼ Dp0 ω̂ 0.5 β(z0 )−1 ∼ Dp0 ω̂ 0.5 ε−1 α(z0 ).

10.2 Lemma on model system


In this subsection we state a general lemma that will be later applied to study resonance
crossing described by (9.4). Consider a system
∂H
p0 = − (p, q, w) + vp (p, q, w, τ ),
∂q
∂H
q0 = (p, q, w) + vq (p, q, w, τ ), (10.3)
∂p
w0 = vw (p, q, w, τ ),
τ 0 = 1,
where
H = H0 + ∆H(p, q, w), H0 = p2 /2 + V (q),
and
k∆HkC 2 < ε1 , kvp kC 0 , kvq kC 0 , kvw kC 0 < ε2 (10.4)
and
V (q) = Vc q + Vper (q), (10.5)
where Vc > 0 is a constant and Vper (q) is 2π-periodic. We will call the system (10.3) the
perturbed system. We will call this system without the v terms the intermediate system. It is
an autonomous Hamiltonian system with the Hamiltonian H(p, q, w). We will call the Hamil-
tonian system given by H0 the unperturbed system. One can find analysis of the unperturbed
system in [4, Section 9.2]. We will assume that the function V satisfies Condition B 0 intro-
duced in Subsection 2.3. Note that the saddles of the unperturbed system correspond to the
local maxima of V . This also holds for the intermediate system if ε1 is small enough.

33
Lemma 10.1. Fix V (q) as above. Then for any small enough ε1 > 0 and any large enough
Dp,1 , Dp,2 > 0 with Dp,1 < Dp,2 for any large enough (compared with Dp,2 ) Cp for any large
enough (compared with Cp ) Cq there exists C > 1 such that for any cZ > 0 and any ε2 > 0
with
ε2 < ε 1 , ε2 (1 + | ln ε2 |) < 0.5cZ C −1
the following holds.
Consider the unperturbed system in the domain

D = w, p, q, τ ∈ Rn+3 : kwk ≤ cZ , p ∈ [−Cp , Cp ], q ∈ [−Cq , Cq ] .




For any ∆H, vp , vq , vw that satisfy (10.4) in this domain with ε1 , ε2 fixed above, also consider
the intermediate and the perturbed systems in this domain. Let

X0 = (p0 , q0 , w0 , τ0 )

denote some initial data with

p0 ∈ [Dp,1 , Dp,2 ], q0 ∈ [−π, π], kw0 k < 0.5cZ .

If V has no local maxima, set ∆h0 = 1. Otherwise, let Ci (w) be a saddle of the intermediate
system such that H(Ci (w0 ), w0 ) is as close as possible to H(p0 , q0 , w0 ) and set

∆h0 = |H(p0 , q0 , w0 ) − H(Ci (w0 ), w0 )|.

We will assume
∆h0 > 2Cε2 .
Let X(τ ) = (p(τ ), q(τ ), w(τ ), τ ) denote the solution of the perturbed system with initial data
X0 . Then there exists τ1 > τ0 such that

p(τ1 ) = −Cp , τ1 − τ0 < C(1 + | ln ∆h0 |)

and for any τ ∈ [τ0 , τ1 ] we have

X(τ ) ∈ D, |H(X(τ )) − H(X(τ0 ))| < Cε2 , kw(τ ) − w(τ0 )k < Cε2 (1 + | ln ∆h0 |).

10.3 Proof of the lemma on model system


Let us state a lemma that will be used to prove Lemma 10.1. Denote by U some neighborhood
of (0, 0, 0) with diam U < 1. Consider the Hamiltonian system given in U by the Hamiltonian
H = p2 /2 − aq 2 + ∆H(p, q, w), a > 0 and its perturbation given by a vector field u:
∂∆H
ṗ = 2aq − (p, q, w) + up (p, q, w, t),
∂q
∂∆H
q̇ = p + (p, q, w) + uq (p, q, w, t),
∂p
ẇ = uz (p, q, w, t).

Assume that in U we have

k∆HkC 2 < ε1 , kup kC 0 , kuq kC 0 , kuw kC 0 < ε2 . (10.6)


−1
The constants ε1 and ε2 are assumed to be small enough compared to min(a, a ). The
Hamiltonian system has a saddle C(w) = (pC (w), qC (w)) with pC , qC = O(ε1 ). Assume
C(w) ∈ U for all values of w encountered in U . Denote h(p, q, w) = H(p, q, w) − H(C(w), w).
Lemma 10.2. There exists C1 > 0 such that for any ∆H, up , uq , uw as above the following
holds. Given any initial data (p0 , q0 , w0 , t0 ) in U such that

|h(p0 , q0 , w0 )| ≥ C1 ε2 ,

consider the solution p(t), q(t), w(t), t of the perturbed system starting at p0 , q0 , w0 , t0 . Then
this solution exits U at some time t1 > t0 and for any t ∈ [t0 , t1 ] we have the estimates

|h(p(t), q(t), w(t)) − h(p0 , q0 , w0 )| < 0.5C1 ε2 , t1 − t0 < C1 (1 + |ln |h(p0 , q0 , w0 )||).

Proof. We will assume H(p0 , q0 , w0 ) > H(pC (w0 ), qC (w0 ), w0 ), the proof is similar when the
opposite inequality holds. All O-estimates in this proof will be uniform in p0 , q0 , w0 , ∆H,
up , uq , uw . Denote p̃ = p − pC (w), q̃ = q − qC (w). Let us denote by H̃ the Hamiltonian
H − H(pC (w), qC (w), w) rewritten in the shifted coordinates. We can write H̃ = p̃2 /2 − aq̃ 2 +
ψ(p̃, q̃, w), where ψ = O(ε1 ). As at the points (0, 0, w) we have H̃ = ∂∂H̃ p̃
= ∂∂H̃

= 0, ψ does
not contain constant or linear terms with respect to p̃, q̃. Hence, we have
∂ψ ∂ψ ∂ψ
H̃ = p̃2 /2 − aq̃ 2 + ψ(p̃, q̃, w), ψ, = ε1 O(p̃2 + q̃ 2 ), , = ε1 O(|p̃| + |q̃|). (10.7)
∂w ∂ p̃ ∂ q̃

34
Denote p̃(t) = p(t) − pC (w(t)), q̃(t) = q(t) − qC (w(t)) and h(t) = H̃(p̃(t), q̃(t), w(t)). Let us
take the largest t2 > t0 such that for any t ∈ [t0 , t2 ) we have

(p(t), q(t), w(t)) ∈ U, h(t) > ε2 .

Then for any t ∈ [t0 , t2 ] at the point (p̃, q̃, w) = (p̃(t), q̃(t), w(t)) we have

h = h(t) = (1 + O(ε1 ))p̃2 /2 − (a + O(ε1 ))q̃ 2 .

Hence, p √
|p̃| ≥ aq̃ 2 + h ≥ c1 max(|q̃|, ε2 )
for some c1 ∈ (0, 1). Hence, p̃(t) has the same sign for all t ∈ [t0 , t2 ]. Without loss of generality
we will assume it to be positive. We have q̃˙ = ∂∂H̃ p̃
+ε2 (uq − dq C
u ) = p̃+ε1 O(|p̃|+|q̃|)+O(ε2 ).
dw w
Therefore, p
q̃˙ ≥ 0.25c1 (|p̃| + |q̃|), q̃˙ ≥ 0.5 aq̃ 2 + h. (10.8)

By (10.7) we have ∂ p̃ , ∂ q̃ ≤ O(1)(|p̃| + |q̃|) and thus q̃˙−1 ∂∂H̃ , q̃˙−1 ∂∂H̃ = O(1). As q̃˙ > 0,
∂ H̃ ∂ H̃
p̃ q̃
we can use q̃ as an independent variable instead of t. Denote by 0 the derivative with respect
to q̃. We have

∂ H̃  dCq  ˙−1 ∂ H̃  dCp  ∂ H̃


h0 = q̃˙−1 uq − uw + q̃ up − uw + uw = O(ε2 ).
∂q dw ∂p dw ∂w

Thus for t ∈ [t0 , t2 ] we have the estimate |h(t) − h(t0 )| = O(ε2 ) and this estimate does not
depend on t2 . Denote h0 = h(t0 ). We have

h0 = H̃(p̃0 , q̃0 , w0 ) = H(p0 , q0 , w0 ) − H(Cp (w0 ), Cq (w0 ), w0 ) ≥ C1 ε2 .

Hence, for large enough C1 we have h(t2 ) > 0.5h0 > 0 and thus the solution exists U at the
time t2 . This means that t1 exists and t1 = t2 . This also proves the estimate for the change
of H.
Using (10.8), we can estimate (the details are given below)
Z q(t1 ) Z
dq̃
t1 − t0 = ≤2 (aq̃ 2 + 0.5h0 )−0.5 dq̃ = O(|ln h0 | + 1).
q(t0 ) q̃˙

The
√ estimate for the√integral above can √ be obtained by splitting it into two parts, with |q̃| ≤
h0 and with |q̃| > √h0 . When |q̃| ≤ h0 , we use (aq̃ 2 + 0.5h0 )−0.5 = O(h−0.5
0 ), so this part
is O(1). When |q̃| > h0 , we use (aq̃ 2 + 0.5h0 )−0.5 = O(q̃ −1 ), so this part is O(|ln h0 | + 1).
This gives the estimate for t1 − t0 and thus completes the proof of the lemma.

Proof of Lemma 10.1. We will consider the case where the unperturbed system has saddles,
the other case is much simpler. Let us first consider the unperturbed system in

w, p, q, τ ∈ Rn+3 : kwk ≤ cZ .


Note that for p > 0 the value of the function L = p2 /2 + Vper (q) = H0 − Vc q decreases along
the solutions of the unperturbed system, L̇ = −pVc < 0. For small enough ε1 (this implies
that ∆H and v are small) and p > 1 we also have L̇ < 0 along the solutions of intermediate
and perturbed systems. The function L is 2π-periodic in q and its contour lines such that
p > 1 on the whole contour line provide transversals to solutions of all three systems.
Let us take a contour line Dp of L such that p > 1 on this line, let Dp,0 be the maximum of
p on this line. We assume Dp,1 > Dp,0 . Given Dp,1 and Dp,2 , take Cp > Dp,2 such that there
is a contour line of L with p ∈ (Dp,2 , Cp ) on the whole contour line. This guarantees that a
solution (of any of the three systems) starting with p ≤ Dp,2 does not cross the line p = Cp .
Let us now restrict to the domain

D+ = w, p, q, τ ∈ Rn+3 : kwk ≤ cZ , |p| ≤ Cp .




By (10.4) the set



H(w, ∆H) = H(p, q, w) : (p, q, w, τ ) ∈ D+ , q ∈ [−π, π]

is bounded (uniformly in w and ∆H). We will assume ε1 < 0.25Vc . Then we have

H(p, q + 2π) ≥ H(p, q) + πVc .

As H is bounded, this means that the values of |q| on the contour lines given by H(p, q, w) =
h0 + h1 , h0 ∈ H, h1 ∈ [−1, 1] is bounded by some constant (uniformly in w and ∆H), take Cq
equal to this constant. This choice of Cq gives the following property that will be used later:
if we start in D+ with q ∈ [−π, π], and the value of H changes by at most 1 (while still being
in D+ ), the value of q stays in (−Cq , Cq ).

35
Fixed points of the unperturbed system correspond to extrema of V , with maxima of
V corresponding to saddles and minima of V corresponding to centers. For each saddle
CV,i = (0, qC,i ) of the unperturbed system let us fix a small neighborhood Ui in the space with
coordinates p, q, w (these neighborhoods will respect 2π-periodicity of the unperturbed system,
i.e. neighborhoods of saddles that differ by 2πk will be shifts of each other; this means that
we only need to construct such neighborhoods for saddles with qC,i ∈ [0, 2π]) in the following
p,q
way. First, let us fix preliminary neighborhoods Ui,0 = Ui,0 × {w : kwk < cZ } such that there
is csep > 0 such that the values of H0 inside different Ui,0 are separated by at least csep . Now
for fixed i let us define Ui . Let qi = q − qC,i , we can write V = V (qC,i ) − ai qi2 + O(qi3 ). We will
now use Lemma 10.2 together with the notation defined there, we add tilde to expressions from
this lemma to distinguish them. Let us apply this lemma to q̃ = qi , p̃ = p, w̃ = w, ã = ai and
Ũ = Ui,0 , it gives us ε̃1 such that the lemma can be applied if in Ui,0 we have ∆H < ε̃1 .
g
C2
2
Now take Ui ⊂ Ui,0 such that inside Ui we have V − V (qC,i ) + ai qi C 2 < 0.5ε̃1 and assume

ε1 < 0.5ε̃1 (note that this restriction on ε1 depends only on V , this will also hold for further
restrictions on ε1 ). Denote by Ci (w) the saddle of the intermediate system near CV,i . If
needed, let us futher increase ε1 so that we have Ci ∈ 0.5Ui for any ∆H that satisfies (10.4),
here 0.5Ui is the image of Ui under w-dependent homothety with center Ci (w) and ratio 0.5.
Take
g = ∆H + V − V (qC,i ) + ai qi2
∆H
and ũ = v, we have H0 + ∆H = p2 /2 − ai qi2 + ∆H g + const in Ui . As ∆H g satisfies (10.6) in
Ui (with ε̃1 instead of ε1 ), we can apply Lemma 10.2 to describe the movement inside Ui for
small enough ε2 . The words ”small enough” here give another upper bound on ε1 .
By the choice of Ui the values of H0 in different sets Ui are separated by at least csep . For
small enough ε1 the values of H in different sets Ui are separated by at least csep /2. Take
small enough hb ∈ (0, csep /8) such that for any i, ∆H, w the set |H − H(Ci (w), w)| ≤ 2hb
intersects ∂Ui by four disjoints sets near intersections of separatrices of Ci with ∂Ui .
Let us suppose we are given some initial data. Denote

h(p, q, w) = H(p, q, w) − H(Ci (w), w).

We will assume h0 < hb , the case h0 ≥ hb (meaning the initial condition is far from separatrices
of the intermediate system) is easier and we omit it. We consider the case h0 > 0, the proof
is similar when h0 < 0. Let Z be the stripe given by

Z = (p, q, w, τ ) ∈ D : h ∈ [0.5h0 , 2hb ]

(cf. Figure 2). For small enough hb we have q ∈ (−Cq , Cq ) in Z. Also, as 2hb < csep /4, the

Figure 2: The zones Zi

zone Z does not intersect Uj for j 6= i.


We can split Z into a union of three zones Z1− , Z1+ , Z3 far from Ci and two zones Z2− , Z2+
inside Ui . Solutions of the intermediate system visit these zones in the following order: Z1− ,
Z2− , Z3 , Z2+ , Z1+ . Thus, for small ε2 solutions of the perturbed system starting in each zone

36
can only leave this zone into the next zone or through the boundary of Z, but not into the
previous zone.
Take one of the zones Z1− , Z3 , Z1+ . It is easy to show that solutions of the perturbed
system starting at any point inside this zone exit it after time O(1) passes. As h0 = O(ε2 ) and
w0 = O(ε2 ), both h and w change by at most O(ε2 ) while passing this zone.
Now take one of the zones Z2− , Z2+ inside Ui . When selecting Ui above, we have checked
that Lemma 10.2 can be applied to describe the solutions of the perturbed system inside Ui .
By this lemma any orbit starting in our zone leaves this zone after time O(|ln h0 | + 1) passes
and h changes by O(ε2 ). From the estimate on the time spent in this zone we conclude that
w changes by at most O(|ln h0 | + 1)ε2 .
Thus the total time spent in Z and changes in h and w before leaving Z are bounded by
O(| ln h0 | + 1), O(ε2 ) and O(|ln h0 | + 1)ε2 , respectively. Take τ1 to be the moment when the
solution leaves Z. Taking C much greater than the (uniform) constants in these O-estimates,
we obtain the estimates for τ1 − τ0 and the change in h and w from the statement of the
lemma. For large enough C we also have h(τ ) ∈ [h0 − Cε2 , h0 + Cε2 ] ⊂ (0.5h0 , 2hb ) and
kw(τ ) − w0 k ≤ C(| ln ε2 | + 1)ε2 < 0.5cZ (due to the inequalities on ε2 from the statement of
the lemma) for large enough C. Recall that we have q(τ ) ∈ (−Cq , Cq ) in Z. This implies
that our solution can only leave Z by crossing one of the lines p = Cp , p = −Cp . However,
the choice of Cp above prohibits crossing p = Cp , thus the solution crosses p = −Cp . This
completes the proof.

10.4 Applying the lemma on model system


In this section we apply Lemma 10.1 to system (9.4) (with P, Q, Z in (9.4) corresponding
to p, q, w in (10.3) in the same order) in such a way that the estimates of this lemma will be
uniform for all resonances close enough to separatrices. We also take care of the fact that the
main part of the Hamiltonian in (9.4) depends on Z by taking its value at Z = 0 as the main
part and considering the difference as part of the Hamiltonian perturbation.
We will use the notation VF introduced in Subsection 2.3. Without the u terms the
system (9.4) is Hamiltonian with the Hamiltonian

Hr (P, Q, Z) = P 2 /2 + V (Q, z0 + ω̂Z) + ω̂ −2 s1 βH7 , (10.9)

where V = VFs .
Lemma 10.3. Given z∗ ∈ ZB , for any Cs1 for any large enough Dp,2 > Dp,1 > 0 there exist

Cp > Dp,2 , Cρ , C, Cq > 1, cz , c Z , ω 0 , ε 0 > 0

such that for any z0 with kz0 − z∗ k < cz for any ω̂ = s2 /s1 ∈ (0, ω0 ), and ε < ε0 with

|s1 | < Cs1 ln2 ε, ĥ(z0 ) > Cρ ε| ln5 ε|

we have the following.


1. We can apply Lemma 9.2 and the system (9.4) given by this lemma is defined in the
domain
D = (Z, P, Q, τ ) ∈ Rn+3 : kZk < cZ , |P | < Cp , |Q| < π ω̂ −1 + Cq },


where Z = ω̂ −1 (z − z0 ).
2. Set ε2 = Cα(z0 )ω̂ −1 . Let
X0 = (P0 , Q0 , Z0 , τ0 )
denote some initial data with

P0 ∈ [Dp,1 , Dp,2 ], Q0 ∈ [−ω̂ −1 π − 1, ω̂ −1 π + 1], kZ0 k < cZ /2.

If VFs (Q, z∗ ) (here Fs is from (9.4)) has no local maxima, set ∆hr,0 = 1. Otherwise, let
Ci (Z) be a saddle of the intermediate system such that H(Ci (Z0 ), Z0 ) is as close as possible
to H(p0 , q0 , Z0 ) and set

∆hr,0 = |Hr (p0 , q0 , Z0 ) − Hr (Ci (Z0 ), Z0 )|.

We assume
∆hr,0 > 2Cε2 .
Let X(τ ) = (P (τ ), Q(τ ), Z(τ ), τ ) denote the solution of (9.4) with initial data X0 . Denote
hr (τ ) = Hr (P (τ ), Q(τ ), Z(τ )). Then there exists τ1 > τ0 such

P (τ1 ) = −Cp , τ1 − τ0 < C(1 + | ln ∆hr,0 |)

and for τ ∈ [τ0 , τ1 ] we have

X(τ ) ∈ D, |Hr (X(τ )) − Hr (X(τ0 ))| < Cε2 , kZ(τ ) − Z0 k < Cε2 (1 + | ln ∆hr,0 |).

37
Proof. We apply Lemma 9.2 with Cγ ≥ 8π. This lemma gives us the values of the constants
(we add ˜· to the names of this constants)
c̃Z , C̃, C̃ρ .
Fix cZ = c̃Z and assume Cρ ≥ C̃ρ . We apply Lemma 9.2 with z0 as in the current lemma.
For small enough ω0 the conditions on s1 and z0 in Lemma 9.2 are satisfied and the domain
D lies in the domain provided by Lemma 9.2.
Let us now build a finite set F such that elements of this set approximate the functions
Fs for any s as in the statement of lemma. Denote VΘ = VF , where F = Θ3 (z). Let us apply
Lemma 10.1 to VΘ (Q, z∗ ), denote by ε1,0 the largest value of ε1 allowed by this lemma. Apply
Lemma 9.4 with z0 in that lemma equal to z∗ and with δ1 = 0.5 min(ε1,0 , minZB Θ3 (z)). This
lemma gives S, c and a set Fs2 for each s2 ≤ S, take
F = {Θ3 } ∪ (∪s2 ≤S Fs2 ), F∗ = {F (Q, z∗ ) : F ∈ F}.
Let us now determine the constants used to define the domain D. We will assume cz < 0.5c
and ω0 cZ < 0.5c, then estimates of Lemma 9.4 are valid for z = z0 + ω̂Z with kZk < cZ . Let
ε1,1 be so small that Lemma 10.1 can be applied to all functions VFi , Fi ∈ F∗ with ε1 = ε1,1 .
Clearly, ε1,1 ≤ ε1,0 . Let us apply Lemma 10.1 to the functions VFi , Fi ∈ F∗ with ε1 = ε1,0
for VΘ and ε1 = ε1,1 for other VFi . We can take the same values of Cp , Cq for all V , fix these
constants.
Let us now prove that for small enough cz and ω0 for any z0 and s we have
kVFs (Q, z0 + ω̂Z) − VFs (Q, z∗ )kC 2 (Q,Z),Q∈[−Cq ,Cq ] < 0.25ε1,1 , (10.10)
(here and thereafter
we consider all Z with kZk < cZ when taking the norm). We use that
kFs kC 1 (Q,z) and ∂F s
∂z C 1 (Q,z)
are bounded by Lemma 9.4. Denote

z = z0 + ω̂Z, U (Q, Z) = VFs (Q, z) − VFs (Q, z∗ ).


We have (in the formula below zint ∈ [z, z∗ ] denotes some intermediate z)
Z Q Z Q
∂Fs
U= Fs (Q̃, z) − Fs (Q̃, z∗ )dQ̃ = (Q̃, zint (Q̃))dQ̃ = O(z − z∗ ) = O(cz + ω̂cZ ).
0 0 ∂z
∂U ∂ 2 U
We check in the same way that , = O(cz + ω̂cZ ). We have
∂Q ∂Q2
Z Q
∂U ∂ ∂Fs
= ω̂ VFs (Q, z) = ω̂ (Q̃, z)dQ̃ = O(ω̂).
∂Z ∂z 0 ∂z
2 2
∂ U
We check in the same way that ∂Q∂Z , ∂∂ZU2 = O(ω̂), this proves (10.10).
Decompose Q0 = Q0,1 + ∆Q with ∆Q = 2πk, Q0,1 ∈ [−π, π]. To apply Lemma 10.1, we
use shifted variable Q̃ defined by Q = Q̃+∆Q. Set ṼF (Q̃) = VF (Q)|Q=Q̃ . Given Fs from (9.4),
we take ṼFi (Q̃, z∗ ) as V (Q̃) in Lemma 10.1, where Fi is the closest to Fs element of F and set
in Lemma 10.1
∆H(P, Q̃, Z) = ṼFs (Q̃, z0 + ω̂Z) − ṼFi (Q̃, z∗ ) + ω̂ −2 s1 βH7 (P, Q̃ + ∆Q, Z),
(vp , vq , vz ) = (αuP , αω̂ −0.5 uQ , αω̂ −0.5 uz )|Q=Q̃+∆Q .
We have
k∆HkC 2 (Q̃,Z),Q̃∈[−Cq ,Cq ] ≤ kVFs (Q, z) − VFs (Q, z∗ )kC 2 (Q,Z),Q∈[−Cq ,Cq ]
+ kVFi (Q, z∗ ) − VFs (Q, z∗ )kC 2 (Q),Q∈[−Cq ,Cq ]
+ ω̂ −2 s1 βH7 C 2 (P,Q,Z),Q∈[∆Q−C ,∆Q+C ] .

q q

The first term is bounded by 0.25ε1,1 due to (10.10). The second term is bounded by 0.5ε1,0
for VFi = VΘ due to item 3 of Lemma 9.4, and by 0.5ε1,1 for other VFi when ω0 is small enough
due to item 4 of the same lemma. Let us now show that for large enough Cρ the third term
is bounded by 0.25ε1,1 . Indeed, by (9.2) we have |β(z0 + ω̂Z)| ∼ β(z0 ) for complex Z with
∂β ∂ 2 β
kZk < 8cZ , by Cauchy formula this means ∂Z , ∂Z 2 = O(β(z0 )). Together with the bound on
kH7 k from Lemma 9.2 this implies
−2 q q q
ω̂ s1 βH7 2 = O(ω̂ −2 s1 β(z0 )) = O( εĥ−1 ω̂ −1 s2 ) = O( εĥ−1 Cs2 | ln5 ε|) = O( Cs2 /Cρ ).

C 1 1 1

Thus, k∆HkC 2 (Q,Z) is bounded by ε1,0 if VFi = VΘ and by ε1,1 otherwise. This means
that we can apply Lemma 10.1 with these ∆H and v. Recall that we took ε2 = Cα(z0 )ω̂ −1 .
Clearly, for large enough C we have kvkC 0 < ε2 . We have
q
ε2 = O( εĥ(z0 )| ln5 ĥ|).
It is easy to see that ε2 satisfies the conditions in Lemma 10.1 for small enough ε0 and large
enough Cρ . We can finally apply Lemma 10.1, we assume that C from the current lemma
is greater than the constant C from Lemma 10.1. Then Lemma 10.1 gives us the estimates
stated in the current lemma.

38
10.5 Low-numerator resonances: proof
In this subsection we prove Lemma 6.9, thus estimating the measure of initial conditions
captured into resonances and time spent in resonance zones for initial conditions that are not
captured. To prove this lemma we first prove auxiliary Lemma 10.4. Recall that m denotes
the Lebesgue measure on A and Br (O) denotes the ball with center O and radius r. Denote
by mz the Lebesgue measure on Rn .
Lemma 10.4. Given z∗ ∈ ZB , for any Cs1 , Dp,0 > 0 there exist

cz , ω0 , ε0 > 0, Dp > Dp,0 , C, Cρ , CB > 1

such that for any ω̂ = s2 /s1 ∈ (0, ω0 ), and ε < ε0 with |s1 | < Cs1 ln2 ε there exist a finite
collection of balls Bi with centers zi ∈ Bcz (z∗ ) and equal radii ω̂cZ such that
X
∪i Bω̂cZ /2 (zi ) ⊃ Bcz (z∗ ), mz (Bi ) ≤ CB
i

and a collection of sets


Vi ⊂ R2p,q × Bi × [0, 2π]λ ,
such that for any i we have
m(Vi ) ≤ Cαi2 s1 ω̂ −0.5 mz (Bi )
(we denote αi = α(zi )).
These collections satisfy the following: for any i for any initial data p0 , q0 , z0 , λ0 with
ˆ 0 ) + Dp α(z0 )ω̂ 0.5 ,
I(p0 , q0 , z0 ) = I(z z0 ∈ Bω̂cZ /2 (zi ), h(p0 , q0 , z0 ) > Cρ ε| ln ε5 |
(10.11)
at least one of the following holds.
• The solution of the perturbed system (2.2) with this initial data crosses the hypersurface
ˆ − Dp α(z)ω̂ 0.5 at some time λ1 > λ0 with
I = I(z)

ε(λ1 − λ0 ) ≤ C| ln ε|ω̂ 0.5 αi .

• There exists λ1 > λ0 with


ε(λ1 − λ0 ) ≥ C −1 ω̂ 0.5 αi
such that this solution remains in Vi for λ ∈ [λ0 , λ1 ].

Proof. Recall that we define D(I, z) by the equality


ˆ + Dα(z)ω̂ 0.5 .
I = I(z)
0
The values of Dp,1 in Lemma 10.3 should be large enough, we denote by Dp,0 a constant such
0
that this lemma can be applied for Dp,1 > Dp,0 . The coordinate change of Lemma 9.2 (we do
not specify the details yet, this will be done later) gives the variables P, Q. By (9.5) we have
D = P + O(s1 ω̂ −1.5 β). We have
p
s1 ω̂ −1.5 β . εh−1 ln4 ε . | ln ε|−0.5  1.
0
Thus we can take Dp > Dp,0 and Dp,2 , Dp,1 > Dp,0 with Dp,2 > Dp,1 + 3 such that
• P (I, z, ϕ, λ) ∈ [Dp,1 + 1, Dp,2 − 1] for any I, z, ϕ, λ with D(I, z) = Dp
• P (I, z, ϕ, λ) = −Dp,2 implies D(I, z) < −Dp .
Let us now apply Lemma 10.3 with Dp,1 and Dp,2 chosen above, it provides us with
constants (we add ˜· to constants provided by this lemma)

C̃p , C̃ρ , C̃, C̃q , c̃z , c̃Z , ω̃0 , ε̃0 .

Set cz = c̃z , cZ = c̃Z , Cρ = 2C̃ρ .


The existence of CB and {Bi } is obvious, we can place the centers zi in the nodes of a
hyper-cubic lattice (with step ∼ ω̂cZ ). Let us now fix some i. Assume that condition (10.11)
is satisfied for some initial condition, else we have nothing to prove. Let us apply Lemma 10.3
with z0 (from that lemma) equal to zi . The bound on ĥ(zi ) required by Lemma 10.3 is satisfied
by (9.2), (10.11) and our choice of Cρ .
We will use the coordinates P, Q provided by Lemma 9.2 applied as a part of statement of
Lemma 10.3. Set
ε2,i = 2C̃αi ω̂ −1 ,
it is greater that the value of ε2 in Lemma 10.3 (C̃α(z0 )ω̂ −1 ). We have ln ε2,i ∼ ln ε. Set
n o
Ṽ0,i = (Z, P, Q, τ ) ∈ Di : kZk < cZ , P ∈ [Dp,1 , Dp,2 ], Q ∈ [−π ω̂ −1 − C̃q , π ω̂ −1 + C̃q ] ,

where Di denotes the domain D from Lemma 10.3. Define Ṽi ⊂ Ṽ0,i by additional condition
that the value of the Hamiltonian Hr,i defined by (10.9) is at least 2C̃ε2,i -far from its values

39
at the saddles of the Hamiltonian system given by Hr,i . Let Vi be the preimage of Ṽi under
the coordinate change of Lemma 9.2.
Take an initial condition satisfying (10.11). Rewriting our initial condition in the new chart
gives P0 , Q0 , Z0 , τ0 (there is some freedom, as we can add 2πk to ϕ0 = ϕ(p0 , q0 ), this will be
resolved later). By the choice of Dp we have P0 ∈ [Dp,1 + 1, Dp,2 − 1]. We can also achieve
Q0 ∈ [−π ω̂ −1 − 1, π ω̂ −1 + 1]. Indeed, by (9.5) it is sufficient to have γ(ϕ0 , λ0 ) ∈ [−π, π] and
this can be accomplished by adding 2πk to ϕ0 .
We study the evolution of this initial condition using the system (9.4). As P 0 , Q0 , Z 0 = O(1)
in (9.4), solutions starting in Ṽ0,i with kZ0 k < 0.5cZ and P ∈ [Dp,1 + 1, Dp,2 − 1] spend in Ṽ0,i
time τ & 1, thus time λ & ε−1 αω̂ 0.5 . Indeed, by (9.3) we have

∆λ = ε−1 αω̂ 0.5 ∆τ.

If during the crossing of Ṽ0,i the solution is not in Ṽi (at just one time), by Lemma 10.3 this
solution reaches P = −C̃p < −Dp,2 at some point. By the choice of Dp this implies D ≤ −Dp ,
by continuity at some point before this solution crosses D = −Dp , i.e. I = I(z) ˆ − Dp α(z)ω̂ 0.5 .
The estimate on the time λ before this moment follows from the estimate τ . | ln ε2 | . | ln ε|
on the time τ before this moment provided by Lemma 10.3.
Let us consider sections Ṽi,z of Ṽi and sections Vi,z of Vi with fixed z. We will also
consider sections Ṽi,z,τ of Ṽi with fixed z and τ . In Ṽ0 we have ∂H ∂P
r
≈ P > Dp,1 and thus
∂Hr
∂P
> 0.5D p,1 . This means that the measures of intersections of Ṽi,z,τ with each segment
where only P varies are O(ε2,i )nsadd , where nsadd is the number of saddles we need to consider.
When Q is fixed, nsadd = O(1). Indeed, for fixed Q the range of values of Hr is bounded and
so, as (we use notation Vc from (10.5)) Vc > 0, the number of saddles where the value of Hr is
close to that range is also bounded. Thus, as the range of values of Q is ∼ ω̂ −1 , the measures
of Ṽi,z,τ ⊂ R2P,Q are O(ε2,i ω̂ −1 ) = O(αi ω̂ −2 ).
To estimate the measure of the set Vi,z , we follow the construction of the coordinate change
of Lemma 9.2 (see the proof of this lemma) in the reverse order. The z-dependent time change
µ → τ does not change the measures of Ṽi,z,τ (for the preimages of these sets µ is fixed
instead of τ ). As our system is 2π-periodic in µ, integrating by µ shows that measures of
the preimages of Ṽi,z in R2P,Q × [0, 2π]µ are O(αi ω̂ −2 ). Before the scaling P̃ , Q̃ → P, Q the
measures of the preimages of Ṽi,z in R2P̃ ,Q̃ × [0, 2π]µ are O(αi ω̂ −0.5 ). The coordinate change of
Corollary 9.8 is volume-preserving, so the measure of the preimages of Ṽi,z in R2J,γ × [0, 2π]µ is
again O(αi ω̂ −0.5 ). We project γ from R to [0, 2π], this does not increase the measures. Before
the change of angular variables the measures in RJ × [0, 2π]2ϕ,λ are O(αi s1 ω̂ −0.5 ). Before the
scaling of I the measures in RI × [0, 2π]2ϕ,λ are O(αi2 s1 ω̂ −0.5 ).
Finally, moving from I, ϕ-variables back to p, q (this preserves the volume), we get that the
measure of the preimages of Ṽi,z in R2p,q × [0, 2π]λ (i.e. Vi,z ) is O(αi2 s1 ω̂ −0.5 ). Taking union of
these preimages for z ∈ Bi gives the estimate on the measure of Vi . This completes the proof
of the lemma.

Proof of Lemma 6.9. Let us apply Lemma 10.4 to our z∗ , Cs1 , Dp,0 , it gives us

cz , ω0 , ε0 , Dp > Dp,0 .

Denote by Mi the set of all initial conditions Xinit such that the second alternative of
Lemma 10.4 holds with this i for some X(λ0 ) with λ0 as above, let mi be the measure of
Mi . Set Es = ∪i Mi . Then the statement of the lemma holds if Xinit 6∈ Es , it only remains to
estimate the measure of Es .
Denote by g λ the flow given by the perturbed system. Let λi ∼ ε−1 αi ω̂ 0.5 denote the
estimate from below on the time spent in Vi claimed in the second alternative. Set

K1 = dε−1 Λ/λi e ∼ αi−1 ω̂ −0.5 .

Then we have
K1
[
Mi ⊂ g −jλi (Vi ).
j=0

By Lemma 4.3 there is K2 > 1 such that for any λ ∈ [0, ε−1 Λ] the map g −λ expands the
volume m by factor at most K2 . Thus we have the estimate

mi ≤ K2 K1 m(Vi ) . αi s1 ω̂ −1 mz (Bi ).
P
As i mz (Bi ) = O(1), this gives
X
m(Es ) ≤ mi . s1 ω̂ −1 max αi . s1 ω̂ −1 α∗ .
i
i

40
11 Passing separatrices: proof
Proof of Lemma 4.6. Pick RZ ∼ | ln−1 h∗ | ∼ | ln−1 ε| such that we have RZ < dZ ω(h, z) for
all z ∈ Bcz (z∗ ) and h ∈ [h∗ , 4h∗ ], where the constant dZ is provided by Lemma 9.1. Build a
finite collection of balls Bi with centers zi ∈ Bcz (z∗ ) and equal radii RZ such that for some
CB > 0 we have X
∪i BRZ /2 (zi ) ⊃ Bcz (z∗ ), mz (Bi ) ≤ CB
i

(recall that mz denotes Lebesgue measure on Rn ). Fix some i and set

Li = X ∈ BRZ (zi ) × R2p,q × [0, 2π]λ : h(X) ∈ [−h∗ , 4h∗ ] .




Let us prove the following alternative: for any X0 with z0 ∈ BRZ /2 (zi ) and h(X0 ) = h∗ (then
X0 ∈ Li ) the corresponding solution of (2.2) either reaches h = −h∗ after time at most ∆λ or
stays in Li for time ∆λ, where √
∆λ = ε−1 ε| lnγ ε|.
To prove this alternative, we need to show that this solution cannot leave Li through
|z − zi | = RZ or through h = 4h∗ . The first part follows from ż = O(ε), thus changing z by
0.5RZ requires time at least ∼ ε−1 /RZ  ∆λ.
To prove the second part, we find a non-resonant zone Zr,r+1 such that we have (h, zi ) ∈
Zr,r+1 for some h ∈ [1.4h∗ , 1.6h∗ ]. We can find such √non-resonant zone, as different resonant
zones with h ∼ h∗ do not intersect and have width ∼ εh∗  h∗ in h. By Lemma 9.1 (applied
to sr and sr+1 ) for all z ∈ BRZ (zi ) there exists some h ∈ [1.2h∗ , 1.8h∗ ] with (h, z) ∈ Zr,r+1 .
By Lemma 6.6 if at some time we have X(λ) ∈ Zr,r+1 (this must happen before reaching
h = 4h∗ ), we then reach either ∂Π (then h  h∗ ) or the border of Zr,r+1 with Zr+1 (then h is
again less then in Zr,r+1 and the solution must again enter Zr,r+1 before reaching h = 4h∗ ).
The third part of Lemma 6.6 gives the estimate h ≤ 35 × 1.8h∗ = 3h∗ , so it is impossible to
have h > 3h∗ while z stays in Bcz (z∗ ). The alternative is proved.
Denote by g λ the flow given by the perturbed system. Set

K1 = dε−1 Λ/∆λe ∼ ε−0.5 ln−γ ε

and
K1
[
Ci = g −j∆λ (Li ).
j=0

This set covers all Xinit such that second part of the alternative can be satisfied for some λ0
with z(X(λ0 )) ∈ BRZ /2 (zi ). Indeed, if the second part of the alternative is satisfied, we have
X(λ) ∈ Li for all λ ∈ [λ0 , λ0 + ∆λ], thus for some natural k ≥ 0 we have X(λinit + k∆λ) ∈ Li
and Xinit ∈ g −k∆λ (Li ).
By Lemma 4.3 there is K2 > 1 such that for any λ ∈ [0, ε−1 Λ] the map g −λ expands
the volume m by factor at most K2 . As ∆h = h∗ corresponds to ∆I ∼ h∗ ln ε, we have
m(Li ) ∼ h∗ ln εmz (Bi ). Thus we have the estimate

m(Ci ) ≤ K2 K1 m(Li ) . mz (Bi )h∗ ln ε ε−0.5 ln−γ ε . ε lnρ−γ+1 ε mz (Bi ).
P
Set E = ∪Bi . As i mz (Bi ) = O(1), this gives
X √
m(E) ≤ m(Bi ) . ε lnρ−γ+1 ε.
i

Clearly, if X0 6∈ Es , we have z0 ∈ BRZ /2 (zi ), X0 6∈ Ei for some i. But then the corresponding
solution cannot stay in Li for time ∆λ, thus it crosses h = −h∗ .

41
12 Probabilities
Denote Z 2π I
1
F(h, z, i) = div f dtdλ.
2πT 0
The inner integral above is taken along the closed trajectory of the unperturbed system given
by h = h, z = z that lies inside Bi , this trajectory is parametrized by the time t.
Lemma 12.1. In the assumptions of Theorem 3.1, there exists a set E1 with

m(E1 ) = O( ε| ln5 ε|)

such that if the initial condition is not in E1 , for λ ∈ [λ0 , λ0 + ε−1 Λ] we have
Z λ Z λ

ε div f (X(λ))dλ = ε F(X(λ))dλ + O( ε| ln ε|).
λ0 λ0

Proof. Set z̃ = (z, zρ ), fz̃ = (fz , div f ). Consider extended perturbed system, where we replace
z and fz by z̃ and fz̃ . Then along solutions of extended perturbed system we have żρ = ε div f
and along solutions of extended averaged system we have ż ρ = F. Note that the right-hand
side of the extended systems (perturbed and averaged) does not depend on zρ , the evolution
of all variables except zρ is given by the ititial systems. Apply Theorem 3.1 to the extended
system and denote by E1 the excluded set for extended system. Solutions of perturbed and

averaged system are O( ε| ln ε|)-close to each other. Taking zρ -coordinates yields
Z λ Z λ
√ √
ε div f (X(λ))dλ = zρ (λ)−zρ (λ0 ) = z ρ (λ)−z ρ (λ0 )+O( ε| ln ε|) = ε F(X(λ))dλ+O( ε| ln ε|).
λ0 λ0

Given a vector v0 = (v01 , . . . , v0s ), denote by Cδ (v0 ) the cube with side 2δ and center v0 .
Set
W δ (I0 , z0 ) = Cδ (I0 , z0 ) × [0, 2π]2ϕ,λ ⊂ A3 .
Let Wiδ ⊂ W δ ∩ E be the subset of initial conditions such that the solution is captured in
Ai , i = 1, 2.
Lemma 12.2.

Z
Θi (z∗ )
m(Wiδ ) = dp0 dq0 dz0 + O( ε| ln5 ε|), i = 1, 2.
Wδ Θ3 (z∗ )

Here z∗ (p0 , q0 , z0 ) denotes the value of z when the solution of the averaged system with initial
data (h(p0 , q0 , z0 ), z0 ) crosses the separatrices.

Proof. This lemma is similar to [7, Proposition 2.4] and can be proved in the same way.
In [7] the proof of Proposition 2.4 is based on Lemma 5.1 and Lemma 5.2. Lemma 5.1 in [7]
is an analogue of Lemma 12.1 above, and we replace the use of Lemma 5.1 in [7] by our
Lemma 12.1. Lemma 5.2 in [7] is a statement about the averaged system (in one-frequency
case). One can obtain averaged system in two-frequency case as follows: first, average the
perturbation over the time λ and obtain a system with one frequency (corresponding to the
angle ϕ) and then take average over ϕ. Thus, averaged system in two-frequency case is the
same as averaged system in one-frequency system obtained after averaging over λ, and the
statement of Lemma 5.2 holds for our case. The rest of the proof of Proposition 2.4 can be
straightforwardly applied to our case, we just need to change the error terms to adjust for the
difference of precision of averaged method and the measure of exceptional set in one-frequency
and two-frequency cases.

Recall the notation


U δ (I0 , z0 , ϕ0 , λ0 ) = Cδ (I0 , z0 , ϕ0 , λ0 ).
Lemma 12.3. We have the following for i = 1, 2. Suppose that for (I0 , z0 ) in some open set
U we have
m(Wiδ )
lim lim = ψ(I0 , z0 ), (12.1)
δ→0 ε→0 m(W δ )
where the function ψ is continuous in U and the limit is uniform for all (I0 , z0 ) ∈ U. Then
for any (I0 , z0 ) ∈ U and any ϕ0 , λ0 ∈ [0, 2π] we have

m(Uiδ )
lim lim = ψ(I0 , z0 ). (12.2)
δ→0 ε→0 m(U δ )

42
Proof. Assume w.l.o.g. that i = 1. Suppose we are given some tolerance κ and we want to
show that for each small enough δ for small enough ε we have |m(U1δ )/m(U δ ) − ψ| ≤ κ for
fixed I0 , z0 and all ϕ0 , λ0 . The value of δ should be so small that in the set U0 = Cδ (I0 , z0 )
the values of ψ differ from ψ(I0 , z0 ) by at most κ/9.
Denote φ = (ϕ, λ). Let mφ , mω and mI,z denote the Lebesgue measure on the space
where the corresponding variable(s) is defined: R2 , R and Rn+1 , respectively. Set Ω0 =
{ω(I, z), (I, z) ∈ U0 }. Denote by Ω(ν, δφ , N ) ⊂ Ω0 the set of all ω such that the flow
t
Rω (ϕ, λ) = (ϕ+ωt, λ+t) on [0, 2π 2 ] satisfies the following: for any φ1 , φ2 there exists t ∈ [0, N ]
such that  
t
mφ Cδφ (φ1 )4Rω (Cδφ (φ2 )) ≤ νδφ2 . (12.3)
We have the following
• Ω(ν, δφ , N ) is compact.
• limk→∞ mω (Ω(ν, δφ , k)) = mω (Ω0 ) for each ν > 0.
The first property is straightforward (we can write Ω(κ, δφ , N ) as intersection of sets such that
the property above is satisfied for each pair (φ1 , φ2 ), each such set is compact as we can pick
convergent subsequence of the values of t). To prove the second property, note that if ω is
t
irrational, we have ω ∈ Ω(ν, δφ , k) for large enough k, as orbits of the flow Rω are dense. This
gives mω (∪k Ω(ν, δφ , k)) = mω (Ω0 ), our statement follows from the continuity of Lebesgue
measure.
Now set δφ = δ, ν = κ/100 and Ωk = Ω(ν, δ, k). Take large enough k so that
m({X : ω(X) ∈ Ω0 \ Ωk }) ≤ κ/9 m(W δ ) (12.4)
∂ω
(this can be done, as ∂X
is non-degenerate). Each point of Ωk has a neighborhood (a segment
with center at this point) such that (12.3) holds for all ω in this segment with the same t but
2ν instead of ν:  
t
mφ Cδ (φ1 )4Rω (Cδ (φ2 )) ≤ 2νδφ2 . (12.5)
Pick a finite cover α of Ωk by such segments. Let {Ij } denote all segments between endpoints
of the segments from α that intersect Ωk , the segments Ij are disjoint from each other and
their union covers Ωk . Denote Kj = {(I, z) ∈ U0 : ω(I, z) ∈ Ij }. We can decompose this
set (except a subset of small measure) as disjoint union of cubes Kj,l with small side δw . We
pick δw so small that most of volume of Kj is covered (except the proportion at most κ/9)
and |m(W1δ )/m(W δ ) − ψ| ≤ κ/9 when δ ≤ δw for small enough ε. Let us now apply (12.1)
to the set W δ = W j,l = Kj,l × [0, 2π]2φ . We obtain for small enough ε (in the formula below
and thereafter the lower index 1 denotes that we consider initial data from some set that is
captured in A1 )
m(W1j,l )
∈ [ψ(I, z) − κ/9, ψ(I, z) + κ/9] ⊂ [ψ(I0 , z0 ) − 2κ/9, ψ(I0 , z0 ) + 2κ/9].
m(W j,l )
Denote by W j,l,φ0 the subset of W j,l defined by the condition |ϕ − ϕ0 |, |λ − λ0 | < δφ , where
(ϕ0 , λ0 ) = φ0 . We have (12.5). Thus for small enough ε we have
 
m W j,l,φ1 4gεt (W j,l,φ2 ) ≤ 3νδ 2 mI,z (W j,l )

for any φ1 , φ2 . Here gεt denotes the flow of the perturbed system. The set of points captured
in A1 is invariant under gεt , so the estimate above means that m(W1j,l,φ ) is almost the same
for all φ, with difference ≤ 3νδφ2 mI,z (W j,l ). As the average of m(W1j,l,φ )/m(W j,l,φ ) over φ is
m(W1j,l )/m(W j,l ), this means (given ν = κ/100) for all φ
m(W1j,l,φ )
∈ [ψ(I0 , z0 ) − 3κ/9, ψ(I0 , z0 ) + 3κ/9].
m(W j,l,φ )
Denote K j,φ = Kj × Cδ (φ). Taking sum over l gives
m(K1j,φ )
∈ [ψ(I0 , z0 ) − 4κ/9, ψ(I0 , z0 ) + 4κ/9].
m(K j,φ )
Finally, taking union over j and using (12.4), we get
m(U1δ )
∈ [ψ(I0 , z0 ) − 5κ/9, ψ(I0 , z0 ) + 5κ/9].
m(U δ )
This estimates holds for any κ when δ and ε are small enough (and ε is small compared with
δ). This completes the proof.

Proof of Proposition 3.2. Fix any open set U ⊂ A3 that is separated from the separatrices.
By Lemma 12.2 we have
m(Wiδ )
lim lim = Θi (z∗ )/Θ3 (z∗ ),
δ→0 ε→0 m(W δ )

where z∗ is taken at (I0 , z0 ). This holds uniformly in U. By Lemma 12.3 for any (I0 , z0 ) ∈ U
and any ϕ0 , λ0 ∈ [0, 2π] we have (12.2). Proposition 3.2 follows from this statement.

43
13 Appendix
13.1 Analytic continuation: proofs

Figure 3: The transversals

Proof of Lemma 5.2. We will use Moser’s normal form [13] in a neighborhood of C. There are
coordinates x and y such that our system can be written as

ẋ = a(h, z)x,
(13.1)
ẏ = −a(h, z)y,

where a > 0 for real h, z. Rescaling p, q, x, y if necessary, we may assume that new coordinates
are defined for all (x, y) ∈ C2 with |x|, |y| ≤ 1. Let us consider four transversals Γ1 , . . . , Γ4
given by x = ±1 and y = ±1 as shown in Figure 3. We can split T = T12 + T23 + T34 + T41 ,
where Tij is the time from Γi to Γj . Note that as all paths connecting (h, z) and (h0 , z0 )
are homotopic to each other (we assume c2 < 0.5, then h > 0), the functions T and Tij are
continued as single-valued functions.
The functions T23 and T41 are holomorphic even for h = 0. Let us denote h̃ = xy. As it
is a first integral of (13.1), we have h̃ = h̃(h, z). Using (13.1), we can compute T12 = T34 =
−a−1 ln h̃; the branch of the logarithm here is obtained by analytic continuation of the real
logarithm. As h̃ = 0 for h = 0, we can write h̃(h, z) = hh̃0 (h, z) with analytic h0 and thus
T12 = T34 = −a−1 ln h − a−1 ln h̃0 . Thus for z close to z0 we have T = A(h, z) ln h + B(h, z),
where A = −2a−1 and B are holomorphic, A 6= 0.
Let us now prove that ĥ(z) is uniquely defined for any z close to z0 . Fix such z and
denote F (h) = T (h) − T0 and F1 (h) = A(h0 , z) ln h + B(h0 , z) − T0 . Let ĥ1 be determined
by F1 (ĥ1 ) = 0. Consider the curve hβ = ĥ1 (1 + αeiβ ), where β ∈ [0, 2π] is a parameter
and α > 0 is a small fixed number. We have |F (hβ ) − F1 (hβ )| = O(α|ĥ1 |)(| ln ĥ1 | + 1) and
−1
|F1 (hβ )| = |A(h0 , z) ln 1 + αeiβ | & α. As |ĥ1 | is small (we have ĥ1 = eA (T0 −B) , here T0


is large positive number for small h0 , B is bounded, and A is close to A(z0 ) which is a real
negative number), we have |F1 (hβ )| > 2|F (hβ ) − F1 (hβ )|. Thus, by Rouche’s theorem the
equation F (h) = 0 has a unique solution ĥ(z) in the region bounded by hβ .
From |h − ĥ| < c2 |ĥ| we have
 
ln h − ln ĥ = ln 1 + (h − ĥ)/ĥ = O(c2 )

and

T (h, z) = O(c2 )+A(h, z) ln ĥ+B(h, z) = A(ĥ, z) ln ĥ+B(ĥ, z)+O(c2 )(1+| ln ĥ|) = T0 (1+O(c2 )),

as claimed above. R
Finally, the action I(h, z) is defined by I = pdq.

Proof of Lemma 5.3. Let us first state some helper statements. The local uniqueness and
existence theorem (see e.g. [14, Theorem 1.1]) claims that in a small neighborhood of any initial
condition y0 , t0 the solution of a complex ODE ẏ = g(y, t) exists and depends holomorphically
on the initial condition. From this we have the following corollary.
Corollary 13.1. For any complex ODE ẏ = g(y, t) for any point y0 there are ∆t > 0, ∆y > 0
such that for any y1 with |y1 − y0 | < ∆y the solution y(t) with y(0) = y1 exists for all complex
t with |t| < ∆t.
By a compactness argument we get

44
Corollary 13.2. Consider a complex ODE ẏ = g(y, t) in some open domain U . Consider a
compact set V ⊂ U . Then there is ∆t > 0 such that for any y0 ∈ V the solution of our ODE
with the initial condition y(0) = y0 exists and stays in U for all t with |t| < ∆t.
We are now ready to prove Lemma 5.3. We will use the notation of the proof of Lemma 5.2
such as Γi , a and Tij . Given (h, z) ∈ πD (here π is the projection along the ϕ axis), let us
consider a continuous piecewise-linear path ts : [0, 1] 7→ C with
t0 = 0, t1/4 = T12 (h, z), t1/2 = T12 + T23 , t3/4 = T12 + T23 + T34 , t1 = T
that is linearly interpolated between the points 0, 1/4, 1/2, 3/4, 1. Then if we consider the
cycle r(ts ) lying in the complex solution of the unperturbed system with given h, z starting
with r(t0 ) ∈ Γ1 , the path r(ts ) then crosses Γ2 , Γ3 , Γ4 for s = 1/4, 1/2, 3/4 and returns to the
same point on Γ1 for s = 1.
Let us check that the segment [0, T ] lies in c2 -neighborhood of the path ts with the constant
c2 > 0 that can be made as small as needed by reducing c. Pick α ∈ C, |α| = 1 such that
αT12 = αT34 ∈ R. As T23 and T41 are analytic functions of h, z, we have Im αTij < O(c). This
means that Im αT = O(c) and Im αts = O(c) for all s, so [0, T ] lies in O(c)-neighborhood of
the path ts .
Let us choose a neighborhood V (in the space with coordinates p, q, z) of the union of the
real separatrices of the system for z = z0 such that r(ts ) ∈ V for all considered h, z. We take
any V that contains the set given by x, y ∈ [−1, 1] for all z with |z − z0 | < c2 . This implies
that r(ts ) stays in V for s ∈ [0, 1/4] and s ∈ [1/2, 3/4]. For other values of s the point r(ts )
also stays O(c)-close to the union of the real separatrices of the system for z = z0 , so for small
enough c it lies in V .
Now we apply Corollary 13.2 (with y = (p, q, z), ż = 0) to the closure of V , it gives us
some ∆t. Denote by t = ϕT 2π
the time for the unperturbed system. Let us continue r(h, z, ϕ(t))
as a function of t. Clearly, it is defined for t ∈ {ts }. By Corollary 13.2 it can be continued
to ∆t-neighborhood of {ts }. We have proved above that for small enough c we have that
∆t/2-neighborhood of {ts } covers [0, T ]. This means that ∆t-neighborhood of {ts } covers
∆t/2-neighborhood of [0, T ], so r can be continued to the latter neighborhood. Returning to
ϕ, we get that r can be continued to π∆t/|T |-neighborhood of [0, 2π]. As from Lemma 5.2 we
have |T | ∼ T (h0 , z0 ), this estimate proves the lemma.

Proof of Lemma 5.4. We can rewrite


Z ϕ0 Z t0
ω −1 ψ(h, z, ϕ, λ)dϕ = ψ(h, z, ωt, λ)dt,
ϕ=0 t=0

where t0 = ωϕ0 . Let us again consider the contour {ts } introduced in the proof of Lemma 5.3.
As−1
shown in this lemma, δ-neighborhood of {ts } covers [0, T ] for some δ = O(1). As we have
ω Im(ϕ0 ) = O(1), we can connect the point t0 with the point ω −1 Re ϕ0 ∈ [0, T ] and then

with some point ts0 on {ts }R by a path of length O(1) on the complex plane of the values of t.
t
As ψ is bounded, we have ts0 ψdt = O(1). Thus, it is enough to show that
0
Z ts0
ψdt = O(1)
t=0

along the contour {ts , 0 ≤ s ≤ s0 }. This contour is split into up to four parts by the transversals
Γi (cf. Figure 3). For the parts when the solution is far from the saddle, the integral along
these parts is O(1). Let us prove that the integral along the parts near the saddle is also O(1).
For defineteness, let us consider the part between Γ1 and Γ2 (or some fragment of this part if
it is included in our contour only partially). During this part the solution is near the saddle,
and we can use the x, y chart (13.1). As ψ(C) = 0 (C corresponds to x = y = 0), we can write
ψ(x, y, z, λ) = xψx + yψy , where ψx , ψy are analytic. As ẋ = ax, we can write (we omit the
limits of integration, as they are not important for the O(1) estimate)
Z Z
xψx dt = a−1 ψx dx = O(1).
R R ts
Similarly, we have yψy dt = O(1). Thus, t=0
0 ψdt = O(1), as required.

13.2 Estimates on Fourier coefficients


Lemma 5.3 can be used to estimate Fourier coefficients near the separatrices. Let ψ(p, q, z, λ)
and ψ0 (p, q, z, λ) be analytic functions in B̃ with ψ0 = 0 at the saddle C(z) for all λ, z.
Corollary 13.3. There is K > 0 such that for any real (h, z) ∈ B, λ ∈ [0, 2π], k ∈ Z we have
Z 2π
ikϕ

ψ(h, ϕ, z, λ)e dϕ ≤ K exp(−c|k|/T (h, z)),

0
Z 2π
ψ0 (h, ϕ, z, λ)eikϕ dϕ ≤ KT −1 (h, z) exp(−c|k|/T (h, z)).



0

45
Proof. The first inequality is proved exactly like the exponential decay of Fourier coefficients
of an analytic function. We can move the contour of integration up (assuming k > 0, otherwise
down) by adding ic/T (h, z) to ϕ. By the periodicity it will not change the integral. But for
the new integral |eikϕ | = exp(−ck/T (h, z)) while |ψ(h, ϕ, z, λ)| is bounded, as ψ is bounded
in the compact set B̃.
The second inequality is proved by the same shift of the contour of integration, but we
ic
now take into account that ψ0 is small near C. Denote ϕ̃(t) = 2π(t + 2π )/T, ϕ(t) = 2πt/T .
Rewriting our integral as integral dt, we get
Z T
ω e2πikt/T e−kc/T ψ0 (h, ϕ̃, z, λ)dt. (13.2)
0

In a neighborhood S of C given by |x|, |y| < 1 we can use (13.1). Solving this system, we get
x(t) = ea(t−t0 ) x(t0 ), y(t) = e−a(t−t0 ) y(t0 ).
Here a > 0, as h and z are real. Taking t − t0 = ic/(2π), we get |x(ϕ)| = |x(ϕ̃)| and
|y(ϕ)| = |y(ϕ̃)|. Let us now show that the integral of |ψ0 | inside S is O(1). Indeed, as
ψ0 (C) = 0 (C corresponds to x = y = 0), we can write ψ0 (x, y, z, λ) = xψx + yψy , where
ψx , ψy are analytic. Let [t1 , t2 ] ⊂ [0, T ] be a segment such that the solution (x(ϕ(t)), y(ϕ(t)))
is in S for all t ∈ [t1 , t2 ]. Then (x(ϕ̃(t)), y(ϕ̃(t))) is also in S. We may use x = x(ϕ(t)) as an
independent variable, from ẋ = ax we have dt = dx/(ax). Thus we have
Z t2 Z t2 Z x(t2 )
|x(ϕ̃)ψx (ϕ̃)|dt = |ψx (ϕ̃)||x(ϕ)|dt = a−1 |ψx (ϕ̃)|dx = O(1).
t1 t1 x(t1 )
R t2 R t2
Similarly, we have t1 |yψy |dt = O(1). Thus, t1 |ψ0 |dt = O(1), as required.
R integral of |ψ0 | outside
It is clear that the R S is also O(1), as the solution only spends time
O(1) there. Thus, |ψ0 (ϕ̃)|dt = O(1) and |ψ0 (ϕ̃)|dϕ = O(T −1 ). Together with (13.2) this
yields the second inequality.

Corollary 13.4. There are C1 , C2 > 0 such that for any (h, z) ∈ B, k, l ∈ Z we have
Z 2π Z 2π  
ikϕ ilλ ≤ C1 exp − C2 |l| − C2 |k| ,

ψ(h, ϕ, z, λ)e e dϕdλ

0 0
T
Z 2π Z 2π   (13.3)
ikϕ ilλ
−1 |k|
ψ0 (h, ϕ, z, λ)e e dϕdλ ≤ C1 T exp − C2 |l| − C2
.

0 0 T
Proof. Let us prove the second part, the first one can be obtained similarly. Denote by L the
left hand side of the second line of (13.3). It is enough to obtain two separate estimates

 |k|
L ≤ C1 T −1 exp(−2 C2 |l|), L ≤ C1 T −1 exp − 2 C2 .
T
Shifting the contour of integration in λ by ci (as in the usual argument for the exponential decay
of Fourier coefficients of an analytic function that we have already used in Corollary 13.3), we
get Z Z 2π 2π
L . exp(−cl) |ψ0 (h, ϕ, z, λ + ci)|dϕdλ.
0 0
R 2π
Arguing as in the proof of Corollary 13.3, we can show that for all λ we have 0 |ψ0 (h, ϕ, z, λ+
−1 −1
ci)|dϕ = O(T ). This means L . T exp(−cl) and thus the first required estimate holds.
Multiplying the estimate from Lemma 13.3 by eilλ and integrating by λ, we obtain the
second required estimate.

Proof of Lemma 6.1. As ∂h ∂I


= ω −1 , the Fourier coefficients of fI can be expressed via the
Fourier coefficients of fh : fI,m = ω −1 fh,m . Thus the first part follows from Corollary 13.4.
∂fh,m
Let us prove the second part. Let us first estimate ∂h . We have
Z 2π Z 2π Z 2π Z 2π
∂fh,m ∂fh −i(m1 ϕ+m2 λ) ∂fh −im2 λ
∼ e dλdϕ . e dλ dϕ

∂h ∂h ∂h
0 0 0 0

By (5.8) we have ∂h∂p ∂q


, ∂h = O∗ (h−1 ln−1 h). Note that these expressions do not depend on
R 2π
λ and 0 O∗ (h ln h)dϕ = O(h−1 ln−2 h) by (5.7). As ∂f
−1 −1 h
∂h
= ∂fh ∂p
∂p ∂h
+ ∂f h ∂q
∂q ∂h
, we may
continue the estimate above as follows.
Z 2π
∂fh,m ∂fh −im2 λ ∂fh −im2 λ
. h−1 ln−2 h max e + e dλ . h−1 ln−2 h exp(−CF |m2 |),
∂h ϕ ∂p ∂q
0

as the Fourier coefficients of smooth functions ∂fh


∂p
and ∂f∂q
h
decrease exponentially. As fI,m =
−1
ω fh,m , this gives

∂fI,m
. ω −1 ∂fh,m + kfh,m k ∂ (ω −1 ) . |h−1 ln−1 h| exp(−CF |m2 |).


∂h ∂h ∂h

∂fh,m
The estimate ∂z . h−1 ln−2 h exp(−CF |m2 |) is obtained in the same way as the estimate

∂fh,m
for ∂h .

46
13.3 Proof of estimates on u
The following three lemmas will be used to prove Lemma 8.1. Fix a resonance s2 /s1 . In
all three lemmas we will assume that s2 /s1 is the nearest to ω resonance. We will use the
notation m = (m1 , m2 ) ∈ Z2 . In summation over m we will often need to skip the vectors m
that are equal to (νs1 , −νs2 ) for some ν ∈ Z. This will be denoted by an upper index s in the
summation symbol.
Lemma 13.5.
X(s)
kfm k |m1 ω + m2 |−1 (|m2 | + 1) . |s1 | + | ln h| ln | ln ε|. (13.4)
1≤|m|≤N

Moreover, this holds not only for the Fourier coefficients fm , but for any non-negative numbers
kfm k with kfm k . e−CF |m2 | .

Proof. Denote by Sm2 the part of the left-hand side of (13.4) with this m2 . Note that we have

m1 ω + m2 6= 0. For fixed m2 let m+ 1 (m1 ) be the values of m1 corresponding to the smallest
in absolute value positive (negative) value of m1 ω + m2 . Here we consider all integer values
of m1 , including the ones with |m1 | > N , so we may have |m± + −
1 | > N . Let Am2 (Am2 ) denote
the corresponding term in Sm2 if it exists (i.e. 1 ≤ |m| ≤ N ), or 0 otherwise. Denote by Bm2

the sum of all other terms, Sm2 = A+ m2 + Am2 + Bm2 .
As in [4, Proof of Lemma 7.1], for allP 1 ≤ |m| ≤ N we have |m1P ω + m2 | ≥ (4s1 )−1 . This
−CF |m2 |
+
means Am2 . s1 (|m2 | + 1)e and m2 Am2 . s1 . Similarly, m2 A−
+
m2 . s1 .

For fixed m2 set k(m1 ) = m1 − m+ 1 for m 1 > m +
1 and k(m 1 ) = m 1 − m1 for m1 < m− 1 .
We have |m1 ω + m2 | ≥ ωk, so
2N
X
Bm2 . 2ω −1 (|m2 | + 1)e−CF |m2 | k−1 . ω −1 ln | ln ε| (|m2 | + 1)e−CF |m2 |
k=1
P −1
and m2 Bm2 . ω ln | ln ε|.

Lemma 13.6. For the Fourier coefficients fm of f we have


X(s)
kfm k |m1 ω + m2 |−1 |m1 | . |s1 ln h| + ln2 h ln | ln ε|. (13.5)
1≤|m|≤N,

Proof. Let us argue as above, adapting the notation A±m2 and Bm2 to the corresponding terms
−1
in (13.5). We now have (we use |m+ +
1 ω + m2 | ≤ ω, so |m1 + ω m2 | ≤ 1)
+ −CF |m2 |
A+
m2 . s1 |m1 |e . s1 (ω −1 |m2 | + 1)e−CF |m2 |

and m2 A+
P P
m2 . |s1 ln h|. Similarly, m2 Am2 . |s1 ln h|.
We also have
−1


m1 = ω − m2 ω
−1
. |ω −1 | + |ω −1 ||m2 ||m1 ω + m2 |−1 .
m1 ω + m2 m1 ω + m2

Thus we can split Bm2 . B1,m2 +B2,m2 , where the two summands correspond to the summands
−1 −1
ω
P and ω |m2 ||m 1 ω + m2 |−1 above (in this order). By the estimate (6.1) for fm we have
f(m ,m ) . e−CF |m2 | ω −1 , thus P B1,m2 . ω −2 . Denote by B̃ the terms B
|m1 |≤N 1 2 m2
in (13.4). The estimate m2 B̃m2 . ω −1 ln | ln ε| from the proof of (13.4) gives m2 B2,m2 .
P P

ω −2 ln | ln ε|.

Lemma 13.7. For the Fourier coefficients fm of f we have


X(s)
kfm k |m1 ω + m2 |−2 |m1 | . | ln h| ln4 ε. (13.6)
1≤|m|≤N

Proof. Let us once again argue as in the proof of Lemma 13.5, adapting the notation A±
m2 and
Bm2 to the corresponding terms in (13.6). As in the proof of Lemma 13.6, we have
−CF |m2 |
A+ 2
m2 . s1 |m1 |e . s21 ω −1 |m2 |e−CF |m2 | .

Thus, m2 A+ 4 4
P P
m2 . | ln h| ln ε and, similarly, m2 Am2 . | ln h| ln ε.
To estimate B, let us reuse the notation k from the proof of Lemma 13.5. Recall that
|m1 ω + m2 | ≥ ωk. We have
X X −2
Bm2 . N e−CF |m2 | |m1 ω + m2 |−2 . e−CF |m2 | N ω −2 k . e−CF |m2 | N ω −2 .
m1 k

P 2 2
This implies m2 Bm2 . ln h ln ε.

47
Proof of Lemma 8.1. We have u = A + B with
X(s) fm ei(m1 ϕ+m2 λ) X fνs eiν(s1 ϕ−s2 λ)
A= , B= .
i(m1 ω + m2 ) iν(s1 ω − s2 )
1≤|m|≤N, m∈Z2 1≤|νs|≤N, ν∈Z

We have s1 ≤ N . ln ε and ω ∼ ln h . ln ε. By (13.4) we get kAk . ln2 ε. As ∆ = |ω − ξs |,


2
|ν|
we have |s1 ω − s2 | = s1 ∆. We also have from (6.1) that kfνs k . as , as ≤ e−CF < 1. Hence,
−1 −1
kBk . as s1 ∆ . This gives the estimate on kuk.
∂ ∂
Taking ∂λ or ∂ϕ of the terms in A multiplies them by m2 or m1 , respectively. The estimates
. as s2 s−1 ∆−1

on ∂λ and ∂ϕ follow from (13.4) and (13.5), respectively. We also have ∂B
∂A ∂A
∂λ 1

∂B −1
and ∂ϕ . as ∆ .
Taking ∂h∂
of A or B creates two terms. In one of them fm are replaced by ∂f m
∂h
, denote
this term by Ah,1 or Bh,1 . By Lemma 6.1 and Lemma 13.5 we have Ah,1 . h−1 ln−1 h ln2 ε.
By Lemma 6.1 we also have Bh,1 . bs s−1 1 h
−1
ln−1 h∆−1 .
For the second term we have
∂ω X(s) fm m1 ei(m1 ϕ+m2 λ) ∂ω X fνs s1 eiν(s1 ϕ−s2 λ)
Ah,2 = − , Bh,2 = − .
∂h i(m1 ω + m2 )2 ∂h iν(s1 ω − s2 )2
1≤|m|≤N 1≤|νs|≤N

−1 −2
We have ∂ω ∼ h ln h. By (13.6) we have kAh,2 k .
∂h
| ∂ω
∂h
| ln h ln4 ε . h−1 ln−1 h ln4 ε. As
|ν|
we have kBh,2 k . bs ∂ω
kfνs k . bs , s−1 ∆−2 .
∂h 1
∂u
The estimate on ∂z is obtained in the same way.

Proof of Lemma 5.1. From the Hamiltonian equations we have ∂h ∂I


= ω. By [7, Corollary 3.2]
we have
∂I ∂I ∂2I
= O(1), = O(ln h), = O(1).
∂z ∂zh ∂z 2
−1
∂I
∂w ∂I ∂I
As ∂h = ω , the first estimate implies ∂v = O(ln h). We have ∂h I=const
+( ∂z )h=const = 0,
−1 −1

this gives ( ∂h )
∂z I=const
= O(ln h) and ∂h
∂w
= O(ln h).
We have
∂I ∂I
fI,0 = fh,0 + fz,0 .
∂h ∂z
This rewrites as I
∂I
fI,0 = (2π)−1 fh dt + fz,0 . (13.7)
∂z
The contour integral is taken along the closed trajectory of the unperturbed system given by
the values of h, z and this trajectory is parametrized by the time t. By [7, Lemma 3.2] we
have I I I
∂ ∂
fh dt = O(1), fh dt = O(ln h), fh dt = O(1).
∂h ∂z
Plugging the first estimate in (13.7) gives fI,0 = O(1). From [7, Lemma 3.2] we also have

∂fh,0 ∂fz,0 ∂fh,0 ∂fz,0
, = O(h−1 ln−2 h), = O(ln−1 h),
∂z = O(1).

∂h ∂h ∂z

Plugging this in the derivatives of (13.7) gives



∂fI,0 ∂fI,0
= O(h−1 ln−2 h), ∂z = O(1).

∂h

∂fI,0 ∂fI,0 ∂h ∂fI,0 ∂z
∂h
As ∂w = O(ln−1 h), we have ∂w = ∂h ∂w + ∂z ∂w
= O(h−1 ln−3 h). The estimate

∂fz,0
for ∂w is obtained in the same way.

13.4 Proof of auxiliary lemmas


Proof of Lemma 4.3. Recall that the divergence of a vector field v with respect to a volume
form α is a function divα (v) such that Lv (α) = divα (v) · α (here L denotes the Lie derivative).
Let α = dp ∧ dq ∧ dz ∧ dλ be the volume form. Set
b(X, λ) = ((g λ )∗ α)X /αX ,
i.e. we take pullback of α by the flow and divide it by α at the point X. This gives a number,
as the space of n + 3-forms on n + 3-manifold in the given point is one-dimensional. By
definition of Lie derivative the number b(X, λ) satisfies
db(X, λ)
= div v(g λ (X)) b(X, λ),

where v is the right-hand side of (2.2). As the Hamiltonian terms have zero divergence and
λ̇ = 1, we have div v = ε div f = O(ε). This shows that for λ with |λ| < ε−1 Λ we have
b(X, λ) ∈ (0, C] with C = exp(Λ max | div f |). Integrating over all X ∈ A gives the required
estimate.

48
References
[1] P. Fatou. “Sur le mouvement d’un système soumis à des forces à courte période”.
Bulletin de la Société Mathématique de France 56 (1928), pp. 98–139.
[2] N. N. Bogolyubov and Yu. A. Mitropol’skij. Asymptotic Methods in the Theory of
Non-Linear Oscillations. Hindustan Publishing Corp., Delhi; Gordon and Breach
Science Publishers, New York, 1961.
[3] A.I. Neishtadt. “Passage through resonances in a two-frequency problem”. Aka-
demiia Nauk SSSR Doklady. Vol. 221. 1975, pp. 301–304.
[4] A.I. Neishtadt. “Averaging, passage through resonances, and capture into res-
onance in two-frequency systems”. Russian Mathematical Surveys 69.5 (2014),
p. 771.
[5] V.I. Arnold. “Applicability conditions and an error bound for the averaging
method for systems in the process of evolution through a resonance”. Doklady
Akademii Nauk. Vol. 161. 1. Russian Academy of Sciences. 1965, pp. 9–12.
[6] D.V. Anosov. “Averaging in systems of ordinary differential equations with rapidly
oscillating solutions”. Izvestiya Rossiiskoi Akademii Nauk. Seriya Matematich-
eskaya 24.5 (1960), pp. 721–742.
[7] A.I. Neishtadt. “Averaging method for systems with separatrix crossing”. Non-
linearity 30.7 (2017), p. 2871.
[8] V.I. Arnold. “Small denominators and problems of stability of motion in classical
and celestial mechanics”. Russ. Math. Surv 18.6 (1963), pp. 85–191.
[9] J.D. Brothers and R. Haberman. “Slow passage through a homoclinic orbit with
subharmonic resonances”. Studies in Applied Mathematics 101.2 (1998), pp. 211–
232.
[10] G. Wolansky. “Limit theorem for a dynamical system in the presence of res-
onances and homoclinic orbits”. Journal of Differential Equations 83.2 (1990),
pp. 300–335.
[11] A.I. Neishtadt. “The separation of motions in systems with rapidly rotating
phase”. Journal of Applied Mathematics and Mechanics 48.2 (1984), pp. 133–
139.
[12] A.I. Neishtadt and A. Okunev. “On the phase change for perturbations of Hamil-
tonian systems with separatrix crossing”. arXiv:2003.05828 (2020).
[13] J. Moser. “The analytic invariants of an area-preserving mapping near a hyper-
bolic fixed point”. Communications on Pure and Applied Mathematics 9.4 (1956),
pp. 673–692.
[14] Y. Ilyashenko and S. Yakovenko. Lectures on Analytic Differential Equations.
Vol. 86. Graduate Studies in Mathematics. American Mathematical Society, 2008.

Anatoly Neishtadt,
Department of Mathematical Sciences,
Loughborough University, Loughborough LE11 3TU, United Kingdom;
Space Research Institute, Moscow 117997, Russia
E-mail : a.neishtadt@lboro.ac.uk

Alexey Okunev,
Department of Mathematical Sciences,
Loughborough University, Loughborough LE11 3TU, United Kingdom
E-mail : a.okunev@lboro.ac.uk

49

You might also like