Professional Documents
Culture Documents
Reconstruction RGG
Reconstruction RGG
Abstract. Random geometric graphs are random graph models defined on metric spaces. Such
a model is defined by first sampling points from a metric space and then connecting each pair of
sampled points with probability that depends on their distance, independently among pairs.
arXiv:2402.09591v1 [cs.LG] 14 Feb 2024
In this work, we show how to efficiently reconstruct the geometry of the underlying space from
the sampled graph under the manifold assumption, i.e., assuming that the underlying space is a
low dimensional manifold and that the connection probability is a strictly decreasing function of
the Euclidean distance between the points in a given embedding of the manifold in RN .
Our work complements a large body of work on manifold learning, where the goal is to recover
a manifold from sampled points sampled in the manifold along with their (approximate) distances.
1. Introduction
The manifold assumption in machine learning is a popular assumption postulating that many
models of data arise from distribution over manifolds, see e.g. [DDC23, ADC19, FIL+ 21, FILN20,
FILN23, ADL23] among many others.
A major problem studied in this area is the inference problem of estimating an unknown manifold
given data sampled from the manifold. Some of the foundational work in this area shows that
in many situations given the (possibly noisy) distances between sampled points it is possible to
estimate the unknown manifold.
We are interested in more a difficult problem which arises in the context of random geometric
graphs. In our basic setup, points are again sampled from a manifold but now the data is a graph. In
the graph each sampled point corresponds to a vertex and an edge is included with probability that
depends in a strictly monotone fashion on the (embedded) distance of the corresponding endpoints,
where the inclusions of different potential edges are independent events.
It is clear that this problem is harder than inferring the geometry from (exact) distances. Indeed,
assuming we know the function determining the edge probability as a function of the distance, and
given the sampled points and exact distances between them we can generate a random graph that
depends on the distances between them and has the correct distribution. Thus being able to infer
the geometry from the graph allows us to determine the geometry from the distances.
The main contribution of the current paper is showing that even given only the graph information,
it is possible to estimate the geometry of the underlying manifold.
We now describe the model and the main results in more detail. Suppose M is a d-dimensional
manifold embedded in an ambient Euclidean space and let µ denote a probability measure defined
on M . We construct a random graph G with n vertices as follows. First, generate n i.i.d. random
points X1 , . . . , Xn on M according to µ. We let the vertex set be [n] = {1, . . . , n}. Each pair {v, w}
of vertices is independently connected with a probability that depends on the Euclidean distance
of Xv and Xw ; the closer the points on the manifold are, the more likely the corresponding vertices
on the graph are connected. The probability of connection is determined by a distance-probability
function p, which a monotone decreasing function from [0, ∞) to [0, 1]. We let the probability of
connecting {v, w} by p(kXv − Xw k), where k · k is the standard Euclidean norm.
Our second main result involves recovering the manifold M together with the measure µ in a
version of Gromov–Hausdorff distance for metric measure spaces. For more background on metric
measure spaces and Gromov–Hausdorff distances, see e.g. [?, Shi16].
Theorem 1.3. Suppose (M, µ, p) satisfies Assumptions 2.3 and 2.4. Then there exists C(M, µ, p) ≥
1 so that the following holds. There exists a deterministic polynomial-in-n time algorithm that takes
e ν, deuc ) such that
G = G(n, M, µ, p) as input, and with probability 1 − n−ω(1) , outputs (Γ,
e is a weighted graph whose vertex set is a subset V ′ of the vertices of G,
• Γ
• ν is a probability measure on V ′ ,
• deuc defines a metric space on V ′ ,
• there exists a coupling π of µ and ν so that, for two independent copies (X, u), (X ′ , u′ ) ∼ π,
P dgd (X, X ′ ) − dΓe (u, u′ ) > C(M, µ, p)n−c/d ≤ C(M, µ, p)n−c/d ,
and
P kX − X ′ k − deuc (u, u′ ) > C(M, µ, p)n−c/d ≤ C(M, µ, p)n−1/4 ,
e and t ≥ 0,
• for any vertex u of Γ
µ(Bgd (Xu , t − C(M, µ, p)n−c/d )) − C(M, µ, p)n−1/4
≤ ν(BΓe (u, t))
≤ µ(Bgd (Xu , t + C(M, µ, p)n−c/d )) + C(M, µ, p)n−1/4 ,
The theorem shows that Γ e is a good approximation to M as a metric measure space. The
coupling π “matches” the two spaces. Under this matching distances match: the intrinsic distance
on the original manifold matches the graph distance on Γ e and the embedded distance on the original
manifold matches the distance deuc .
The proofs of both theorems essentially use the following result which is the main technical result
of the paper. It shows how to extract a “net of clusters” from the graph.
Theorem 1.4. Suppose (M, µ, p) satisfies Assumptions 2.3 and 2.4. There exist constants C =
C(M, µ, p) ≥ 1, c ∈ (0, 1), and a polynomial-in-n time algorithm buildNet taking G = G(n, M, µ, p)
as input, and with probability 1 − n−ω(1) , outputs a collection of pairs {(Uα , uα )}α∈[ℓ] with uα ∈
Uα ⊆ [n] such that the output is a Cluster-Net in the following sense:
(1) For each α ∈ [ℓ], |Uα | ≥ n1/2 , and for each v ∈ Uα , kXuα − Xv k ≤ Cn−c/d .
3
√
(2) For each p ∈ M , there exists α ∈ [ℓ] such that kp − Xuα k ≤ 1000 dCn−c/d .
1.1. Related Works in the Literature. Our work connects to two large bodies of works: Mani-
fold Learning and Random Geometric Graphs. Due to space limitations, we cannot review many of
the beautiful works in these areas. We will focus on some of the most relevant works. For surveys
on Random Geometric Graphs, see e.g. [Pen03, DDC23].
Perhaps the closest to our work is the work of Araya and De Castro [ADC19]. They consider
random geometric graphs generated from latent points on the sphere Sd−1 with the probability
of having an edge between two latent points given by the value of the probability “link” function
evaluated at the distance between the two points. Similar to our result, their work gives a way
to approximate the distances between the latent points from the random geometric graph. The
setup of [ADC19] crucially relies on the assumption of uniform sampling from the unit sphere. This
allows them to use spectral/harmonic analysis on the sphere to obtain fast algorithms and good
rates of convergence. This also allows them to reconstruct graphs in the sparse regime, which we do
not. See also the follow up work of Eldan, Mikulincer, and Pieters [EMP22]. See also [STP12] for
related work in a similar setup. Since we are considering general manifold such techniques cannot
be applied. This may also explain why our algorithms for the general case are less efficient and
have worse errors rates.
Works of Fefferman, Ivanov, Kurylev, Lassas, Lu, and Narayanan (some with/without Kurylev
and some with/without Lu) [FIK+ 20, FILN20, FIL+ 21, FILN23] consider the problem of manifold
learning as part of a more general manifold extension problem. The main difference in perspective
is that the interests in these lines of work is finding a bona fide manifold and moreover, points are
given with (approximate) distances between them. As mentioned earlier we can leverage this line
of work together with our results to recover manifolds in our setting as well.
There are many other problems that have been studied in random geometric graphs, including
finding the location of points given noisy distances [OMK10, JM13], questions relating to testing
geometry vs. non-geometric random graphs, esp. in high dimensions e.g. [LMSY22, ADL23] and
questions related to geometric block models, e.g. [LS23].
1.2. Some Proof Ideas. Here we give a thought process toward the construction of the algorithm
buildNet and the proof of Theorem 1.4.
Why clusters? Let us begin by explaining why we want clusters of points. Given two vertices
v, w, the presence or absence of an edge between them is indeed only weakly associated with their
distance. However, if there exists a cluster V surrounding the vertex v in the sense that for every
u in V ,
kXv − Xu k ≤ η,
for a small η, then it becomes possible to infer the distance from v to w by examining the number
of edges connecting w and V . This is because the quantity
|{(u, w) : u ∈ V, {u, w} is an edge }|
is a sum of independent Bernoulli random variables with parameter p(kXw −Xu k) ≈ p(kXw −Xv k),
once we condition on the realization of {Xu }u∈V ∪ {Xv , Xw } in which V is a cluster. Leveraging the
concentration of the sum of independent Bernoulli random variables, we can, with high probability,
estimate p(kXw − Xv k) within an error margin determined by η and the size of V . Consequently,
this allows for the approximation of kXw − Xv k with small error.
Finding a Cluster using extremal statistics. The question of how to generate a cluster at a given
vertex v is not trivial. It seems intuitively true that, for every other vertex w, the closer Xv and Xw
are, the larger the size of their common neighbors should be. This reasoning suggests that forming a
cluster around v could be as simple as grouping vertices that share many common neighbors with v.
Unfortunately, this intuition does not hold in general. Counterexamples can be constructed based
4
on the geometry of the embedded manifold M or the properties of the measure µ, demonstrating
scenarios where, despite given Xv = Xv′ and a non-negligible distance between Xv and Xw , the
expected number of common neighbors between v and w can be larger than that between v and v ′ .
Therefore, directly extracting a cluster based on this premise is not feasible.
Nevertheless, the underlying intuition is not entirely wrong. It still applied to an extremal
setting. To illustrate, consider the function
Z
(1) K(x, y) := p(kx − zk)p(ky − zk)dµ(z).
M
Conditioning on Xv = x and Xw = y, K(x, y) represents the probability that a third vertex u
is a common neighbor of both v and w. Consequently, (n − 2)K(x, y) is the expected number of
common neighbors of v and w.
It is worth to noting that simply applying Cauchy-Schwarz inequality,
p
(2) K(x, y) ≤ K(x, x)K(y, y) ≤ max{K(x, x), K(y, y)},
which implies the global maximum K(x, y) only occurs along the diagonal x = y. Therefore, if
there are sufficiently many vertices, and |N (v) ∩ N (w)| ≃ (n − 2)K(Xv , Xw ) is among the largest
of common neighbors for all distinct pairs of vertices, it suggests that (Xv , Xw ) might be near a
global maximum of K, and hence close to each other. Stemming from this observation, we can
derive an algorithm generateCluster that takes an induced subgraph of GW of G = G(n, M, µ, p)
and attempt to extract a cluster (V, v) with v ∈ V ⊆ W in the sense that
∀w ∈ V, kXv − Xw k ≤ η,
for some small parameter η. This algorithm is simple, it simply search for v ∈ W so that there are
many vertices w in W such that |N (v) ∩ N (w) ∩ W |/|W | is within a small gap to
(3) max K(Xw1 , Xw2 ) ≃ max K(Xw1 , Xw1 ).
w1 ,w2 ∈W w1 ∈W
In order for the algorithm to has a high success rate, besides requiring the size of common neighbors
|N (v) ∩ N (w) ∩ W |/|W | approximates K(Xv , Xw ) for v, w ∈ W , we also need
(4) ∀w0 ∈ W, {Xw }w∈W ∩ Bk·k (Xw0 , ε) is large enough,
for some parameter ε much smaller than η, the radius of the cluster. This is necessary so that the
comparison (3) holds.
With these assumptions, we have a simple algorithm generateCluster that can extract a cluster
(V, v) from a given batch of vertices W . The only drawback is that since we are relying on the
extremal statistics of K, we have no control on v.
Finding the next Cluster. The main algorithm buildNet operates as a recursive algorithm that
partition the vertex set into batches. Each iteration extracts a new cluster from a newly intro-
duced batch. Within each iterative process, the algorithm generateCluster is employed with minor
adjustment to gain control on the next generated cluster. Consider the scenario where we already
have a collection of clusters {(Vs , ss )}s∈[k] , and a new batch of vertices, W , is presented. Given the
pre-existing clusters, the distance between each vertex w ∈ W and each cluster center vs can be
estimated, with the accuracy and likelihood of this estimation depending on the parameter η and
the size of Vs . Specifically, this concerns how well
−1 |N (w) ∩ Vs |
p ≃ kXw − Xvs k,
|Vs |
including the associated probability.
Consider W ′ ⊆ W as identified through distance estimates constraints relative to {vs }. Intu-
itively, our aim is for each w ∈ W ′ to satisfy ds ≤ kXw − Xvs k ≤ Ds for some s ∈ [k]. The actual
5
selection criteria for W ′ should be defined so that it is observable from the graph, in the following
way
−1 |N (w) ∩ Vs |
ds ≤ p ≤ Ds ,
|Vs |
which serves as an approximation of out intended condition. Then, we will apply generateCluster
using the input GW but restricting to produce a cluster (V, v) within W ′ .
Specifically, the algorithm seeks a vertex v ∈ W ′ where there are many w ∈ W ′ , such that
|N (v) ∩ N (w) ∩ W |/|W | is comparable to the maximum of |N (w1 ) ∩ N (w2 ) ∩ W | ≃ K(Xw1 , Xw2 )
over distinct w1 , w2 ∈ W ′ .
To ensure that the outcome constitutes a valid cluster, W ′ must satisfy the condition such as
(4), ensuring (3) holds when restricting w, w1 , w2 to W ′ .
Due to the fact that W ′ is only an approximation of what we want with the imposed geometric
constraints. It becomes difficult to justify the desired properties of W ′ when we don’t have control
on the shape of the manifold M together with the measure µ. For this reason, we will need to
choose both ds and Ds at the order of some parameter δ, where for every p ∈ M , the manifold
looks like a flat space of dimension d within a ball of radius δ centered at p. Here we crucially rely
on the fact that the underlying metric space of the random geometric graph is locally Euclidean.
Finding the Next Cluster Close to a Given Cluster. Following from this discussion, we can now
outline an intermediate step of the buildNet algorithm. Consider a scenario where we are given a
cluster (U, u) and a vertex u′ with the property that kXu − Xu′ k ≈ δ, indicating that the points
are neither too close nor too far apart, but within a distance that allows the manifold M to appear
flat. Our objective is to identify a cluster (V, v) in the vicinity of u′ .
This is accomplished through the utilization of d + 1 new batches: W1 , W2 , . . . , Wd+1 . Starting
with the first batch, we apply generateCluster with input GU ∪W1 and a carefully selected subset
W1′ ⊆ W1 to derive a cluster (V1 , v1 ) positioned at a distance r from (U, u), where r is comparable
to δ. Subsequent applications of generateCluster with inputs GU ∪V1 ∪W2 , GU ∪V1 ∪V2 ∪W3 , . . . and
corresponding subsets W2′ ⊆ W2 , W3′ ⊆ W3 , . . . yield clusters (V2 , v2 ), (V3 , v3 ), . . . such that each
cluster (Vi , vi ) satisfies
√
kXvi − Xu k ≈ r and ∀j ∈ [i − 1], kXvi − Xvj k ≈ 2r.
The distance criteria for selecting these clusters are chosen to so that the vectors {Xvi − Xu }i∈[d]
form an approximate orthogonal basis of vectors locally for M at Xu . Finally, we can apply
generateCluster with input GU ∪{Vi }i∈[d] ∪Wd+1 and a subset Wd+1′ ⊆ Wd+1 , containing vertices whose
distances from Xw to the set of points {Xu , Xv1 , . . . , Xvd } closely match those from Xu′ . Given
the local flatness of M within a δ-radius ball centered at Xu , the cluster (V, v) identified in this
manner will be proximal to u′ in the underlying metric space.
While the algorithm is relatively straightforward, the bulk of the technical proof focuses on metic-
ulously justifying, at various stages, that our chosen subset W ′ indeed meets all the prerequisites
for generateCluster to successfully identify a valid cluster. Simultaneously, it is essential to ensure
that the creation of new clusters at each step does not lead to propagation on the cluster radius η.
We also extend the definition to affine subspaces as follows. For any affine subspaces A1 , A2 ⊆ RN
of dimension d ≥ 1, take any points a1 ∈ A1 and a2 ∈ A2 , and define
d(A1 , A2 ) := d(A1 − a1 , A2 − a2 ).
Remark 2.2. When we consider only linear subspaces, the function
d : Gr(N, d) × Gr(N, d) → R≥0 ,
where Gr(N, d) denotes the Grassmannian of d-spaces in RN , is a distance function. However, one
needs to be careful when extending the definition of d to affine spaces, where d is no longer a
distance function, as d(A, A′ ) vanishes whenever A and A′ are parallel.
2.2. Random Graph Model. This subsection describes the random graph model.
Assumption 2.3 (Manifold and probability measure). Let d be a fixed positive integer. Let M be a
d-dimensional compact complete connected smooth manifold embedded in RN with its Riemannian
metric induced from RN . Define the following.
(i) Let
κ := max max kII(u, v)p k0,
p∈M u,v∈Tp M,
kuk=kvk=1
where II denotes the second fundamental form of M (see Definition A.1 in the appendix).
(ii) Let rM,0 be the largest value such that for every point p ∈ M and for every real number
r ∈ (0, rM,0 ), the intersection B(p, r) ∩ M is connected. Here, we assume
rM,0 > 0.
and we allow rM,0 = ∞.
(iii) Let
(6) rM := 0.01 · min{1/κ, rM,0 }.
Note that since κ is strictly positive, we have rM < ∞.
Let µ be a probability measure on M and define µmin : [0, ∞) → [0, 1] by
(7) µmin (r) := min µ(B(p, r) ∩ M ),
p∈M
2.2.4. Remark on the choice of ℓp with dependency on M . Let Ma and Mb denote two embedded
manifolds within R2 , characterized by the configurations depicted in Figure 1. It is evident that
an isometry F exists, respecting the intrinsic metric, such that F reflects the protruding segments
of Ma onto the corresponding indented segments of Mb and acts identically elsewhere. A key
observation is the existence of a constant t > 0 ensuring that, for any points p, q ∈ Ma satisfying
kp − qk ≤ t, the equation kF (p) − F (q)k = kp − qk holds.
Consider µa and µb as the uniform measures on Ma and Mb , respectively. Notice that µb is also
the pushforward measure of µa under F . Assume a function p : [0, ∞) → [0, 1] that is constant for
x ≥ t. Now, let us couple the two graphs G([n], Ma , µa , p) and G([n], Mb , µb , p): Let X1 , . . . , Xn
represent independent and identically distributed random points on Ma , based on the distribution
µa , and let Ui,j denote independent and identically distributed uniform random variables on [0, 1] for
i, j ∈ [n], with the stipulation that Ui,j = Uj,i . A graph is then formed with vertex set [n], connecting
vertices i and j if Ui,j ≤ p(kXi − Xj k), yielding a graph distribution equivalent to G([n], Ma , µa , p).
Analogously, using the same vertex set but connecting i and j if Ui,j ≤ p(kF (Xi ) − F (Xj )k) results
in a graph with the distribution of G([n], Mb , µb , p).
Given that p(kF (Xi )−F (Xj )k) = p(kXi −Xj k whether kXi −Xj k ≤ t (by virtue of the properties
of Ma and Mb ) or kXi − Xj k > t (as p is constant beyond t), the resultant random graphs are
indistinguishable. Consequently, it is impossible to recover the Euclidean structure from the graphs.
1/4 1/2
kKk∞ Cgap d1/2 Lp
(14) C2 ≥ 4 · 1/2 1/2
,
ℓp c1
and
ℓp
(15) 0 < c3 ≤ √ .
Cgap d
Let η > 0 be a parameter such that
n L2p o
(16) η ≥ max C2 · ε1/2 , √ ·ε .
c3 ℓp d
Let δ and r be the parameters given by
√
(17) δ := Cgap d · η and r := Cgap d2 · δ.
Finally we assume that the choice of r satisfies
(18) rM ≥ 24 Cgap d2 r and n ≥ 100.
We remark that this is always true when n is large enough.
Finally, let us notice that
√
(19) δ = Cgap dη & ε1/2 & n−ς/2d ,
where a & b means there exists C = poly(Lp , 1/ℓp , 1/kKk∞ , d, µmin (rM /4)) such that a ≥ Cb.
Remark 2.7. In Section 11 below, we discuss practical use of our results in this paper. There we
will be explicit about what the graph observer observes and what the graph observer can compute
to obtain “graph-observer-accessible” versions of parameters. Furthermore, Proposition 11.1 gives
an explicit test, which can be carried out by the graph observer, to check whether the parameters
are feasible.
11
2.4. Additional Notation and Tools. For each i ∈ V, let
(20) Hi := TXi M
be the tangent plane of M at the point Xi . Let diam(M ) and diamgd (M ) be the diameters of M
with respect to the Euclidean distance and the geodesic distance, respectively.
Let us also recall the standard Hoeffding’s inequality for sum of bounded independent random
variables.
Lemma 2.8 (Hoeffding’s inequality). For any n ≥ 1, let Y1 , Y2 , . . . , Yn be independent random
variables such that Yi ∈ [ai , bi ] for each i ∈ [n]. Then, for any t > 0, we have
n X X o
2t2
P Yi − E Yi ≥ t ≤ 2 exp − Pn 2
.
i∈[n] i∈[n] i=1 (bi − ai )
0.5 − 0.02
kz − yk2 − kzk2 ≥ rM kyk.
1.01
With the estimate kz − yk + kzk ≤ 3rM for the denominator in (24), we conclude
1
kz − yk − kz − ~0k ≥ kyk,
20
The term “cluster” which we use throughout this paper is defined as follows.
Definition 4.2. A cluster is a pair (V, i) for i ∈ V ⊆ V which satisfies the following two properties:
(i) for each j ∈ V , kXj − Xi k < η.
(ii) |V | ≥ n1−ς .
Further, we say that (V, i) is a t-cluster for t > 0 if condition (i) is replaced by kXj − Xi k < t for
j ∈V.
The following is the algorithm generateCluster1 which we use to find clusters in the paper.
1See Subsection ?? for how the graph observer can practically use the algorithms in this paper. All of the four
algorithms in this paper are practical.
15
Algorithm 1: GenerateCluster
Input : V1 ⊆ V, a subset of vertices of size n.
V2 ⊆ V, a subset of at least 2 vertices.
Output: Wgc ⊆ V2 , a subset of vertices.
igc ∈ Wgc , a vertex.
Step 1. Sort {i, j} ∈ V22 according to |NV1 (i) ∩ NV1 (j)| from the largest to the smallest,
and return a list {i1 , j1 }, {i2 , j2 }, . . ., {iL , jL }, where L = |V22 | = V22 .
Step 2. Let m be the largest positive integer such that
1
|NV1 (im ) ∩ NV1 (jm )|/n ≥ |NV1 (i1 ) ∩ NV1 (j1 )|/n − c1 η 2 .
2
S S
Step 3. Consider a graph Ggc with vertex set Vgc := k∈[m] {ik } ∪ k∈[m] {jk } and edge
set Egc := {{ik , jk } : k ∈ [m]}.
Step 4. Take a pair (Wgc , igc ) where igc ∈ Vgc maximizes the size of neighbors in Ggc and
Wgc = {j ∈ Vgc : {igc , j} ∈ Egc } ∪ {igc }.
return (Wgc , igc )
Notice that by construction the output (Wgc , igc ) always satisfies igc ∈ Wgc ⊆ V2 .
Definition 4.3. A pair (W1 , W2 ) of subsets of V is said to be a good pair if the following
conditions are satisfied:
• ∅ 6= W2 ⊆ W1 ,
• |W1 | = n,
• for every i ∈ W2 , there exists p ∈ M such that
L
(i) kXi − pk < Cgap d ℓpp ε, and
(ii) {j ∈ W1 : kp − Xj k < ε} ⊆ W2 .
Lemma 4.4 (Good pair lemma). Assume that the parameters are feasible. If (W1 , W2 ) is a good
pair and W1 is a sample within the events Ecn (W1 ) and Enet (W1 ), then the output (Wgc , igc ) of
Algorithm 1 with input V1 = W1 and V2 = W2 is a cluster with Wgc ⊆ W2 .
Proof. Before even discussing the output (Wgc , igc ), let us begin by checking that (W1 , W2 ) is an
appropriate input for the algorithm. Namely, we argue why W2 has at least two elements, and
along the way we introduce a set S which we will use in later parts of this proof.
Since (W1 , W2 ) is a good pair, we know that W2 is nonempty. Thus, Lemma 3.3 implies that
there exists i0 ∈ W2 such that
K(Xi0 , Xi0 ) = max K(Xi , Xj ).
i,j∈W2
Consider the set
Lp
S := j ∈ W2 : kXj − Xi0 k < 2Cgap d ε .
ℓp
L
Since (W1 , W2 ) is a good pair and i0 ∈ W2 , there exists p ∈ M such that kXi0 − pk < Cgap d ℓpp ε,
and
S ′ := {j ∈ W1 : kp − Xj k < ε} ⊆ W2 .
Using the triangle inequality together with the above inclusion, we obtain
S ′ ⊆ S ⊆ W2 .
We know from the event Enet (W1 ) that
ε n (12)
′ 2ε n n
|S | ≥ µmin · ≥ µmin · ≥ 2n−ς · = n1−ς ,
3 2 6 2 2
16
which implies that
|W2 | ≥ |S| ≥ |S ′ | ≥ n1−ς .
In particular, by the feasibility assumption, we know n ≥ 100, and thus |W2 | ≥ 1003/4 > 31. Hence,
we have established that (W1 , W2 ) is an appropriate input for Algorithm 1.
Now we will show (Wgc , igc ) is a cluster starting with condition (i) from Definition 4.2. For any
pair {s1 , s2 } ∈ S2 , the event Ecn (W1 ) implies that
|NW1 (s1 ) ∩ NW1 (s2 )| 1
≥ K(Xs1 , Xs2 ) − n− 2 +ς .
n
L
By the definition of S, both kXs1 − Xi0 k and kXs2 − Xi0 k is bounded by 2Cgap d ℓpp ε. Together with
p
the Lipschitz constant of K is bounded above by Lp kKk∞ (See Remark 3.4), we have
p Lp
K(Xs1 , Xs2 ) ≥ K(Xi0 , Xi0 ) − 2Lp kKk∞ · 2Cgap d ε.
ℓp
Combining the two inequalities above yields
|NW1 (s1 ) ∩ NW1 (s2 )| p Lp 1
(29) ≥ K(Xi0 , Xi0 ) − 2Lp kKk∞ · 2Cgap d ε − n− 2 +ς ,
n ℓp
for any pair {s1 , s2 } ∈ S2 . Since S has at least n1−ς > 2 elements and because S ⊆ W2 , we have
the following bound:
|NW1 (i′ ) ∩ NW1 (j ′ )| p L2p 1
(30) max ≥ K(Xi0 , Xi0 ) − 4 kKk∞ Cgap d ε − n− 2 +ς .
W
{i′ ,j ′ }∈( 22 ) n ℓp
Now consider the graph Ggc from Algorithm 1. Observe that a pair {i, j} ∈ W22 appears as an
edge in Ggc if and only if
|NW1 (i) ∩ NW1 (j)| |NW1 (i′ ) ∩ NW1 (j ′ )| 1
(31) ≥ max − c1 η 2 .
n {i′ ,j ′ }∈(W22 ) n 2
Consider any pair {i, j} ∈ W22 which satisfies the above inequality. We claim kXi − Xj k < η.
Suppose, for the sake of contradiction, that kXi − Xj k ≥ η. Then by Ecn (W1 ) and Lemma 3.3,
we find
|NW1 (i) ∩ NW1 (j)| 1 1
(32) ≤ K(Xi , Xj ) + n− 2 +ς ≤ K(Xi0 , Xi0 ) − c1 η 2 + n− 2 +ς ,
n
where we also use that rM ≥ η in the last inequality. This follows from the feasibility assumption:
rM ≥ 24 Cgap d2 r > r > δ > η. Combining (30), (31), and (32), we find the following inequality
p L2p 1 1
4 kKk∞ Cgap d ε + 2n− 2 +ς ≥ c1 η 2 ,
ℓp 2
which will be a contradiction if
1 p L2p 1
(33) c1 η 2 ≥ 4 kKk∞ Cgap d > 2n− 2 +ς .
4 ℓp
1/4 1/2 1/2
kKk C d L
p 1/2
This is precisely the reason why we impose the condition η ≥ 4 ∞ 1/2gap1/2 ε from (16)
ℓp c 1
p L
(also (14)) and ε ≥ kKk∞ Lp n−1/2+ς from (12). With these two conditions and Cgap d ℓpp > 1,
(33) holds and hence a contradiction follows.
The argument above shows that every edge {i, j} in Ggc satisfies kXi − Xj k < η. Since for each
i ∈ Wgc − {igc }, the pair {igc , i} is an edge in Ggc by construction, we have established (i).
17
W2
It remains to establish (ii). Consider any pair {i′ , j ′ } ∈ 2 . Within the event Ecn (W1 ), we have
|NW1 (i′ ) ∩ NW1 (j ′ )| 1 1
≤ K(Xi′ , Xj ′ ) + n− 2 +ς ≤ K(Xi0 , Xi0 ) + n− 2 +ς .
n
This shows that
|NW1 (i′ ) ∩ NW1 (j ′ )| 1
K(Xi0 , Xi0 ) ≥ max − n− 2 +ς .
W2
{i ,j }∈( 2 )
′ ′ n
Using the above estimate with (29), we find that every pair {s1 , s2 } ∈ S2 satisfies
|NW1 (s1 ) ∩ NW1 (s2 )| |NW1 (i′ ) ∩ NW1 (j ′ )| p L2p 1
≥ max − 4 kKk∞ Cgap d ε − 2n− 2 +ς
n W
{i′ ,j ′ }∈( 22 ) n ℓp
|NW1 (i′ ) ∩ NW1 (j ′ )| 1
> max − c1 η 2 ,
{i′ ,j ′ }∈(W22 ) n 2
where the last inequality follows from the parameter assumptions. This implies that every pair
S
{s1 , s2 } ∈ 2 is an edge in Ggc . Therefore, the maximum degree in Ggc is at least |S|−1 ≥ n1−ς −1,
and thus |Wgc | ≥ n1−ς . We have completed the proof.
As a consequence of the lemma above, we can use Algorithm 1 to find the first cluster.
Proposition 4.5. Assume that the parameters are feasible. For any subset W ⊆ V with |W | = n,
within the events Enet (W ) and Ecn (W ), Algorithm 1 with input V1 = W and V2 = W returns a
cluster (Wgc , igc ) with igc ∈ Wgc ⊆ W .
Proof. It is straightforward to check that (W, W ) is a good pair. This proposition therefore follows
immediately by applying Lemma 4.4.
5. Regularity of pα
In the previous section, we described how we can use Algorithm 1 to generate a cluster. When
we have a collection of clusters (Vα , iα ), indexed by α, we can define a corresponding collection of
functions pα : M → R, which is an approximation of the function q 7→ p(kq − Xiα k) obtained by
averaging p(kq − Xi k) for i ∈ Vα . Intuitively, due to the concentration of measure, for any given
i ∈ V, we expect
X p(kXi − Xj k) |Ni ∩ Vα |
p(kXi − Xiα k) ≃ pα(Xi ) := ≃ ,
|Vα | |Vα |
j∈Vα
which allows us to estimate kXi − Xiα k. The goal of this section is to derive regularity property of
pα , stated in Lemma 5.3.
Throughout this section, let us fix some nonempty finite index set A. We shall begin by defining
what we call the cluster event, within which we have a collection {(Vα , iα )}α∈A of clusters with the
desired property.
Definition 5.1. For a collection of pairs {(Vα , iα )}α∈A with iα ∈ Vα ⊆ V, the cluster event of
the collection {(Vα , iα )}α∈A is defined as
Eclu ({(Vα , iα )}α∈A ) := for each α ∈ A, the pair (Vα , iα ) is a cluster .
The discussion in this section is within the event
Eclu ({(Vα , iα )}α∈A ).
We also assume that the parameters are feasible throughout this section.
18
Definition 5.2. For each α ∈ A, define the function pα : M → R by
X p(kq − Xj k)
pα(q) := .
|Vα |
j∈Vα
The lemma below describes a useful regularity property of the function pα , which we are going
to use in Sections 6 and 7.
Lemma 5.3 (Regularity of pα ). Fix an index α ∈ A and assume the event Eclu ((Vα , iα )). Suppose
q ∈ M satisfies
0.9r ≤ kq − Xiα k ≤ 2r. Suppose that σ ∈ {−1, +1}, 16η/r ≤ τ ≤ 1, and
u ∈ SN −1 = x ∈ RN : kxk = 1 satisfy
D q − Xiα E
u, σ = τ,
kq − Xiα k
where h·, ·i denotes the usual Euclidean inner product in RN . Then we have the following items.
(a) For every i ∈ Vα ,
τ D q − Xi E 3
≤ u, σ ≤ τ.
2 kq − Xi k 2
(b) If σ = +1, then for any 0 ≤ t < 0.1τ r,
1
pα (q) − 2Lp τ t ≤ pα (q + tu) ≤ pα (q) − ℓp τ t.
4
(c) If σ = −1, then for any 0 ≤ t < 0.1τ r,
1
pα (q) + ℓp τ t ≤ pα (q + tu) ≤ pα (q) + 2Lp τ t.
4
Proof. (a) Take an arbitrary i ∈ Vα . First, since kXi − Xiα k < η, we have
hu, σ(q − Xi )i − τ kq − Xiα k = hu, σ(Xiα − Xi )i + hu, σ(q − Xiα )i − τ kq − Xiα k
(34) = hu, σ(Xiα − Xi )i < η.
Together with kq − Xiα k − kq − Xi k ≤ kXi − Xiα k < η (the triangle inequality), we find
D q − Xi E D q − Xi E kq − Xiα k − kq − Xiα k + kq − Xi k
u, σ − τ = u, σ −τ
kq − Xi k kq − Xi k kq − Xi k
D q − Xi E kq − Xiα k η
≤ u, σ −τ +
kq − Xi k kq − Xi k kq − Xi k
(34) 2η 2η 4η
≤ ≤ ≤ ,
kq − Xi k kq − Xiα k − η r
(17)
r
where the last inequality follows from a coarse estimate η ≤ Cgap ≤ 0.1r and the assumption
kq − Xiα k ≥ 0.9r. With the assumption that τ ≥ 16η/r, part (a) of the lemma follows.
(b) Now we are ready to prove the second statement. To bound
X p(kq + tu − Xi k)
pα(q + tu) = ,
|Vα |
i∈Vα
3τ t2
≤ kq − Xi k + t 1+
2 (kq − Xi k + t 3τ
2 )
2
3τ t2
(35) ≤ kq − Xi k + t + .
2 kq − Xi k
Moreover, by the assumption kq − Xiα k ≥ 0.9r,
t 0.1τ · r 0.1τ · r 0.1τ · r
≤ ≤ ≤ = 0.2τ.
kq − Xi k kq − Xiα k − kXiα − Xi k 0.9r − η r/2
Substituting the above estimate into (35), we get kq +tu−Xi k ≤ kq −Xi k+1.7tτ . To summarize,
we have obtained
(36) kq − Xi k + 0.5τ t ≤ kq + tu − Xi k ≤ kq − Xi k + 1.7τ t.
Since the above expression is valid for every i ∈ Vα , we have
X
pα (q + tu) = p(kq + tu − Xi k)/|Vα |
i∈Vα
(36) X
≤ p kq − Xi k + 0.5τ t /|Vα |
i∈Vα
X
≤ p(kq − Xi k)/|Vα | − 0.5ℓp τ t
i∈Vα
= pα (q) − 0.5ℓp τ t,
where we applied (8) in the last inequality, and it is valid since by the upper bound kq − Xiα k ≤ 2r,
the parameter assumptions, and the feasibility assumption, we have
kq − Xi k + 0.5τ t ≤ kq − Xiα k + η + (0.5τ )(0.1τ r) ≤ 2r + η + r ≤ rM .
For the lower bound of pα (q + tu), we use the Lipschitz constant of p:
X
pα (q + tu) = p(kq + tu − Xi k)/|Vα |
i∈Vα
(36) X
≥ p kq − Xi k + 1.7τ t /|Vα |
i∈Vα
X
≥ p(kq − Xi k)/|Vα | − 1.7Lp τ t
i∈Vα
= pα (q) − 1.7Lp τ t.
As a consequence, we have obtained a slightly stronger estimate than the one stated in the lemma.
20
(c) In the case σ = −1, we can estimate in the same way to get
1 3
kq − Xi k − τ t ≥ kq + tu − Xi k ≥ kq − Xi k − τ t,
4 2
for any i ∈ Vα . The statement for this case follows by applying the same estimate for pα (q +tu).
Throughout this section, we fix k ∈ [0, d − 1]. Consider a collection of pairs (Vα , iα ) with
iα ∈ Vα ∈ V for α ∈ [0, k], and a new batch W of vertices:
k
[
W ⊆V\ Vα ,
α=0
such that |W | = n.
The discussion in this section is within the event
Eortho = Eortho ({(Vα , iα )}α∈[0,k] , W )
(38) := Eao (i0 , . . . , ik ) ∩ Eclu ({(Vα , iα )}α∈[0,k] ) ∩ Enavi ({Vα }α∈[0,k] , W ) ∩ Ecn (W ) ∩ Enet (W ).
Consider the subset W ′ ⊆ W defined as
n √ √
(39) W ′ := i ∈ W : ∀α ∈ [k], p( 2r + 0.95δ) ≤ |N (i) ∩ Vα |/|Vα | ≤ p( 2r − 0.95δ)
o
and p(r + 0.95δ) ≤ |N (i) ∩ V0 |/|V0 | ≤ p(r − 0.95δ) .
21
Recall that N (i) denotes the set of neighbors of i in V. Thus, N (i) ∩ Vα is the same set as
NVα (i).
Consider that given a collection of vertex sets {Vα }α∈[0,k] and a vertex set W , we can determine
the vertex set W ′ by examining the structure of graph G. The primary objective of this section
is to demonstrate that, conditioned on the event Eortho , it is possible to identify a specific cluster
(Wgc , igc ) within W ′ . Furthermore, the index igc , in conjunction with the sequence of indices
(i0 , . . . , ik ), will be shown to satisfy the event Eao (i0 , . . . , ik , igc ). Here is the main objective in this
section:
Proposition 6.3. For k ∈ [0, d − 1], let (V0 , i0 ), . . . , (Vk , ik ) be k + 1 pairs with iα ∈ Vα ⊂ V and
W ⊂ V be a subset of size |W | = n which is disjoint from ∪α∈[0,k] Vα . Let W ′ be the set defined
in (39). Given the occurrence of the event Eortho ({(Vα , iα )}, W ) as described in (38). If we run
the Algorithm (1) with input (W, W ′ ), then the output (Wgc , igc ) is a cluster and (i0 , . . . , ik , igc )
satisfies the event Eao (i0 , . . . , ik , igc ). In other words,
Eortho ({(Vα , iα )}, W ) ⊆ Eclu (Wgc , igc ) ∩ Eao (i0 , . . . , ik , igc ).
In this section, we will first prove Proposition 6.3 and then prove the corollary. Before we proceed,
let us introduce additional notations. For brevity, let us introduce some additional notations to be
used in this section. Without loss of generality, let us assume
Xi0 = ~0 ∈ RN
and recall from (20) that
(40) Hi0 = TXi0 M.
Let
P : R N → H i0
be the orthogonal projection to Hi0 . For each point q ∈ M , we write
q = q⊤ + q⊥,
where q ⊤ = P q and q ⊥ = q − q ⊤ . We apply Proposition A.4 with p = ~0, H = H0 , and ζ = 1.
Together with rM ≤ 0.01κ from its definition (6), there exists a local inverse of P
φ : B(~0, rM ) ∩ Hi0 → M
such that
(41) P ◦ φ = IdB(~0,rM )∩Hi ,
0
The will be obtained by showing that both spaces form only a small angle with Hi0 .
Lemma 6.8. Consider iW ′ ∈ W ′ and adapt the notations described in Lemma 6.7. Recall that P
is the orthogonal projection to Hi0 and u⊥ = u − P u for every u ∈ RN . Within the event Eortho ,
we have
1
max ku⊥ k ≤ .
u∈HY : kuk=1 4Cgap
Notice that
Yv kPH ⊥ Y kop
(55) max ku⊥ k = max PH ⊥ ≤ 0
,
u∈HY : kuk=1 v∈R[0,k] :kvk=1 i0 kY vk smin (Y )
where PH ⊥ is the orthogonal projection to Hi⊥0 , Y is the N by k + 1 matrix with columns
i0
(Y0 , Y1 , . . . , Yk ), and smin (Y ) is the least singular value of Y . With k + 1 ≤ d ≤ N , we have
p
smin (Y ) = smin (Y T Y ),
and (Y T Y )α,β = hYα , Yβ i for α, β ∈ [0, k].
Lemma 6.9. Consider iW ′ ∈ W ′ and adapt the notations described in Lemma 6.7. Within the
event Eortho , for α, β ∈ [0, k],
hYα , Yβ i − r 2 − r 2 1 α = β 6= 0) ≤ 15rδ,
where 1(α = β 6= 0) is the indicator function which equals 1 when α = β = s for some s ∈ [k].
26
Proof. By definition,
(56) hYα , Yβ i = XiW ′ , XiW ′ − Xiα , XiW ′ − Xiβ , XiW ′ + Xiα , Xiβ .
Now we will estimate the four terms on the right-hand side of the above expression. We start
with the expression hXiα , Xiβ i. For distinct α, β ∈ [k], using our assumption kXiα k − r ≤ δ and
√
kXiα − Xiβ k − 2r ≤ δ from event Eao (i0 , . . . , ik ), we find
1
Xiα , Xiβ = kXiα k2 + kXiβ k2 − kXiα − Xiβ k2
2
1
≤ kXiα k2 − r 2 + kXiβ k2 − r 2 + kXiα − Xiβ k2 − 2r 2
2
1 √
≤ 2rδ + δ2 + 2rδ + δ2 + 2 2rδ + δ2 )
2
(57) < 4rδ.
where the last inequality relies on the coarse bound δ2 ≤ 0.1rδ which follows from the parameter
assumption (17). Similarly, when α = β ∈ [k], we have
and Be ∈ R(k+1)×(k+1) is a matrix whose entries are uniformly bounded above by 15rδ in absolute
value.
27
Notice that Q−1 = (e
qij )i,j∈[0,k] is simply the matrix such that for i, j ∈ [0, k],
1 if i = j,
qeij = −1 if i = 0 and j ≥ 1,
0 otherwise.
For example, when k = 3,
1 1 1 1 1 −1 −1 −1
0 1 0 0 0 1 0 0
Q= 0
and Q−1 = .
0 1 0 0 0 1 0
0 0 0 1 0 0 0 1
Thus, we have the following upper bound on the operator norm2 of Q−1 :
s X √ p
−1 2 =
kQ kop ≤ qeij 2k + 1 < 2(k + 1).
i,j∈[0,k]
Therefore,
r2
smin (r 2 QT Q) = r 2 · smin (Q)2 ≥.
2(k + 1)
Since we have the following bound on the operator norm of B:e
s X
e op ≤ e 2 ≤ (k + 1) · 15rδ = 15(k + 1) r 2 ≤ r2
kBk B ij ,
Cgap d2 4(k + 1)
i,j∈[0,k]
we conclude that
r2 r2 r2
smin (B) ≥ − = .
2(k + 1) 4(k + 1) 4(k + 1)
Remark 6.11. As a immediate consequence of Lemma 6.9 and 6.10, within the event Eortho we
have p r
smin (Y ) = smin (Y T Y ) ≥ √ .
2 k+1
Now we are ready to prove Lemma 6.8.
Proof of Lemma 6.8. By (55), we have
kPH ⊤ Y kop
i0
(58) max ku⊥ k ≤ .
u∈HY : kuk=1 smin (Y )
Let us first derive an upper bound for kPH ⊤ Y kop .
i0
Using kXiW ′ k = kXiW ′ − Xi0 k ≤ r + δ ≤ 2r < rM from (52) and the feasibility assumptions (18),
we find that XiW ′ is contained in the image of φ by (42). With the properties of φ (see (41) and
(43)) we obtain
kPHi0 XiW ′ k = kXi⊥W ′ k ≤ κkXi⊤W ′ k2 ≤ κkXiW ′ k2 ≤ 2−4 δ,
2Because the matrix Q is particularly nice, we can actually compute this operator norm explicitly. A more careful
analysis yields
√
s
−1 k + 2 + k2 + 4k
kQ kop = .
2
√
This shows that our upper bound is at most a multiplicative factor of 2 away from the actual value, which is good
enough for our purpose. Thus instead of showing tedious calculation details, we decided to simply use this bound.
28
where we once again use rM ≤ 0.01/κ from (18). Furthermore, the same estimate also holds for
Xiα , since kXiα k = kXiα − Xi0 k ≤ 2r. Hence, we conclude that
(59) kYα⊥ k = k(XiW ′ − Xiα )⊥ k ≤ kXi⊥W ′ k + kXi⊥α k ≤ 2−3 δ,
for any α ∈ [0, k].
Therefore, we conclude that PH ⊥ Y = (PH ⊥ Y0 , . . . PH ⊥ Yk ) is a matrix with columns whose Eu-
i0 i0 i0
clidean norms are bounded by 2−3 δ, and hence
s X √
kPH ⊤ Y kop ≤ kPH ⊥ Yα k2 ≤ 2−3 δ k + 1.
i0 i0
α∈[0,k]
r
Substitute the above this bound and smin (Y ) ≥ 2√k+1 from Remark 6.11 into (58), we conclude
that
δ(k + 1) δd 1
max ku⊥ k ≤ ≤ ≤ ,
u∈HY : kuk=1 4r 4r 4Cgap
where the last inequality follows from the parameter assumption (17).
Proof of Lemma 6.7. Recall from (52) that kXiW ′ − Xi0 k ≤ r + δ ≤ rM ≤ 0.01/κ. Hence, Corol-
lary A.5 gives
0.04 0.04
(60) d(Hi0 , HiW ′ ) ≤ 2κ · kq0 − pk ≤ 2κ · 2r ≤ 2
≤ .
Cgap d Cgap
Let He Y be a d-dimensional subspace containing HY and that is contained in the span of HY and
Hi0 ∩ HY⊥ . By doing this, we have
e Y , Hi ) ≤ 1
(61) d(H 0 .
4Cgap
Using the triangle inequality, (61), and (60), we obtain
6.2.2. Existence of p. The goal in Section 6.2.2 is to establish the following Proposition:
Proposition 6.12. With the occurrence of the event Eortho , for any iW ′ ∈ W ′ , there exists a point
L
p ∈ M satisfying kXiW ′ − pk ≤ 28 d ℓpp ε so that the following holds:
{j ∈ W : Xj ∈ B(p, ε)} ⊆ W ′ .
Before we proceed, let us set up additional notations. Here we fix iW ′ ∈ W ′ . For α ∈ [k], let
√
(62) σα := sign pα (p) − p( 2r) and σ0 := sign p0 (p) − p(r) .
Here, the function sign : R → {−1, +1} is defined as
(
+1 if x ≥ 0,
(63) sign(x) :=
−1 if x < 0,
for each x ∈ R. In particular, by our convention, sign outputs either −1 or +1, but never 0.
The key ingredient is the following lemma.
29
Lemma 6.13. With the occurrence of Eortho , for any iW ′ ∈ W ′ , there exists a point p ∈ M with
L
kp − XiW ′ k ≤ 28 d ℓpp ε such that
√ √
∀α ∈ [k], p( 2r + 0.95δ) + 2Lp ε ≤ pα (p) ≤ p( 2r − 0.95δ) − 2Lp ε,
and
p(r + 0.95δ) + 2Lp ε ≤ p0 (p) ≤ p(r − 0.95δ) − 2Lp ε.
Given the above lemma, let us derive the proof of Proposition 6.12.
Proof of Proposition 6.12. Suppose that j ∈ W satisfies kXj − pk < ε. Then by using the Lipschitz
constant Lp and the triangle inequality, we obtain, for each α ∈ [0, k],
X |p(kXj − Xi k) − p(kp − Xi k)|
|pα (Xj ) − pα (p)| ≤
|Vα |
i∈Vα
X Lp · kXj − Xi k − kp − Xi k
≤
|Vα |
i∈Vα
X Lp · kXj − pk
(64) ≤ < Lp ε.
|Vα |
i∈Vα
Next, Enavi and the parameter assumptions give for each α ∈ [0, k],
|N (j) ∩ Vα |
(65) − pα (Xj ) ≤ n−1/2+ς < Lp ε.
|Vα |
Combining (64), (65), and the properties of p from Lemma 6.13, we find that for each α ∈ [k],
√ |N (j) ∩ Vα | √
p( 2r + 0.95δ) ≤ ≤ p( 2r − 0.95δ),
|Vα |
and
|N (j) ∩ V0 |
p(r + 0.95δ) ≤ ≤ p(r − 0.95δ),
|V0 |
whence j ∈ W ′ .
Proof. Let H e Y and Hi ′ be the subspaces introduced in Lemma 6.7. With the occurrence the event
W
Eortho , we can apply the Lemma to get
d(He Y , Hi ′ ) ≤ 1 .
W
Cgap
Observe that XiW ′ = XiW ′ − Xi0 = Y0 ∈ HY ⊆ H e Y . By applying Proposition A.4 (with Xi ′ , HeY ,
W
PHe Y , and 1 − 1/Cgap playing the roles of p, H, PH , and ζ in the proposition, respectively), we find
that there exists a local inverse map
eY → M
φY : B XiW ′ , 0.09(1 − 1/Cgap )2 /κ ∩ H
such that PHe Y (φY (x)) = x, for every point x ∈ B XiW ′ , 0.09(1 − 1/Cgap )2 /κ ∩ He Y . Since
(6)
0.09(1 − 1/Cgap )2 /κ ≥ 0.01/κ ≥ rM ,
e Y . Furthermore, Proposition A.4(c) gives
we may restrict the domain of φY to B(XiW ′ , rM ) ∩ H
(66) kφY (x) − XiW ′ k ≤ 2kx − XiW ′ k,
eY .
for any x ∈ B(XiW ′ , rM ) ∩ H
30
Let ̟ ∈ HY be the vector satisfying
D Yα E Lp
(67) ∀α ∈ [0, k], σα , ̟ = 12 ε.
kYα k ℓp
L
Equivalently, ̟ ∈ HY is the vector so that (Y T ̟)α = σα 12 ℓpp εkYα k, for every α ∈ [0, k].
The existence (and uniqueness) of ̟ ∈ HY = Im(Y ) follows from Remark 6.11 that the matrix
Y has full rank. By Lemma 6.10 and a coarse bound that kYα k ≤ 2r for α ∈ [0, k] (see (52)),
r −1 r −1 √ Lp Lp
(68) k̟k ≤ √ kY T ̟k ≤ √ k + 1 · 12 ε · 2r ≤ 48d ε.
2 k+1 2 k+1 ℓp ℓp
We take p ∈ M to be the following point:
p := φY (XiW ′ + ̟),
and claim that p′ satisfies the desired property.
To begin with, by (66) and (68), p is not too far from XiW ′ :
Lp Lp
kp − XiW ′ k ≤ 2k̟k ≤ 96d ε < 28 d ε.
ℓp ℓp
L
Next, note that from (67), since 12 ℓpp ε > 0, we know that ̟ 6= ~0. Because φY is a diffeomorphism,
p−Xi ′
we then have that p 6= XiW ′ . Hence, it makes sense to consider u := kp−Xi ′ k .
W
Fix α ∈ [0, k].
W
Define D Yα E
τ := u, σα .
kYα k
Observe that
D p − (X Yα E D Yα E (67)
iW ′ + ̟) + ̟ ̟ 1 Lp
(69) τ = , σα = , σα = · 12 ε,
kp − XiW ′ k kYα k kp − XiW ′ k kYα k kp − XiW ′ k ℓp
e Y and p − (Xi ′ + ̟) ⊥ H
where the middle equality follows from the fact that Yα ∈ H e Y since
W
6.3. Finding a cluster in W ′ . It remains to prove main statement for this section.
Proof of Proposition 6.3. Let us check that (W, W ′ ) is a good pair. Lemma 6.5 shows that W ′ 6= ∅.
It is clear by construction that W ′ ⊆ W and |W | = n. Now for any i ∈ W ′ , by Lemma 6.13 and
Proposition 6.12, there exists p ∈ M with
Lp Lp
kXi − pk ≤ 28 d ε < Cgap d ε,
ℓp ℓp
such that
{j ∈ W : kXj − pk < ε} ⊆ W ′ .
This implies that (W, W ′ ) is a good pair. Therefore, by Lemma 4.4, (Wgc , igc ) is a cluster.
Further, since igc ∈ Wgc ⊂ W ′ , for α ∈ [k] we have
√ √
p( 2r + 0.95δ) ≤ |N (igc ) ∩ Vα |/|Vα | ≤ p( 2r − 0.95δ).
Within the event Enavi ({Vα }α∈[0,k] , W ) ⊇ Eortho ,
≤ kXi − Xiα k − kXinx − Xiα k + kXinx − Xiα k − kXi0 − Xiα k + kXi0 − Xiα k − r
34
Substituting the bound for kPH ⊥ Zkop into (74) we conclude
i0
r 2 (17) 2−3
d(HZ , Hi0 ) ≤ · ≤ .
24 Cgap r Cgap
(c) By Corollary A.5, we have
(76) d(Hi0 , Hi ) ≤ 2κkXi0 − Xi k ≤ 2κ kXi0 − Xinx k +kXinx − Xi k
| {z }
Edist (inx ,i0 )
(6) 0.01 (17),(18)
≤ 2κ(δ + 0.6δ) ≤ 2 (δ + 0.6δ) < 2−4 /Cgap .
rM
By the triangle inequality, we have
(76),(b)
d(HZ , Hi ) ≤ d(HZ , H0 ) + d(H0 , Hi ) ≤ 2−2 /Cgap .
r 2 r
kY T uk = kY T PHY uk = min kY T vk · kPHY uk = smin (Y )kPHY uk ≥ · ≥ ,
v∈ImY : kvk=1 2 3 3
where the third equality is a standard consequence of the singular value decomposition.
This implies that there exists α ∈ [d] such that
r
(80) |hYα , ui| ≥ √ .
3 d
From now on, we fix such α ∈ [d]. Our plan is to bound kXi − Xinx k from above by kpα (Xi ) −
pα (Xinx )k using Lemma 5.3, and bound the latter term by |N (i)∩V |Vα |
α|
− |N (i|V
nx )∩Vα |
α|
from the condi-
tions of the events Enavi ({Vα }α∈[0,d] , W ) and Enavi ({Vα }α∈[d] , {inx }). Once this is completed, we will
see this provides the desired upper bound for kXi − pk from the definition of W ♯ .
Let us set up the proper parameters and show that they satisfy the constraints from Lemma 5.3.
• First, we have
kYα k ≤ kXinx − Xi0 k + kXi0 − Xiα k ≤ 0.6δ + (r + δ) ≤ 2r,
and similarly, kYα k ≥ kXi0 − Xiα k− kXinx − Xi0 k ≥ r − δ − 0.6δ ≥ 0.9r. Hence, Xinx satisfies
the condition of q in Lemma 5.3.
• Second, let σ ∈ {−1, +1} be the sign of hYα , ui and define
Yα
τ := u, σ .
kYα k
We find that
1 (80) 1 r 1 16η
(81) τ= · |hYα , ui| ≥ · √ ≥ √ ≥ ,
kYα k kYα k 3 d 6 d r
from the parameter assumptions. Hence, τ also satisfies the assumption in Lemma 5.3.
36
• Lastly, let t := kXi − Xinx k so that Xinx + tu = Xi . By (78), we have
4r r
t < 4δ = 2
≤ √ ≤ 0.1τ r,
Cgap d 100 d
where the last step uses τ ≥ 101√d , which we know from (81). This shows that t satisfies
the condition in Lemma 5.3.
Now we can apply Lemma 5.3 to get
τ ℓp
(82) pα (Xi ) − pα (Xinx ) = pα(Xinx + tu) − pα (Xinx ) ≥ ℓp kXi − Xinx k ≥ √ kXi − Xinx k.
4 24 d
Within the events Enavi ({Vα }α∈[0,d] , W ) and Enavi ({Vα }α∈[d] , {inx }), we have
|N (i) ∩ Vα | |N (inx ) ∩ Vα |
(83) |pα (Xi ) − pα (Xinx )| ≤ − + 2n−1/2+ς
|Vα | |Vα |
25 25 ℓp
≤ c3 δ + 2n−1/2+ς ≤ c3 δ ≤ √ δ.
24 24 Cgap d
25
Combining (82) and (83), we conclude that kXi − Xinx k ≤ Cgap δ < 0.1δ.
(b) This follows immediately from part (a) and the triangle inequality:
kXi − Xi0 k ≤ kXi − Xinx k + kXinx − Xi0 k ≤ 0.1δ + 0.6δ = 0.7δ.
We have finished the proof.
7.3. (W, W ♯ ) is a good pair. We begin this subsection by using the second condition in the
definition of W ♯ to prove the following lemma, which gives a rough upper estimate for kXi0 −Xi k for
each i ∈ W ♯ . Later, we are going to obtain a better estimate for the same quantity in Lemma 7.4(b).
Lemma 7.6. Within the event Enext , we have |W ♯ | ≥ n1−ς .
Proof. Consider the set S := {i ∈ W : kXi − Xinx k < ε}. Within the event Enet (W ) (see (28)), we
know |S| ≥ n1−ς . We claim that S ⊆ W ♯ .
Take any i ∈ S. Let us check the two conditions for i to be in W ♯ . First, within the events
Enavi ({Vα }α∈[0,d] , W ) and Enavi ({Vα }α∈[d] , {inx }), for any α ∈ [d], we have
|N (i) ∩ Vα | |N (inx ) ∩ Vα |
− ≤ pα (Xi ) − pα (Xinx ) + 2n−1/2+ς
|Vα | |Vα |
X 1
≤ p(kXi − Xj k) − p(kXj − Xinx k) + 2n−1/2+ς
|Vα |
j∈Vα
≤ Lp ε + 2n−1/2+ς ≤ 3Lp ε ≤ c3 δ.
For the second condition,
|N (i) ∩ V0 | X 1
≥ p(kXj − Xi k) − n−1/2+ς
|V0 | |V0 |
j∈V0
X 1
≥ p(kXj − Xi0 k + kXi0 − Xinx k + kXinx − Xi k) − n−1/2+ς
|V0 |
j∈V0
X 1
≥ p(η + 0.6δ + ε) − n−1/2+ς
|V0 |
j∈V0
(16),(17) (12),(16)
≥ p(0.8δ) − n−1/2+ς > p(2δ).
37
Hence, i ∈ W ♯ . The proof is complete.
Lemma 7.7. Given the occurrence of the event Enext . For any i ∈ W ♯ and for any α ∈ [d], we
have
|pα (Xi ) − pα (Xinx )| ≤ c3 δ + 2Lp ε.
Proof. By the triangle inequality and the parameter assumptions,
|pα (Xi ) − pα (Xinx )|
|N (i) ∩ Vα | |N (i) ∩ Vα | |N (inx ) ∩ Vα | |N (inx ) ∩ Vα |
≤ pα (Xi ) − + − + − pα (Xinx )
|Vα | |Vα | |Vα | |Vα |
| {z } | {z }
Enavi ({(Vα ,iα )}α∈[0,d] ,W ) Enavi ({(Vα ,iα )}α∈[0,d] ,{inx })
1 1 (12)
≤ n− 2 +ς + c3 δ + n− 2 +ς ≤ c3 δ + 2Lp ε,
as desired.
The main technical result is the following.
Lemma 7.8. For any i♯ ∈ W ♯ , there exists a point p ∈ M so that the following holds:
√ L
(a) kXi♯ − pk ≤ 28 d ℓpp ε;
(b) For every α ∈ [d], |pα (p) − pα(Xinx )| ≤ c3 δ − 3Lp ε;
(c) kp − Xinx k ≤ δ.
Proof. Let us fix i♯ ∈ W ♯ . Without lose of generality, we make a shift of RN and assume
Xi♯ = ~0.
Step 1. Find p through construction.
By Lemma 7.4(a), we have kXi♯ − Xinx k ≤ 0.1δ. Hence, we can apply Lemma 7.3 with i = i♯ .
Let Z = (Z1 , . . . , Zd ) with Zα = Xi♯ − Xiα for α ∈ [d] and HZ = span({Zα }α∈[d] ). By Lemma 7.3,
we have
(84) d(HZ , Hi♯ ) < 2−2 /Cgap .
Let PHZ : RN → HZ be the orthogonal projection to HZ . From (84), we can apply Proposi-
1
tion A.4 with ζ = 1 − d(HZ , Hi♯ ) > 1 − Cgap > 0.99 to obtain a map
(85) φHZ : B(Xi♯ , rM ) ∩ HZ → M
such that PHZ (φHZ (x)) = x and
(86) kφHZ (x) − Xi♯ k ≤ 2kx − Xi♯ k,
for any x ∈ B(Xi♯ , rM ) ∩ HZ .
For each α ∈ [d], let σα := sign(pα (Xi♯ ) − pα (Xinx )). (Recall the definition of sign from (63).)
Now using the same argument as the proof of Lemma 6.13, since the matrix Z has full rank
(from Lemma 7.3 (a)), there exists a unique ̟ ∈ HZ such that
D Zα E Lp
(87) σα , ̟ = 25 ε,
kZα k ℓp
for every α ∈ [d]. We attempt to set the point p described in the statement of the lemma to be
(88) p = φHZ (Xi♯ + ̟).
To achieve that, we need to show Xi♯ + ̟ is in the domain of φHZ , which is equivalent to show
k̟k < rM (See (85)).
38
r
First, relying on smin (Z) ≥ from Lemma 7.3 (a),
2
s 6 L sX
kZ T ̟k 2 X 2
(87) 2 p
k̟k ≤ = hZα , ̟i = · ε kZα k2 .
smin (Z) r r ℓp
α∈[d] α∈[d]
we find that
√ Lp
(89) k̟k ≤ 27 d ε < 0.1δ < rM ,
ℓp
which shows that Xi♯ + ̟ is in the domain of φHZ . Hence, the definition of p in (85) is valid.
Step 2: Demonstrating that p possesses the required properties.
First, property (a) follows from (86):
(89) √ Lp
(90) kp − Xi♯ k = kφHZ (Xi♯ + ̟) − Xi♯ k ≤ 2k̟k ≤ 28 d ε.
ℓp
Next, let us estimate pα (p) − pα (Xinx ) for α ∈ [d]. Once again, our tool is Lemma 5.3, and so
we have to set up some parameters. From the definition of ̟, we know that ̟ 6= ~0, and therefore
p = φHZ (Xi♯ + ̟) 6= Xi♯ . Hence, it makes sense to define
p − Xi♯ N −1 Zα
u := ∈S and set τ := u, σα .
kp − Xi♯ k kZα k
We would like to apply Lemma 5.3 with p, σα , τ , u, and kp − Xi♯ k playing the roles of q, σ, τ ,
u, and t in the lemma, respectively. Let us check the following items.
(i) 0.9r ≤ kZα k ≤ 2r,
(ii) 16η/r ≤ τ , and
(iii) kp − Xi♯ k < 0.1τ r.
Item (i) follows from the triangle inequality and Lemma 7.4(b):
kZα k − r ≤ kZα k − kXi0 − Xiα k + kXi0 − Xiα k − r
| {z }
Eao (i0 ,...,id )
≤ kXi♯ − Xinx k + δ ≤ 1.7δ < 0.1r.
For Item (ii), we have
D p−X♯ Zα E D ̟ Zα E (87) 1 Lp (90) 1 η
i
τ= , σα = , σα = · 25 ε ≥ √ ≥ 16 .
kp − Xi♯ k kZα k kp − Xi♯ k kZα k kp − wk ℓp 8 d r
Finally, for Item (iii), we have
kp − Xi♯ k kp − Xi♯ k2 (90) 11 Lp (16),(15) 11 1 (17) 211
= 5 ≤ 2 d ε ≤ 2 d η ≤ 1.5 3 r < 0.1r.
τ 2 (Lp /ℓp )ε ℓp Cgap d Cgap
Hence, Lemma 5.3 is applicable. For the moment let us assume σα = +1. In this case, pα (Xi♯ ) −
pα (Xinx ) ≥ 0. Lemma 5.3(b) gives
Lp 1 Lp
(91) pα (Xi♯ ) − 2Lp · 25 ε ≤ pα (p) ≤ pα (Xi♯ ) − ℓp · 25 ε,
ℓp 4 ℓp
39
L
where we applied τ kp − Xi♯ k = 25 ℓpp ε. we have
2 2
6 Lp 6 Lp
(91)
(92) pα (p) ≥ pα (Xi♯ ) − 2 ε ≥ pα (Xinx ) − 2 ε.
ℓp ℓp
Note that Lemma 7.7 implies that
(93) pα (Xi♯ ) ≤ pα (Xinx ) + c3 δ + 2Lp ε.
Thus,
(91) (93)
(94) pα (p) ≤ pα (Xi♯ ) − 8Lp ε ≤ pα(Xinx ) + c3 δ − 6Lp ε.
Considering (92) and (94), we conclude that
n L2p o
|pα (p) − pα (Xinx )| ≤ max c3 δ − 6Lp ε, 26 ε < c3 δ − 3Lp ε.
ℓp
The proof for the case when σα = −1 is analogous to the argument above, where we applied
Lemma 5.3(c) instead. We omit the proof here.
Finally, it remains to show Property (c): kp − Xinx k ≤ δ. This last piece follows from the triangle
inequality:
(90) √ Lp
kp − Xinx k ≤ kp − wk + kw − Xinx k ≤ 28 d ε + 0.7δ < δ.
ℓp
We have finished the proof.
Proof of Proposition 7.2. Let (Wgc , igc ) be the output of Algorithm 1 with input (W, W ♯ ).
Step 1. (Wgc , igc ) is a cluster.
From Lemma 4.4, considering Enext ⊆ Ecn (W ) ∩ Enet (W ), it suffices to show that (W, W ♯ ) is a
good pair. We have W ♯ ⊆ W and |W | = n by construction, and we know W ♯ 6= ∅ from Lemma 7.6.
Take an arbitrary point i♯ ∈ W ♯ . Let p ∈ M be a point obtained from Lemma 7.8. We claim
that this point p may play the role of p in Definition 4.3 (when i♯ plays the role of i).
Note that
√ Lp Lp
kXi♯ − pk ≤ 28 d ε < Cgap d ε,
ℓp ℓp
so we only need to establish the inclusion {j ∈ W : kp − Xj k < ε} ⊆ W ♯ . To that end, take any
j ∈ W with kp − Xj k < ε. By the triangle inequality, for each α ∈ [d], we have
|N (j) ∩ Vα | |N (i0 ) ∩ Vα | |N (j) ∩ Vα |
− ≤ − pα (Xj ) + |pα (Xj ) − pα (p)|
|Vα | |Vα | |Vα |
|N (i0 ) ∩ Vα |
+ |pα (p) − pα (p)| + pα (p) −
|Vα |
≤ n−1/2+ς + Lp ε + c3 δ − 3Lp ε + n−1/2+ς
≤ c3 δ.
Next, note that for each j ′ ∈ V0 , we have
kXj − Xj ′ k ≤ kXj − pk + kp − Xi0 k + kXi0 − Xj ′ k < ε + δ + η < 1.1δ.
Therefore,
|N (j) ∩ V0 | X p(kXj − Xj ′ k)
≥ − n−1/2+ς ≥ p(1.1δ) − n−1/2+ς > p(2δ).
|V0 | ′
|V0 |
j ∈V0
7.4. Building a Nearby Cluster. Having done our analysis in the previous and current sections,
we now present an algorithm to produce a cluster. The overview of the algorithm is as follows.
Suppose we have an index w ∈ V and a cluster (V, i) such that 0.4δ ≤ kXw − Xi k ≤ 0.6δ. The
algorithm gives a new cluster with corresponding points in M near Xw .
Algorithm 2: BuildNearbyCluster
Input : inx ∈ V.
(V, i), where i ∈ V ⊆ V with |V | ≥ n1−ς .
W1 , W2 , . . . , Wd , Wd+1 ⊆ V.
Output: U ⊆ Wd+1 and
u ∈ U.
(V0 , i0 ) ← (V, i);
k ← 1;
while k ≤ d do
Wk′ ← the set of all vertices i ∈ Wk such that
√ |N (i) ∩ Vα | √
p( 2r + 0.95δ) ≤ ≤ p( 2r − 0.95δ),
|Vα |
for every α ∈ [k − 1], and such that
|N (i) ∩ V0 |
p(r + 0.95δ) ≤ ≤ p(r − 0.95δ);
|V0 |
(Vk , ik ) ← GenerateCluster(W, W ′ );
k ← k + 1;
end
W ♯ ← the set of all vertices i ∈ Wd+1 such that
|N (inx ) ∩ Vα | |N (i0 ) ∩ Vα |
− ≤ c3 δ,
|Vα | |Vα |
for every α ∈ [d], and such that
|N (i) ∩ V0 |
≥ p(2δ);
|V0 |
(U, u) ← GenerateCluster(Wd+1 , W ♯ );
return (U, u)
/ W ♭.
which implies i ∈
For each α ∈ [ℓ], we define
|N (i) ∩ Uα |
Wα♮ := i ∈ W : ≥ p(0.55δ) .
|Uα |
Lemma 8.3. Condition on EnetCheck . For any α ∈ [ℓ], if i ∈ Wα♮ , then kXi − Xuα k < 0.6δ.
Proof. This proof is similar to the proof of Lemma 8.2, so we omit the details.
The last piece to prove Proposition 8.1 is the following lemma.
Lemma 8.4. Condition on EnetCheck . If
[
Wα♮ ∩ W ♭ = ∅,
α∈[ℓ]
then there is no point q ∈ M such that for every α ∈ [ℓ], kq − Xuα k ≥ δ. In other words, {Xuα }α∈[ℓ]
is a δ-net.
Proof. Suppose, for the sake of contradiction, that there exists q ∈ M such that for every α ∈ [ℓ],
kq − Xuα k ≥ δ. Since M is connected, there exists a path γ : [0, 1] → M from q to Xi1 .
The function f : [0, 1] → R given by
f (t) := min kγ(t) − Xuα k
α∈[ℓ]
is continuous with f (0) ≥ δ and f (1) = 0. Let t0 ∈ [0, 1] be the smallest number for which
f (t0 ) = 0.5δ and let q ′ := γ(t0 ).
Now we have q ′ such that kq ′ − Xuα k ≥ 0.5δ for every α ∈ [ℓ]. Within the event Enet (W ), there
exists i ∈ W such that kXi − q ′ k < ε. By the triangle inequality, for any j ∈ Uα , we have
kXi − Xj k ≥ kq ′ − Xiα k − kq ′ − Xi k − kXj − Xiα k ≥ 0.5δ − ε − η > 0.48δ.
Therefore, within Enavi {Uα }α∈[ℓ] , W , we have
|N (i) ∩ Uα | X p(kXi − Xj k)
≤ + n−1/2+ς < p(0.48δ) + n−1/2+ς < p(0.45δ).
|Uα | |Uα |
j∈Uα
Hence, i ∈ W ♭ .
43
Also, since f (t0 ) = 0.5δ, there exists a ∈ [ℓ] such that kq ′ − Xia k = 0.5δ. By a similar triangle
inequality argument as above,
we find that for any j ∈ Ua , we have kXi − Xj k < 0.52δ. Therefore,
within Enavi {Uα }α∈[ℓ] , W , we have
|N (i) ∩ Ua | X p(kXi − Xj k)
≥ − n−1/2+ς ≥ p(0.52δ) − n−1/2+ς > p(0.55δ),
|Ua | |Ua |
j∈Ua
then by Lemma 8.4, we know that {Xuα }α∈[ℓ] is a δ-net. Therefore, the proposition follows.
ℓ ← ℓ + 1;
(iℓnx , (V0ℓ , iℓ0 )) ← NetCheck({(Us , us )}s∈[ℓ−1] , W0ℓ );
end
return {(Us , us )}s∈[ℓ]
Associated with this algorithm, we define the following events: For each integer ℓ ≥ 2,
ℓ
EnetCheck {(Us , us )}s∈[ℓ−1] , W0 k = 0,
ℓ ℓ ℓ ℓ
Ek := Eortho {(Vα , iα )}α∈[0,k−1] , Wk k ∈ [d],
ℓ ℓ ℓ ℓ
Enext {(Vα , iα )}α∈[0,d] , inx , Wd+1 k = d + 1.
(See (96), (38), and (70) for the definitions of these events.)
For these events, there is an ambiguity due to {(Us , us )}s∈[ℓ−1] or {(Vαℓ , iℓα )}α∈[k−1] might be null.
In that case, we simply treat it as the random sets do not satisfy the property described by the
corresponding event.
The events Ekℓ are defined to be the events that the algorithm buildNet is successful at the (ℓ, k)th
step. For example, given E0ℓ , by Proposition 8.1, we know we apply the NetCheck algorithm with
input {(Us , us )}s∈[ℓ−1] , W0ℓ either return ((V0 , i0 ), inx ) with the properties described in Proposition
8.1, or return null, which also guarantees {Xus }s∈[ℓ] is a δ-net.
Further, for ease of notation, we also define Ek1 to be the trivial event for k ∈ [d], and
1 1 1
Ed+1 := Ecn (Wd+1 ) ∩ Enet (Wd+1 ),
which is the event that guarantee the GenerateCluster(Wd+1 1 , W 1 ) returns a cluster (U , u ), by
d+1 1 1
Proposition 4.5.
Further, for ℓ ≥ 2,
n o
ℓ
Ehalt := NetCheck({(Us , us )}s∈[ℓ−1] , W0ℓ ) = null .
Then, we define the union of such events according to the order: For ℓ ≥ 2 and k ∈ [d + 1],
we define
\ ′
Ωℓk = Ekℓ′ .
(ℓ′ ,k ′ )(ℓ,k)
45
Notice that we also have the relation that, for ℓ ≥ 2,
Ωℓ1 ⊆ Ωℓ0 ∩ (Ωℓhalt )c ,
due to Ωℓ1 requires (V0ℓ , iℓ0 ) is non-null.
Let us rephrase the Theorem ?? here for convenience and in terms of the parameters introduced
in Section 2.3 and the algorithm (4).
Theorem 9.1. [Rephrase of Theorem ??] Let (M, µ, p) be the manifold, probability measure, and
distance-probability-function satisfying Assumption 2.3 and 2.4. Suppose n is sufficiently large so
that the parameters are feasiable. Then, with probability at least 1 − n−ω(1) , applying (4) with input
G(V, M, µ, p) returns a (δ, η)-cluster-net {(Us , us )}s∈[ℓ] with us ∈ Us ⊆ V and ℓ ≤ n3ς/4 in the
sense that
(1) ∀s ∈ [ℓ], (Us , us ) is a cluster.
(2) {Xus }s∈[ℓ] is a δ-net of M ;
Let
(97) ΩnetFound
denote the event that the output of Algorithm buildNet is a (δ, η)-cluster-net with ℓ ≤ n3ς/4 .
Observe that, by Proposition 8.1, we have
[
ΩnetFound = Ωℓ0 ∩ Ehalt
ℓ
.
2≤ℓ≤n3ς/4
where
|NW (i) ∩ NW (j)| 1
O{i,j} := − K(Xi , Xj ) > n− 2 +ς .
n
W
Note that for any {i, j} ∈ 2 , we have
|NW (i) ∩ NW (j)| |NW \{i,j} (i) ∩ NW \{i,j} (j)| 1 1
− = − · |NW (i) ∩ NW (j)|
n n−2 n−2 n
2 2
= · |NW (i) ∩ NW (j)| ≤ .
n(n − 2) n
Therefore, by the triangle inequality, the event O{i,j} is a subevent of the event
′ |NW \{i,j} (i) ∩ NW \{i,j} (j)| − 21 +ς 2
O{i,j} := − K(Xi , Xj ) ≥ n − .
n−2 n
Now, let us fix a pair {i, j} ∈ W 2 . For each k ∈ W \ {i, j}, let Zk be the indicator that
k ∈ NW (i) ∩ NW (j). From (98), we have
Zk = 1 Ui,k ≤ p(kXi − Xk k) 1 Uj,k ≤ p(kXj − Xk k) .
46
According to the above expression and {Xi }i∈V ∪ {Ui,j }i,j∈V are jointly independent, we know
that conditioning on Xi = xi and Xj = xj , {Zk }k∈W \{i,j} are i.i.d. Bernoulli random variables with
E Zk Xi = xi , Xj = xj = EXk p(kXk − xi k)p(kXk − xj k) = K(xi , xj ).
Now we apply Hoeffding’s inequality,
′
P E{i,j} Xi = xi , Xj = xj
1
X 2
=P Zk − (n − 2)K(Xi , Xj ) ≥ (n − 2) n− 2 +ς − Xi = xi , Xj = xj
n
k∈W \{i,j}
!
− 21 +ς 2 2
≤2 · exp −2(n − 2) n − = n−ω(1) .
n
By Fubini’s Theorem,
′
P{Oi,j } ≤ P{Oi,j } = n−ω(1) .
Now we use the union bound:
X
P{(Ecn (W ))c } ≤ P{O{i,j} } ≤ n2 · n−ω(1) = n−ω(1) .
{i,j}∈(W
2)
Lemma 9.3. For any W ⊆ V with |W | = n, we have
(99) P (Enet (W ))c ≤ exp(−n1/2 ).
Proof. Consider any subset M ⊆ M such that every pair of points p, q ∈ M satisfies kp − qk ≥ 3ε .
Since the collection of open balls {B(p, ε/6)}p∈M are pairwise disjoint,
[ X
1 = µ(M ) ≥ µ B(p, ε/6) = µ B(p, ε/6) ≥ |M|µmin (ε/6)
p∈M p∈M
1 (12) nς
⇒ |M| ≤ ≤ .
µmin (ε/6) 2
Now, we start with an empty set M and keep adding point from M into M with the restriction
that any pair of points within M has distance at least 3ε until it is not possible to proceed. This
process must terminate at finite step due to the above bound. The resulting set M is an 3ε -net.
That is, every point q in M is within distance 3ε to some point p ∈ M. From now on, we fix the
set M.
For each q ∈ M, by Hoeffding’s inequality,
n no
P {i ∈ W : Xi ∈ B(q, 2ε/3)} ≤ µmin (2ε/3) ·
n 2 o
≤ P {i ∈ W : Xi ∈ B(q, 2ε/3)} ≤ E {i ∈ W : Xi ∈ B(q, 2ε/3)} − n1−ς
n2(1−ς)
≤ exp − 2 = exp(−2n1−2ς ).
n
Taking a union bound, we have
n n o nς
(100) P ∃q ∈ M {i ∈ W : Xi ∈ B(q, 2ε/3)} ≤ µmin (2ε/3) · ≤ exp(−2n1−2ς )
2 2
≤ exp(−n1/2 ).
47
Within the complement of the event Enet (W ), there exists p ∈ M such that
n
{i ∈ W : Xi ∈ B(p, ε)} < µmin (2ε/3) · .
2
Let q ∈ M be a point such that kp − qk < ε/3. Then,
n
{i ∈ W : Xi ∈ B(q, 2ε/3)} ≤ {i ∈ W : Xi ∈ B(p, ε)} < µmin (2ε/3) · .
2
This shows that the event in the estimate (100) contains the event (Enet (W ))c , and hence
P{(Enet (W ))c } ≤ exp(−n1/2 ).
Lemma 9.4. Let V, W be a pair of disjoint subset of V. For a positive integer k, suppose
{(Vα , iα )}[0,k] are k + 1 random pairs with iα ∈ Vα ⊆ V , which are functions of XV ∪ UV . For
any realization (XV , UV ) = (xV , UV ) with (xV , UV ) ∈ Eclu ({(Vα , iα )}α∈[0,k] ) and any realization of
XW = xw ,
n o
P (Enavi ({Vα }α∈[0,k] , W ))c XV = xV , UV = UV , XW = xw = n−ω(1) .
In particular, the above inequality holds without conditioning on XW = xw .
Proof. We claim that
P (Enavi ({Vα }α∈[0,k] , W ))c XV = xV , UV = UV ≤ 2d · n · exp(−2nς ) .
Observe that the event (Enavi ({Vα }α∈[0,k] , W ))c can be expressed as
k
[ [
(Enavi ({Vα }α∈[0,k] , W ))c = Oi,α ,
i∈W α=0
where
|N (i) ∩ V | X p(kX − X k)
α i j
Oi,α := − > n−1/2+ς .
|Vα | |Vα |
j∈Vα
Let us fix a pair (i, α) with i ∈ W and α ∈ [0, k]. For each j ∈ V , let Zj be the indicator of the
edge {i, j}, which can be expressed as
Zj := 1(Ui,j ≤ p(kXi − Xj k)).
With this notation, we have X
|N (i) ∩ Vα | = Zj .
j∈Vα
On the other hand, conditioning on XV = xV , UV = UV , and XW = xw , we have {Zj }j∈V are
independent Bernoulli random variables with EZj = p(kxi − xj k) for each j ∈ V . Thus, we can
apply Hoeffding’s inequality, to obtain
P Oi,α XV = xV , UV = UV , XW = xw ≤ 2 exp(−2nς ),
where we rely on the condition |Vα | ≥ n1−ς from the event Eclu ({(Vα , iα )}α∈[0,k] ).
Hence, by the union bound with |W | ≤ |V| ≤ n2 , we obtain
c
P{Enavi | XV = xV , UV = UV , XW = xw ≤ n2 · (k + 1) · 2 exp(−2nς ) ≤ 2d · n2 · exp(−2nς ) = n−ω(1) .
The second statement of the lemma simply follows by taking expectation with respect to XW on
both sides of the above inequality.
48
9.2. Probability Estimates for the Events Ekℓ .
=Eao (iℓ0 , . . . , iℓk ) ∩ Eclu ({(Vαℓ , iℓα )}α∈[0,k] ) ∩ Enavi ({Vαℓ }α∈[0,k] , Wk+1
ℓ ℓ
) ∩ Ecn (Wk+1 ℓ
) ∩ Enet (Wk+1 ).
By Proposition 6.3, we know that
Ωℓk ⊆ Ekℓ ⊆ Eao (iℓ0 , . . . , iℓk ) ∩ Eclu (Vkℓ , iℓk ),
which in turn implies
(101) Ωℓk ⊆ Eao (iℓ0 , . . . , iℓk ) ∩ Eclu ({(Vαℓ , iℓα )}α∈[0,k] ),
Next, let us fix any realization ykℓ ∈ Ωℓk . Given ykℓ ∈ Eclu ({(Vαℓ , iℓα )}α∈[0,k] ), we can apply Lemma
9.4 to get
P (Enavi ({Vαℓ }α∈[0,k] , Wk+1
ℓ
))c Ykℓ = ykℓ ≤ n−ω(1) .
Ykℓ , XW ℓ , UW ℓ ,W ℓ := (Ui,j )i∈W ℓ ,j∈W ℓ
k+1 k k+1 k k+1
Proof. We will consider the case ℓ ≥ 2 only. The proof of the case ℓ = 1 is similar but simpler.
Recall the precise definition of E0ℓ+1 (see (96)):
E0ℓ+1 :=EnetCheck {(Us , us )}s∈[ℓ] , W0ℓ+1
=Eclu {(Us , us )}s∈[ℓ] ∩ Erps ({is }s∈[ℓ] ) ∩ Enavi {Us }s∈[ℓ] , W0ℓ+1 ∩ Enet (W0ℓ+1 ).
Using the union bound, we get
n c ℓ o c ℓ
(102) P{(E0ℓ+1 )c | Ωℓd } ≤P Eclu {(Us , us )}s∈[ℓ] Ωd + P Erps ({is }s∈[ℓ] ) Ωd
n c ℓ o c ℓ
+ P Enavi {Us }s∈[ℓ] , W0ℓ+1 Ωd + P Enet (W0ℓ+1 ) Ωd .
Recall that
∅ 6= Ωℓd ⊆ E0ℓ ⊆ Eclu {(Us , us )}s∈[ℓ] ∩ Erps ({is }s∈[ℓ] ),
where from (95) we have
[ℓ]
(103) Erps ({iα }α∈[ℓ] ) := ∀{α, β} ∈ , kXiα − Xiβ k ≥ 0.3δ .
2
We can replace the first two summands on the R.H.S. of (102) to get
c ℓ
P{(E0ℓ+1 )c | Ωℓd } ≤P Eclu Uℓ , uℓ Ωd + P ∃s ∈ [ℓ], kXis − Xiℓ k < 0.3δ Ωℓd
n c ℓ o c ℓ
+ P Enavi {Us }s∈[ℓ] , W0ℓ+1 Ωd + P Enet (W0ℓ+1 ) Ωd .
50
It remains to show each of the summand above is n−ω(1) . Next, we apply Proposition 7.2 to get
(104) Ωℓd+1 ⊆ Ed+1
ℓ
⊆ Eclu (Uℓ , uℓ ) ∩ {kXuℓ − Xiℓnx k ≤ 0.1δ}.
An immediate consequence is that the first summand in (103) is 0:
c ℓ
P Eclu Uℓ , uℓ Ωd = 0.
Given that Ωℓd+1 6= ∅, within the event Ωℓd+1 , we have
(iℓnx , V0ℓ , iℓ0 )
is a valid output of NetCheck({(Us , us )}s∈[ℓ−1] , W0ℓ ). Then, from Proposition 8.1 we have we have
∀s ∈ [ℓ − 1], kXiℓnx − Xus k ≥ 0.4δ.
Using the above estimate, for each s ∈ [ℓ − 1], we can apply the triangle inequality to get
(104)
kXuℓ − Xus k ≥ kXiℓnx − Xus k − kXuℓ − Xiℓnx k ≥ 0.4δ − 0.1δ ≥ 0.3δ,
which in turn implies that the second summand in (103) is also 0:
P ∃s ∈ [ℓ], kXis − Xiℓ k < 0.3δ Ωℓd = 0.
Finally, the estimate
n c ℓ o c ℓ
P Enavi {Us }s∈[ℓ] , W0ℓ+1 Ωd + P Enet (W0ℓ+1 ) Ωd = n−ω(1)
follows from the same argument as shown in the proof of Lemma 9.5. We will omit the details here.
Therefore, the lemma follows.
Lemma 9.7. For ℓ ≥ 2, if P{Ωℓ0 and NetCheck({(Us , us )}s∈[ℓ−1] , W0ℓ ) 6= null} > 0, then
n o
P (E1ℓ+1 )c Ωℓ0 and NetCheck({(Us , us )}s∈[ℓ−1] , W0ℓ ) 6= null = n−ω(1) .
Proof. Within the events Ωℓ0 and NetCheck({(Us , us )}s∈[ℓ−1] , W0ℓ ) 6= null,
(iℓnx , V0ℓ , iℓ0 )
is a valid output of NetCheck({(Us , us )}s∈[ℓ−1] , W0ℓ ).
Recall that
E1ℓ = Eortho (Vαℓ , iℓα ), W1ℓ = Eao (iℓ0 ) ∩Eclu ((V0 , i0 )) ∩ Enavi (V0 , W1ℓ ) ∩ Ecn (W1ℓ ) ∩ Enet (W1ℓ ).
| {z }
trivial event
Since (V0ℓ , iℓ0 ) ∈ {(Us , us )}s∈[ℓ−1] , we automatically have
n o
Ωℓ0 ∩ NetCheck({(Us , us )}s∈[ℓ−1] , W0ℓ ) 6= ∅ ⊆ Eclu (V0ℓ , iℓ0 ).
Hence,
n o
P (E1ℓ+1 )c Ωℓ0 and NetCheck({(Us , us )}s∈[ℓ−1] , W0ℓ ) 6= ∅
n c c c ℓ o
=P Enavi (V0 , W1ℓ ) ∪ Ecn (W1ℓ ) ∩ Enet (W1ℓ ) Ω0 and NetCheck({(Us , us )}s∈[ℓ−1] , W0ℓ ) 6= ∅
| {z }
event of Y0ℓ
n c o
≤P Enavi (V0 , W1ℓ ) Ωℓ0 and NetCheck({(Us , us )}s∈[ℓ−1] , W0ℓ ) 6= ∅ + P{(Ecn (W1ℓ ))c } + P{(Enet (W1ℓ ))c }
n c o
≤P Enavi (V0 , W1ℓ ) Ωℓ0 and NetCheck({(Us , us )}s∈[ℓ−1] , W0ℓ ) 6= ∅ + n−ω(1) ,
where in the last inequality we applied Lemma 9.2 and Lemma 9.3.
51
Finally, the argument to show
n c o
P Enavi (V0 , W1ℓ ) Ωℓ0 and NetCheck({(Us , us )}s∈[ℓ−1] , W0ℓ ) 6= ∅ = n−ω(1)
is precisely the same as shown in the proof of Lemma 9.5. We will omit the details here. The
lemma follows.
9.3. Proof of Theorem 1.4.
Proof. For simplicity, let us denote the event described by the theorem as ΩnetFound . Further, for
ℓ ≥ 2, n o
ℓ
Ehalt := NetCheck({(Us , us )}s∈[ℓ−1] , W0ℓ ) = null .
Given that W0ℓ does not exist for ℓ > ⌈nς ⌉,
ℓ
Ehalt =∅ ∀ℓ > ⌈nς ⌉.
Indeed, we claim that
3
Ωℓ+1
0 = ∅ ∀ℓ ≥ n 4 ς .
(The +1 plays no substantial role here. It is just convenient to have it for proof later.) Let us prove
3
this claim: Fix ℓ ≥ n 4 ς . Let us assume Ωℓ+1
0 6= ∅. First of all, it implies that Ωℓ0 is also non-empty.
Condition on
Ωℓ0 ⊆ E0ℓ ⊆ Erps (u0 , u1 , . . . , uℓ ).
We can apply a standard volumetric argument to show that
[ X
1 = µ(M ) ≥ µ B(Xus , 0.15δ) ≥ µ(B(Xus , 0.15δ)) ≥ µmin (0.15δ) · ℓ ≥ cµ (0.15δ)d ℓ.
s∈[ℓ] s∈[ℓ]
Next, as a corollary of Lemma 9.5, Lemma 9.7, and Lemma 9.6, we have that, for ℓ ≥ 2,
n c o
P (Ωℓ+1
0 )c
Ω ℓ
0 ∩ (E ℓ
halt = n−ω(1)
c
⇒ P Ωℓ0 ∩ Ehalt
ℓ
≤ P Ωℓ+1
0 + n−ω(1) ,
which again due to Ωℓ+1
0
ℓ )c . Let us formualte this inequality into an inductive form:
⊆ Ωℓ0 ∩ (Ehalt
ℓ c
(105) ℓ
P Ω0 ∩ Ehalt ≤P Ωℓ+1
0 + n−ω(1)
ℓ+1 ℓ+1 c
≤P Ω0 ∩ (Ehalt ) + P Ωℓ+10
ℓ+1
∩ Ehalt + n−ω(1) .
52
Now we apply the above inequality recursively, we have
1 =P{Ω20 } + P{(Ω10 )c }
=P{Ω20 ∩ Ehalt
2
} + P{Ω20 ∩ Ehalt
c
} + n−ω(1)
(106) ≤P{Ω20 ∩ Ehalt
2
} + P{Ω30 } + 2n−ω(1)
X
≤ P{Ωs0 ∩ Ehalt
s
} + P{Ωℓ+1
0 }+n
3ς/4 −ω(1)
n = P{ΩnetFound } + 0 + n−ω(1) .
ℓ∈[2,s]
10.1. Partition of V and distance estimate. To avoid the dependences of the random sets
{(Us , us )}s∈[ℓ] , which are the output of buildNet, we double the size of our vertex set from (11):
|V| = 2n · (d + 2) · ⌈nς ⌉,
and partition it into two disjoint subsets of size n · (d + 2) · ⌈nς ⌉:
V = V1 ⊔ V2 .
Such modification does not affect the statement of the theorems, as discussed for the adjustment
appeared in (11) previously. A second modification is the following: We apply Algorithm buildNet
with input GV1 , instead of GV . Let ΩnetFound be the event stated in Theorem 9.1. Condition
on ΩnetFound , the algorithm returns a (δ, η)-cluster-net {(Us , us )}s∈[ℓ] with us ∈ Us ⊆ V1 with
ℓ ≤ n3ς/4 .
The discussion of this section is based on the above description and condition on
ΩnetFound .
Next, we partition V2 into (d + 2) · ⌈nς ⌉ subsets of size n, indexed from 0 to (d + 2) · ⌈nς ⌉ − 1:
[
V2 = Vs .
s∈[0,(d+2)·⌈nς ⌉−1]
For each Vs with s ∈ [ℓ], we want to extract Us′ ⊆ Vs so that (Us′ , us ) is a 3η-cluster. (See
Definition 4.2). This is possible if Enavi (Us , Vs ) holds.
Let us denote P
p(kXu − xk)
ps (x) := u∈Us ,
|Us |
and define
′ |N (v) ∩ Us |
Us = v ∈ Vs ≥ p(1.5η) .
|Us |
53
Let
\
(107) Ω′netfound = ΩnetFound ∩ Enavi (Us , Vs ) ∩ Enet (Vs ) .
s∈[ℓ]
It is important to note that applying the algorithm buildNet to GV1 only reveals (XV1 , UV1 ).
Moreover, for each s ∈ [ℓ], to extract the set Us′ , the additional information we need to reveal are
UUs ,Vs and XVs . In fact, for each s ∈ [ℓ], the edges UUs′ ,Vrest remain hidden.
Our next step involves estimating the distance from each vertex to the cluster and the distance
between clusters, which relies on part of the remaining unrevealed edges. The primary goal in this
subsection is the following:
Proposition 10.2. Consider the event
\
(109) Ωdist = Ω′netFound ∩ Enavi (Us′ , Vrest ),
s∈[ℓ]
which an event of
Y = (XV1 , UV1 , XV1 , UU1 ,V1 , . . . , XVℓ , UUℓ ,Vℓ ).
54
S
/ s∈[ℓ] Us ∪ Us′ ,
Then, condition on Ωdist , for each v ∈
′
−1 |N (v) ∩ Us |
p − kXv − Xus k ≤ 0.01δ.
|Us′ |
And if v ∈ Us ∪ Us′ ,
kXv − Xus k ≤ 3η ≤ 0.01δ.
Further, for any sample y ∈ Ω′netFound ,
P{Ωcdist | Y = y} = n−ω(1) .
In particular,
P{Ωcdist | Ω′netFound } = n−ω(1) .
In other words, given Ωdist , by observing the graph G, one can detect distance from Xus to every
other Xv with accurcy 0.01δ.
Proof. Probability Estimate: Given that the definition of Ωdist and ℓ ≤ n−3ς/4 from Theorem 9.1,
it is suffice to prove
c
(110) ∀s ∈ [ℓ], P Enavi Us′ , Vrest Y = y = n−ω(1) .
Since conditioning might not seem straight forward, let us go over it carefully once. First, let us
fix a realization
(XV1 , UV1 ) = (xV1 , uV1 ) ∈ ΩnetFound .
Then, the set {(Us , us )}s∈[ℓ] is determined (as a function of (xV1 , uV1 )). Now we fix s ∈ [ℓ] and a
realization of
(UUs ,Vs , XVs ) = (uUs ,Vs , xVs ) ∈ Enavi (Us , Vs ) ∩ Enet (Vs ).
Similarly, the set Us′ is determined (as a function of (xV1 , uV1 , uUs ,Vs , xVs )). Now, let us fixed a
realization of Y = y described in the statement of the Proposition. As a sanity check, the random
variables UUs′ ,V\Us has not been revealed, yet.
Now, we apply Lemma 9.4 to get
c
PUUs ,V\Us Enavi Us′ , V \ Us Y = y) = n−ω(1) ,
which in turn implies
c
P Enavi Us′ , V \ Us Ω′netFound = n−ω(1) ,
following Fubini’s theorem. With ℓ ≤ n3ς/4 from Theorem 9.1, (110) follows.
Distance Estimate: For the second statement, let us first notice (Us′ , us ) is a 3η-cluster from
Lemma 10.1. Thus, for v ∈ V \ Us ,
|N (v) ∩ Us′ |
p(kXv − Xus k − 3η) ≤ps (Xv ) ≤ + n−1/2+ς
|Us′ |
|N (v) ∩ Us′ |
⇒ p kXv − Xus k − 3η − ℓ−1
p n
−1/2+ς
≤ .
|Us′ |
Now, we apply p−1 on both sides and relying on the fact that p is monotone decreasing to get
′
−1 |N (v) ∩ Us |
p ≥ kXv − Xus k − 3η − ℓ−1
p n
−1/2+ς
≥ 0.01δ.
|Us′ |
The bound on the other side can be derived by the same argument. We will omit the proof here.
55
10.2. Construction of the weighted graph Γ and the metric deuc .
Definition 10.3 (Graph Γ(G, r)). Given the event Ωdist ⊆ ΩnetFound and a parameter r > 0,
we construct a weighted graph Γ(G, r) with weight w(u, v) in the following way: Starting with the
edgeless graph on V.
• First, for 1 ≤ i < j ≤ ℓ, if
′
−1 |N (uj ) ∩ Ui |
p ≤ r,
|Ui′ |
then we connect the edge {ui , uj } and assign a weight on the edge
|N (uj ) ∩ Ui′ |
w(ui , uj ) = p−1 + 0.04δ.
|Ui′ |
|N (uj )∩Ui′ |
(Intuitively, we want to define w(ui , uj ) = p−1 ′
|Ui | . However, a constant term
0.04δ is added to this weight to discourage overly long paths appeared as a shortest path
distance. This modification avoids cumulating gaps when comparing shortest path distance
and the underlying geodesic distance.)
• Second, for Sv∈/ {us }s∈[ℓ] ,
– if v ∈ s∈[ℓ] Us ∪ Us′ , then there exists unique sv ∈ [ℓ] such that v ∈ Usv ∪ Us′ v . Here,
we connect the edge {v, usv } with a weight
wr (v, usv ) = δ.
S
– if v ∈
/ s∈[ℓ] Us ∪ Us′ , then let
−1 |N (v) ∩ Us′ v |
sv := argmins∈[ℓ] p .
|Us′ v |
In the event of a tie, we choose sv to be the index with smaller value.) Then, we
connect the edge {v, usv } with a weight
wr (v, usv ) = δ.
Further, let dΓ(G,r) (v, w) denote the path distance of Γ(G, r). That is
(k−1 )
X
dΓ(G,r) (v, w) = min w(vi , vi+1 ) v0 = v, vk = w, and {vi , vi+1 } is an edge in Γ(G, r) .
i=0
An immediate analogue in the case when r = ∞ is the following:
Lemma 10.6. Condition on Ωdist . If r = ∞, then the following holds.
• For ∀1 ≤ i < j ≤ ℓ,
(113) kXui − Xuj k + 0.03δ ≤ weuc (ui , uj ) ≤ kXui − Xuj k + 0.05δ.
S
/ s∈[ℓ] Us ∪ Us′ ,
• For each v ∈
(114) kXv − Xusv k ≤ 1.02δ.
We omit the proof here since it is an analogue to that of Lemma 10.5.
10.3. Proof of Theorem 1.1. Let us reformulate Theorem 1.1 in the context of Γ and Ωdist
discussed in the previous subsection.
Theorem 10.7. Given the event Ωdist, which happens with probability 1− n−ω(1) . For all v, w ∈ V,
we have
|dΓ (v, w) − dgd (Xv , Xw )| ≤ Cdiamgd (M )rM δ2/3 ,
where C ≥ 1 is an universal constant, and
|deuc (v, w) − kXv − Xw k| ≤ 4δ.
57
Before we proceed to the proof of the theorem, we need a comparison between Euclidean distance
and geodeisc Distance:
Lemma 10.8. For p, q ∈ M with kp − qk ≤ rM ,
dgd (p, q) ≤ kp − qk(1 + κkp − qk2 /2).
Proof. We fix such p, q described in the Lemma. Consider Proposition A.4 with H = Tp M .
By the definition of rM , we know that B(p, rM ) ∩ M is connected. By (e) from Proposition A.4,
we know the orthogonal projection P from M to Tp M if restricted to B(p, rM ) ∩ M . Denoting its
inverse by φ. Then, there exists v ∈ Tp M ∩ B(p, rM ) such that φ(v) = q. Now, we apply (d) from
the Proposition with v to get
dgd (p, q) ≤ kp − qk(1 + κkp − qk2 /2).
10.4. Proof for the Geodesic Distance Setting. Let us begin with the proof of Theorem 10.7
in the geodesic setting. In particular, we split comparison of dΓ and dgd into two parts:
Lemma 10.9. Condition on Ωdist . With Γ = Γ(G, 2δ1/3 ), for every pair of vertices v, w ∈ V,
dΓ (v, w) ≤ dgd (Xv , Xw )(1 + 20δ2/3 ) ≤ dgd (Xv , Xw ) + 20diamgd (M )δ2/3 .
Proof. Fix v, w ∈ V. S
Case: kXv − Xw k ≤ δ1/3 . Assume that both v, w ∈
/ s∈[ℓ] Us . Then,
kXusv − Xusw k ≤ kXv − Xusv k + kXv − Xw k + kXw − Xusw k ≤ 1.02δ + δ1/3 + 1.02δ ≤ 1.1δ1/3 .
where the second to last inequality follows from Lemma 10.5. Particularly, according to the event
Ωdist , we have
′ ′
−1 |N (usv ) ∩ Usw | −1 |N (usw ) ∩ Usv |
max p ,p ≤ kXusv − Xusw k + 0.01δ ≤ 1.2δ1/3 .
|Us′ w | |Us′ v |
Hence, by (111), {usv , usw } is an edge with weight
w(usv , usw ) ≤ kXusv − Xusw k + 0.05δ.
Combining these together, we have
dΓ (v, w) ≤w(v, usv ) + w(usv , usw ) + w(usw , w)
≤δ + kXusv − Xusw k + 0.05δ + δ
≤3δ + kXv − Xusv k + kXv − Xw k + kXw − Xusw k
(112)
≤ 6δ + kXv − Xw k
≤dgd (Xv , Xw ) + 6δ.
Therefore, we conclude that
dΓ (v, w) ≤ dgd (Xv , Xw ) + 6δ.
When v or w is contained in Us ∪ Us′ , the proof is simliar. Therefore, we will omit the repetition of
the proof here.
Lemma 10.10. Suppose n is large enough so that δ1/3 < 0.1rM . Condition on Ωdist . With
Γ = Γ(G, 2δ1/3 ), for every pair of vertices v, w ∈ V,
dΓ (v, w) ≥ dgd (Xv , Xw ) − Cdiamgd (M )rM δ2/3 ,
for some universal constant C ≥ 1.
Proof. Let
(v = v0 , v1 , v2 , . . . , vt = w)
59
be a shortest path in Γ connecting v and w. That is,
X
dΓ (v, w) = w(vi−1 , vi ).
i∈[t]
The main technical part is to show t is at most of order δ−1/3 . Let us assume that v, w ∈
/ {us }s∈[ℓ] .
We will first derive two claims about properties of the path.
Claim 1: {vi }i∈[t−1] ⊆ {us }s∈[ℓ] . Let us prove by contraction. Given that v has only one
neighbor usv , v1 ∈ {us }s∈[ℓ] . For the same reason, u ∈ {us }s∈[ℓ] as well. Suppose vi ∈
/ {us }s∈[ℓ] for
some 1 < i < t − 1. For the same reason we have
vi−1 = vi+1 = usvi ,
which makes (v0 , v1 , . . . , vi−1 , vi+1 , . . . , vt ) a shorter path, and this is a contradiction.
which is strictly greater than the length of (ui , uj ). Thus, a contradiction follows and the claim
holds.
Claim 2: In general, for any distinct v, w ∈ V, the shortest path in Γeuc connecting v and w is
(v, usv , usw , w) if v, w ∈/ {us }s∈[ℓ] ,
(v, u , w)
sw if v ∈/ {us }s∈[ℓ] , w ∈ {us }s∈[ℓ] ,
(v, usv , w)
if v ∈ {us }s∈[ℓ] , w ∈ / {us }s∈[ℓ] ,
(v, w) if v, w ∈ {us }s∈[ℓ] .
The last case was shown in the first claim. It suffices to show the first case, as we will see soon
that the second and third cases are similar. Consider the shortest path (v = v0 , v1 , . . . , vt = w) in
Γeuc connecting v and w. Then, immediately we have v1 = usv and vt−1 = usw since v and w are
only connected to usv and usw , respectively. Next, (usv = v1 , v2 , . . . , vt−1 = usw ) must also be the
shortest path in Γeuc connecting usv and usw , as a subpath of (v0 , v1 , . . . , vt ). From the first claim,
we know (usv = v1 , v2 , . . . , vt−1 = usw ) = (usv , usw ). Hence, the first case follows. The claim holds.
61
Finally, by Lemma 10.6, for v, w ∈
/ {us }s∈[ℓ] ,
deuc (v, w) = dΓeuc (v, w) ≤δ + kXusv − Xusw k + 0.05δ + δ ≤ kXusv − Xusw k + 2δ
≤ kXv − Xw k + kXv − Xusv k + kXw − Xusw k + 2δ ≤ kXv − Xw k + 4δ.
Also, the lower bound can be establish in the same way:
deuc (v, w) = dΓeuc (v,w) ≥ w(usv , usw ) ≥ kXusv − Xusw k + 0.03δ
≥ kXv − Xw k − kXv − Xusv k − kXw − Xusw k + 0.03δ ≥ kXv − Xw k − 2δ.
The same estimate when v ∈ {us }s∈[ℓ] or w ∈ {us }s∈[ℓ] can be established in the same way, we will
omit the proof here. Therefore, the theorem follows.
10.6. Gromov-Hausdorff Distance: Proof of Theorem 1.3. Recall that Ω′netFound is an event
of
Y := (XV , UV1 , {XVs , UUs ,Vs }s∈[ℓ] ).
The discussion in this subsection is always conditioned on a sample Y = y ∈ Ω′netFound .
It is worth to make a few remarks:
• Recall that V0 ⊆ V2 is a vertex set size n and Ω′netFound is not an event of (XV0 , UV0 ,S Us′ ).
• While we have not condition on Ωdist ⊆ Ω′netFound , the graph Γ(G, r) is well-defined for any
r ≥ 0.
• For v ∈ V0 , recall the definition of sv for v ∈ V0 from the definition of Γ(G, r) (regardless of
the choice of r):
|N (v) ∩ Us′ |
sv = argmaxs∈[ℓ] .
|Us′ |
Now, we define a (random) measure ν on {us }s∈[ℓ] as follows:
|{v ∈ V0 : sv = s}|
ν({us }) = .
|V0 |
Fix any v ∈ V0 , given the fact that (Xw , U{w},Ss∈[ℓ] Us′ ) for w ∈ V0 \ {v} are i.i.d. copies of
(Xv , U{v},Ss∈[ℓ] Us′ ), which in turn implies that the conditional random variables {sw }w∈V0 are i.i.d.
copies. Hence, ν is the empirical measure of the measure ν0 defined by
ν0 ({us }) = P{sv = s}.
Before we proceed, let us emphasize that both ν and ν0 are random measures, due the fact that
{Us , us }s∈[ℓ] are random. Though, to avoid the complexity of notation, we will not explicitly write
the dependence on the sample y ∈ Ω′netFound .
Lemma 10.11. Fix any sample y ∈ Ω′netFound . With probability 1 − n−ω(1) (on (XV0 , UV0 ,Ss∈[ℓ] Us′ )),
the total variation distance of ν and ν0 is bounded by n−1/4 . That is, for any U ⊆ {us }s∈[ℓ] ,
|ν0 (U ) − ν(U )| < n−1/4 .
We denote the above event by Ων,ν0 ⊆ Ω′netFound .
Proof. Notice that nν({us }) is the sum of n i.i.d. Bernoulli random variables with probability
ν0 ({us }). For each s ∈ [ℓ], we can apply the Hoeffding’s inequality, for t ≥ 0,
(tn)2
P ν({us }) − ν0 ({us }) ≥ t ≤ 2 exp −2 .
n
log(n)
By taking t = √
n
and apply the union bound argument, we have
62
log(n)
P ∃s ∈ [ℓ], ν({us }) − ν0 ({us }) ≥ √ = n−ω(1) ,
n
where we relied on ℓ ≤ n3ς/4 from ΩnetFound ⊇ Ω′netFound .
Finally, for U ′ ⊆ {us }s∈[ℓ] ,
X log(n) log(n)
|ν(U ′ ) − ν0 (U ′ )| ≤ √ ≤ √ ℓ ≤ log(n)n−1/2 n3ς/4 < n−1/4 .
n n
s∈[ℓ]
Proof of Theorem 1.3. Recall the event Ω′netFound introdueced in the last section, which happens
with probability 1−n−ω(1) . Now we fix a realziation y ∈ Ω′netFound . We will show that the statements
of the Theorem hold within the realization.
We define the graph Γ̃ to be the induced subgraph of Γ with vertex set V ′ = {us }s∈[ℓ] .
Notice that there is a natural coupling πµ,ν0 of µ and ν0 in the generation of the random graph
Γ(G): Fix v ∈ V0 , consider the (Xv , U{v},Ss∈[ℓ] Us′ ). We have Xv ∼ µ and sv = sv (Xv , U{v},Ss∈[ℓ] Us′ ) ∼
ν0 . Now, let v ′ be another vertex of V0 . Then, (X, u) = X(v), usv ) and (X ′ , u′ ) = (X(v ′ ), usv′ ) are
two independent copies of random pairs with distribution πµ,ν0 .
Consider the event Ωdist ⊆ Ω′netFound introduced in the last section. Within the event, we have
kXv − Xusv k ≤ 1.01δ. Further, from Proposition 10.2,
Next, given the total variation distance between ν0 and ν is almost n−1/4 from Lemma 10.11,
we also know there exists a coupling πν0 ,ν such that for (u, w) ∼ πν0 ,ν ,
We can further couple (µ, ν, ν0 ) together to get a measure πµ,ν0 ,ν , with the marginal distribution
corresponding to (µ, ν0 ) and (µ, ν) being πµ,ν0 and πν0 ,ν , respectively. While such a coupling is not
unique, for any πµ,ν0 ,ν , the marginal distribution corresponding to (µ, ν) is a probability measure
π such that for a pair of independent copies of (X, u, w), (X, u′ , w′ ) ∼ π,
P |dgd (X, X ′ ) − dΓ (w, w′ )| ≥ 1.01δ ≤ P |dgd (X, X ′ ) − dΓ (u, u′ )| ≥ 1.01δ + P u = w + P u′ = w′
≤ n−ω(1) + 2n−1/4 ,
and the first statement about the coupling in the geodesic distance setting follows.
Now we move to the proof of the second statement. Same to the derivation of (116), we can
apply Theorem 10.7 to show that: Given the event Ωdist ⊆ Ω′netFound ,
∀ũ ∈ {us }s∈[ℓ] , |dgd (X, Xũ ) − dΓ (u, ũ)| ≤ C ′ diamgd (M )rM δ2/3 .
63
Thus, for any ũ ∈ {us }s∈[ℓ] and t > 0, we have
P X ∈ Bgd (Xũ , t + C ′ diamgd (M )rM δ2/3 ) |Y = y} − P{Ωcdist | Y = y}
≤P u ∈ BΓ (ũ, t)
≤P X ∈ Bgd (Xũ , t + C ′ diamgd (M )rM δ2/3 ) | Y = y} + P{Ωcdist | Y = y}.
Now, if we combine the above inequality with (115) and
P{w ∈ BΓ (ũ, t)} − P u ∈ BΓ (ũ, t) < n−1/4
from the total variation distance estimate Lemma 10.11, the second statement follows.
It remains to these two statements for the coupling with respect to Euclidean distance. However,
the proof is identical to the one in the geodesic distance setting, with the difference been replacing
(116) by the corresponding estimate given Theorem 10.7 for the Euclidean distance setting. We
omit the proof here.
(v)
( )
◦ ◦ 1/d −ς/d n−1/2+ς
ε := max 6 · (2/C ) n , ,
p(D ◦ ) · Lp
(vi)
1 ◦ 2 ◦ ◦
c◦1 := (ℓ ) C (rM /4)d ,
800 p
(vii)
1/2
Cgap d1/2 Lp
C2◦ := 4 · ◦ 1/2 ◦ 1/2 ,
(ℓp ) (c1 )
(viii)
ℓ◦p
c◦3 := √ ,
Cgap d
(ix)
( )
L2p
η ◦ := max C2◦ · (ε◦ )1/2 , ◦ √ · ε◦ ,
c3 ℓ◦p d
√
(x) δ◦ := Cgap d · η ◦ , and
(xi) r ◦ := Cgap d2 · δ◦ .
It is not hard to see that all the eleven numbers computed above are strictly positive (and finite).
We note from the definition of rM ◦ that
◦
(117) rM = 0.01 · min{1/κ◦ , rM,0
◦
} ≤ 0.01 · min{1/κ, rM,0 } = rM .
Moreover, we deduce from Corollary A.6 in the appendix that
D ◦ ≥ diam(M ) > 2rM ,
and therefore, by the definition of ℓ◦p , we obtain
(118) ℓ◦p = min |p′ (x)| ≤ min |p′ (x)| = ℓp .
x∈[0,D ◦ ] x∈[0,2rM ]
Hence, both parameters rM ◦ and ℓ◦ which the graph observer can compute exactly are lower bounds
p
for their corresponding variables.
Proposition 11.1. We have the following items.
(a) The parameters ε◦ , c◦1 , C2◦ , c◦3 , η ◦ , δ◦ , r ◦ can be computed exactly by the graph observer.
(b) (“the test”) The two inequalities
◦
(119) rM ≥ 24 Cgap d2 r ◦ and n ≥ 100
can be checked by the graph observer.
(c) If both inequalities in part (b) hold, then the parameters ε◦ , c◦1 , C2◦ , c◦3 , η ◦ , δ◦ , r ◦ may
play the roles of ε, c1 , C2 , c3 , η, δ, r, respectively, throughout the paper, as they satisfy
the parameter assumptions (12)–(17), and furthermore, these parameters are feasible (see
(18)).
65
Proof. The first two parts, (a) and (b), follow by design. Let us prove (c). Suppose that both
inequalities in (119) hold. We now check the parameter assumptions. First, we have
(117) (119)
◦
rM ≥ rM ≥ 24 Cgap d2 r ◦ > r ◦ > δ◦ > η ◦ > ε◦ > ε◦ /6.
This shows that ε◦ /6 ∈ [0, rM ], and thus by the definitions of C ◦ and ε◦ , we have
2
µmin (ε◦ /6) ≥ C ◦ · (ε◦ /6)d ≥ C ◦ · ◦ · n−ς = 2n−ς .
C
We also have
n−1/2+ς 1
ε◦ ≥ ◦
≥p · n−1/2+ς ,
p(D )Lp kKk∞ Lp
where in the last inequality, we use the bound from Remark 3.5. This shows that (12) is satisfied.
Second, (13) is satisfied, since
1 ◦ 2 ◦ ◦ 1 2 ◦ 1 2
0 < c◦1 = (ℓp ) C (rM /4)d ≤ ℓp C (rM /4)d ≤ ℓ µmin (rM /4),
800 800 800 p
where we use the definition of C ◦ , noting that rM /4 ∈ [0, rM ].
Third, (14) is satisfied, since
1/2 1/4 1/2
Cgap d1/2 Lp kKk∞ Cgap d1/2 Lp
C2◦ = 4 · ≥ 4 · ,
(ℓ◦p )1/2 (c◦1 )1/2 (ℓp )1/2 (c◦1 )1/2
where we use the observation that kKk∞ ≤ 1 from the definition of K (see Definition 3.1).
Next, (15) and (16) follow from the definitions of c◦3 and η ◦ and the inequality ℓ◦p ≤ ℓp from
(118). Finally, (17) is immediate from the definitions and δ◦ and r ◦ .
Having shown that ε◦ , c◦1 , C2◦ , c◦3 , η ◦ , δ◦ , r ◦ may play the roles of ε, c1 , C2 , c3 , η, δ, r, we
proceed to show that the parameters are feasible. This final step is easy. From (117), we know that
rM ≥ rM◦ . Thus, with r ◦ playing the role of r, if both inequalities in (119) hold, then
◦
rM ≥ rM ≥ 24 Cgap d2 r ◦ ,
and n ≥ 100, implying that the parameters are feasible.
Acknowledgments
Han Huang and Pakawut Jiradilok were supported by Elchanan Mossel’s Vannevar Bush Faculty
Fellowship ONR-N00014-20-1-2826 and by Elchanan Mossel’s Simons Investigator award (622132).
Elchanan Mossel was partially supported by Bush Faculty Fellowship ONR-N00014-20-1-2826, Si-
mons Investigator award (622132), ARO MURI W911NF1910217 and NSF award CCF 1918421.
We would like to thank Anna Brandenberger, Tristan Collins, Dan Mikulincer, Ankur Moitra, Greg
Parker, and Konstantin Tikhomirov for helpful discussions.
References
[ADC19] Ernesto Araya and Yohann De Castro. Latent distance estimation for random geometric graphs. Advances
in Neural Information Processing Systems, 32, 2019.
[ADL23] Caelan Atamanchuk, Luc Devroye, and Gábor Lugosi. A note on estimating the dimension from a random
geometric graph. arXiv:2311.13059, 2023.
[DDC23] Quentin Duchemin and Yohann De Castro. Random geometric graph: some recent developments and
perspectives. In High dimensional probability IX—the ethereal volume, volume 80 of Progr. Probab.,
pages 347–392. Birkhäuser/Springer, Cham, [2023] ©2023.
[EMP22] Ronen Eldan, Dan Mikulincer, and Hester Pieters. Community detection and percolation of information
in a geometric setting. Combin. Probab. Comput., 31(6):1048–1069, 2022.
66
[FIK+ 20] Charles Fefferman, Sergei Ivanov, Yaroslav Kurylev, Matti Lassas, and Hariharan Narayanan. Recon-
struction and interpolation of manifolds. I: The geometric Whitney problem. Found. Comput. Math.,
20(5):1035–1133, 2020.
[FIL+ 21] Charles Fefferman, Sergei Ivanov, Matti Lassas, Jinpeng Lu, and Hariharan Narayanan. Reconstruction
and interpolation of manifolds II: Inverse problems for Riemannian manifolds with partial distance data.
arXiv:2111.14528, 2021.
[FILN20] Charles Fefferman, Sergei Ivanov, Matti Lassas, and Hariharan Narayanan. Reconstruction of a Riemann-
ian manifold from noisy intrinsic distances. SIAM J. Math. Data Sci., 2(3):770–808, 2020.
[FILN23] Charles Fefferman, Sergei Ivanov, Matti Lassas, and Hariharan Narayanan. Fitting a manifold to data in
the presence of large noise. arXiv:2312.10598, 2023.
[GLZ15] Chao Gao, Yu Lu, and Harrison H. Zhou. Rate-optimal graphon estimation. Ann. Statist., 43(6):2624–
2652, 2015.
[HMST23] Keaton Hamm, Caroline Moosmüller, Bernhard Schmitzer, and Matthew Thorpe. Manifold learning in
wasserstein space. arXiv:2311.08549, 2023.
[JM13] Adel Javanmard and Andrea Montanari. Localization from incomplete noisy distance measurements.
Found. Comput. Math., 13(3):297–345, 2013.
[LMSY22] Siqi Liu, Sidhanth Mohanty, Tselil Schramm, and Elizabeth Yang. Testing thresholds for high-dimensional
sparse random geometric graphs. In STOC ’22—Proceedings of the 54th Annual ACM SIGACT
Symposium on Theory of Computing, pages 672–677. ACM, New York, [2022] ©2022.
[LS23] Shuangping Li and Tselil Schramm. Spectral clustering in the gaussian mixture block model.
arXiv:2305.00979, 2023.
[MWZ23] Cheng Mao, Alexander S. Wein, and Shenduo Zhang. Detection-recovery gap for planted dense cycles.
arXiv:2302.06737, 2023.
[OMK10] Sewoong Oh, Andrea Montanari, and Amin Karbasi. Sensor network localization from local connectiv-
ity: Performance analysis for the mds-map algorithm. In 2010 IEEE Information Theory Workshop on
Information Theory (ITW 2010, Cairo), pages 1–5. IEEE, 2010.
[Pen03] Mathew Penrose. Random geometric graphs, volume 5 of Oxford Studies in Probability. Oxford University
Press, Oxford, 2003.
[Shi16] Takashi Shioya. Metric measure geometry, volume 25 of IRMA Lectures in Mathematics and Theoretical
Physics. EMS Publishing House, Zürich, 2016. Gromov’s theory of convergence and concentration of
metrics and measures.
[STP12] Daniel L. Sussman, Minh Tang, and Carey E. Priebe. Universally consistent latent position estimation
and vertex classification for random dot product graphs. arXiv:1207.6745, 2012.
[TSP13] Minh Tang, Daniel L. Sussman, and Carey E. Priebe. Universally consistent vertex classification for latent
positions graphs. Ann. Statist., 41(3):1406–1430, 2013.
[YSLY23] Zhigang Yao, Jiaji Su, Bingjie Li, and Shing-Tung Yau. Manifold fitting. arXiv:2304.07680, 2023.
(121) = d(H0 , Ht ).
Notice that G(0) ∈ R(N −d)×d is the zero matrix. Consider the first-degree Taylor approximation
G(t) = A · t + B(t) where A, B(t) ∈ R(N −d)×d are matrices, where each entry (B(t))ij is of order
O(t2 ). Therefore, for each (i, j) ∈ [N −d]×[d], there exists ̺eij > 0 such that for any t with |t| < ̺eij ,
we have
ϑ
|(B(t))ij | < · |t|.
N
Now we take ̺ := min{e ̺ij | (i, j) ∈ [N − d] × [d]} > 0. Thus, for any t ∈ R with |t| < ̺, we have
p ϑ
kB(t)kop ≤ (N − d)d · · |t| < ϑ · |t|.
N
Therefore, for any v ∈ Rd , we have by the triangle inequality that
k(G(t))(v)k = k(A · t + B(t))(v)k ≤ kAtvk + k(B(t))(v)k ≤ (kAkop + ϑ) · |t| · kvk,
whence
(121)
(122) d(H0 , Ht ) = kG(t)kop ≤ (kAkop + ϑ) · |t|,
for any t ∈ (−̺, ̺).
Now let α = [α1 , . . . , αd ]T ∈ Sd−1 be a unit vector such that kAαk = kAkop . Then
d
X
Y (t) := αi Xi (t)
i=1
is a smooth unit vector field along the curve γ in M , defined in an open neighborhood of 0. Observe
that
Aα = G′ (0)α = PH ⊥ (Dγ ′ (t) Ye ) ,
0 t=0
where Ye is the vector field along γ in M given by Yeγ(t) := Y (t). Therefore, we have
(123) kAkop = kAαk = kII(γ ′ , Ye )γ(0) k ≤ κ.
Note that the inequality above is valid since both γ ′ (0) and Yeγ(0) = Y (0) are unit vectors. We
finish by combining (122) and (123).
68
Lemma A.3. For any interior points a, b ∈ I, we have d(Ha , Hb ) ≤ κ · |b − a|.
Proof. Without loss of generality, assume a ≤ b. Take an arbitrary ϑ > 0. We define a non-
decreasing sequence {ai }∞
i=0 of real numbers as follows. Take a0 := a. For each i ≥ 0, define
(124) ai+1 := max {x ∈ [ai , b] | d(Hai , Hx ) ≤ (κ + ϑ)(x − ai )} .
We give two remarks about (124). First, we do not require that d(Hai , Ht ) ≤ (κ + ϑ)(t − ai ) for
all ai ≤ t ≤ x. We only require the inequality to be true when t = x. Second, ai+1 is well-defined
since x 7→ d(Hai , Hx ) − (κ + ϑ)(x − ai ) is a continuous function, and hence the set in (124) is a
closed set.
Lemma A.2 implies that ai+1 > ai unless ai = b. We argue that there exists T < ∞ such that
aT = b. Suppose otherwise for the sake of contradiction. Then {ai }∞ i=0 is a strictly increasing
sequence, and hence it converges to a limit L ∈ (a, b].
Using Lemma A.2 again, we find that there exists ̺ > 0 such that for any t ∈ (L − ̺, L + ̺) ∩ I,
we have
(125) d(HL , Ht ) ≤ (κ + ϑ) · |t − L|.
Since {ai }∞
i=0 converges to L, there exists k ∈ Z≥0 such that L − ̺ < ak < L, and so (125) gives
d(Hak , HL ) ≤ (κ + ϑ) · |L − ak |,
which, by (124), implies that ak+1 ≥ L. However, this means ak+1 = ak+2 = · · · = L, which is a
contradiction.
Therefore, there exists an index T for which aT = b. By the triangle inequality, we have
T
X −1 T
X −1
d(Ha , Hb ) ≤ d(Hai , Hai+1 ) ≤ (κ + ϑ)(ai+1 − ai ) = (κ + ϑ)(b − a).
i=0 i=0
Since ϑ > 0 was arbitrary, by taking ϑ ց 0, the lemma follows.
A.2. Projection as a local diffeomorphism. This subsection collects some useful geometric
results for us to use in the paper. Let M and κ be as described in the previous subsection. Let
dgd (p, q) denote the geodesic distance between points p, q ∈ M . We use the notation Bgd (p, r) ⊆ M
to denote the set of all points in M with geodesic distance strictly less than r from p ∈ M , while
B(p, r) ⊆ RN denotes the Euclidean open ball of radius r centered at p.
Proposition A.4. Fix p ∈ M . Suppose H is a d-dimensional affine subspace of RN through p
such that 1 − d(H, Tp M ) =: ζ > 0 and PH : RN → H is the orthogonal projection to H. Then
(a) PH is a diffeomorphism from Bgd (p, ζ/10κ) to its image.
(b) B(p, 0.09ζ 2 /κ) ∩ H ⊆ PH (Bgd (p, ζ/10κ)).
(c) Let φ be the inverse of PH |Bgd (p,ζ/10κ) so that φ(PH (q)) = q, for every q ∈ Bgd (p, ζ/10κ).
1
For any v ∈ B(p, 0.09ζ 2 /κ) ∩ H, we have kφ(v) − pk ≤ dgd (φ(v), p) ≤ 0.9ζ kv − pk. In
particular,
dgd (φ(v), p) ≤ kv − pk(1 + κ2 kv − pk2 /2) ≤ kφ(v) − pk(1 + κ2 kφ(v) − pk2 /2)
(d) If H = Tp M , then for any v ∈ B(p, 0.09/κ) ∩ Tp M , we have kPH ⊥ φ(v)k ≤ κkv − pk2 and
dgd (φ(v), p) ≤ kv − pk(1 + κ2 kv − pk2 /2).
(e) If M ∩ B(p, 0.09ζ 2 /κ) is connected, then M ∩ B(p, 0.09ζ 2 /κ) ⊆ φ(B(p, 0.09ζ 2 /κ) ∩ H).
Proof. With a translation, we assume p = ~0 ∈ RN throughout this proof.
(a) For any point q ∈ M with dgd (~0, q) < ζ/10κ, the triangle inequality and Lemma A.3 give
ζ
1 − d(H, Tq M ) ≥ 1 − d(H, T~0 M ) − d(T~0 M, Tq M ) > ζ − κ · = 0.9ζ.
10κ
69
For any v ∈ Tq M with kvk = 1, we have
p q
(126) kPH vk = 1 − kPH ⊥ vk2 ≥ 1 − d(H, Tq M )2 ≥ 1 − d(H, Tq M ) ≥ 0.9ζ.
Therefore, when we restrict the domain of PH to M , we find that the least singular value of the
linear map
(dPH )q Tq M = PH Tq M : Tq M → TPH (q) H
is bounded below by 0.9ζ > 0. Therefore, at any point q ∈ Bgd (~0, ζ/10κ), the map PH M is a local
diffeomorphism.
The next step is to show that PH is a diffeomorphism when restricted to Bgd (~0, ζ/10κ). It suffices
to show that PH B (~0,ζ/10κ) is injective. To that end, consider any q, q ′ ∈ Bgd (~0, ζ/10κ) with q 6= q ′ .
gd
We claim that PH q 6= PH q ′ .
Let γ : [0, t0 ] → M be a shortest unit-speed geodesic connecting q and q ′ with γ(0) = q and
γ(t0 ) = q ′ . Observe that γ ′ (0) is a unit vector in Tq M , and therefore by (126), we find that
(127) kPH γ ′ (0)k ≥ 0.9ζ > 0.
′
Hence, we can consider v := kPPH γ (0)
′
H γ (0)k
.
′
We will show that hPH γ (t), vi > 0 for t ∈ [0, t0 ], which implies
hPH q ′ − PH q, vi = hPH γ(t0 ) − PH γ(0), vi > 0,
and hence PH q ′ 6= PH q.
To achieve that, we take any t ∈ [0, t0 ] and compare γ ′ (t) and γ ′ (0). Observe that
q
kDγ ′ (t) γ ′ (t)k = k(Dγ ′ (t) γ ′ (t))⊥ k2 + k(Dγ ′ (t) γ ′ (t))⊤ k2 .
Since γ is a unit speed geodesic, the second summand inside the square root is 0, and thus
kDγ ′ (t) γ ′ (t)k = k(Dγ ′ (t) γ ′ (t))⊥ k = kII(γ ′ (t), γ ′ (t))k ≤ κ.
Therefore, we have
(128) kγ ′ (t) − γ ′ (0)k ≤ κt.
Notice that since q, q ′ ∈ Bgd (~0, ζ/10κ), we have
(129) t ≤ t0 = dgd (q, q ′ ) ≤ 0.2ζ/κ,
by the triangle inequality.
Hence,
hPH γ ′ (t), vi = kPH γ ′ (0)k + hPH (γ ′ (t) − γ ′ (0)), vi
≥ kPH γ ′ (0)k − kγ ′ (t) − γ ′ (0)k
(127),(128)
≥ 0.9ζ − κt
(129)
≥ 0.9ζ − κ · 0.2ζ/κ = 0.7ζ > 0.
We have finished the proof of part (a).
(b) Let U := PH (Bgd (~0, ζ/10κ)). From part (a), PH : Bgd (~0, ζ/10κ) → U is a diffeomorphism.
Let φ denote the inverse of PH . Suppose, for the sake of contradiction, that there exists z ∈
B(~0, 0.09ζ 2 /κ) ∩ H such that z ∈
/ U . Since ~0 ∈ U , we have z 6= ~0, and it makes sense to consider
z
u := kzk , and let us define S := {t ∈ R : tu ∈ U }.
70
By construction, observe the following: (i) S is an open set containing 0 ∈ R, (ii) S is bounded,
since every element x ∈ S satisfies |x| < ζ/10κ, and (iii) kzk ∈
/ S. Therefore, the set
S ′ := [0, ζ/10κ] \ S
is a compact set which contains kzk. We define s0 := min S ′ . Thus, we have
(130) 0 < s0 ≤ kzk < 0.09ζ 2 /κ,
and for any s ∈ [0, s0 ), we have su ∈ U , while s0 u ∈ / U.
Consider the curve γ e : [0, s0 ) → Bgd (~0, ζ/10κ), given by γ e(s) := φ(su). Let γ : [0, t0 ) →
~
Bgd (0, ζ/10κ) be the unit-speed reparametrization of γ e(s(t)) = γ(t), for a certain monotonically
increasing function s(t) with s(0) = 0. Note that PH γ(t) = s(t)·u, and so we have s′ (t) = kPH γ ′ (t)k.
For each t ∈ [0, t0 ), since γ(t) ∈ Bgd (~0, ζ/10κ) and kγ ′ (t)k = 1, we can use (126) to obtain
kPH γ ′ (t)k ≥ 0.9ζ, and therefore
Z t0
(131) s0 = s′ (t) dt ≥ 0.9ζt0 .
0
Combining (130) and (131), we find t0 < 0.1ζ/κ. This implies that limt→t0 γ(t) ∈ Bgd (~0, ζ/10κ).
However, this gives s0 u = limt→t0 PH γ(t) ∈ U , a contradiction.
(c) Each point v ∈ B(~0, 0.09ζ 2 /κ) ∩ H can be expressed as v = s0 u where s0 ∈ [0, 0.09ζ 2 /κ) and
u ∈ H with kuk = 1. Consider the same construction, as in part (b), of a curve γ e(s) = φ(su) for
e(s(t)). The same argument from part (b)
s ∈ [0, s0 ) with the unit-speed reparametrization γ(t) = γ
kvk
implies that dgd (φ(s0 u), ~0) ≤ t0 ≤ 0.9ζ
s0
= 0.9ζ . The other inequality is clear, since dgd (φ(v), ~0)
is the length of a geodesic joining φ(v) and ~0, which must be bounded below by the Euclidean
distance between the two points.
(d) Take any v ∈ B(~0, 0.09/κ) ∩ H. Consider the same construction as in the previous steps. In
particular, we have
• the expression v = s0 u and so kvk = s0 ,
e : [0, s0 ) → M , given by γ
• the curve γ e(s) = φ(su), and
e(s(t)).
• the unit-speed reparametrization γ : [0, t0 ) → M so that γ(t) = γ
Note that we have s0 ≥ 0.9t0 .
For each t ∈ [0, t0 ), using Lemma A.3, we have
(132) kPH ⊥ γ ′ (t)k ≤ d(H, Tγ(t) M ) = d(Tγ(0) M, Tγ(t) M ) ≤ κt,
which by integration implies
1
(133) kPH ⊥ φ(v)k ≤ κt20 ≤ κs20 = κkvk2 .
2
From (132), we also obtain
p
s′ (t) = kPH γ ′ (t)k ≥ 1 − κ2 t2 ≥ 1 − 0.51κ2 t2 ,
κs0
where we use κt ≤ κt0 ≤ 0.9 < 0.1. Therefore, we find by integrating that
Z t0
s0 = s′ (t) dt ≥ t0 − 0.17κ2 t30 ≥ t0 · 1 − 0.3κ2 s20 ,
0
and hence
s0 κ2 s20
dgd (φ(v), ~0) ≤ t0 ≤ ≤ s0 · 1 + .
1 − 0.3κ2 s20 2
71
(e) Suppose that N := M ∩ B(~0, 0.09ζ 2 /κ) is connected. Take an arbitrary point q ∈ N . We
want to show that q ∈ φ(B(~0, 0.09ζ 2 /κ) ∩ H). Note that this is clear when q = ~0, so for the
rest of this proof let us assume q 6= ~0. Because N is connected, we can take a unit-speed path
γ : [0, t0 ] → N with γ(0) = ~0 and γ(t0 ) = q, for some t0 > 0.
Since PH is a contraction, using part (b), we find that for each t ∈ [0, t0 ],
(134) PH (γ(t)) ∈ PH (M ∩ B(~0, 0.09ζ 2 /κ))
⊆ PH M ∩ PH (B(~0, 0.09ζ 2 /κ))
⊆ H ∩ B(~0, 0.09ζ 2 /κ)
⊆ PH (Bgd (~0, ζ/10κ)),
which shows that PH (γ(t)) is in the domain of φ. Hence, it makes sense to define γ e : [0, t0 ] → M
e(t) := φ(PH (γ(t))), for each t ∈ [0, t0 ]. We claim that γ(t) = e
by γ γ (t), for every t ∈ [0, t0 ].
Consider the set
S ′′ := {t ∈ [0, t0 ] : γ(t) = e
γ (t)} .
′′
For each t ∈ S , from (134), we have
e(t) ∈ φ(H ∩ B(~0, 0.09ζ 2 /κ)) =: N ′ .
γ(t) = γ
Since N ′ is an open set in M , there exists ̺ > 0 such that for each t′ ∈ (t − ̺, t + ̺) ∩ [0, t0 ], we
have γ(t′ ) ∈ N ′ . From parts (a) and (b), we know that φ ◦ PH |N ′ is the identity map on N ′ . Thus,
e(t′ ) = φ(PH (γ(t′ ))) = γ(t′ ), which implies that t′ ∈ S ′′ .
γ
The argument from the previous paragraph shows that S ′′ is an open set in [0, t0 ]. On the other
hand, since the function t 7→ γ(t) − e γ (t) is continuous, S ′′ is also a closed set in [0, t0 ]. We find
that S is either [0, t0 ] or ∅. Because 0 ∈ S ′′ , we conclude that S ′′ = [0, t0 ], and in particular,
′′
e(t0 ).
q = γ(t0 ) = γ
Finally, using (134) again, we find
e(t0 ) = φ(PH (γ(t0 ))) ∈ φ(B(~0, 0.09ζ 2 /κ) ∩ H),
q=γ
as desired.
Corollary A.5. If p, p′ ∈ M satisfy kp − p′ k ≤ 0.01/κ, then d(Tp M, Tp′ M ) ≤ 2κ · kp − p′ k.
Proof. Let H := Tp M , and let PH : RN → H denote the orthogonal projection to Tp M . Let
v := PH (p′ ). Note that kv − pk ≤ kp′ − pk. Proposition A.4(d) gives
dgd (p′ , p) ≤ kv − pk(1 + κ2 kv − pk2 /2) ≤ kp′ − pk(1 + κ2 kp′ − pk2 /2) ≤ 2kp′ − pk,
and thus Lemma A.3 yields d(Tp M, Tp′ M ) ≤ κ · dgd (p, p′ ) ≤ 2κ · kp − p′ k.
Corollary A.6. In the setting of Proposition A.4, if M ′ := M ∩ B(p, 0.09ζ 2 /κ) is connected, then
the projection PH is a diffeomorphism from M ′ to its image.
Proof. Using parts (a), (b) and (e) of Proposition A.4, we find that
(e) (b) (a)
M ∩ B(p, 0.09ζ 2 /κ) ⊆ φ(B(p, 0.09ζ 2 /κ) ∩ H) ⊆ φ(PH (Bgd (p, ζ/10κ))) = Bgd (p, ζ/10κ),
and therefore, by part (a) again, we conclude that PH |M ∩B(p,0.09ζ 2 /κ) is a diffeomorphism.
73