Download as pdf or txt
Download as pdf or txt
You are on page 1of 73

RECONSTRUCTING THE GEOMETRY OF RANDOM GEOMETRIC GRAPHS

HAN HUANG, PAKAWUT JIRADILOK, AND ELCHANAN MOSSEL

Abstract. Random geometric graphs are random graph models defined on metric spaces. Such
a model is defined by first sampling points from a metric space and then connecting each pair of
sampled points with probability that depends on their distance, independently among pairs.
arXiv:2402.09591v1 [cs.LG] 14 Feb 2024

In this work, we show how to efficiently reconstruct the geometry of the underlying space from
the sampled graph under the manifold assumption, i.e., assuming that the underlying space is a
low dimensional manifold and that the connection probability is a strictly decreasing function of
the Euclidean distance between the points in a given embedding of the manifold in RN .
Our work complements a large body of work on manifold learning, where the goal is to recover
a manifold from sampled points sampled in the manifold along with their (approximate) distances.

1. Introduction
The manifold assumption in machine learning is a popular assumption postulating that many
models of data arise from distribution over manifolds, see e.g. [DDC23, ADC19, FIL+ 21, FILN20,
FILN23, ADL23] among many others.
A major problem studied in this area is the inference problem of estimating an unknown manifold
given data sampled from the manifold. Some of the foundational work in this area shows that
in many situations given the (possibly noisy) distances between sampled points it is possible to
estimate the unknown manifold.
We are interested in more a difficult problem which arises in the context of random geometric
graphs. In our basic setup, points are again sampled from a manifold but now the data is a graph. In
the graph each sampled point corresponds to a vertex and an edge is included with probability that
depends in a strictly monotone fashion on the (embedded) distance of the corresponding endpoints,
where the inclusions of different potential edges are independent events.
It is clear that this problem is harder than inferring the geometry from (exact) distances. Indeed,
assuming we know the function determining the edge probability as a function of the distance, and
given the sampled points and exact distances between them we can generate a random graph that
depends on the distances between them and has the correct distribution. Thus being able to infer
the geometry from the graph allows us to determine the geometry from the distances.
The main contribution of the current paper is showing that even given only the graph information,
it is possible to estimate the geometry of the underlying manifold.
We now describe the model and the main results in more detail. Suppose M is a d-dimensional
manifold embedded in an ambient Euclidean space and let µ denote a probability measure defined
on M . We construct a random graph G with n vertices as follows. First, generate n i.i.d. random
points X1 , . . . , Xn on M according to µ. We let the vertex set be [n] = {1, . . . , n}. Each pair {v, w}
of vertices is independently connected with a probability that depends on the Euclidean distance
of Xv and Xw ; the closer the points on the manifold are, the more likely the corresponding vertices
on the graph are connected. The probability of connection is determined by a distance-probability
function p, which a monotone decreasing function from [0, ∞) to [0, 1]. We let the probability of
connecting {v, w} by p(kXv − Xw k), where k · k is the standard Euclidean norm.

Date: February 16, 2024.


1
We denote the resulting random geometric graph by G = G(n, M, µ, p). The main question we
address in this paper is:
Question: can we estimate the geometry of the manifold M from the graph G? If so, is there an
efficient algorithm to do so?
Indeed, we answer both questions affirmatively under some relatively mild assumptions on M, µ,
and p. We briefly describe these assumptions informally here. We only consider the case where
M is a smooth compact connected d-dimensional manifold embedded in a Euclidean space. Our
results will depend on a bound on the second fundamental form of M , which is a measure of how
curved the manifold is. They will also depend on a “repulsion property,” informally requiring that
if p, q ∈ M are sufficiently close in the Euclidean distance, then p and q are also close in the geodesic
distance. As for µ, we assume a lower bound of order r d on the probability measure of any ball of
radius r for r close to 0.
Finally, we assume that p is a smooth strictly monotone decreasing function, with some quanti-
tative bound on the derivative of p. For an exact formulation, see Assumptions 2.3 and 2.4.
In our main results, we prove that the geometry can be recovered from the sampled graph. There
are two different notions of geometric recovery stated in Theorems 1.1 and 1.3 below.
Theorem 1.1. There exists c > 0 so that the following holds. Suppose (M, µ, p) satisfies As-
sumptions 2.3 and 2.4. Then there exists a deterministic polynomial-in-n time algorithm that takes
G = G(n, M, µ, p) as input, and outputs a weighted graph Γ on [n] and a metric deuc on [n], such
that, with probability 1 − n−ω(1) ,
• for every p ∈ M , there exists v such that
kXv − pk ≤ dgd (Xv , p) ≤ C(M, µ, p)n−c/d .
• for every pair of vertices v, w,
|dgd (Xv , Xw ) − dΓ (v, w)| ≤ C(M, µ, p)n−c/d ,
where dgd denoted the geodesic distance on M and dΓ denotes the path metric on Γ.
• for every pair of vertices v, w,
kXv − Xw k − deuc (v, w) ≤ C(M, µ, p)n−c/d ,
where k · k denotes the Euclidean norm.
Here, C(M, µ, p) ≥ 1 is a constant which depends on M , µ, and p.
What this theorem says informally is that we recover both the intrinsic and the extrinsic distances
of the sampled points on the manifold.
Using the main results of [FIK+ 20, FIL+ 21], it is possible to recover a manifold that is close to
the one that data is sampled from.
Corollary 1.2. Assume the same settings as in Theorem 1.1. There exist an absolute constant
c > 0 and a deterministic polynomial-in-n time algorithm that takes G = G(n, M, µ, p) as input, and
outputs, with probability 1 − n−ω(1) , a smooth Riemannian manifold (Mc, gb) which is diffeomorphic
to (M, g), together with a diffeomorphism
c→M
F :M
such that
1 dgdM (F (x), F (y))
≤ ≤ L,
L dgdMc
(x, y)
where L = 1 + C(M, µ, p) · n−c/d , and where C(M, µ, p) ≥ 1 is a constant which depends on M , µ,
and p.
2
Proof. This corollary essentially follows from [FIL+ 21, Thm. 1.2 (2)]. To apply this result, we need
to check that the manifold in our setting is applicable. The bounds on the diameter, the injectivity
radius, and the sectional curvature are given under Assumptions 2.3 and 2.4. Theorem 1.1 implies
that we obtain a finite ε0 -net with ε0 ≪M,µ,p n−c/d in the whole manifold M , and we also obtain
distance vector data (see [FIL+ 21]) with distance noise upper bound ε1 ≪M,µ,p n−c/d between any
two points of the net. Thus, [FIL+ 21] gives an algorithmic reconstruction of the desired smooth
Riemannian manifold together with the desired diffeomorphism. 

Our second main result involves recovering the manifold M together with the measure µ in a
version of Gromov–Hausdorff distance for metric measure spaces. For more background on metric
measure spaces and Gromov–Hausdorff distances, see e.g. [?, Shi16].
Theorem 1.3. Suppose (M, µ, p) satisfies Assumptions 2.3 and 2.4. Then there exists C(M, µ, p) ≥
1 so that the following holds. There exists a deterministic polynomial-in-n time algorithm that takes
e ν, deuc ) such that
G = G(n, M, µ, p) as input, and with probability 1 − n−ω(1) , outputs (Γ,
e is a weighted graph whose vertex set is a subset V ′ of the vertices of G,
• Γ
• ν is a probability measure on V ′ ,
• deuc defines a metric space on V ′ ,
• there exists a coupling π of µ and ν so that, for two independent copies (X, u), (X ′ , u′ ) ∼ π,

P dgd (X, X ′ ) − dΓe (u, u′ ) > C(M, µ, p)n−c/d ≤ C(M, µ, p)n−c/d ,
and

P kX − X ′ k − deuc (u, u′ ) > C(M, µ, p)n−c/d ≤ C(M, µ, p)n−1/4 ,
e and t ≥ 0,
• for any vertex u of Γ
µ(Bgd (Xu , t − C(M, µ, p)n−c/d )) − C(M, µ, p)n−1/4
≤ ν(BΓe (u, t))
≤ µ(Bgd (Xu , t + C(M, µ, p)n−c/d )) + C(M, µ, p)n−1/4 ,

µ(Bk·k (Xu , t − C(M, µ, p)n−c/d )) − C(M, µ, p)n−1/4


≤ ν(Bdeuc (u, t))
≤ µ(Bk·k (Xu , t + C(M, µ, p)n−c/d )) + C(M, µ, p)n−1/4 .
Here, Bgd (x, t) (resp., Bk·k (x, t), BΓe (x, t), and Bdeuc (x, t)) denotes the open ball of radius t around
x in the geodesic distance on M (resp., the Euclidean distance in the ambient space, the path metric
e and the metric deuc on Γ).
on Γ, e

The theorem shows that Γ e is a good approximation to M as a metric measure space. The
coupling π “matches” the two spaces. Under this matching distances match: the intrinsic distance
on the original manifold matches the graph distance on Γ e and the embedded distance on the original
manifold matches the distance deuc .
The proofs of both theorems essentially use the following result which is the main technical result
of the paper. It shows how to extract a “net of clusters” from the graph.
Theorem 1.4. Suppose (M, µ, p) satisfies Assumptions 2.3 and 2.4. There exist constants C =
C(M, µ, p) ≥ 1, c ∈ (0, 1), and a polynomial-in-n time algorithm buildNet taking G = G(n, M, µ, p)
as input, and with probability 1 − n−ω(1) , outputs a collection of pairs {(Uα , uα )}α∈[ℓ] with uα ∈
Uα ⊆ [n] such that the output is a Cluster-Net in the following sense:
(1) For each α ∈ [ℓ], |Uα | ≥ n1/2 , and for each v ∈ Uα , kXuα − Xv k ≤ Cn−c/d .
3

(2) For each p ∈ M , there exists α ∈ [ℓ] such that kp − Xuα k ≤ 1000 dCn−c/d .
1.1. Related Works in the Literature. Our work connects to two large bodies of works: Mani-
fold Learning and Random Geometric Graphs. Due to space limitations, we cannot review many of
the beautiful works in these areas. We will focus on some of the most relevant works. For surveys
on Random Geometric Graphs, see e.g. [Pen03, DDC23].
Perhaps the closest to our work is the work of Araya and De Castro [ADC19]. They consider
random geometric graphs generated from latent points on the sphere Sd−1 with the probability
of having an edge between two latent points given by the value of the probability “link” function
evaluated at the distance between the two points. Similar to our result, their work gives a way
to approximate the distances between the latent points from the random geometric graph. The
setup of [ADC19] crucially relies on the assumption of uniform sampling from the unit sphere. This
allows them to use spectral/harmonic analysis on the sphere to obtain fast algorithms and good
rates of convergence. This also allows them to reconstruct graphs in the sparse regime, which we do
not. See also the follow up work of Eldan, Mikulincer, and Pieters [EMP22]. See also [STP12] for
related work in a similar setup. Since we are considering general manifold such techniques cannot
be applied. This may also explain why our algorithms for the general case are less efficient and
have worse errors rates.
Works of Fefferman, Ivanov, Kurylev, Lassas, Lu, and Narayanan (some with/without Kurylev
and some with/without Lu) [FIK+ 20, FILN20, FIL+ 21, FILN23] consider the problem of manifold
learning as part of a more general manifold extension problem. The main difference in perspective
is that the interests in these lines of work is finding a bona fide manifold and moreover, points are
given with (approximate) distances between them. As mentioned earlier we can leverage this line
of work together with our results to recover manifolds in our setting as well.
There are many other problems that have been studied in random geometric graphs, including
finding the location of points given noisy distances [OMK10, JM13], questions relating to testing
geometry vs. non-geometric random graphs, esp. in high dimensions e.g. [LMSY22, ADL23] and
questions related to geometric block models, e.g. [LS23].
1.2. Some Proof Ideas. Here we give a thought process toward the construction of the algorithm
buildNet and the proof of Theorem 1.4.
Why clusters? Let us begin by explaining why we want clusters of points. Given two vertices
v, w, the presence or absence of an edge between them is indeed only weakly associated with their
distance. However, if there exists a cluster V surrounding the vertex v in the sense that for every
u in V ,
kXv − Xu k ≤ η,
for a small η, then it becomes possible to infer the distance from v to w by examining the number
of edges connecting w and V . This is because the quantity
|{(u, w) : u ∈ V, {u, w} is an edge }|
is a sum of independent Bernoulli random variables with parameter p(kXw −Xu k) ≈ p(kXw −Xv k),
once we condition on the realization of {Xu }u∈V ∪ {Xv , Xw } in which V is a cluster. Leveraging the
concentration of the sum of independent Bernoulli random variables, we can, with high probability,
estimate p(kXw − Xv k) within an error margin determined by η and the size of V . Consequently,
this allows for the approximation of kXw − Xv k with small error.
Finding a Cluster using extremal statistics. The question of how to generate a cluster at a given
vertex v is not trivial. It seems intuitively true that, for every other vertex w, the closer Xv and Xw
are, the larger the size of their common neighbors should be. This reasoning suggests that forming a
cluster around v could be as simple as grouping vertices that share many common neighbors with v.
Unfortunately, this intuition does not hold in general. Counterexamples can be constructed based
4
on the geometry of the embedded manifold M or the properties of the measure µ, demonstrating
scenarios where, despite given Xv = Xv′ and a non-negligible distance between Xv and Xw , the
expected number of common neighbors between v and w can be larger than that between v and v ′ .
Therefore, directly extracting a cluster based on this premise is not feasible.
Nevertheless, the underlying intuition is not entirely wrong. It still applied to an extremal
setting. To illustrate, consider the function
Z
(1) K(x, y) := p(kx − zk)p(ky − zk)dµ(z).
M
Conditioning on Xv = x and Xw = y, K(x, y) represents the probability that a third vertex u
is a common neighbor of both v and w. Consequently, (n − 2)K(x, y) is the expected number of
common neighbors of v and w.
It is worth to noting that simply applying Cauchy-Schwarz inequality,
p
(2) K(x, y) ≤ K(x, x)K(y, y) ≤ max{K(x, x), K(y, y)},
which implies the global maximum K(x, y) only occurs along the diagonal x = y. Therefore, if
there are sufficiently many vertices, and |N (v) ∩ N (w)| ≃ (n − 2)K(Xv , Xw ) is among the largest
of common neighbors for all distinct pairs of vertices, it suggests that (Xv , Xw ) might be near a
global maximum of K, and hence close to each other. Stemming from this observation, we can
derive an algorithm generateCluster that takes an induced subgraph of GW of G = G(n, M, µ, p)
and attempt to extract a cluster (V, v) with v ∈ V ⊆ W in the sense that
∀w ∈ V, kXv − Xw k ≤ η,
for some small parameter η. This algorithm is simple, it simply search for v ∈ W so that there are
many vertices w in W such that |N (v) ∩ N (w) ∩ W |/|W | is within a small gap to
(3) max K(Xw1 , Xw2 ) ≃ max K(Xw1 , Xw1 ).
w1 ,w2 ∈W w1 ∈W

In order for the algorithm to has a high success rate, besides requiring the size of common neighbors
|N (v) ∩ N (w) ∩ W |/|W | approximates K(Xv , Xw ) for v, w ∈ W , we also need
(4) ∀w0 ∈ W, {Xw }w∈W ∩ Bk·k (Xw0 , ε) is large enough,
for some parameter ε much smaller than η, the radius of the cluster. This is necessary so that the
comparison (3) holds.
With these assumptions, we have a simple algorithm generateCluster that can extract a cluster
(V, v) from a given batch of vertices W . The only drawback is that since we are relying on the
extremal statistics of K, we have no control on v.
Finding the next Cluster. The main algorithm buildNet operates as a recursive algorithm that
partition the vertex set into batches. Each iteration extracts a new cluster from a newly intro-
duced batch. Within each iterative process, the algorithm generateCluster is employed with minor
adjustment to gain control on the next generated cluster. Consider the scenario where we already
have a collection of clusters {(Vs , ss )}s∈[k] , and a new batch of vertices, W , is presented. Given the
pre-existing clusters, the distance between each vertex w ∈ W and each cluster center vs can be
estimated, with the accuracy and likelihood of this estimation depending on the parameter η and
the size of Vs . Specifically, this concerns how well
 
−1 |N (w) ∩ Vs |
p ≃ kXw − Xvs k,
|Vs |
including the associated probability.
Consider W ′ ⊆ W as identified through distance estimates constraints relative to {vs }. Intu-
itively, our aim is for each w ∈ W ′ to satisfy ds ≤ kXw − Xvs k ≤ Ds for some s ∈ [k]. The actual
5
selection criteria for W ′ should be defined so that it is observable from the graph, in the following
way  
−1 |N (w) ∩ Vs |
ds ≤ p ≤ Ds ,
|Vs |
which serves as an approximation of out intended condition. Then, we will apply generateCluster
using the input GW but restricting to produce a cluster (V, v) within W ′ .
Specifically, the algorithm seeks a vertex v ∈ W ′ where there are many w ∈ W ′ , such that
|N (v) ∩ N (w) ∩ W |/|W | is comparable to the maximum of |N (w1 ) ∩ N (w2 ) ∩ W | ≃ K(Xw1 , Xw2 )
over distinct w1 , w2 ∈ W ′ .
To ensure that the outcome constitutes a valid cluster, W ′ must satisfy the condition such as
(4), ensuring (3) holds when restricting w, w1 , w2 to W ′ .
Due to the fact that W ′ is only an approximation of what we want with the imposed geometric
constraints. It becomes difficult to justify the desired properties of W ′ when we don’t have control
on the shape of the manifold M together with the measure µ. For this reason, we will need to
choose both ds and Ds at the order of some parameter δ, where for every p ∈ M , the manifold
looks like a flat space of dimension d within a ball of radius δ centered at p. Here we crucially rely
on the fact that the underlying metric space of the random geometric graph is locally Euclidean.
Finding the Next Cluster Close to a Given Cluster. Following from this discussion, we can now
outline an intermediate step of the buildNet algorithm. Consider a scenario where we are given a
cluster (U, u) and a vertex u′ with the property that kXu − Xu′ k ≈ δ, indicating that the points
are neither too close nor too far apart, but within a distance that allows the manifold M to appear
flat. Our objective is to identify a cluster (V, v) in the vicinity of u′ .
This is accomplished through the utilization of d + 1 new batches: W1 , W2 , . . . , Wd+1 . Starting
with the first batch, we apply generateCluster with input GU ∪W1 and a carefully selected subset
W1′ ⊆ W1 to derive a cluster (V1 , v1 ) positioned at a distance r from (U, u), where r is comparable
to δ. Subsequent applications of generateCluster with inputs GU ∪V1 ∪W2 , GU ∪V1 ∪V2 ∪W3 , . . . and
corresponding subsets W2′ ⊆ W2 , W3′ ⊆ W3 , . . . yield clusters (V2 , v2 ), (V3 , v3 ), . . . such that each
cluster (Vi , vi ) satisfies

kXvi − Xu k ≈ r and ∀j ∈ [i − 1], kXvi − Xvj k ≈ 2r.
The distance criteria for selecting these clusters are chosen to so that the vectors {Xvi − Xu }i∈[d]
form an approximate orthogonal basis of vectors locally for M at Xu . Finally, we can apply
generateCluster with input GU ∪{Vi }i∈[d] ∪Wd+1 and a subset Wd+1′ ⊆ Wd+1 , containing vertices whose
distances from Xw to the set of points {Xu , Xv1 , . . . , Xvd } closely match those from Xu′ . Given
the local flatness of M within a δ-radius ball centered at Xu , the cluster (V, v) identified in this
manner will be proximal to u′ in the underlying metric space.
While the algorithm is relatively straightforward, the bulk of the technical proof focuses on metic-
ulously justifying, at various stages, that our chosen subset W ′ indeed meets all the prerequisites
for generateCluster to successfully identify a valid cluster. Simultaneously, it is essential to ensure
that the creation of new clusters at each step does not lead to propagation on the cluster radius η.

1.3. Open Problems.


(1) Efficiency: To approximate the manifold M within a distance δ, it’s necessary and suf-
ficient to construct a net of M with a mesh size no greater than δ. A simple volumetric
argument indicates that the size of the net is of size δ1/d , where d is the dimension of M .
In our setting, while we do not know the underlying distance since only a graph is given,
our result indicates that this is possibly true since we can produce a net of M with mesh
size at most n−c/d , where n is the number of vertices. Then, our algorithm or perhaps our
analysis on the performance is suboptimal in two primary aspects:
6
• Our proof can only guarantee the algorithm to produce desired result when the number
−d2 (d)
of vertices n is at least larger than rM , where rM is the radius within which M
appears as a flat space of dimension d, centered at any point on M . Ideally, the order
−d
should be rM . This discrepancy primarily arises from the difficulty in comparison
between maxv,w∈W K(Xv , Xw ) and maxv,w∈W |N (v)∩N (w)∩W |
|W | for a typical batch W ,
which can be reduced to finding good upper and lower estimates of |K(x, x) − K(x, y)|
when y is not far from x.
• Furthermore, even when the vertex count is sufficiently large, our algorithm yields a
net with a mesh size of at most n−c/d , rather than the desired n−1/d .
(2) Generation on the Manifold Constraints: The algorithm seems adaptable to scenarios
where the manifold is not connected, featuring several components of potentially varying
dimensions. Detecting the local dimension at a given point is feasible by continuously
identifying clusters, which then serve to construct “orthogonal vectors,” as discussed in the
proof ideas. However, we have not considered the case where the manifold has a boundary.
(3) Sparse Random Geometric Graphs One can naturally extend the definition of the ran-
dom geometric graph to a sparse setting, namely connecting an edge {v, w} with probability
λ(n)p(kXv − Xw k), where λ(n) is a sparse parameter. It seems promising that our result
can be adapted to this setting when λ(n) = n−c with a sufficiently small c > 0 without
much modification and produce a net at the cost of a larger mesh size. However, in general,
it is interesting to understand how can this question be answered in a sparse setting.
(4) Random Graphs on Riemannian Manifolds: Consider the generation of a random
graph according to the intrinsic distance on a Riemannian Manifold M , where edges are
connected based on the probability related to the geodesic distance. The adaptation of our
algorithm to this context poses an intriguing question for further exploration.
1.4. Structure of the paper.
(1) In Section 2, we introduce the random graph model and the notations used throughout the
paper.
(2) In Section 3, we introduce the function K(x, y) and discuss its properties.
(3) In Section 4, we introduce the algorithm generateCluster and discuss necessary prerequisites
for the algorithm to successfully identify a valid cluster. P
(4) In Section 5, we discuss some regularity property of the function p 7→ u∈V p(kp − Xv k)
for a given cluster (V, v).
(5) In Sections 6 and 7, these are the two technical parts of the proof, where we prove that
under suitable constraints, one can produce new cluster with suitable geometric constraints.
(6) In Section 8, here we introduce a simple algorithm to either verify a given set of clusters
form a cluster net or return a vertex which will be used as a pivot to build the next cluster.
(7) In Section 9, we outline the main algorithm buildNet and prove Theorem 1.4.
(8) In Section 10, we prove Theorem 1.1 and Theorem 1.3 base on Theorem 1.4.
(9) In Section 11, we discuss about how the parameters in the algorithm can be chosen.
(10) In Appendix A, we gave the necessary background on differential geometry.

2. Random Graph Model


2.1. Notations. In this subsection, we define a few notations so we can use them throughout the
paper. We fix a positive integer N . For p ∈ RN and r ≥ 0, let
B(p, r) := {x ∈ RN : kx − pk < r},
where throughout this paper we use k − k to denote the Euclidean norm in RN . We use SN −1 to
denote the set of all unit vectors in RN , namely, w ∈ RN with kwk = 1.
7
For any (linear or affine) subspace H ⊆ RN , let PH be the orthogonal projection from RN to H.

Definition 2.1. Let H1 , H2 ⊆ RN be linear subspaces of dimension d ≥ 1. Define


(5) d(H1 , H2 ) := max kPH ⊥ vk.
v∈H2 , kvk=1 1

We also extend the definition to affine subspaces as follows. For any affine subspaces A1 , A2 ⊆ RN
of dimension d ≥ 1, take any points a1 ∈ A1 and a2 ∈ A2 , and define
d(A1 , A2 ) := d(A1 − a1 , A2 − a2 ).
Remark 2.2. When we consider only linear subspaces, the function
d : Gr(N, d) × Gr(N, d) → R≥0 ,

where Gr(N, d) denotes the Grassmannian of d-spaces in RN , is a distance function. However, one
needs to be careful when extending the definition of d to affine spaces, where d is no longer a
distance function, as d(A, A′ ) vanishes whenever A and A′ are parallel.

2.2. Random Graph Model. This subsection describes the random graph model.

2.2.1. Manifold and probability measure.

Assumption 2.3 (Manifold and probability measure). Let d be a fixed positive integer. Let M be a
d-dimensional compact complete connected smooth manifold embedded in RN with its Riemannian
metric induced from RN . Define the following.
(i) Let
κ := max max kII(u, v)p k0,
p∈M u,v∈Tp M,
kuk=kvk=1

where II denotes the second fundamental form of M (see Definition A.1 in the appendix).
(ii) Let rM,0 be the largest value such that for every point p ∈ M and for every real number
r ∈ (0, rM,0 ), the intersection B(p, r) ∩ M is connected. Here, we assume
rM,0 > 0.
and we allow rM,0 = ∞.
(iii) Let
(6) rM := 0.01 · min{1/κ, rM,0 }.
Note that since κ is strictly positive, we have rM < ∞.
Let µ be a probability measure on M and define µmin : [0, ∞) → [0, 1] by
(7) µmin (r) := min µ(B(p, r) ∩ M ),
p∈M

for each r ≥ 0. We assume that µmin satisfies

inf µmin (r) · r −d > cµ


r∈(0,rM ]

for some constant cµ > 0.


8
2.2.2. Distance-probability function.
Assumption 2.4 (distance-probability function). Let p : [0, ∞) → [0, 1] be a monotone strictly
decreasing continuous function. Let M be an embedded manifold satisfying Assumption 2.3, and let
diameuc (M )
denote the Diameter of M with respect to the Euclidean distance.
We assume p and M satisfies the following:
(1) p is Lipschitz continuous with a bounded Lipschitz constant denoted by Lp < ∞.
(2) The largest value ℓp ≥ 0 satisfying
(8) ∀ 0 ≤ a ≤ b ≤ 2diameuc (M ), p(a) − p(b) ≥ ℓp |b − a|,
is straightly positive.
Remark 2.5. Although it might seem intuitive to impose conditions on p independently of the
manifold M , incorporating a dependence on M is, in fact, a necessary condition for recovering the
manifold structure with respect to the Euclidean distance. We will illustrate this necessity with an
example in Section 2.2.4, following the introduction of our graph model.
Conversely, while not explicitly stated, it appears that with slight adjustment in the proof, our
theorems could also accommodate geodesic distances. This adjustment would involve assuming the
existence of a constant interval [0, c]—independent of M and µ—where ∀a, b ∈ [0, c], p(|b − a|) ≥
ℓp |b − a| for some positive constant ℓp .
2.2.3. Graph Model. Let V be a nonempty finite index set. We construct the random (simple)
graph G = G(V, M, µ, p) in the following way. The graph G has vertex set V. Let {Xi }i∈V be
i.i.d. random points in M generated according to the probability measure µ. Each pair of distinct
vertices i and j are connected by an edge in G with probability p(kXi − Xj k) independently. Here
is the formal definition of the graph.
Definition 2.6. Given an embedded manifold M in an Euclidean space, µ a probability measure
on M , and a distance-probability function p : [0, ∞) 7→ [0, 1]. The random geometric graph G =
G(V, M, µ, p) with vertex set V is the random graph with adjacency matrix A = (ai,j )i,j∈V given
by
( 
1 Ui,j ≤ p(kXi − Xj k) i 6= j,
(9) ai,j =
0 i = j,
where
• X = {Xi }i∈V are i.i.d. random points in M generated according to M ;
• S = {Ui,j }i,j∈V are i.i.d. random variables uniformly distributed over the interval [0, 1],
subject to the constraints that Ui,j = Uj,i , and Ui,i = 0.
Further, we write G(n′ , M, µ, p) to denote such random graph with a given vertex set of size n′
without specifying the vertex set itself.
For any subset V, U ⊆ V, let GV be the induced subgraph of G with vertex set V . We use N (i)
and NV (i) to denote the neighbors of i in V and the neighbors of i in V , respectively. For example,
NV (i) can be observed from GV ∪{i} . Further, let
(10) XV := {Xi }i∈V and UV,U := {Ui,j }v∈V,u∈U .
For simplicity, we also use UV := UV,V .
9
Ma Mb

Figure 1. Two closed curves.

2.2.4. Remark on the choice of ℓp with dependency on M . Let Ma and Mb denote two embedded
manifolds within R2 , characterized by the configurations depicted in Figure 1. It is evident that
an isometry F exists, respecting the intrinsic metric, such that F reflects the protruding segments
of Ma onto the corresponding indented segments of Mb and acts identically elsewhere. A key
observation is the existence of a constant t > 0 ensuring that, for any points p, q ∈ Ma satisfying
kp − qk ≤ t, the equation kF (p) − F (q)k = kp − qk holds.
Consider µa and µb as the uniform measures on Ma and Mb , respectively. Notice that µb is also
the pushforward measure of µa under F . Assume a function p : [0, ∞) → [0, 1] that is constant for
x ≥ t. Now, let us couple the two graphs G([n], Ma , µa , p) and G([n], Mb , µb , p): Let X1 , . . . , Xn
represent independent and identically distributed random points on Ma , based on the distribution
µa , and let Ui,j denote independent and identically distributed uniform random variables on [0, 1] for
i, j ∈ [n], with the stipulation that Ui,j = Uj,i . A graph is then formed with vertex set [n], connecting
vertices i and j if Ui,j ≤ p(kXi − Xj k), yielding a graph distribution equivalent to G([n], Ma , µa , p).
Analogously, using the same vertex set but connecting i and j if Ui,j ≤ p(kF (Xi ) − F (Xj )k) results
in a graph with the distribution of G([n], Mb , µb , p).
Given that p(kF (Xi )−F (Xj )k) = p(kXi −Xj k whether kXi −Xj k ≤ t (by virtue of the properties
of Ma and Mb ) or kXi − Xj k > t (as p is constant beyond t), the resultant random graphs are
indistinguishable. Consequently, it is impossible to recover the Euclidean structure from the graphs.

2.3. Parameter Assumptions. In this subsection, we introduce variables (“parameters”) along


with relations—equalities and inequalities—between them (“parameter assumptions”), which we
frequently refer to throughout the paper.
In the proof of this theorem, we will consider a graph with ns vertices, where 1 < s < 2,
instead of n vertices stated in the theorem. This alternation is made solely to enhance the
clarity of the proof. We partition the set of ns vertices into batches, each containing n vertices.
As a rough picture, the algorithm is employed to construct a cluster-net, progressively processing
each batch to generate a new cluster. Further, this modification does not change the theorem’s
statement, but it does lead to a modification of the constants, by a factor of up to 2. Next, we will
introduce some technical parameters in the following subsection.
We fix
ς ∈ (0, 1/4)
and let n be a positive integer. The vertex set V is defined to have a size:
(11) |V| = n · (d + 2) · ⌈nς ⌉.
10
This formulation allows the algorithm netBuild to divide V into (d + 2) · ⌈nς ⌉ batches, each of
size n. We will explain later for the reason of this formulation.
We will introduce a sequence of parameters
ε < η < δ < r,
whose roles are described in Section ??.
The parameters will be defined in the order of ε, η, δ, and r.
Let ε > 0 be a parameter such that
1
(12) µmin (ε/6) ≥ 2n−ς and ε≥ p n−1/2+ς ,
kKk∞ Lp
where K : M × M → [0, 1] is a function which depends on M , µ, and p p, defined in Section 3, and
kKk∞ := maxp,q∈M K(p, q). (As a technical note, the denominator kKk∞ Lp is strictly positive,
so we are not dividing by zero in (12). See Remark 3.5.)
Let Cgap := 210 . Let c1 ∈ (0, 1), C2 ≥ 1, and c3 ∈ (0, 1) be parameters such that
1
(13) 0 < c1 ≤ · ℓ2 · µmin (rM /4),
800 p

1/4 1/2
kKk∞ Cgap d1/2 Lp
(14) C2 ≥ 4 · 1/2 1/2
,
ℓp c1
and
ℓp
(15) 0 < c3 ≤ √ .
Cgap d
Let η > 0 be a parameter such that
n L2p o
(16) η ≥ max C2 · ε1/2 , √ ·ε .
c3 ℓp d
Let δ and r be the parameters given by

(17) δ := Cgap d · η and r := Cgap d2 · δ.
Finally we assume that the choice of r satisfies
(18) rM ≥ 24 Cgap d2 r and n ≥ 100.
We remark that this is always true when n is large enough.
Finally, let us notice that

(19) δ = Cgap dη & ε1/2 & n−ς/2d ,

where a & b means there exists C = poly(Lp , 1/ℓp , 1/kKk∞ , d, µmin (rM /4)) such that a ≥ Cb.

Remark 2.7. In Section 11 below, we discuss practical use of our results in this paper. There we
will be explicit about what the graph observer observes and what the graph observer can compute
to obtain “graph-observer-accessible” versions of parameters. Furthermore, Proposition 11.1 gives
an explicit test, which can be carried out by the graph observer, to check whether the parameters
are feasible.
11
2.4. Additional Notation and Tools. For each i ∈ V, let
(20) Hi := TXi M
be the tangent plane of M at the point Xi . Let diam(M ) and diamgd (M ) be the diameters of M
with respect to the Euclidean distance and the geodesic distance, respectively.
Let us also recall the standard Hoeffding’s inequality for sum of bounded independent random
variables.
Lemma 2.8 (Hoeffding’s inequality). For any n ≥ 1, let Y1 , Y2 , . . . , Yn be independent random
variables such that Yi ∈ [ai , bi ] for each i ∈ [n]. Then, for any t > 0, we have
n X X o  
2t2
P Yi − E Yi ≥ t ≤ 2 exp − Pn 2
.
i∈[n] i∈[n] i=1 (bi − ai )

If {Yi }i∈[n] are i.i.d. Ber(p) random variables, we have


n X X o  
2t2
P Yi − E Yi ≥ t ≤ 2 exp − .
n
i∈[n] i∈[n]

3. Common neighbor probability K(x, y)


Let us recall the definition of K introduced in Section
Definition 3.1. For x, y ∈ M , we define the common neighbor probability of x and y as
K(x, y) := EZ∼µ p(kx − Zk)p(ky − Zk).
Remark 3.2. For three different vertices i, j, k ∈ V, conditioned on Xi = xi and Xj = xj , the
probability of k ∈ N (i) ∩ N (j) is K(xi , xj ). This explains the name of K(x, y) above.
The main goal in this section is to derive the following property of K(x, y).
Lemma 3.3. For any x, y ∈ M , we have
K(x, x) + K(y, y)
K(x, y) ≤ − c1 · min{kx − yk2 , rM
2
},
2
where c1 is the parameter described in (13).
An immediate consequence of the lemma is
(21) K(x, y) ≤ kKk∞ − c1 · min{kx − yk2 , rM
2
},
where kKk∞ = maxx,y∈M K(x, y).
Suppose |V| is sufficiently large so that |N (i) ∩ N (j)| ≃ (|V| − 2)K(Xi , Xj ) for each {i, j} ∈
V
 ′ ′
2 . Then, by comparing |N (i) ∩ N (j)| with maxi ,j ∈V |N (i ) ∩ N (j )|, we can use (21) to derive
′ ′

information about the Euclidean distance kXi − Xj k.


Let us begin with some standard properties about K(x, y).
p
Remark 3.4. K(x, y) is Lipschitz continuous in each variable with Lipschitz constant ≤ Lp kKk∞ .
To see this, observe that
Z

(22) |K(x, y) − K(x , y)| ≤ p(kx − zk) − p(kx′ − zk) p(ky − zk) dµ(z)
M
Z

≤ Lp kx − x k p(ky − zk)dµ(z)
M
p
≤ Lp kx − x′ k kKk∞ .
12
The last inequality follows from Cauchy–Schwarz:
Z 2 Z
p(ky − zk) dµ(z) ≤ p(ky − zk)2 dµ(z) = K(y, y) ≤ kKk∞ .
M M

Using the formula (a − b)2 = a2 − 2ab + b2 , we have


Z
K(x, x) + K(y, y) 1
(23) K(x, y) = − (p(kx − zk) − p(ky − zk))2 dµ(z).
2 2 M
Since the integral on the right-hand side of the equation above is non-negative, it follows that
K(x, x) + K(y, y)
K(x, y) ≤ .
2
Recalling that K is continuous and M is compact, we find that the maximum of K is attained at
K(x, x) for some (not necessarily unique) point x ∈ M .
p
Remark 3.5. We remark that Lp kKk∞ ≥ |p′ (0)| · p(diam(M )) > 0 is strictly positive.
Let us now give a quick useful lemma about the function p as follows.
Proof of Lemma 3.3. It suffices to prove that there exists a point z0 ∈ B(x, 34 rM ) ∩ M such that
every point z ∈ B(z0 , 41 rM ) ∩ M satisfies
1
kz − xk − kz − yk ≥ min{kx − yk, rM }.
20
Suppose such point z0 exists. Consider z ∈ B(z0 , 14 rM ) ∩ M . From triangle inequality we have
kz − xk ≤ kz − z0 k + kz0 − xk ≤ rM .
Then, by the monotonicity of p,
 1  ℓp
p(kz − xk) − p(kz − yk) ≥ max p(s) − p s + min{kx − yk, rM } ≥ min{kx − yk, rM },
s∈[0,rM ] 20 20
where the last inequality follows from the definition of ℓp (See (8)).
Hence,
Z Z
2
(p(kx − zk) − p(ky − zk)) dµ(z) ≥ |p(kx − zk) − p(ky − zk)|2 dµ(z)
M B(z0 ,rM /4)∩M
r 
ℓ2p
M
≥ µmin · · min {kx − yk, rM }2
4  400
≥ 2c1 · min kx − yk2 , rM
2
.
rM
 ℓ2p
Here, we capture the term µmin 4 · 400 and introduce the parameter c1 with the constraint
(13) that
1  r  ℓ2
M p
c1 ≤ µmin · ,
2 4 400
to simplify our expression, where the additional factor of 21 was included to compensate for cancel-
lation in the derivation later.
With the above bound and Equation (23), the lemma immediately follows.
Now we prove that z0 exists. If kx − yk ≥ rM , then we can simply take z0 = x. In this case, for
every z ∈ B(x, 41 rM ) ∩ M , we have
kz − yk − kz − xk ≥ kx − yk − kz − xk − kz − xk ≥ rM /2.
13
It thus suffices to consider kx − yk < rM . Note that the desired inequality holds trivially when
x = y, so let us assume 0 < kx − yk < rM .
With a shift and a rotation, we may assume x = ~0 and T~0 M = Rd . From Proposition A.4 in the
appendix, we know that there exists a map φ : B(~0, rM ) ∩ Rd → M such that φ ◦ P |B(~0,rM )∩M =
IdB(~0,rM )∩M , where P is the orthogonal projection from RN to Rd . Furthermore, for any v ∈
B(~0, rM ) ∩ Rd , we can express φ(v) = v + φ̃(v), where φ̃(v) ∈ T~0 M ⊥ and kφ̃(v)k ≤ κkvk2 .
Now, we express y = P y + y ⊥ , where y ⊥ = y − P y = φ̃(P y). Note that since y 6= ~0, we know by
Corollary A.6 in the appendix, that P y 6= ~0, from which it makes sense to consider
Py rM
u := − and take z0 := φ( u) ∈ M.
kP yk 2
(6)
rM rM 0.01
First, with kP z0 k = 2 u = 2 < rM ≤ κ ,
r
rM 2 rM 4 3
kz0 k ≤ u + κ2 u < rM ,
2 2 4
and hence B := B(z0 , rM /4) ∩ M ⊆ B(~0, rM ) ∩ M . Therefore,
 r  
M
B = φ ◦ P (B) ⊆ φ B u, rM /4 ∩ Rd ,
2
since the projection P is a contraction.
Next, for any z ∈ B, we can decompose z = au+w with w ⊥ u. We will estimate kz −yk−kz −~0k
from below. First,
kz − yk2 − kzk2
(24) kz − yk − kz − ~0k = .
kz − yk + kzk
To estimate the numerator, we find by decomposing kz − yk2 that
kz − yk2 = hz − y, ui2 + kw − y ⊥ k2
= a2 + 2akP yk + kP yk2 + kwk2 − 2hw, y ⊥ i + ky ⊥ k2
≥ a2 + 2akP yk + kwk2 − 2kwk · ky ⊥ k
(25) = kzk2 + 2akP yk − 2kwk · ky ⊥ k.
Now we want to bound the second and the third summands of the above equation. First, a =
hz, ui = hP z, ui and P z ∈ B( r2M u, rM /4) ∩ Rd , we have
rM rM rM
a≥ − ≥ .
2 4 4
Second,
1 3
kwk ≤ kzk ≤ kz − z0 k + kz0 k ≤
rM + rM ≤ rM .
4 4
Last, with kP yk ≤ kyk < rM , we can apply the property of φ̃ to get that
(6)
(26) ky ⊥ k = kφ̃(P y)k ≤ κkP yk2 ≤ κrM kP yk ≤ 0.01kP yk.
With the above three estimates substituted into (25), we have
kz − yk2 ≥ kzk2 + (0.5 − 0.02)rM kP yk.
14
(26)
Together with kyk2 = kP yk2 + ky ⊥ k2 ≤ 1.01kP yk2 , the numerator of the right-hand side of
Equation (24) can be bounded below by

0.5 − 0.02
kz − yk2 − kzk2 ≥ rM kyk.
1.01

With the estimate kz − yk + kzk ≤ 3rM for the denominator in (24), we conclude

1
kz − yk − kz − ~0k ≥ kyk,
20

and the proof is completed. 

4. The Cluster-Finding Algorithm and Good Pairs


In this section, we describe our main algorithm (Algorithm 1) which we use multiple times
in this paper to find clusters. (See Definition 4.2 for the definition of a cluster.) We introduce
in Definition 4.3 the notion of a “good pair.” The upshot is that if we input a good pair into
Algorithm 1, then under certain assumptions, the algorithm gives a cluster (Lemma 4.4). The
main achievement of this section is Proposition 4.5, in which we obtain the first cluster using the
algorithm.
In the following, we are defining some events. Later, in Section ??, we are going to see that they
occur with high probability, but for now we are using them as assumptions.

Definition 4.1. For any W ⊆ V with |W | = n, the common neighbor event of W is


   
W −1/2+ς
(27) Ecn (W ) := ∀{i, j} ∈ , |NW (i) ∩ NW (j)|/n − K(Xi , Xj ) ≤ n ,
2

and the net event of W is


 
n
(28) Enet (W ) := ∀p ∈ M, {i ∈ W : Xi ∈ B(p, ε)} ≥ µmin (2ε/3) · .
2

The term “cluster” which we use throughout this paper is defined as follows.

Definition 4.2. A cluster is a pair (V, i) for i ∈ V ⊆ V which satisfies the following two properties:
(i) for each j ∈ V , kXj − Xi k < η.
(ii) |V | ≥ n1−ς .
Further, we say that (V, i) is a t-cluster for t > 0 if condition (i) is replaced by kXj − Xi k < t for
j ∈V.

The following is the algorithm generateCluster1 which we use to find clusters in the paper.

1See Subsection ?? for how the graph observer can practically use the algorithms in this paper. All of the four
algorithms in this paper are practical.
15
Algorithm 1: GenerateCluster
Input : V1 ⊆ V, a subset of vertices of size n.
V2 ⊆ V, a subset of at least 2 vertices.
Output: Wgc ⊆ V2 , a subset of vertices.
igc ∈ Wgc , a vertex.

Step 1. Sort {i, j} ∈ V22 according to |NV1 (i) ∩ NV1 (j)| from the largest to the smallest,
 
and return a list {i1 , j1 }, {i2 , j2 }, . . ., {iL , jL }, where L = |V22 | = V22 .
Step 2. Let m be the largest positive integer such that
1
|NV1 (im ) ∩ NV1 (jm )|/n ≥ |NV1 (i1 ) ∩ NV1 (j1 )|/n − c1 η 2 .
2
S  S 
Step 3. Consider a graph Ggc with vertex set Vgc := k∈[m] {ik } ∪ k∈[m] {jk } and edge
set Egc := {{ik , jk } : k ∈ [m]}.
Step 4. Take a pair (Wgc , igc ) where igc ∈ Vgc maximizes the size of neighbors in Ggc and
Wgc = {j ∈ Vgc : {igc , j} ∈ Egc } ∪ {igc }.
return (Wgc , igc )

Notice that by construction the output (Wgc , igc ) always satisfies igc ∈ Wgc ⊆ V2 .
Definition 4.3. A pair (W1 , W2 ) of subsets of V is said to be a good pair if the following
conditions are satisfied:
• ∅ 6= W2 ⊆ W1 ,
• |W1 | = n,
• for every i ∈ W2 , there exists p ∈ M such that
L
(i) kXi − pk < Cgap d ℓpp ε, and
(ii) {j ∈ W1 : kp − Xj k < ε} ⊆ W2 .
Lemma 4.4 (Good pair lemma). Assume that the parameters are feasible. If (W1 , W2 ) is a good
pair and W1 is a sample within the events Ecn (W1 ) and Enet (W1 ), then the output (Wgc , igc ) of
Algorithm 1 with input V1 = W1 and V2 = W2 is a cluster with Wgc ⊆ W2 .
Proof. Before even discussing the output (Wgc , igc ), let us begin by checking that (W1 , W2 ) is an
appropriate input for the algorithm. Namely, we argue why W2 has at least two elements, and
along the way we introduce a set S which we will use in later parts of this proof.
Since (W1 , W2 ) is a good pair, we know that W2 is nonempty. Thus, Lemma 3.3 implies that
there exists i0 ∈ W2 such that
K(Xi0 , Xi0 ) = max K(Xi , Xj ).
i,j∈W2
Consider the set  
Lp
S := j ∈ W2 : kXj − Xi0 k < 2Cgap d ε .
ℓp
L
Since (W1 , W2 ) is a good pair and i0 ∈ W2 , there exists p ∈ M such that kXi0 − pk < Cgap d ℓpp ε,
and
S ′ := {j ∈ W1 : kp − Xj k < ε} ⊆ W2 .
Using the triangle inequality together with the above inclusion, we obtain
S ′ ⊆ S ⊆ W2 .
We know from the event Enet (W1 ) that
   ε  n (12)
′ 2ε n n
|S | ≥ µmin · ≥ µmin · ≥ 2n−ς · = n1−ς ,
3 2 6 2 2
16
which implies that
|W2 | ≥ |S| ≥ |S ′ | ≥ n1−ς .
In particular, by the feasibility assumption, we know n ≥ 100, and thus |W2 | ≥ 1003/4 > 31. Hence,
we have established that (W1 , W2 ) is an appropriate input for Algorithm 1.
Now we will show (Wgc , igc ) is a cluster starting with condition (i) from Definition 4.2. For any

pair {s1 , s2 } ∈ S2 , the event Ecn (W1 ) implies that
|NW1 (s1 ) ∩ NW1 (s2 )| 1
≥ K(Xs1 , Xs2 ) − n− 2 +ς .
n
L
By the definition of S, both kXs1 − Xi0 k and kXs2 − Xi0 k is bounded by 2Cgap d ℓpp ε. Together with
p
the Lipschitz constant of K is bounded above by Lp kKk∞ (See Remark 3.4), we have
p Lp
K(Xs1 , Xs2 ) ≥ K(Xi0 , Xi0 ) − 2Lp kKk∞ · 2Cgap d ε.
ℓp
Combining the two inequalities above yields
|NW1 (s1 ) ∩ NW1 (s2 )| p Lp 1
(29) ≥ K(Xi0 , Xi0 ) − 2Lp kKk∞ · 2Cgap d ε − n− 2 +ς ,
n ℓp

for any pair {s1 , s2 } ∈ S2 . Since S has at least n1−ς > 2 elements and because S ⊆ W2 , we have
the following bound:
|NW1 (i′ ) ∩ NW1 (j ′ )| p L2p 1
(30) max ≥ K(Xi0 , Xi0 ) − 4 kKk∞ Cgap d ε − n− 2 +ς .
W
{i′ ,j ′ }∈( 22 ) n ℓp

Now consider the graph Ggc from Algorithm 1. Observe that a pair {i, j} ∈ W22 appears as an
edge in Ggc if and only if
|NW1 (i) ∩ NW1 (j)| |NW1 (i′ ) ∩ NW1 (j ′ )| 1
(31) ≥ max − c1 η 2 .
n {i′ ,j ′ }∈(W22 ) n 2

Consider any pair {i, j} ∈ W22 which satisfies the above inequality. We claim kXi − Xj k < η.
Suppose, for the sake of contradiction, that kXi − Xj k ≥ η. Then by Ecn (W1 ) and Lemma 3.3,
we find
|NW1 (i) ∩ NW1 (j)| 1 1
(32) ≤ K(Xi , Xj ) + n− 2 +ς ≤ K(Xi0 , Xi0 ) − c1 η 2 + n− 2 +ς ,
n
where we also use that rM ≥ η in the last inequality. This follows from the feasibility assumption:
rM ≥ 24 Cgap d2 r > r > δ > η. Combining (30), (31), and (32), we find the following inequality
p L2p 1 1
4 kKk∞ Cgap d ε + 2n− 2 +ς ≥ c1 η 2 ,
ℓp 2
which will be a contradiction if
1 p L2p 1
(33) c1 η 2 ≥ 4 kKk∞ Cgap d > 2n− 2 +ς .
4 ℓp
1/4 1/2 1/2
kKk C d L
p 1/2
This is precisely the reason why we impose the condition η ≥ 4 ∞ 1/2gap1/2 ε from (16)
ℓp c 1
p L
(also (14)) and ε ≥ kKk∞ Lp n−1/2+ς from (12). With these two conditions and Cgap d ℓpp > 1,
(33) holds and hence a contradiction follows.
The argument above shows that every edge {i, j} in Ggc satisfies kXi − Xj k < η. Since for each
i ∈ Wgc − {igc }, the pair {igc , i} is an edge in Ggc by construction, we have established (i).
17
W2 
It remains to establish (ii). Consider any pair {i′ , j ′ } ∈ 2 . Within the event Ecn (W1 ), we have
|NW1 (i′ ) ∩ NW1 (j ′ )| 1 1
≤ K(Xi′ , Xj ′ ) + n− 2 +ς ≤ K(Xi0 , Xi0 ) + n− 2 +ς .
n
This shows that
|NW1 (i′ ) ∩ NW1 (j ′ )| 1
K(Xi0 , Xi0 ) ≥ max − n− 2 +ς .
W2
{i ,j }∈( 2 )
′ ′ n

Using the above estimate with (29), we find that every pair {s1 , s2 } ∈ S2 satisfies
|NW1 (s1 ) ∩ NW1 (s2 )| |NW1 (i′ ) ∩ NW1 (j ′ )| p L2p 1
≥ max − 4 kKk∞ Cgap d ε − 2n− 2 +ς
n W
{i′ ,j ′ }∈( 22 ) n ℓp
|NW1 (i′ ) ∩ NW1 (j ′ )| 1
> max − c1 η 2 ,
{i′ ,j ′ }∈(W22 ) n 2
where the last inequality follows from the parameter assumptions. This implies that every pair
S
{s1 , s2 } ∈ 2 is an edge in Ggc . Therefore, the maximum degree in Ggc is at least |S|−1 ≥ n1−ς −1,
and thus |Wgc | ≥ n1−ς . We have completed the proof. 
As a consequence of the lemma above, we can use Algorithm 1 to find the first cluster.
Proposition 4.5. Assume that the parameters are feasible. For any subset W ⊆ V with |W | = n,
within the events Enet (W ) and Ecn (W ), Algorithm 1 with input V1 = W and V2 = W returns a
cluster (Wgc , igc ) with igc ∈ Wgc ⊆ W .
Proof. It is straightforward to check that (W, W ) is a good pair. This proposition therefore follows
immediately by applying Lemma 4.4. 

5. Regularity of pα
In the previous section, we described how we can use Algorithm 1 to generate a cluster. When
we have a collection of clusters (Vα , iα ), indexed by α, we can define a corresponding collection of
functions pα : M → R, which is an approximation of the function q 7→ p(kq − Xiα k) obtained by
averaging p(kq − Xi k) for i ∈ Vα . Intuitively, due to the concentration of measure, for any given
i ∈ V, we expect
X p(kXi − Xj k) |Ni ∩ Vα |
p(kXi − Xiα k) ≃ pα(Xi ) := ≃ ,
|Vα | |Vα |
j∈Vα

which allows us to estimate kXi − Xiα k. The goal of this section is to derive regularity property of
pα , stated in Lemma 5.3.
Throughout this section, let us fix some nonempty finite index set A. We shall begin by defining
what we call the cluster event, within which we have a collection {(Vα , iα )}α∈A of clusters with the
desired property.
Definition 5.1. For a collection of pairs {(Vα , iα )}α∈A with iα ∈ Vα ⊆ V, the cluster event of
the collection {(Vα , iα )}α∈A is defined as

Eclu ({(Vα , iα )}α∈A ) := for each α ∈ A, the pair (Vα , iα ) is a cluster .
The discussion in this section is within the event
Eclu ({(Vα , iα )}α∈A ).
We also assume that the parameters are feasible throughout this section.
18
Definition 5.2. For each α ∈ A, define the function pα : M → R by
X p(kq − Xj k)
pα(q) := .
|Vα |
j∈Vα

The lemma below describes a useful regularity property of the function pα , which we are going
to use in Sections 6 and 7.
Lemma 5.3 (Regularity of pα ). Fix an index α ∈ A and assume the event Eclu ((Vα , iα )). Suppose
q ∈ M satisfies
 0.9r ≤ kq − Xiα k ≤ 2r. Suppose that σ ∈ {−1, +1}, 16η/r ≤ τ ≤ 1, and
u ∈ SN −1 = x ∈ RN : kxk = 1 satisfy
D q − Xiα E
u, σ = τ,
kq − Xiα k
where h·, ·i denotes the usual Euclidean inner product in RN . Then we have the following items.
(a) For every i ∈ Vα ,
τ D q − Xi E 3
≤ u, σ ≤ τ.
2 kq − Xi k 2
(b) If σ = +1, then for any 0 ≤ t < 0.1τ r,
1
pα (q) − 2Lp τ t ≤ pα (q + tu) ≤ pα (q) − ℓp τ t.
4
(c) If σ = −1, then for any 0 ≤ t < 0.1τ r,
1
pα (q) + ℓp τ t ≤ pα (q + tu) ≤ pα (q) + 2Lp τ t.
4
Proof. (a) Take an arbitrary i ∈ Vα . First, since kXi − Xiα k < η, we have
hu, σ(q − Xi )i − τ kq − Xiα k = hu, σ(Xiα − Xi )i + hu, σ(q − Xiα )i − τ kq − Xiα k
(34) = hu, σ(Xiα − Xi )i < η.

Together with kq − Xiα k − kq − Xi k ≤ kXi − Xiα k < η (the triangle inequality), we find
D q − Xi E D q − Xi E kq − Xiα k − kq − Xiα k + kq − Xi k
u, σ − τ = u, σ −τ
kq − Xi k kq − Xi k kq − Xi k
D q − Xi E kq − Xiα k η
≤ u, σ −τ +
kq − Xi k kq − Xi k kq − Xi k
(34) 2η 2η 4η
≤ ≤ ≤ ,
kq − Xi k kq − Xiα k − η r
(17)
r
where the last inequality follows from a coarse estimate η ≤ Cgap ≤ 0.1r and the assumption
kq − Xiα k ≥ 0.9r. With the assumption that τ ≥ 16η/r, part (a) of the lemma follows.
(b) Now we are ready to prove the second statement. To bound
X p(kq + tu − Xi k)
pα(q + tu) = ,
|Vα |
i∈Vα

we will need to estimate kq + tu − Xi k for i ∈ Vα . From now on, let us fix i ∈ Vα .


For t ≥ 0, using part (a), we can show
D q − Xi E D q − Xi E τ
kq + tu − Xi k ≥ q + tu − Xi , = kq − Xi k + t u, ≥ kq − Xi k + t.
kq − Xi k kq − Xi k 2
19
Imposing the condition that t < 0.1τ r, we can also derive an upper bound for kq + tu − Xi k by
a similar approach.
s
D q − Xi E2
kq + tu − Xi k ≤ q + tu − Xi , + t2
kq − Xi k
r
3τ 2
≤ kq − Xi k + t + t2
2
s
 3τ  t2
= kq − Xi k + t 1+
2 (kq − Xi k + t 3τ
2 )
2

 3τ  t2 
≤ kq − Xi k + t 1+
2 (kq − Xi k + t 3τ
2 )
2

3τ t2
(35) ≤ kq − Xi k + t + .
2 kq − Xi k
Moreover, by the assumption kq − Xiα k ≥ 0.9r,
t 0.1τ · r 0.1τ · r 0.1τ · r
≤ ≤ ≤ = 0.2τ.
kq − Xi k kq − Xiα k − kXiα − Xi k 0.9r − η r/2
Substituting the above estimate into (35), we get kq +tu−Xi k ≤ kq −Xi k+1.7tτ . To summarize,
we have obtained
(36) kq − Xi k + 0.5τ t ≤ kq + tu − Xi k ≤ kq − Xi k + 1.7τ t.
Since the above expression is valid for every i ∈ Vα , we have
X
pα (q + tu) = p(kq + tu − Xi k)/|Vα |
i∈Vα
(36) X 
≤ p kq − Xi k + 0.5τ t /|Vα |
i∈Vα
X
≤ p(kq − Xi k)/|Vα | − 0.5ℓp τ t
i∈Vα
= pα (q) − 0.5ℓp τ t,
where we applied (8) in the last inequality, and it is valid since by the upper bound kq − Xiα k ≤ 2r,
the parameter assumptions, and the feasibility assumption, we have
kq − Xi k + 0.5τ t ≤ kq − Xiα k + η + (0.5τ )(0.1τ r) ≤ 2r + η + r ≤ rM .
For the lower bound of pα (q + tu), we use the Lipschitz constant of p:
X
pα (q + tu) = p(kq + tu − Xi k)/|Vα |
i∈Vα
(36) X 
≥ p kq − Xi k + 1.7τ t /|Vα |
i∈Vα
X
≥ p(kq − Xi k)/|Vα | − 1.7Lp τ t
i∈Vα
= pα (q) − 1.7Lp τ t.
As a consequence, we have obtained a slightly stronger estimate than the one stated in the lemma.
20
(c) In the case σ = −1, we can estimate in the same way to get
1 3
kq − Xi k − τ t ≥ kq + tu − Xi k ≥ kq − Xi k − τ t,
4 2
for any i ∈ Vα . The statement for this case follows by applying the same estimate for pα (q +tu). 

6. Forming almost orthogonal clusters


Now that under appropriate assumptions we have managed to find one cluster, let us find more.
Suppose we start with one cluster (V0 , i0 ). In this section, we would like to find d more clusters,
(V1 , i1 ), . . . , (Vd , id ) so that the list Xi0 , Xi1 , . . . , Xid is “almost orthogonal” in the sense that the d
vectors {Xiα − Xi0 : α ∈ [d]} are geometrically close to an orthogonal frame of vectors in Rd and
the size of each vector in the collection is close to r (see Definition 6.1).
We are going to find these clusters inductively. The base case of the induction is finding the cluster
(V0 , Xi0 ), which we have done. The inductive case is the main content of this section. Suppose
that for some 0 ≤ k ≤ d − 1, we have found clusters (V0 , i0 ), . . . , (Vk , ik ) such that Xi0 , . . . , Xik are
almost orthogonal. With a fresh batch W ⊆ V of n vertices—fresh meaning we have not used any
vertex in it before in previous analysis—we find a new cluster (Vk+1 , ik+1 ) such that Xi0 , . . . , Xik+1
is almost orthogonal.
Our plan to find the new cluster is as follows. We define a subset W ′ ⊆ W with the property
that for any i ∈ W ′ , the point Xi would make the list Xi0 , Xi1 , . . . , Xik , Xi an almost orthogonal
list (see Proposition 6.6). Then we show that (W, W ′ ) is a good pair, and hence we can find a
desired cluster inside W ′ , using the good pair lemma (Lemma 4.4).
Definition 6.1. For a non-negative integer k and i1 , . . . , ik ∈ V, the almost-orthogonal event
of the finite sequence i0 , . . . , ik , denoted by Eao (i0 , . . . , ik ), consists of
• for each index α ∈ [k], we have kXiα − Xi0 k − r ≤ δ, and

• for distinct indices α, β ∈ [k], we have kXiα − Xiβ k − 2r ≤ δ.
We also extend the definition to Eao (i0 ), which is the trial event.
Definition 6.2. Suppose that A is an index set. For a collection {Vα }α∈A of nonempty subsets of
V and for a subset W ⊆ V, the navigation event of ({Vα }α∈A , W ) is
 X p(kXi − Xj k) 
|N (i) ∩ Vα | −1/2+ς
(37) Enavi ({Vα }α∈A , W ) := ∀i ∈ W, ∀α ∈ A, − ≤n .
|Vα | |Vα |
j∈Vα

Throughout this section, we fix k ∈ [0, d − 1]. Consider a collection of pairs (Vα , iα ) with
iα ∈ Vα ∈ V for α ∈ [0, k], and a new batch W of vertices:
k
[
W ⊆V\ Vα ,
α=0
such that |W | = n.
The discussion in this section is within the event
Eortho = Eortho ({(Vα , iα )}α∈[0,k] , W )
(38) := Eao (i0 , . . . , ik ) ∩ Eclu ({(Vα , iα )}α∈[0,k] ) ∩ Enavi ({Vα }α∈[0,k] , W ) ∩ Ecn (W ) ∩ Enet (W ).
Consider the subset W ′ ⊆ W defined as
n √ √
(39) W ′ := i ∈ W : ∀α ∈ [k], p( 2r + 0.95δ) ≤ |N (i) ∩ Vα |/|Vα | ≤ p( 2r − 0.95δ)
o
and p(r + 0.95δ) ≤ |N (i) ∩ V0 |/|V0 | ≤ p(r − 0.95δ) .
21
Recall that N (i) denotes the set of neighbors of i in V. Thus, N (i) ∩ Vα is the same set as
NVα (i).
Consider that given a collection of vertex sets {Vα }α∈[0,k] and a vertex set W , we can determine
the vertex set W ′ by examining the structure of graph G. The primary objective of this section
is to demonstrate that, conditioned on the event Eortho , it is possible to identify a specific cluster
(Wgc , igc ) within W ′ . Furthermore, the index igc , in conjunction with the sequence of indices
(i0 , . . . , ik ), will be shown to satisfy the event Eao (i0 , . . . , ik , igc ). Here is the main objective in this
section:
Proposition 6.3. For k ∈ [0, d − 1], let (V0 , i0 ), . . . , (Vk , ik ) be k + 1 pairs with iα ∈ Vα ⊂ V and
W ⊂ V be a subset of size |W | = n which is disjoint from ∪α∈[0,k] Vα . Let W ′ be the set defined
in (39). Given the occurrence of the event Eortho ({(Vα , iα )}, W ) as described in (38). If we run
the Algorithm (1) with input (W, W ′ ), then the output (Wgc , igc ) is a cluster and (i0 , . . . , ik , igc )
satisfies the event Eao (i0 , . . . , ik , igc ). In other words,
Eortho ({(Vα , iα )}, W ) ⊆ Eclu (Wgc , igc ) ∩ Eao (i0 , . . . , ik , igc ).
In this section, we will first prove Proposition 6.3 and then prove the corollary. Before we proceed,
let us introduce additional notations. For brevity, let us introduce some additional notations to be
used in this section. Without loss of generality, let us assume
Xi0 = ~0 ∈ RN
and recall from (20) that
(40) Hi0 = TXi0 M.
Let
P : R N → H i0
be the orthogonal projection to Hi0 . For each point q ∈ M , we write
q = q⊤ + q⊥,
where q ⊤ = P q and q ⊥ = q − q ⊤ . We apply Proposition A.4 with p = ~0, H = H0 , and ζ = 1.
Together with rM ≤ 0.01κ from its definition (6), there exists a local inverse of P
φ : B(~0, rM ) ∩ Hi0 → M
such that
(41) P ◦ φ = IdB(~0,rM )∩Hi ,
0

(42) B(~0, rM ) ∩ M ⊆ φ(B(~0, rM ) ∩ Hi0 ),


(43) ∀x ∈ B(~0, rM ) ∩ Hi0 , kφ(x) − xk ≤ κkxk2 , and
(44) ∀x ∈ B(~0, rM ) ∩ Hi0 , dgd (φ(x), q0 ) ≤ kxk(1 + κ2 kxk2 /2),
where in (44), dgd (x, y) is the geodesic distance between two points x, y ∈ M . We refer the reader
to the appendix for details.
6.1. W ′ is nonempty. Since our objective is to extract a cluster from W ′ by applying Algorithm
1 to (W, W ′ ), in this subsection we will first show W ′ 6= ∅, which is one requirement of a good pair
(see Definition 4.3).
Lemma 6.4 (Existence of a navigation point in M ). Assuming the event Eao (i0 , i1 , . . . , iα ). There
exists p ∈ M such that

∀α ∈ [k], kp − Xiα k − 2r ≤ 0.9δ and kp − Xi0 k − r ≤ 0.9δ.
22
Proof. Let p′ ∈ Hi0 be a point with kp′ k = kp′ − Xi0 k = r which is orthogonal to Xi⊤α = P Xiα for
every α ∈ [k]. Such a point p′ exists because k < d = dim(Hi0 ). Since p′ ∈ B(~0, rM ) ∩ Hi0 is in the
domain of φ, we can set p = φ(p′ ) ∈ M . Notice that p⊤ = P p = P φ(p′ ) = p′ .
Fix α ∈ [k]. Our first step is to compare kXiα − pk and kXi⊤α − p⊤ k. We have
kXi⊤α − p⊤ k ≤ kXiα − pk ≤ kXi⊤α − p⊤ k + kp − p⊤ k + kXiα − Xi⊤α k,
where the first inequality relies on the fact that P is a contraction, and the second inequality follows
from the triangle inequality. Let us first simplify the right-hand side of the above expression. For
the second summand, using (43), we have
(6) 0.01 2 (18) 1 (17)
(45) kp − p⊤ k ≤ κkp⊤ k2 = κr 2 ≤ r ≤ 0.01 2
r ≤ 2−4 δ,
rM Cgap d
where we relied on rM ≤ 0.01/κ, the parameter assumptions, and the feasibility assumption. The
third summand can be bounded similarly. Within the event Eao (i0 , i1 , . . . , iα ),
Xiα ∈ B(~0, rM ) ∩ M ⊆ φ(B(~0, rM ) ∩ Tq0 M ),
and together with (41) we have φ ◦ P (Xiα ) = Xiα . Now repeat the same derivation as in (45),
r
(46) kXi⊥α k = kφ(Xi⊤α ) − Xi⊤α k ≤ κkXi⊤α k2 ≤ κkXiα k2 ≤ κ(2r)2 ≤ 4 = 2−4 δ.
2 Cgap d2
Together we conclude that
(47) kXi⊤α − p⊤ k ≤ kXiα − pk ≤ kXi⊤α − p⊤ k + 2−3 δ.
The second step is to estimate kXi⊤α − p⊤ k. Let us begin with the upper estimate. Within the event
Eao (i0 , i1 , . . . , ik ), we have
kXi⊤α k ≤ kXiα k ≤ r + δ,
and by taking hXi⊤α , p⊤ i = 0 into account, we obtain
q p √ 1 1 δ2
kXi⊤α − p⊤ k = kXi⊤α k2 + kp⊤ k2 ≤ (r + δ)2 + r 2 ≤ 2r + √ δ + .
2 4 r
r
Recall δ = Cgap d2 from (17), we have
√ 1
(48) kXi⊤α − p⊤ k ≤ 2r + √ (1 + 2−6 )δ.
2
To derive a lower estimate of kXi⊤α − p⊤ k, we recycle the estimate of kXi⊥α k from (46) to get
kXi⊤α k ≥ kXiα k − kXi⊥α k ≥ r − δ − 2−4 δ,
and hence
p √ √ √
(49) kXi⊤α − p⊤ k ≥ (r − (1 + 2−4 )δ)2 + r 2 ≥ 2r − (1 + 2−4 )δ/ 2 ≥ 2r − 0.9δ.
By combining (47), (48), and (49), we obtain
√ √
2r − 0.9δ ≤ kXiα − pk ≤ 2r + 0.9δ.
It remains to derive the statement for α = 0, which follows immediately from our previous
estimate of kp − p⊤ k in (45):

kXi0 − pk − r = kXi0 − pk − kXi0 − p⊤ k ≤ kp − p⊤ k ≤ 2−4 δ.


The proof is completed. 
23
Lemma 6.5. Assuming the events Eao (i0 , . . . , ik ), Eclu ({(Vα , iα )}α∈[0,k] ), Enavi ({Vα }α∈[0,k] , W ), and
Enet (W ). We have |W ′ | ≥ n1−ς .
Proof. With the occurrence of event Eao (i0 , . . . , ik ), we can pick a point p ∈ M described in Lemma
6.4. Within the event Enet (W ), the set
W ′′ := {i ∈ W : kXi − pk < ε}
satisfies |W ′′ | ≥ n1−ς . Thus it suffices to show that W ′′ ⊆ W ′ .
Take an arbitrary i ∈ W ′′ . For any α ∈ [k], the event Enavi ({(Vα }α∈[0,k] , W ) guarantees that
|N (i) ∩ Vα |
(50) ≥ pα (Xi ) − n−1/2+ς .
|Vα |
For each j ∈ Vα , within the event Eclu ({(Vα , iα )}α∈[0,k] ) we have
kXi − Xj k ≤ kXi − pk + kp − Xiα k + kXiα − Xj k
√ (17) √
≤ ε + 2r + 0.9δ + ε < 2r + 0.91δ.

Therefore, p(kXi − Xj k) > p( 2r + 0.91δ). Using this in (50) yields
|N (i) ∩ Vα | √
(51) ≥ pα (Xi ) − n−1/2+ς > p( 2r + 0.91δ) − n−1/2+ς .
|Vα |
By the feasibility assumption (18), we have
√ √
0 < 2r + 0.91δ < 2r + 0.95δ < rM ,
and therefore, √ √
p( 2r + 0.91δ) − p( 2r + 0.95δ) ≥ ℓp · 0.04δ > n−1/2+ς .
Using the above inequality in (51) yields
|N (i) ∩ Vα | √
> p( 2r + 0.95δ).
|Vα |
In a similar fashion, we can show
|N (i) ∩ Vα | √
< p( 2r − 0.95δ),
|Vα |
|N (i) ∩ V0 |
> p(r + 0.95δ), and
|V0 |
|N (i) ∩ V0 |
< p(r − 0.95δ).
|V0 |
This shows that i ∈ W ′ . Since i was arbitrarily chosen, we conclude that W ′′ ⊆ W ′ , and hence
|W ′ | ≥ n1−ς . 
6.2. Existence of a point p′ . In this subsection, we show that for any i ∈ W ′ , there exists p′ ∈ M
such that kp′ − Xi k is small and {j ∈ W : Xj ∈ B(p′ , ε)} ⊆ W ′ . This is the key condition in the
definition of a good pair (See Definition 4.3).
To demonstrate the proof idea, let us make a false assumption where the following terms are the
equivalent, with no gaps between them:
|N (i) ∩ Vα |
= pα (i) = p(kXi − Xiα k)
|Vα |
for ′
√ every i ∈ W
√ and α ∈ [0, k]. Consider i ∈ W . In this case, kXi − Xiα√
k falls within the interval
( 2r − 0.95δ, 2r + 0.95δ) for α ∈ [k], and similarly for α = 0 but with 2r replaced by r.
24
Solving a linear equation allows us to identify a small vector u with its magnitude proportional
to ε (also depending on d) within span({Xi − Xiα }α∈[0,k] ) such that kXi + u − Xiα k is contained in
a narrower interval √ √
( 2r − 0.95δ + Cε, 2r + 0.95δ − Cε)

for each α ∈ [k] (and similarly for α = 0 with 2r replaced by r), where C ≥ 1 is a constant greater
than 1 to absorb some error terms. Now, for every j ∈ W with Xj ∈ B(Xi + u, ε), the triangle
inequality implies kXj − Xiα k lies within
√ √
( 2r − 0.95δ + Cε − ε, 2r + 0.95δ − Cε + ε)

for α ∈ [k] (and the same for α = 0 with 2r replaced by r). This in turn implies that j ∈ W ′ ,
which is our desired property for p′ . However, there is a complication: The point Xi + u is not
necessarily lies in M . To address that, we will to find u∗ , perpendicular to Xi − Xiα for α ∈ [0, k],
ensuring Xi +u+u∗ ∈ M . If the span of {Xi −Xiα }α∈[0,k] aligns closely with TXi M , u∗ can be chosen
with a magnitude proportional to that of u. Further, due to u∗ is orthogonal to {Xi − Xiα }α∈[0,k]
and Xi + u − Xiα has a significantly larger magnitude than u∗ , one can show
kXi + u + u∗ − Xiα k − kXi + u − Xiα k
is of order proportional to ku∗ k2 = O(ε2 ), which has negligible impact on the argument above.
Therefore, one can take p′ = Xi + u + u∗ .
The actual argument, while based on the simple idea above, must account for the gaps difference
between |N (i)∩V
|Vα |
α|
, pα (Xi ), and p(Xi − Xiα ). Notably, the gap of pα (Xi ) and p(kXi − Xiα k) can be
proportionally related to η, which is significantly larger than kuk. If we were naively extend the
above argument and attempt to treat the gap between |N (i) ∩ Vα |/|Vα | and p(kXi − Xiα k) merely
as error terms, then we could only construct such u with kuk = O(η), a size proportional the radius
of the current clusters. This approach leads to a geometrically increasing radius for the clusters
subsequently generated, which is not desirable. Hence, to attain the required level of preciseness,
we will work directly on pα , which relies on its regularity property established in Lemma 5.3.
Let us begin with a distance estimate kXi − Xiα k for i ∈ W ′ .
Proposition 6.6. Assume event Eclu ({(Vα , iα )}α∈A ) occurs. For any i ∈ W ′ , the point Xi ∈ M
satisfies
√ √
(52) ∀α ∈ [k], 2r − δ ≤ kXi − Xiα k ≤ 2r + δ and r − δ ≤ kXi − Xi0 k ≤ r + δ.
Proof. Given the occurrence of event Eclu ({(Vα , iα )}α∈A ) and based on the definition W , it follows
that for α ∈ [k],
√ √
(53) p( 2r + 0.95δ) − n−1/2+ς ≤ pα (Xi ) ≤ p( 2r − 0.95δ) + n−1/2+ς
and
(54) p(r + 0.95δ) − n−1/2+ς ≤ p0 (Xi ) ≤ p(r − 0.95δ) + n−1/2+ς .
Notice that we can bound kXi −Xiα k from the assumption on Xi . Within the event Eclu ({Vα , iα }α∈A ),
for α ∈ [k] we have
(53) √
p(kXi − Xiα k + η) ≤ pα (Xi ) ≤ p( 2r − 0.95δ) + n−1/2+ς .
Next, from the assumptions of the parameters, we have
(17) √ (16),(15) 2 L2 (12) C 2 L
Cgap p gap p 100 −1/2+ς
δ ≥ Cgap d · η ≥ 2
ε ≥ p n−1/2+ς ≥ n ,
ℓp kKk∞ ℓ2p ℓp
25
L
where the last inequality relies on the facts that ℓpp ≥ 1 by definition, kKk∞ ≤ 1, and Cgap ≥ 100.
Now relying on the definition (8) of ℓp ,
√ √ √
p( 2r − 0.95δ) + n−1/2+ς ≤ p( 2r − 0.96δ) − ℓp · 0.01δ + n−1/2+ς ≤ p( 2r − 0.96δ),
√ √
and hence kXi − qα k ≥ 2r − 0.96δ − η ≥ 2r − δ.
With the same approach, from (53) and (54), we can show, for each α ∈ [k],
√ √
2r − δ ≤ kXi − qα k ≤ 2r + δ,
and r − δ ≤ kXi − q0 k ≤ r + δ. 
6.2.1. Small angles between planes. Here we will derive that span({Xi − Xiα }α∈[0,k] ) aligns closely
with the tangent plane TXi M .
Lemma 6.7. Consider iW ′ ∈ W ′ . We define a N by k + 1 matrix
Y = (Y0 , . . . , Yk )
with column vectors {Yα }α∈[0,k] defined as
Yα := XiW ′ − Xiα ,
for each α ∈ [0, k]. Furthermore, we define two linear subspaces

HY := span {Yα }α∈[0,k] ⊆ RN and HiW ′ := TXi M.
W′

Then, within the event Eortho , we have


e Y , Hi ′ ) ≤ 1/Cgap ,
d(H W

e Y is any d-dimensional subspace in RN satisfying


where H
e Y ⊆ span(HY , Hi ∩ HY⊥ ).
HY ⊆ H 0

The will be obtained by showing that both spaces form only a small angle with Hi0 .
Lemma 6.8. Consider iW ′ ∈ W ′ and adapt the notations described in Lemma 6.7. Recall that P
is the orthogonal projection to Hi0 and u⊥ = u − P u for every u ∈ RN . Within the event Eortho ,
we have
1
max ku⊥ k ≤ .
u∈HY : kuk=1 4Cgap
Notice that
Yv kPH ⊥ Y kop
(55) max ku⊥ k = max PH ⊥ ≤ 0
,
u∈HY : kuk=1 v∈R[0,k] :kvk=1 i0 kY vk smin (Y )
where PH ⊥ is the orthogonal projection to Hi⊥0 , Y is the N by k + 1 matrix with columns
i0
(Y0 , Y1 , . . . , Yk ), and smin (Y ) is the least singular value of Y . With k + 1 ≤ d ≤ N , we have
p
smin (Y ) = smin (Y T Y ),
and (Y T Y )α,β = hYα , Yβ i for α, β ∈ [0, k].
Lemma 6.9. Consider iW ′ ∈ W ′ and adapt the notations described in Lemma 6.7. Within the
event Eortho , for α, β ∈ [0, k],
hYα , Yβ i − r 2 − r 2 1 α = β 6= 0) ≤ 15rδ,
where 1(α = β 6= 0) is the indicator function which equals 1 when α = β = s for some s ∈ [k].
26
Proof. By definition,
(56) hYα , Yβ i = XiW ′ , XiW ′ − Xiα , XiW ′ − Xiβ , XiW ′ + Xiα , Xiβ .
Now we will estimate the four terms on the right-hand side of the above expression. We start
with the expression hXiα , Xiβ i. For distinct α, β ∈ [k], using our assumption kXiα k − r ≤ δ and

kXiα − Xiβ k − 2r ≤ δ from event Eao (i0 , . . . , ik ), we find
1
Xiα , Xiβ = kXiα k2 + kXiβ k2 − kXiα − Xiβ k2
2
1 
≤ kXiα k2 − r 2 + kXiβ k2 − r 2 + kXiα − Xiβ k2 − 2r 2
2
1 √
≤ 2rδ + δ2 + 2rδ + δ2 + 2 2rδ + δ2 )
2
(57) < 4rδ.
where the last inequality relies on the coarse bound δ2 ≤ 0.1rδ which follows from the parameter
assumption (17). Similarly, when α = β ∈ [k], we have

Xiα , Xiβ − r 2 = kXiα k2 − r 2 ≤ 2rδ + δ2 < 3rδ.

And if either α = 0 or β = 0, we simply have hXiα , Xiβ i = 0 since X0 = ~0.


Now consider the second and the third terms on the right-hand side of (56). Within the event
Eclu ({(Vα , iα )}α∈A ), we can apply (52) from Proposition 6.6 with i = iW ′ to show that for α ∈ [k],
1
Xiα , XiW ′ =kXiα k2 + kXiW ′ k2 − kXiW ′ − Xiα k2
2
1 
≤ kXiα k2 − r 2 + kXiW ′ − Xi0 k2 − r 2 + kXiW ′ − Xiα k2 − 2r 2 < 4rδ,
2
where the last inequality is carried out in the same way as that for (57). Since Xi0 = ~0, the above
expression also holds when α = 0.
For the last term on the right-hand side of (56), it also follows from (52) that hXiW ′ , XiW ′ i−r 2 <
3rδ. Now substituting all these estimates in (56), the lemma follows.

Relying on the estimates of hYα , Yβ i, one can give a bound on smin (Y ⊤ Y ).
Lemma 6.10. Suppose B = (bα,β )α,β∈[0,k] is a k + 1 by k + 1 matrix such that for α, β ∈ [0, k],
bα,β − r 2 − r 2 1(α = β 6= 0) ≤ 15rδ.
We have
r2
smin (B) ≥ .
4(k + 1)
Proof. We can express
e
B = r 2 QT Q + B,
where Q = (qij )i,j∈[0,k] ∈ R(k+1)×(k+1) is the upper-triangular matrix such that for i, j ∈ [0, k],
(
1 if i = j or i = 0,
qij =
0 otherwise,

and Be ∈ R(k+1)×(k+1) is a matrix whose entries are uniformly bounded above by 15rδ in absolute
value.
27
Notice that Q−1 = (e
qij )i,j∈[0,k] is simply the matrix such that for i, j ∈ [0, k],


1 if i = j,
qeij = −1 if i = 0 and j ≥ 1,


0 otherwise.
For example, when k = 3,
   
1 1 1 1 1 −1 −1 −1
0 1 0 0 0 1 0 0
Q= 0
 and Q−1 = .
0 1 0 0 0 1 0
0 0 0 1 0 0 0 1
Thus, we have the following upper bound on the operator norm2 of Q−1 :
s X √ p
−1 2 =
kQ kop ≤ qeij 2k + 1 < 2(k + 1).
i,j∈[0,k]

Therefore,
r2
smin (r 2 QT Q) = r 2 · smin (Q)2 ≥.
2(k + 1)
Since we have the following bound on the operator norm of B:e
s X
e op ≤ e 2 ≤ (k + 1) · 15rδ = 15(k + 1) r 2 ≤ r2
kBk B ij ,
Cgap d2 4(k + 1)
i,j∈[0,k]

we conclude that
r2 r2 r2
smin (B) ≥ − = .
2(k + 1) 4(k + 1) 4(k + 1)

Remark 6.11. As a immediate consequence of Lemma 6.9 and 6.10, within the event Eortho we
have p r
smin (Y ) = smin (Y T Y ) ≥ √ .
2 k+1
Now we are ready to prove Lemma 6.8.
Proof of Lemma 6.8. By (55), we have
kPH ⊤ Y kop
i0
(58) max ku⊥ k ≤ .
u∈HY : kuk=1 smin (Y )
Let us first derive an upper bound for kPH ⊤ Y kop .
i0
Using kXiW ′ k = kXiW ′ − Xi0 k ≤ r + δ ≤ 2r < rM from (52) and the feasibility assumptions (18),
we find that XiW ′ is contained in the image of φ by (42). With the properties of φ (see (41) and
(43)) we obtain
kPHi0 XiW ′ k = kXi⊥W ′ k ≤ κkXi⊤W ′ k2 ≤ κkXiW ′ k2 ≤ 2−4 δ,
2Because the matrix Q is particularly nice, we can actually compute this operator norm explicitly. A more careful
analysis yields

s
−1 k + 2 + k2 + 4k
kQ kop = .
2

This shows that our upper bound is at most a multiplicative factor of 2 away from the actual value, which is good
enough for our purpose. Thus instead of showing tedious calculation details, we decided to simply use this bound.
28
where we once again use rM ≤ 0.01/κ from (18). Furthermore, the same estimate also holds for
Xiα , since kXiα k = kXiα − Xi0 k ≤ 2r. Hence, we conclude that
(59) kYα⊥ k = k(XiW ′ − Xiα )⊥ k ≤ kXi⊥W ′ k + kXi⊥α k ≤ 2−3 δ,
for any α ∈ [0, k].
Therefore, we conclude that PH ⊥ Y = (PH ⊥ Y0 , . . . PH ⊥ Yk ) is a matrix with columns whose Eu-
i0 i0 i0
clidean norms are bounded by 2−3 δ, and hence
s X √
kPH ⊤ Y kop ≤ kPH ⊥ Yα k2 ≤ 2−3 δ k + 1.
i0 i0
α∈[0,k]

r
Substitute the above this bound and smin (Y ) ≥ 2√k+1 from Remark 6.11 into (58), we conclude
that
δ(k + 1) δd 1
max ku⊥ k ≤ ≤ ≤ ,
u∈HY : kuk=1 4r 4r 4Cgap
where the last inequality follows from the parameter assumption (17). 

Proof of Lemma 6.7. Recall from (52) that kXiW ′ − Xi0 k ≤ r + δ ≤ rM ≤ 0.01/κ. Hence, Corol-
lary A.5 gives
0.04 0.04
(60) d(Hi0 , HiW ′ ) ≤ 2κ · kq0 − pk ≤ 2κ · 2r ≤ 2
≤ .
Cgap d Cgap

Let He Y be a d-dimensional subspace containing HY and that is contained in the span of HY and
Hi0 ∩ HY⊥ . By doing this, we have

e Y , Hi ) ≤ 1
(61) d(H 0 .
4Cgap
Using the triangle inequality, (61), and (60), we obtain

e Y , Hi ) + d(Hi , Hi ′ ) ≤ 0.25 + 0.04 < 1 .


e Y , Hi ′ ) ≤ d(H
d(H 0 0
W W
Cgap Cgap Cgap


6.2.2. Existence of p. The goal in Section 6.2.2 is to establish the following Proposition:
Proposition 6.12. With the occurrence of the event Eortho , for any iW ′ ∈ W ′ , there exists a point
L
p ∈ M satisfying kXiW ′ − pk ≤ 28 d ℓpp ε so that the following holds:

{j ∈ W : Xj ∈ B(p, ε)} ⊆ W ′ .
Before we proceed, let us set up additional notations. Here we fix iW ′ ∈ W ′ . For α ∈ [k], let
√  
(62) σα := sign pα (p) − p( 2r) and σ0 := sign p0 (p) − p(r) .
Here, the function sign : R → {−1, +1} is defined as
(
+1 if x ≥ 0,
(63) sign(x) :=
−1 if x < 0,
for each x ∈ R. In particular, by our convention, sign outputs either −1 or +1, but never 0.
The key ingredient is the following lemma.
29
Lemma 6.13. With the occurrence of Eortho , for any iW ′ ∈ W ′ , there exists a point p ∈ M with
L
kp − XiW ′ k ≤ 28 d ℓpp ε such that
√ √
∀α ∈ [k], p( 2r + 0.95δ) + 2Lp ε ≤ pα (p) ≤ p( 2r − 0.95δ) − 2Lp ε,
and
p(r + 0.95δ) + 2Lp ε ≤ p0 (p) ≤ p(r − 0.95δ) − 2Lp ε.
Given the above lemma, let us derive the proof of Proposition 6.12.
Proof of Proposition 6.12. Suppose that j ∈ W satisfies kXj − pk < ε. Then by using the Lipschitz
constant Lp and the triangle inequality, we obtain, for each α ∈ [0, k],
X |p(kXj − Xi k) − p(kp − Xi k)|
|pα (Xj ) − pα (p)| ≤
|Vα |
i∈Vα

X Lp · kXj − Xi k − kp − Xi k

|Vα |
i∈Vα
X Lp · kXj − pk
(64) ≤ < Lp ε.
|Vα |
i∈Vα

Next, Enavi and the parameter assumptions give for each α ∈ [0, k],
|N (j) ∩ Vα |
(65) − pα (Xj ) ≤ n−1/2+ς < Lp ε.
|Vα |
Combining (64), (65), and the properties of p from Lemma 6.13, we find that for each α ∈ [k],
√ |N (j) ∩ Vα | √
p( 2r + 0.95δ) ≤ ≤ p( 2r − 0.95δ),
|Vα |
and
|N (j) ∩ V0 |
p(r + 0.95δ) ≤ ≤ p(r − 0.95δ),
|V0 |
whence j ∈ W ′ . 
Proof. Let H e Y and Hi ′ be the subspaces introduced in Lemma 6.7. With the occurrence the event
W
Eortho , we can apply the Lemma to get

d(He Y , Hi ′ ) ≤ 1 .
W
Cgap
Observe that XiW ′ = XiW ′ − Xi0 = Y0 ∈ HY ⊆ H e Y . By applying Proposition A.4 (with Xi ′ , HeY ,
W
PHe Y , and 1 − 1/Cgap playing the roles of p, H, PH , and ζ in the proposition, respectively), we find
that there exists a local inverse map

eY → M
φY : B XiW ′ , 0.09(1 − 1/Cgap )2 /κ ∩ H

such that PHe Y (φY (x)) = x, for every point x ∈ B XiW ′ , 0.09(1 − 1/Cgap )2 /κ ∩ He Y . Since
(6)
0.09(1 − 1/Cgap )2 /κ ≥ 0.01/κ ≥ rM ,
e Y . Furthermore, Proposition A.4(c) gives
we may restrict the domain of φY to B(XiW ′ , rM ) ∩ H
(66) kφY (x) − XiW ′ k ≤ 2kx − XiW ′ k,
eY .
for any x ∈ B(XiW ′ , rM ) ∩ H
30
Let ̟ ∈ HY be the vector satisfying
D Yα E Lp
(67) ∀α ∈ [0, k], σα , ̟ = 12 ε.
kYα k ℓp
L
Equivalently, ̟ ∈ HY is the vector so that (Y T ̟)α = σα 12 ℓpp εkYα k, for every α ∈ [0, k].
The existence (and uniqueness) of ̟ ∈ HY = Im(Y ) follows from Remark 6.11 that the matrix
Y has full rank. By Lemma 6.10 and a coarse bound that kYα k ≤ 2r for α ∈ [0, k] (see (52)),
 r −1  r −1 √ Lp Lp
(68) k̟k ≤ √ kY T ̟k ≤ √ k + 1 · 12 ε · 2r ≤ 48d ε.
2 k+1 2 k+1 ℓp ℓp
We take p ∈ M to be the following point:
p := φY (XiW ′ + ̟),
and claim that p′ satisfies the desired property.
To begin with, by (66) and (68), p is not too far from XiW ′ :
Lp Lp
kp − XiW ′ k ≤ 2k̟k ≤ 96d ε < 28 d ε.
ℓp ℓp
L
Next, note that from (67), since 12 ℓpp ε > 0, we know that ̟ 6= ~0. Because φY is a diffeomorphism,
p−Xi ′
we then have that p 6= XiW ′ . Hence, it makes sense to consider u := kp−Xi ′ k .
W
Fix α ∈ [0, k].
W
Define D Yα E
τ := u, σα .
kYα k
Observe that
D p − (X Yα E D Yα E (67)
iW ′ + ̟) + ̟ ̟ 1 Lp
(69) τ = , σα = , σα = · 12 ε,
kp − XiW ′ k kYα k kp − XiW ′ k kYα k kp − XiW ′ k ℓp
e Y and p − (Xi ′ + ̟) ⊥ H
where the middle equality follows from the fact that Yα ∈ H e Y since
W

PHe Y (p) = PHe Y (φY (XiW ′ + ̟)) = XiW ′ + ̟.


Now, we would like to estimate pα (p) for α ∈ [0, k] by applying Lemma 5.3 with XiW ′ , σα , τ ,
u, and kp − XiW ′ k playing the roles of q, σ, τ , u, and t in the lemma, respectively. We shall
show that τ and t satisfy the assumptions stated in the lemma, namely 16η/r ≤ τ ≤ 1 and
0 ≤ kp − XiW ′ k < 0.1τ r.
L r
We start with the lower bound for τ . With kp−XiW ′ k ≤ 2k̟k ≤ 96d ℓpp ε and η = C 2 1d5/2 r ≤ 128d
gap
from (17), we have
1 Lp 1 16η
τ= · 12 ε ≥ ≥ .
kp − XiW ′ k ℓp 8d r
The upper bound for τ follows from:
 

τ = u, σα ≤ kuk · |σα | = 1.
kpα k
The lower bound for kp − XiW ′ k is trivial, and the upper bound follows from:
kp′ − XiW ′ k kp′ − XiW ′ k2 (96d(Lp /ℓp )ε)2 Lp ε
= ≤ = 7680d2 · < 1.
0.1τ r 0.1 · 12(Lp /ℓp )εr 0.1 · 12(Lp /ℓp )εr ℓp r
L
We can now apply Lemma 5.3. With τ t = τ · kp′ − pk = 12 ℓpp ε, we get
31
L2p
if σα = + 1, pα (p) − 3Lp ε ≥ pα (p′ ) ≥ pα (p) − 24 ε, and
ℓp
L2p
if σα = − 1, pα (p) + 3Lp ε ≤ pα (p′ ) ≤ pα (p) + 24 ε.
ℓp
Suppose α ∈ [k] and σα = +1. From the assumption of p and the definition of σα (see (62)), we
have √ √
p( 2r) ≤ pα (p) ≤ p( 2r − 0.95δ) + n−1/2+ς .
With the assumption that Lp ε ≥ n−1/2+ς form (12),

pα (p′ ) ≤ pα (p) − 3Lp ε ≤ p( 2r − 0.95δ) − 2Lp ε.
For the lower bound of pα (p′ ), notice that
√ √ L2p
p( 2r) − p( 2r + 0.95δ) ≥ ℓp · 0.95δ ≥ 100 ε.
ℓp
We have
L2p √ L2p √
pα (p′ ) ≥ pα(p) − 24 ε ≥ p( 2r) − 24 ε ≥ p( 2r + 0.95δ) + 2Lp ε.
ℓp ℓp
Thus, the statement of the lemma follows when α ∈ [k] and σα = +1. For the rest of the cases
(when α = 0 or σα = −1), the derivations are the same. We omit the proof here. 

6.3. Finding a cluster in W ′ . It remains to prove main statement for this section.
Proof of Proposition 6.3. Let us check that (W, W ′ ) is a good pair. Lemma 6.5 shows that W ′ 6= ∅.
It is clear by construction that W ′ ⊆ W and |W | = n. Now for any i ∈ W ′ , by Lemma 6.13 and
Proposition 6.12, there exists p ∈ M with
Lp Lp
kXi − pk ≤ 28 d ε < Cgap d ε,
ℓp ℓp
such that
{j ∈ W : kXj − pk < ε} ⊆ W ′ .
This implies that (W, W ′ ) is a good pair. Therefore, by Lemma 4.4, (Wgc , igc ) is a cluster.
Further, since igc ∈ Wgc ⊂ W ′ , for α ∈ [k] we have
√ √
p( 2r + 0.95δ) ≤ |N (igc ) ∩ Vα |/|Vα | ≤ p( 2r − 0.95δ).
Within the event Enavi ({Vα }α∈[0,k] , W ) ⊇ Eortho ,

|N (igc ) ∩ Vα |/|Vα | − pα (Xigc ) ≤ n−1/2+ς .


Within the event Eclu ({(Vα , iα )}α∈[0,k] ),
p(kXiα − Xigc k) − pα (Xigc ) ≤ Lp η.
Combining these two bound, together with (17) and (16), we obtain
√ √
2r − δ ≤ kXiα − Xigc k ≤ 2r + δ.

We can derive the same bound for α = 0 with 2r replaced by r. Finally, together with the
occurrence of Eao (i0 , . . . , ik ), we conclude that the list (i0 , . . . , ik , igc ) satisfies Eao (i0 , . . . , ik , igc ). 
32
7. Using navigation clusters to capture the next cluster
In this section, we consider the following scenario. We have d + 1 clusters (Vα , iα ) which forms an
”almost orthogonal” clusters described in the previous section. Further, there is a vertex inx ∈ V
which is known that the underlying point Xinx is approximately 0.5δ away from Xi0 . Now a new
batch of vertices W ∈ V of size n is given. The objective in this section is to identify a cluster in
close proximity to Xinx , which relies on the established ’almost orthogonal’ clusters represented by
(Vα , iα )α∈[0,d] . Let us first formulate the assumptions described here as events.
Definition 7.1. For points i, j ∈ V, we define the distance event of i and j as

Edist (i, j) := 0.4δ ≤ kXi − Xj k ≤ 0.6δ .
The discussion in this section is always within the event
(70) Enext :=Enext ({(Vα , iα )}α∈[0,d] , inx , W )
=Eclu ({(Vα , iα )}α∈[0,d] ) ∩ Eao (i0 , i1 , . . . , id )
∩ Edist (inx , i0 ) ∩ Enavi ({Vα }α∈[0,d] , W ) ∩ Enavi ({Vα }α∈[d] , {inx }) ∩ Enet (W ),
and the goal of this section is to show the following:
Proposition 7.2. Consider d + 1 pairs {(Vα , iα )}α∈[0,d] , where each iα is an element of Vα , and
each Vα belongs to the set V. Let inx ∈ V, and let W ⊂ V be a subset such that |W | = n. Further,
S
α∈[0,d] Vα , {inx }, and W are pairwise disjoint. Let

♯ |N (i) ∩ Vα | |N (i0 ) ∩ Vα |
W := i ∈ W : ∀α ∈ [d], − ≤ c3 δ
|Vα | |Vα |

|N (i) ∩ V0 |
and ≥ p(2δ) .
|V0 |
Considering the occurrence of the event Enext ({(Vα , iα )}α∈[0,d] , inx , W ), when Algorithm 1 is ap-
plied with inputs V1 = W and V2 = W ♯ , its output (Wgc , igc ) is a cluster and kXigc − Xinx k ≤ 0.1δ.
Through this section, we will fix a triple {(Vα , iα )}α∈[0,d] , inx , and W as described in Proposi-
tion 7.2.
The proof of Proposition 7.2 essentially follows the same idea as shown in the previous section.
We need to show
(1) For every i ∈ W ♯ , kXi − Xinx k ≤ 0.1δ.
(2) (W, W ♯ ) is a good pair.
To achieve these two objectives, we need to begin with the derivation of the closeness of certain
subspaces, similar to the step presented in Section 6.2.1.
7.1. Small angles between planes. The goal here is to establish the following:
Lemma 7.3. Suppose i ∈ V satisfies kXi − Xinx k ≤ δ. For each α ∈ [d], let
Z = (Z1 , . . . , Zd )
be the N × d matrix with Zα := Xi − Xinx and HZ := span(Z1 , . . . , Zd ). Given the occurrence of
the event Enext , we have
(a) smin (Z) ≥ 2r .
(b) d(HZ , Hi0 ) ≤ 2−3 /Cgap , and
(c) d(HZ , Hi ) ≤ 2−2 /Cgap ,
where Hi0 and Hi are the notions for TXi0 M and TXi M , respectively (See (20)).
33
Proof. The proof is essentially analogous to the derivation of Lemma 6.7, relying on Lemma 6.8,
Lemma 6.9, and Lemma 6.10. Given its relative simplicity, we will provide a sketch of the proof
here.
(a) First, smin (Z T Z) = (smin (Z))2 since N ≥ d. Notice that, for α, β ∈ [d],
1
(Z T Z)α,β = hZα , Zβ i = − (kZα − Zβ k2 − kZα k2 − kZβ k2 ).
2
Now we shall estimate the summands. First,
kZα k − r = kXi − Xiα k − r

≤ kXi − Xiα k − kXinx − Xiα k + kXinx − Xiα k − kXi0 − Xiα k + kXi0 − Xiα k − r

≤ kXi − Xinx k + kXinx − Xi0 k + kXi0 − Xiα k − r


| {z } | {z }
Edist (inx ,i0 )
Eao (i0 ,i1 ,...,id )
(71) ≤ δ + 0.6δ + δ < 3δ,
which in turn implies
(17)
kZα k2 − r 2 ≤ 2r · 3δ + (3δ)2 ≤ 7rδ.
Second,
√ √
(72) kZα − Zβ k − 2r1(α = β) = k − Xiα + Xiβ k − 2r1(α = β) ≤ δ
| {z }
Eao (i0 ,i1 ,...,id )
(17)
⇒ kZα − Zβ k2 − 2r 2 1(α = β) ≤ 2rδ + δ2 ≤ 4rδ.
Substituting these two bounds into (74) we obtain
(Z T Z)α,β − r 2 1(α = β) ≤ 9rδ.
Hence, we can express
Z T Z = r 2 Id + B,
where Id is the d by d identity matrix, and B = (bα,β )α,β∈[d] is a d by d matrix whose entries are
bounded by 9rδ. Given this, we can show
s X (17)
(73) smin (Z T Z) ≥ r 2 − kBkop ≥ r 2 − b2α,β ≥ r 2 − 9rδd ≥ r 2 /4,
α,β∈[0,d]

and (a) follows.


(b) Similar to (55), we have
kPH ⊥ Zkop (73) 2kPH ⊥ Zkop
i0 i0
(74) d(HZ , Hi0 ) = max kPH ⊥ uk ≤ ≤ .
u∈HZ : kuk=1 i0 smin (Z) r
Now we will establish an upper bound for kPH ⊥ Zkop . The estimate in (71) gives kZα k < r +3δ <
i0
rM . Thus, we can apply (43) and obtain
sX sX (43)
sX
(75) kPH ⊥ Zkop ≤ kPH ⊥ Zα k2 = kZα − P Zα k2 ≤ (κkP Zα k2 )2
i0 i0
α∈[d] α∈[d] α∈[d]
sX √ (6) 0.01 √ (18) r
≤ (κkZα k2 )2 ≤ κ(2r)2 d ≤ · 4r 2 d ≤ 4 .
rM 2 Cgap
α∈[d]

34
Substituting the bound for kPH ⊥ Zkop into (74) we conclude
i0

r 2 (17) 2−3
d(HZ , Hi0 ) ≤ · ≤ .
24 Cgap r Cgap
(c) By Corollary A.5, we have

(76) d(Hi0 , Hi ) ≤ 2κkXi0 − Xi k ≤ 2κ kXi0 − Xinx k +kXinx − Xi k
| {z }
Edist (inx ,i0 )
(6) 0.01 (17),(18)
≤ 2κ(δ + 0.6δ) ≤ 2 (δ + 0.6δ) < 2−4 /Cgap .
rM
By the triangle inequality, we have
(76),(b)
d(HZ , Hi ) ≤ d(HZ , H0 ) + d(H0 , Hi ) ≤ 2−2 /Cgap .


7.2. W ♯ is the desired set of vertices.


Lemma 7.4. Given the occurrence of Enext . For each i ∈ W ♯ , we have
(a) kXi − Xinx k ≤ 0.1δ.
(b) kXi − Xi0 k ≤ 0.7δ.
To prove this lemma, we need to derive a weaker statement:
Lemma 7.5. Within the event Enext , for every i ∈ W ♯ , we have kXi0 − Xi k < 3δ.
Proof. Suppose that i ∈ W ♯ . If kXi0 − Xi k ≤ η, then the result follows by the parameter assump-
tions. From now on, suppose kXi0 − Xi k > η.
We have
p(kXi0 − Xi k − η) ≥ p0 (Xi ) (since (V0 , Xi0 ) is a cluster)
|N (i) ∩ V0 |
≥ − n−1/2+ς (by the event Enavi ({Vα }α∈[0,d] , W ))
|V0 |
≥ p(2δ) − n−1/2+ς
1
≥ p(2.5δ) + ℓp δ − n−1/2+ς (by the definition of ℓp (see (8)))
2
≥ p(2.5δ).
Since p is strictly decreasing, we conclude kXi0 − Xi k ≤ 2.5δ + η < 3δ. 
Proof. Without lose of generality, by a shift of RN we assume
Xinx = ~0.
Let Y = (Y1 , . . . , Yd ) be the matrix with Yα = Xinx − Xiα and let HY = span(Y1 , . . . , Yd ). With
the assumption that Xinx = 0, let PHY be the projection from RN to HY .
(a) From Lemma 7.3, we apply Proposition A.4 with
1
ζ = 1 − d(HY , Hi0 ) ≥ 1 − > 0.99,
4Cgap
and find that there exists a map
φHY : B(Xinx , rM ) ∩ HY → M
35
such that PHY (φHY (x)) = x and
3
(77) kφHY (x) − Xinx k ≤ kx − Xinx k,
2
for every x ∈ B(p, rM ) ∩ H.
Take an arbitrary i ∈ W ♯ .
By Lemma 7.5, we can show the rough estimate
(78) kPHY Xi − Xinx k ≤ kXi − Xinx k ≤ kXi − Xi0 k + kXi0 − Xinx k < 4δ < rM .
This shows that PHY Xi is in the domain of φHY , and hence we can apply (77) to compare kXi −Xinx k
and kPHY Xi − Xinx k:
3
(79) kXi − Xinx k = kφHY (PHY (Xi )) − Xinx k ≤ kPHY Xi − Xinx k.
2
If Xi = Xinx , then the inequality we want to prove holds trivially. From now on, suppose
Xi − Xinx 6= ~0, and so we can consider
Xi − Xinx
u := .
kXi − Xinx k
We have
PHY (Xi − Xinx ) PHY Xi − PHY Xinx PHY Xi − Xinx
PHY (u) = = = ,
kXi − Xinx k kXi − Xinx k kXi − Xinx k
and thus from (79), we find that kPHY (u)k ≥ 23 .

 r 2 r
kY T uk = kY T PHY uk = min kY T vk · kPHY uk = smin (Y )kPHY uk ≥ · ≥ ,
v∈ImY : kvk=1 2 3 3
where the third equality is a standard consequence of the singular value decomposition.
This implies that there exists α ∈ [d] such that
r
(80) |hYα , ui| ≥ √ .
3 d
From now on, we fix such α ∈ [d]. Our plan is to bound kXi − Xinx k from above by kpα (Xi ) −
pα (Xinx )k using Lemma 5.3, and bound the latter term by |N (i)∩V |Vα |
α|
− |N (i|V
nx )∩Vα |
α|
from the condi-
tions of the events Enavi ({Vα }α∈[0,d] , W ) and Enavi ({Vα }α∈[d] , {inx }). Once this is completed, we will
see this provides the desired upper bound for kXi − pk from the definition of W ♯ .
Let us set up the proper parameters and show that they satisfy the constraints from Lemma 5.3.
• First, we have
kYα k ≤ kXinx − Xi0 k + kXi0 − Xiα k ≤ 0.6δ + (r + δ) ≤ 2r,
and similarly, kYα k ≥ kXi0 − Xiα k− kXinx − Xi0 k ≥ r − δ − 0.6δ ≥ 0.9r. Hence, Xinx satisfies
the condition of q in Lemma 5.3.
• Second, let σ ∈ {−1, +1} be the sign of hYα , ui and define
 

τ := u, σ .
kYα k
We find that
1 (80) 1 r 1 16η
(81) τ= · |hYα , ui| ≥ · √ ≥ √ ≥ ,
kYα k kYα k 3 d 6 d r
from the parameter assumptions. Hence, τ also satisfies the assumption in Lemma 5.3.
36
• Lastly, let t := kXi − Xinx k so that Xinx + tu = Xi . By (78), we have
4r r
t < 4δ = 2
≤ √ ≤ 0.1τ r,
Cgap d 100 d
where the last step uses τ ≥ 101√d , which we know from (81). This shows that t satisfies
the condition in Lemma 5.3.
Now we can apply Lemma 5.3 to get
τ ℓp
(82) pα (Xi ) − pα (Xinx ) = pα(Xinx + tu) − pα (Xinx ) ≥ ℓp kXi − Xinx k ≥ √ kXi − Xinx k.
4 24 d
Within the events Enavi ({Vα }α∈[0,d] , W ) and Enavi ({Vα }α∈[d] , {inx }), we have
|N (i) ∩ Vα | |N (inx ) ∩ Vα |
(83) |pα (Xi ) − pα (Xinx )| ≤ − + 2n−1/2+ς
|Vα | |Vα |
25 25 ℓp
≤ c3 δ + 2n−1/2+ς ≤ c3 δ ≤ √ δ.
24 24 Cgap d
25
Combining (82) and (83), we conclude that kXi − Xinx k ≤ Cgap δ < 0.1δ.
(b) This follows immediately from part (a) and the triangle inequality:
kXi − Xi0 k ≤ kXi − Xinx k + kXinx − Xi0 k ≤ 0.1δ + 0.6δ = 0.7δ.
We have finished the proof. 
7.3. (W, W ♯ ) is a good pair. We begin this subsection by using the second condition in the
definition of W ♯ to prove the following lemma, which gives a rough upper estimate for kXi0 −Xi k for
each i ∈ W ♯ . Later, we are going to obtain a better estimate for the same quantity in Lemma 7.4(b).
Lemma 7.6. Within the event Enext , we have |W ♯ | ≥ n1−ς .
Proof. Consider the set S := {i ∈ W : kXi − Xinx k < ε}. Within the event Enet (W ) (see (28)), we
know |S| ≥ n1−ς . We claim that S ⊆ W ♯ .
Take any i ∈ S. Let us check the two conditions for i to be in W ♯ . First, within the events
Enavi ({Vα }α∈[0,d] , W ) and Enavi ({Vα }α∈[d] , {inx }), for any α ∈ [d], we have
|N (i) ∩ Vα | |N (inx ) ∩ Vα |
− ≤ pα (Xi ) − pα (Xinx ) + 2n−1/2+ς
|Vα | |Vα |
X 1
≤ p(kXi − Xj k) − p(kXj − Xinx k) + 2n−1/2+ς
|Vα |
j∈Vα

≤ Lp ε + 2n−1/2+ς ≤ 3Lp ε ≤ c3 δ.
For the second condition,
|N (i) ∩ V0 | X 1
≥ p(kXj − Xi k) − n−1/2+ς
|V0 | |V0 |
j∈V0
X 1
≥ p(kXj − Xi0 k + kXi0 − Xinx k + kXinx − Xi k) − n−1/2+ς
|V0 |
j∈V0
X 1
≥ p(η + 0.6δ + ε) − n−1/2+ς
|V0 |
j∈V0
(16),(17) (12),(16)
≥ p(0.8δ) − n−1/2+ς > p(2δ).
37
Hence, i ∈ W ♯ . The proof is complete. 
Lemma 7.7. Given the occurrence of the event Enext . For any i ∈ W ♯ and for any α ∈ [d], we
have
|pα (Xi ) − pα (Xinx )| ≤ c3 δ + 2Lp ε.
Proof. By the triangle inequality and the parameter assumptions,
|pα (Xi ) − pα (Xinx )|
|N (i) ∩ Vα | |N (i) ∩ Vα | |N (inx ) ∩ Vα | |N (inx ) ∩ Vα |
≤ pα (Xi ) − + − + − pα (Xinx )
|Vα | |Vα | |Vα | |Vα |
| {z } | {z }
Enavi ({(Vα ,iα )}α∈[0,d] ,W ) Enavi ({(Vα ,iα )}α∈[0,d] ,{inx })

1 1 (12)
≤ n− 2 +ς + c3 δ + n− 2 +ς ≤ c3 δ + 2Lp ε,
as desired. 
The main technical result is the following.
Lemma 7.8. For any i♯ ∈ W ♯ , there exists a point p ∈ M so that the following holds:
√ L
(a) kXi♯ − pk ≤ 28 d ℓpp ε;
(b) For every α ∈ [d], |pα (p) − pα(Xinx )| ≤ c3 δ − 3Lp ε;
(c) kp − Xinx k ≤ δ.
Proof. Let us fix i♯ ∈ W ♯ . Without lose of generality, we make a shift of RN and assume
Xi♯ = ~0.
Step 1. Find p through construction.
By Lemma 7.4(a), we have kXi♯ − Xinx k ≤ 0.1δ. Hence, we can apply Lemma 7.3 with i = i♯ .
Let Z = (Z1 , . . . , Zd ) with Zα = Xi♯ − Xiα for α ∈ [d] and HZ = span({Zα }α∈[d] ). By Lemma 7.3,
we have
(84) d(HZ , Hi♯ ) < 2−2 /Cgap .
Let PHZ : RN → HZ be the orthogonal projection to HZ . From (84), we can apply Proposi-
1
tion A.4 with ζ = 1 − d(HZ , Hi♯ ) > 1 − Cgap > 0.99 to obtain a map
(85) φHZ : B(Xi♯ , rM ) ∩ HZ → M
such that PHZ (φHZ (x)) = x and
(86) kφHZ (x) − Xi♯ k ≤ 2kx − Xi♯ k,
for any x ∈ B(Xi♯ , rM ) ∩ HZ .
For each α ∈ [d], let σα := sign(pα (Xi♯ ) − pα (Xinx )). (Recall the definition of sign from (63).)
Now using the same argument as the proof of Lemma 6.13, since the matrix Z has full rank
(from Lemma 7.3 (a)), there exists a unique ̟ ∈ HZ such that
D Zα E Lp
(87) σα , ̟ = 25 ε,
kZα k ℓp
for every α ∈ [d]. We attempt to set the point p described in the statement of the lemma to be
(88) p = φHZ (Xi♯ + ̟).
To achieve that, we need to show Xi♯ + ̟ is in the domain of φHZ , which is equivalent to show
k̟k < rM (See (85)).
38
r
First, relying on smin (Z) ≥ from Lemma 7.3 (a),
2
s 6 L sX
kZ T ̟k 2 X 2
(87) 2 p
k̟k ≤ = hZα , ̟i = · ε kZα k2 .
smin (Z) r r ℓp
α∈[d] α∈[d]

Using the estimate


(17)
kZα k ≤ kXi♯ − Xi0 k + kXi0 − Xiα k ≤ 0.7δ + r + δ < 2r,
| {z } | {z }
Lemma 7.4(b) Eao (i0 ,...,id )

we find that

√ Lp
(89) k̟k ≤ 27 d ε < 0.1δ < rM ,
ℓp
which shows that Xi♯ + ̟ is in the domain of φHZ . Hence, the definition of p in (85) is valid.
Step 2: Demonstrating that p possesses the required properties.
First, property (a) follows from (86):
(89) √ Lp
(90) kp − Xi♯ k = kφHZ (Xi♯ + ̟) − Xi♯ k ≤ 2k̟k ≤ 28 d ε.
ℓp
Next, let us estimate pα (p) − pα (Xinx ) for α ∈ [d]. Once again, our tool is Lemma 5.3, and so
we have to set up some parameters. From the definition of ̟, we know that ̟ 6= ~0, and therefore
p = φHZ (Xi♯ + ̟) 6= Xi♯ . Hence, it makes sense to define
 
p − Xi♯ N −1 Zα
u := ∈S and set τ := u, σα .
kp − Xi♯ k kZα k
We would like to apply Lemma 5.3 with p, σα , τ , u, and kp − Xi♯ k playing the roles of q, σ, τ ,
u, and t in the lemma, respectively. Let us check the following items.
(i) 0.9r ≤ kZα k ≤ 2r,
(ii) 16η/r ≤ τ , and
(iii) kp − Xi♯ k < 0.1τ r.
Item (i) follows from the triangle inequality and Lemma 7.4(b):
kZα k − r ≤ kZα k − kXi0 − Xiα k + kXi0 − Xiα k − r
| {z }
Eao (i0 ,...,id )
≤ kXi♯ − Xinx k + δ ≤ 1.7δ < 0.1r.
For Item (ii), we have
D p−X♯ Zα E D ̟ Zα E (87) 1 Lp (90) 1 η
i
τ= , σα = , σα = · 25 ε ≥ √ ≥ 16 .
kp − Xi♯ k kZα k kp − Xi♯ k kZα k kp − wk ℓp 8 d r
Finally, for Item (iii), we have
kp − Xi♯ k kp − Xi♯ k2 (90) 11 Lp (16),(15) 11 1 (17) 211
= 5 ≤ 2 d ε ≤ 2 d η ≤ 1.5 3 r < 0.1r.
τ 2 (Lp /ℓp )ε ℓp Cgap d Cgap
Hence, Lemma 5.3 is applicable. For the moment let us assume σα = +1. In this case, pα (Xi♯ ) −
pα (Xinx ) ≥ 0. Lemma 5.3(b) gives
Lp 1 Lp
(91) pα (Xi♯ ) − 2Lp · 25 ε ≤ pα (p) ≤ pα (Xi♯ ) − ℓp · 25 ε,
ℓp 4 ℓp
39
L
where we applied τ kp − Xi♯ k = 25 ℓpp ε. we have
2 2
6 Lp 6 Lp
(91)
(92) pα (p) ≥ pα (Xi♯ ) − 2 ε ≥ pα (Xinx ) − 2 ε.
ℓp ℓp
Note that Lemma 7.7 implies that
(93) pα (Xi♯ ) ≤ pα (Xinx ) + c3 δ + 2Lp ε.
Thus,
(91) (93)
(94) pα (p) ≤ pα (Xi♯ ) − 8Lp ε ≤ pα(Xinx ) + c3 δ − 6Lp ε.
Considering (92) and (94), we conclude that
n L2p o
|pα (p) − pα (Xinx )| ≤ max c3 δ − 6Lp ε, 26 ε < c3 δ − 3Lp ε.
ℓp
The proof for the case when σα = −1 is analogous to the argument above, where we applied
Lemma 5.3(c) instead. We omit the proof here.
Finally, it remains to show Property (c): kp − Xinx k ≤ δ. This last piece follows from the triangle
inequality:
(90) √ Lp
kp − Xinx k ≤ kp − wk + kw − Xinx k ≤ 28 d ε + 0.7δ < δ.
ℓp
We have finished the proof. 
Proof of Proposition 7.2. Let (Wgc , igc ) be the output of Algorithm 1 with input (W, W ♯ ).
Step 1. (Wgc , igc ) is a cluster.
From Lemma 4.4, considering Enext ⊆ Ecn (W ) ∩ Enet (W ), it suffices to show that (W, W ♯ ) is a
good pair. We have W ♯ ⊆ W and |W | = n by construction, and we know W ♯ 6= ∅ from Lemma 7.6.
Take an arbitrary point i♯ ∈ W ♯ . Let p ∈ M be a point obtained from Lemma 7.8. We claim
that this point p may play the role of p in Definition 4.3 (when i♯ plays the role of i).
Note that
√ Lp Lp
kXi♯ − pk ≤ 28 d ε < Cgap d ε,
ℓp ℓp
so we only need to establish the inclusion {j ∈ W : kp − Xj k < ε} ⊆ W ♯ . To that end, take any
j ∈ W with kp − Xj k < ε. By the triangle inequality, for each α ∈ [d], we have
|N (j) ∩ Vα | |N (i0 ) ∩ Vα | |N (j) ∩ Vα |
− ≤ − pα (Xj ) + |pα (Xj ) − pα (p)|
|Vα | |Vα | |Vα |
|N (i0 ) ∩ Vα |
+ |pα (p) − pα (p)| + pα (p) −
|Vα |
≤ n−1/2+ς + Lp ε + c3 δ − 3Lp ε + n−1/2+ς
≤ c3 δ.
Next, note that for each j ′ ∈ V0 , we have
kXj − Xj ′ k ≤ kXj − pk + kp − Xi0 k + kXi0 − Xj ′ k < ε + δ + η < 1.1δ.
Therefore,
|N (j) ∩ V0 | X p(kXj − Xj ′ k)
≥ − n−1/2+ς ≥ p(1.1δ) − n−1/2+ς > p(2δ).
|V0 | ′
|V0 |
j ∈V0

This shows that j ∈ W ♯ .


40
Step 2. kXigc − Xinx k ≤ 0.1δ
Since igc ∈ W ♯ and each i ∈ W ♯ satisfies kXi − Xinx k ≤ 0.1δ by Lemma 7.4, the statement follows
immediately. 

7.4. Building a Nearby Cluster. Having done our analysis in the previous and current sections,
we now present an algorithm to produce a cluster. The overview of the algorithm is as follows.
Suppose we have an index w ∈ V and a cluster (V, i) such that 0.4δ ≤ kXw − Xi k ≤ 0.6δ. The
algorithm gives a new cluster with corresponding points in M near Xw .

Algorithm 2: BuildNearbyCluster
Input : inx ∈ V.
(V, i), where i ∈ V ⊆ V with |V | ≥ n1−ς .
W1 , W2 , . . . , Wd , Wd+1 ⊆ V.
Output: U ⊆ Wd+1 and
u ∈ U.
(V0 , i0 ) ← (V, i);
k ← 1;
while k ≤ d do
Wk′ ← the set of all vertices i ∈ Wk such that
√ |N (i) ∩ Vα | √
p( 2r + 0.95δ) ≤ ≤ p( 2r − 0.95δ),
|Vα |
for every α ∈ [k − 1], and such that
|N (i) ∩ V0 |
p(r + 0.95δ) ≤ ≤ p(r − 0.95δ);
|V0 |
(Vk , ik ) ← GenerateCluster(W, W ′ );
k ← k + 1;
end
W ♯ ← the set of all vertices i ∈ Wd+1 such that
|N (inx ) ∩ Vα | |N (i0 ) ∩ Vα |
− ≤ c3 δ,
|Vα | |Vα |
for every α ∈ [d], and such that
|N (i) ∩ V0 |
≥ p(2δ);
|V0 |
(U, u) ← GenerateCluster(Wd+1 , W ♯ );
return (U, u)

8. Next pivot for the net


Our algorithm will construct a sequence of clusters. If successful, in the ℓth round we will
have (U1 , u1 ), (U2 , u2 ), . . . , (Uℓ , uℓ ) where each pair (Uα , uα ) is a cluster. Furthermore, for any

{α, β} ∈ [ℓ]
2 , kXuα − Xuβ k ≥ 0.3δ.
In this section, we consider the scenario where a desired ℓ pairs {(Uα , uα )}α∈[ℓ] has been found,
and we would like to find the next pair (Uℓ+1 , uℓ+1 ). As an intermediate step, we will use a new
batch of vertices W and find a vertex inx ∈ W such that kXuα − Xinx k ≥ 0.4δ for each α ∈ [ℓ].
41
Let us introduce some necessary notations and build the algorithm. In this section, we fix ℓ ∈ N,
suppose {(Uα , uα )}α∈[ℓ] are ℓ pairs where uα ∈ Uα ⊆ V. Further, let W ⊆ V be a subset of size
|W | = n. We define the “repulsion” event
   
[ℓ]
(95) Erps ({uα }α∈[ℓ] ) := ∀{α, β} ∈ , kXuα − Xuβ k ≥ 0.3δ .
2
For each q ∈ M and for each α ∈ [ℓ], write
X p(kq − Xj k)
pα (q) := .
|Uα |
j∈Uα

Next, we define two sets


 
♭ |N (i) ∩ Uα |
W := i ∈ W : ∀α ∈ [ℓ], ≤ p(0.45δ) .
|Uα |
and for each α ∈ [ℓ], we define
 
|N (i) ∩ Uα |
Wα♮ := i∈W : ≥ p(0.55δ) .
|Uα |
The discussion in this section is within the event
 
(96) EnetCheck :=EnetCheck {(Uα , uα )}α∈[ℓ] , W
  
=Eclu {(Uα , uα )}α∈[ℓ] ∩ Erps ({uα }α∈[ℓ] ) ∩ Enavi {Uα }α∈[ℓ] , W ∩ Enet (W ).
The following algorithm is to find a vertex inx ∈ W such that kXinx − Xuα k ≥ 0.4δ for every
α ∈ [ℓ] at the same time there exists a specific α ∈ [ℓ] such that kXinx − Xuα k ≤ 0.6δ:
Algorithm 3: NetCheck
Input : {(Uα , uα )}α∈[ℓ] ,
W ⊆ V with |W | = n.
Output: inx ∈ W and
(V0 , i0 ) ∈ {(Uα , uα )}α∈[ℓ] .
n o
W ♭ ← i ∈ W : ∀α ∈ [ℓ], |N (i)∩U |Uα |
α|
≤ p(0.45δ) ;
for α ∈ [ℓ]ndo o
|N (i)∩Uα |
Wα♮ ← i ∈ W : |Uα | ≥ p(0.55δ) ;
for w ∈ Wα♮ ∩ ♭
W do
inx ← w;
(V0 , i0 ) ← (Uα , uα );
return (inx , (V0 , i0 ));
end
end
return null
The main statement we will prove in this section is the following.
Proposition 8.1. Condition on EnetCheck . Running Algorithm 3 will result in one of the following
two cases:
• If it returns a pair (inx , (V0 , i0 )), then
 
EnetCheck ⊆ ∀α ∈ [ℓ], kXinx − Xuα k ≥ 0.4δ ∩ kXinx − Xi0 k ≤ 0.6δ ∩ Eclu ((V0 , i0 )).
• If it returns null, then {Xuα }α∈[ℓ] is a δ-net of M .
42
The proof of the Proposition 8.1 will be break into the following three lemmas.
Lemma 8.2. Condition on EnetCheck . If i ∈ W ♭ , then for every α ∈ [ℓ], we have kXi −Xuα k > 0.4δ.
Proof. We argue by contraposition. Suppose i ∈ W is an index for which there exists α ∈ [ℓ] with
kXi − Xuα k ≤ 0.4δ. For any j ∈ Uα , we have
0 ≤ kXi − Xj k ≤ kXi − Xuα k + kXuα − Xj k < 0.4δ + η < 0.41δ,
and thus p(kXi − Xj k) > p(0.41δ). Since p(0.41δ) − p(0.45δ) ≥ ℓp · 0.04δ > n−1/2+ς , we have
p(kXi − Xj k) > p(0.45δ) + n−1/2+ς .

Therefore, from Enavi {Uα }α∈[ℓ] , W , we have
|N (i) ∩ Uα | X p(kXi − Xj k)
≥ − n−1/2+ς > p(0.45δ),
|Uα | |Uα |
j∈Uα

/ W ♭.
which implies i ∈ 
For each α ∈ [ℓ], we define
 
|N (i) ∩ Uα |
Wα♮ := i ∈ W : ≥ p(0.55δ) .
|Uα |

Lemma 8.3. Condition on EnetCheck . For any α ∈ [ℓ], if i ∈ Wα♮ , then kXi − Xuα k < 0.6δ.
Proof. This proof is similar to the proof of Lemma 8.2, so we omit the details. 
The last piece to prove Proposition 8.1 is the following lemma.
Lemma 8.4. Condition on EnetCheck . If
[
Wα♮ ∩ W ♭ = ∅,
α∈[ℓ]

then there is no point q ∈ M such that for every α ∈ [ℓ], kq − Xuα k ≥ δ. In other words, {Xuα }α∈[ℓ]
is a δ-net.
Proof. Suppose, for the sake of contradiction, that there exists q ∈ M such that for every α ∈ [ℓ],
kq − Xuα k ≥ δ. Since M is connected, there exists a path γ : [0, 1] → M from q to Xi1 .
The function f : [0, 1] → R given by
f (t) := min kγ(t) − Xuα k
α∈[ℓ]

is continuous with f (0) ≥ δ and f (1) = 0. Let t0 ∈ [0, 1] be the smallest number for which
f (t0 ) = 0.5δ and let q ′ := γ(t0 ).
Now we have q ′ such that kq ′ − Xuα k ≥ 0.5δ for every α ∈ [ℓ]. Within the event Enet (W ), there
exists i ∈ W such that kXi − q ′ k < ε. By the triangle inequality, for any j ∈ Uα , we have
kXi − Xj k ≥ kq ′ − Xiα k − kq ′ − Xi k − kXj − Xiα k ≥ 0.5δ − ε − η > 0.48δ.

Therefore, within Enavi {Uα }α∈[ℓ] , W , we have
|N (i) ∩ Uα | X p(kXi − Xj k)
≤ + n−1/2+ς < p(0.48δ) + n−1/2+ς < p(0.45δ).
|Uα | |Uα |
j∈Uα

Hence, i ∈ W ♭ .
43
Also, since f (t0 ) = 0.5δ, there exists a ∈ [ℓ] such that kq ′ − Xia k = 0.5δ. By a similar triangle
inequality argument as above,
 we find that for any j ∈ Ua , we have kXi − Xj k < 0.52δ. Therefore,
within Enavi {Uα }α∈[ℓ] , W , we have
|N (i) ∩ Ua | X p(kXi − Xj k)
≥ − n−1/2+ς ≥ p(0.52δ) − n−1/2+ς > p(0.55δ),
|Ua | |Ua |
j∈Ua

which shows that i ∈ Wa♮ . We have reached a contradiction. 


Now we check whether there exists any α ∈ [ℓ] for which Wα♮ ∩ W ♭ is nonempty.
If so, fix any w ∈ Wα♮ ∩ W ♭. By Lemma 8.2 and 8.3, we have kXw − Xuβ k > 0.4δ for every β ∈ [ℓ],
and kXw − Xuα k < 0.6δ.
Then we proceed to build d navigation orthogonal clusters centered at Xuα , which requires d
new n-vertex batches with the success rate of 1 − n−ω(1) . And use one additional n-vertex batch to
find a cluster (Uℓ+1 , uℓ+1 ) with |Uℓ+1 | ≥ n1−ς such that kXiℓ+1 − Xw k ≤ 0.1δ and kXj − Xuℓ+1 k < η
for j ∈ Uℓ+1 . Therefore, by the triangle inequality, we have
kXuℓ+1 − Xuβ k ≥ kXw − Xuβ k − kXw − Xuℓ+1 k > 0.4δ − 0.1δ = 0.3δ,
for every β ∈ [ℓ].

Proof of Proposition 8.1. If


[
Wα♮ ∩ W ♭ 6= ∅,
α∈[ℓ]

then algorithm 3 will return a pair (inx , (V0 , i0 )).


In particular, inx ∈ Wα♮ ∩ W ♭ for some specific α ∈ [ℓ]. Let us fix such α. In particular, we also
have (V0 , i0 ) = (Uα , uα ).
By Lemma 8.2 and Lemma 8.3, we have
∀α ∈ [ℓ], kXinx − Xuα k ≥ 0.4δ and kXinx − Xi0 k ≤ 0.6δ.
Furthermore, (V0 , i0 ) is a cluster simply follows from (Uα , uα ) is a cluster, since we condition on
EnetCheck . Therefore, the first case of Proposition 8.1 is proved.
If [
Wα♮ ∩ W ♭ = ∅,
α∈[ℓ]

then by Lemma 8.4, we know that {Xuα }α∈[ℓ] is a δ-net. Therefore, the proposition follows. 

9. The algorithm buildNet and Proof of Theorem ??


Decompose V into (d + 2) × ⌈nς ⌉ number of subsets of size n:
{Wαℓ }ℓ∈[⌈nς ⌉],α∈[0,d+1] .
For ease of notation, we use the lexicographic order {(ℓ, k)}ℓ≥1,k∈[d+1] : We denote
(ℓ′ , k′ )  (ℓ, k)
if and only if ℓ′ < ℓ or ℓ′ = ℓ and k′ ≤ k. Given this notation, let us define
[ ′
 

Wk = Wkℓ′ and Ykℓ = XW ℓ , UW ℓ .
k k
(ℓ′ ,k ′ )(ℓ,k)
ℓ is a function of Y ℓ .
For a sanity check, the induced subgraph on the vertex set Wk k
44
Our goal is to run an algorithm to generate a sequence of clusters (U1 , u1 ), (U2 , u2 ), (U3 , u3 ) . . .

where the ℓth cluster (Uℓ , uℓ ) is extracted from Wd+1 such that, with high probability, the following
properties hold: After running ℓ steps,
(1) ∀s ∈ [ℓ], (Us , us ) is a cluster.

(2) ∀{s, s′ } ∈ [ℓ]
2 , kXus − Xus′ k ≥ 0.3δ.
Notice that the second property implies that such a process cannot continue indefinitely due to
a volumetric constraint. When it terminates, it generates the desired δ cluster nets with high
probability.
Here is the Algortihm buildNet:
Algorithm 4: buildNet
Input : {Wαs }s∈[⌈nς ⌉],α∈[0,d+1] .
Output: {(Us , us )}s∈[ℓ] .
(U1 , u1 ) ← GenerateCluster(Wd+1 1 , W 1 );
d+1
ℓ ← 2;
(iℓnx , (V0ℓ , iℓ0 )) ← NetCheck((U1 , u1 ), W0ℓ );
while (iℓnx , (V0ℓ , iℓ0 )) 6= null do
(Uℓ , uℓ ) ← BuildNearbyCluster(iℓnx , (V0ℓ , iℓ0 ), W1ℓ , W2ℓ , . . . , Wdℓ , Wd+1
ℓ );

ℓ ← ℓ + 1;
(iℓnx , (V0ℓ , iℓ0 )) ← NetCheck({(Us , us )}s∈[ℓ−1] , W0ℓ );
end
return {(Us , us )}s∈[ℓ]
Associated with this algorithm, we define the following events: For each integer ℓ ≥ 2,
 
 ℓ
EnetCheck {(Us , us )}s∈[ℓ−1] , W0 k = 0,
ℓ ℓ ℓ ℓ
Ek := Eortho {(Vα , iα )}α∈[0,k−1] , Wk k ∈ [d],

 ℓ ℓ ℓ ℓ

Enext {(Vα , iα )}α∈[0,d] , inx , Wd+1 k = d + 1.
(See (96), (38), and (70) for the definitions of these events.)
For these events, there is an ambiguity due to {(Us , us )}s∈[ℓ−1] or {(Vαℓ , iℓα )}α∈[k−1] might be null.
In that case, we simply treat it as the random sets do not satisfy the property described by the
corresponding event.
The events Ekℓ are defined to be the events that the algorithm buildNet is successful at the (ℓ, k)th
step. For example, given E0ℓ , by Proposition 8.1, we know we apply the NetCheck algorithm with
input {(Us , us )}s∈[ℓ−1] , W0ℓ either return ((V0 , i0 ), inx ) with the properties described in Proposition
8.1, or return null, which also guarantees {Xus }s∈[ℓ] is a δ-net.
Further, for ease of notation, we also define Ek1 to be the trivial event for k ∈ [d], and
1 1 1
Ed+1 := Ecn (Wd+1 ) ∩ Enet (Wd+1 ),
which is the event that guarantee the GenerateCluster(Wd+1 1 , W 1 ) returns a cluster (U , u ), by
d+1 1 1
Proposition 4.5.
Further, for ℓ ≥ 2,
n o

Ehalt := NetCheck({(Us , us )}s∈[ℓ−1] , W0ℓ ) = null .
Then, we define the union of such events according to the  order: For ℓ ≥ 2 and k ∈ [d + 1],
we define
\ ′
Ωℓk = Ekℓ′ .
(ℓ′ ,k ′ )(ℓ,k)
45
Notice that we also have the relation that, for ℓ ≥ 2,
Ωℓ1 ⊆ Ωℓ0 ∩ (Ωℓhalt )c ,
due to Ωℓ1 requires (V0ℓ , iℓ0 ) is non-null.
Let us rephrase the Theorem ?? here for convenience and in terms of the parameters introduced
in Section 2.3 and the algorithm (4).
Theorem 9.1. [Rephrase of Theorem ??] Let (M, µ, p) be the manifold, probability measure, and
distance-probability-function satisfying Assumption 2.3 and 2.4. Suppose n is sufficiently large so
that the parameters are feasiable. Then, with probability at least 1 − n−ω(1) , applying (4) with input
G(V, M, µ, p) returns a (δ, η)-cluster-net {(Us , us )}s∈[ℓ] with us ∈ Us ⊆ V and ℓ ≤ n3ς/4 in the
sense that
(1) ∀s ∈ [ℓ], (Us , us ) is a cluster.
(2) {Xus }s∈[ℓ] is a δ-net of M ;
Let
(97) ΩnetFound
denote the event that the output of Algorithm buildNet is a (δ, η)-cluster-net with ℓ ≤ n3ς/4 .
Observe that, by Proposition 8.1, we have
[
ΩnetFound = Ωℓ0 ∩ Ehalt

.
2≤ℓ≤n3ς/4

9.1. Probability Estimates for Basic Events.


Lemma 9.2. For any W ⊆ V with |W | = n, we have P{(Ecn (W ))c } = n−ω(1) .
Proof. Recall that the adjacency matrix A of G is given by
(98) ai,j = 1(Ui,j ≤ p(kXi − Xj k)),
where {Xi }i∈V ∪ {Ui,j }i,j∈V are jointly independent up to the symmetric restriction Ui,j = Uj,i .
Write [
(Ecn (W ))c = O{i,j} ,
W
{i,j}∈( 2 )

where  
|NW (i) ∩ NW (j)| 1
O{i,j} := − K(Xi , Xj ) > n− 2 +ς .
n
W
Note that for any {i, j} ∈ 2 , we have
 
|NW (i) ∩ NW (j)| |NW \{i,j} (i) ∩ NW \{i,j} (j)| 1 1
− = − · |NW (i) ∩ NW (j)|
n n−2 n−2 n
2 2
= · |NW (i) ∩ NW (j)| ≤ .
n(n − 2) n
Therefore, by the triangle inequality, the event O{i,j} is a subevent of the event
 
′ |NW \{i,j} (i) ∩ NW \{i,j} (j)| − 21 +ς 2
O{i,j} := − K(Xi , Xj ) ≥ n − .
n−2 n

Now, let us fix a pair {i, j} ∈ W 2 . For each k ∈ W \ {i, j}, let Zk be the indicator that
k ∈ NW (i) ∩ NW (j). From (98), we have
 
Zk = 1 Ui,k ≤ p(kXi − Xk k) 1 Uj,k ≤ p(kXj − Xk k) .
46
According to the above expression and {Xi }i∈V ∪ {Ui,j }i,j∈V are jointly independent, we know
that conditioning on Xi = xi and Xj = xj , {Zk }k∈W \{i,j} are i.i.d. Bernoulli random variables with
 
E Zk Xi = xi , Xj = xj = EXk p(kXk − xi k)p(kXk − xj k) = K(xi , xj ).
Now we apply Hoeffding’s inequality,
 ′
P E{i,j} Xi = xi , Xj = xj
  1 
X 2
=P Zk − (n − 2)K(Xi , Xj ) ≥ (n − 2) n− 2 +ς − Xi = xi , Xj = xj
n
k∈W \{i,j}
  !
− 21 +ς 2 2
≤2 · exp −2(n − 2) n − = n−ω(1) .
n
By Fubini’s Theorem,

P{Oi,j } ≤ P{Oi,j } = n−ω(1) .
Now we use the union bound:
X
P{(Ecn (W ))c } ≤ P{O{i,j} } ≤ n2 · n−ω(1) = n−ω(1) .
{i,j}∈(W
2)


Lemma 9.3. For any W ⊆ V with |W | = n, we have

(99) P (Enet (W ))c ≤ exp(−n1/2 ).
Proof. Consider any subset M ⊆ M such that every pair of points p, q ∈ M satisfies kp − qk ≥ 3ε .
Since the collection of open balls {B(p, ε/6)}p∈M are pairwise disjoint,
 [  X 
1 = µ(M ) ≥ µ B(p, ε/6) = µ B(p, ε/6) ≥ |M|µmin (ε/6)
p∈M p∈M

1 (12) nς
⇒ |M| ≤ ≤ .
µmin (ε/6) 2
Now, we start with an empty set M and keep adding point from M into M with the restriction
that any pair of points within M has distance at least 3ε until it is not possible to proceed. This
process must terminate at finite step due to the above bound. The resulting set M is an 3ε -net.
That is, every point q in M is within distance 3ε to some point p ∈ M. From now on, we fix the
set M.
For each q ∈ M, by Hoeffding’s inequality,
n no
P {i ∈ W : Xi ∈ B(q, 2ε/3)} ≤ µmin (2ε/3) ·
n 2 o
≤ P {i ∈ W : Xi ∈ B(q, 2ε/3)} ≤ E {i ∈ W : Xi ∈ B(q, 2ε/3)} − n1−ς
 n2(1−ς) 
≤ exp − 2 = exp(−2n1−2ς ).
n
Taking a union bound, we have
n n o nς
(100) P ∃q ∈ M {i ∈ W : Xi ∈ B(q, 2ε/3)} ≤ µmin (2ε/3) · ≤ exp(−2n1−2ς )
2 2
≤ exp(−n1/2 ).
47
Within the complement of the event Enet (W ), there exists p ∈ M such that
n
{i ∈ W : Xi ∈ B(p, ε)} < µmin (2ε/3) · .
2
Let q ∈ M be a point such that kp − qk < ε/3. Then,
n
{i ∈ W : Xi ∈ B(q, 2ε/3)} ≤ {i ∈ W : Xi ∈ B(p, ε)} < µmin (2ε/3) · .
2
This shows that the event in the estimate (100) contains the event (Enet (W ))c , and hence
P{(Enet (W ))c } ≤ exp(−n1/2 ).

Lemma 9.4. Let V, W be a pair of disjoint subset of V. For a positive integer k, suppose
{(Vα , iα )}[0,k] are k + 1 random pairs with iα ∈ Vα ⊆ V , which are functions of XV ∪ UV . For
any realization (XV , UV ) = (xV , UV ) with (xV , UV ) ∈ Eclu ({(Vα , iα )}α∈[0,k] ) and any realization of
XW = xw ,
n o
P (Enavi ({Vα }α∈[0,k] , W ))c XV = xV , UV = UV , XW = xw = n−ω(1) .
In particular, the above inequality holds without conditioning on XW = xw .
Proof. We claim that

P (Enavi ({Vα }α∈[0,k] , W ))c XV = xV , UV = UV ≤ 2d · n · exp(−2nς ) .
Observe that the event (Enavi ({Vα }α∈[0,k] , W ))c can be expressed as
k
[ [
(Enavi ({Vα }α∈[0,k] , W ))c = Oi,α ,
i∈W α=0
where  
 |N (i) ∩ V | X p(kX − X k) 
α i j
Oi,α := − > n−1/2+ς .
 |Vα | |Vα | 
j∈Vα
Let us fix a pair (i, α) with i ∈ W and α ∈ [0, k]. For each j ∈ V , let Zj be the indicator of the
edge {i, j}, which can be expressed as
Zj := 1(Ui,j ≤ p(kXi − Xj k)).
With this notation, we have X
|N (i) ∩ Vα | = Zj .
j∈Vα
On the other hand, conditioning on XV = xV , UV = UV , and XW = xw , we have {Zj }j∈V are
independent Bernoulli random variables with EZj = p(kxi − xj k) for each j ∈ V . Thus, we can
apply Hoeffding’s inequality, to obtain

P Oi,α XV = xV , UV = UV , XW = xw ≤ 2 exp(−2nς ),
where we rely on the condition |Vα | ≥ n1−ς from the event Eclu ({(Vα , iα )}α∈[0,k] ).
Hence, by the union bound with |W | ≤ |V| ≤ n2 , we obtain
c
P{Enavi | XV = xV , UV = UV , XW = xw ≤ n2 · (k + 1) · 2 exp(−2nς ) ≤ 2d · n2 · exp(−2nς ) = n−ω(1) .
The second statement of the lemma simply follows by taking expectation with respect to XW on
both sides of the above inequality.

48
9.2. Probability Estimates for the Events Ekℓ .

Lemma 9.5. For ℓ ≥ 2 and k ∈ [d], if P{Ωℓk } > 0, then



P{(Ek+1 )c | Ωℓk } = n−ω(1) .

In particular, P{Ωℓk+1 } = P{Ωℓk }(1 − n−ω(1) ) > 0.



Proof. Let us consider the case k ∈ [d − 1], since in the case k = [d], the definition of Ed+1 is slightly
different.
Case 1: k ∈ [d − 1]. Recall that by definition from (38),


Ek+1 =Eortho {(Vαℓ , iℓα )}α∈[0,k] , Wk+1

=Eao (iℓ0 , . . . , iℓk ) ∩ Eclu ({(Vαℓ , iℓα )}α∈[0,k] ) ∩ Enavi ({Vαℓ }α∈[0,k] , Wk+1
ℓ ℓ
) ∩ Ecn (Wk+1 ℓ
) ∩ Enet (Wk+1 ).
By Proposition 6.3, we know that
Ωℓk ⊆ Ekℓ ⊆ Eao (iℓ0 , . . . , iℓk ) ∩ Eclu (Vkℓ , iℓk ),
which in turn implies
(101) Ωℓk ⊆ Eao (iℓ0 , . . . , iℓk ) ∩ Eclu ({(Vαℓ , iℓα )}α∈[0,k] ),

since Ekℓ ⊆ Eclu ({(Vαℓ , iℓα )}α∈[0,k−1] ). Now, we conclude that


n o

P{(Ek+1 )c | Ωℓk } = P (Enavi ({Vαℓ }α∈[0,k] , Wk+1

))c ∪ (Ecn (Wk+1

))c ∪ (Enet (Wk+1

))c Ωℓk .

Next, let us fix any realization ykℓ ∈ Ωℓk . Given ykℓ ∈ Eclu ({(Vαℓ , iℓα )}α∈[0,k] ), we can apply Lemma
9.4 to get

P (Enavi ({Vαℓ }α∈[0,k] , Wk+1

))c Ykℓ = ykℓ ≤ n−ω(1) .

Notice that Enavi ({Vαℓ }α∈[0,k] , Wk+1


ℓ ) (and thus its complement) is an event of

 
Ykℓ , XW ℓ , UW ℓ ,W ℓ := (Ui,j )i∈W ℓ ,j∈W ℓ
k+1 k k+1 k k+1

Then, by applying Fubini’s theorem,



P (Enavi ({Vαℓ }α∈[0,k] , Wk+1 ℓ
))c Yk ∈ Ωℓk
h  ℓ i
=E 1(Enavi ({Vαℓ }α∈[0,k] ,W ℓ ))c Ykℓ , XW ℓ , UW ℓ ,W ℓ Yk ∈ Ωℓk
k+1 k+1 k k+1
h   ℓ i
=E EXW ℓ ,UW ℓ ,W ℓ 1(Enavi ({Vαℓ }α∈[0,k] ,W ℓ ))c Ykℓ , XW ℓ , UW ℓ ,W ℓ Yk ∈ Ωℓk
k+1 k k+1 k+1 k+1 k k+1
h i
=E n−ω(1) Ykℓ ∈ Ωℓk = n−ω(1) .

On the other hand, since Ecn (Wk+1ℓ )∩E ℓ ℓ


net (Wk+1 ) are events that do not depend on Yk , we can
directly apply Lemma 9.2 and Lemma 9.3 to get
 ℓ

P (Ecn (Wk+1 ))c ∪ (Enet (Wk+1

))c Yk ∈ Ωℓk =P (Ecn (Wk+1 ℓ
))c ∪ (Enet (Wk+1

))c } = n−ω(1) .
Combining the above two estimates, we conclude that
ℓ (101) 
P{(Ek+1 )c | Ωℓk } = P (Enavi ({Vαℓ }α∈[0,k] , Wk+1

))c ∪ (Ecn (Wk+1

))c ∪ (Enet (Wk+1

))c Ωℓk = n−ω(1) .
49

Case 2: k = d Recall that Ed+1 is defined as

Ed+1 =Enext ({(Vαℓ , iℓα )}α∈[0,d] , inx , Wd+1

)
=Eclu ({(Vαℓ , iℓα )}α∈[0,d] ) ∩ Eao (iℓ0 , iℓ1 , . . . , iℓd )
∩ Edist (iℓnx , iℓ0 ) ∩ Enavi ({Vαℓ }α∈[0,d] , Wd+1

) ∩ Enavi ({Vαℓ }α∈[d] , {iℓnx }) ∩ Enet (Wd+1

).
Follow the same proof as shown in the previous case, we can conclude that


P{(Ed+1 )c | Ωℓd } ≤ P (Edist (iℓnx , iℓ0 ))c ∪ (Enavi ({Vαℓ }α∈[d] , {iℓnx }))c Ωℓd + n−ω(1) .
It remains to show the first summand on the R.H.S. is also n−ω(1) . Given that P{Ωℓd } > 0, within
the event Ωℓd , we have
(iℓnx , V0ℓ , iℓ0 )
as a valid output of NetCheck({(Us, us )}s∈[ℓ−1] , W0ℓ ). Follows from Proposition 8.1, we have
Ωℓd ⊆ E0ℓ ⊆ Edist (iℓnx , iℓ0 ).
Further, we know iℓnx ∈ W0ℓ from Algorithm 3, which in turn implies
Ωℓd ⊆ E0ℓ ⊆ Enavi ({Us }s∈[ℓ−1] , W0ℓ ) ⊆ Enavi ({Us }s∈[ℓ−1] , iℓnx ).
The above two inclusions imply that

P (Edist (iℓnx , iℓ0 ))c ∪ (Enavi ({Vαℓ }α∈[d] , {iℓnx }))c Ωℓd = 0,
and the proof is complete. 
Lemma 9.6. For ℓ ≥ 1, if P{Ωℓd+1 } > 0, then

P (E0ℓ+1 )c Ωℓd+1 = n−ω(1) .
In particular, P{Ωℓ+1
0 } > 0.

Proof. We will consider the case ℓ ≥ 2 only. The proof of the case ℓ = 1 is similar but simpler.
Recall the precise definition of E0ℓ+1 (see (96)):
 
E0ℓ+1 :=EnetCheck {(Us , us )}s∈[ℓ] , W0ℓ+1
  
=Eclu {(Us , us )}s∈[ℓ] ∩ Erps ({is }s∈[ℓ] ) ∩ Enavi {Us }s∈[ℓ] , W0ℓ+1 ∩ Enet (W0ℓ+1 ).
Using the union bound, we get
n c ℓ o  c ℓ
(102) P{(E0ℓ+1 )c | Ωℓd } ≤P Eclu {(Us , us )}s∈[ℓ] Ωd + P Erps ({is }s∈[ℓ] ) Ωd
n c ℓ o  c ℓ
+ P Enavi {Us }s∈[ℓ] , W0ℓ+1 Ωd + P Enet (W0ℓ+1 ) Ωd .
Recall that  
∅ 6= Ωℓd ⊆ E0ℓ ⊆ Eclu {(Us , us )}s∈[ℓ] ∩ Erps ({is }s∈[ℓ] ),
where from (95) we have
   
[ℓ]
(103) Erps ({iα }α∈[ℓ] ) := ∀{α, β} ∈ , kXiα − Xiβ k ≥ 0.3δ .
2
We can replace the first two summands on the R.H.S. of (102) to get
 c ℓ 
P{(E0ℓ+1 )c | Ωℓd } ≤P Eclu Uℓ , uℓ Ωd + P ∃s ∈ [ℓ], kXis − Xiℓ k < 0.3δ Ωℓd
n c ℓ o  c ℓ
+ P Enavi {Us }s∈[ℓ] , W0ℓ+1 Ωd + P Enet (W0ℓ+1 ) Ωd .
50
It remains to show each of the summand above is n−ω(1) . Next, we apply Proposition 7.2 to get
(104) Ωℓd+1 ⊆ Ed+1

⊆ Eclu (Uℓ , uℓ ) ∩ {kXuℓ − Xiℓnx k ≤ 0.1δ}.
An immediate consequence is that the first summand in (103) is 0:
 c ℓ
P Eclu Uℓ , uℓ Ωd = 0.
Given that Ωℓd+1 6= ∅, within the event Ωℓd+1 , we have
(iℓnx , V0ℓ , iℓ0 )
is a valid output of NetCheck({(Us , us )}s∈[ℓ−1] , W0ℓ ). Then, from Proposition 8.1 we have we have
∀s ∈ [ℓ − 1], kXiℓnx − Xus k ≥ 0.4δ.
Using the above estimate, for each s ∈ [ℓ − 1], we can apply the triangle inequality to get
(104)
kXuℓ − Xus k ≥ kXiℓnx − Xus k − kXuℓ − Xiℓnx k ≥ 0.4δ − 0.1δ ≥ 0.3δ,
which in turn implies that the second summand in (103) is also 0:

P ∃s ∈ [ℓ], kXis − Xiℓ k < 0.3δ Ωℓd = 0.
Finally, the estimate
n c ℓ o  c ℓ
P Enavi {Us }s∈[ℓ] , W0ℓ+1 Ωd + P Enet (W0ℓ+1 ) Ωd = n−ω(1)
follows from the same argument as shown in the proof of Lemma 9.5. We will omit the details here.
Therefore, the lemma follows. 
Lemma 9.7. For ℓ ≥ 2, if P{Ωℓ0 and NetCheck({(Us , us )}s∈[ℓ−1] , W0ℓ ) 6= null} > 0, then
n o
P (E1ℓ+1 )c Ωℓ0 and NetCheck({(Us , us )}s∈[ℓ−1] , W0ℓ ) 6= null = n−ω(1) .

Proof. Within the events Ωℓ0 and NetCheck({(Us , us )}s∈[ℓ−1] , W0ℓ ) 6= null,
(iℓnx , V0ℓ , iℓ0 )
is a valid output of NetCheck({(Us , us )}s∈[ℓ−1] , W0ℓ ).
Recall that

E1ℓ = Eortho (Vαℓ , iℓα ), W1ℓ = Eao (iℓ0 ) ∩Eclu ((V0 , i0 )) ∩ Enavi (V0 , W1ℓ ) ∩ Ecn (W1ℓ ) ∩ Enet (W1ℓ ).
| {z }
trivial event
Since (V0ℓ , iℓ0 ) ∈ {(Us , us )}s∈[ℓ−1] , we automatically have
n o
Ωℓ0 ∩ NetCheck({(Us , us )}s∈[ℓ−1] , W0ℓ ) 6= ∅ ⊆ Eclu (V0ℓ , iℓ0 ).
Hence,
n o
P (E1ℓ+1 )c Ωℓ0 and NetCheck({(Us , us )}s∈[ℓ−1] , W0ℓ ) 6= ∅
n c c c ℓ o
=P Enavi (V0 , W1ℓ ) ∪ Ecn (W1ℓ ) ∩ Enet (W1ℓ ) Ω0 and NetCheck({(Us , us )}s∈[ℓ−1] , W0ℓ ) 6= ∅
| {z }
event of Y0ℓ
n c o
≤P Enavi (V0 , W1ℓ ) Ωℓ0 and NetCheck({(Us , us )}s∈[ℓ−1] , W0ℓ ) 6= ∅ + P{(Ecn (W1ℓ ))c } + P{(Enet (W1ℓ ))c }
n c o
≤P Enavi (V0 , W1ℓ ) Ωℓ0 and NetCheck({(Us , us )}s∈[ℓ−1] , W0ℓ ) 6= ∅ + n−ω(1) ,
where in the last inequality we applied Lemma 9.2 and Lemma 9.3.
51
Finally, the argument to show
n c o
P Enavi (V0 , W1ℓ ) Ωℓ0 and NetCheck({(Us , us )}s∈[ℓ−1] , W0ℓ ) 6= ∅ = n−ω(1)
is precisely the same as shown in the proof of Lemma 9.5. We will omit the details here. The
lemma follows. 
9.3. Proof of Theorem 1.4.
Proof. For simplicity, let us denote the event described by the theorem as ΩnetFound . Further, for
ℓ ≥ 2, n o

Ehalt := NetCheck({(Us , us )}s∈[ℓ−1] , W0ℓ ) = null .
Given that W0ℓ does not exist for ℓ > ⌈nς ⌉,

Ehalt =∅ ∀ℓ > ⌈nς ⌉.
Indeed, we claim that
3
Ωℓ+1
0 = ∅ ∀ℓ ≥ n 4 ς .
(The +1 plays no substantial role here. It is just convenient to have it for proof later.) Let us prove
3
this claim: Fix ℓ ≥ n 4 ς . Let us assume Ωℓ+1
0 6= ∅. First of all, it implies that Ωℓ0 is also non-empty.
Condition on
Ωℓ0 ⊆ E0ℓ ⊆ Erps (u0 , u1 , . . . , uℓ ).
We can apply a standard volumetric argument to show that
 [  X
1 = µ(M ) ≥ µ B(Xus , 0.15δ) ≥ µ(B(Xus , 0.15δ)) ≥ µmin (0.15δ) · ℓ ≥ cµ (0.15δ)d ℓ.
s∈[ℓ] s∈[ℓ]

Given (19) that δ & n−ς/2d , we have


ℓ ≤ Cnς/2 .
for some constant C = poly(Lp , 1/ℓp , 1/kKk∞ , d, µmin (rM /4)), which is a contraction. The claim
follows.
Now, we can apply Proposition 8.1 and the claim above to get
[ [
ΩnetFound = (Ωℓ0 ∩ Ehalt

)= (Ωℓ0 ∩ Ehalt

).
ℓ≥2 ℓ∈[2,n3ς/4 ]

Observe the union above is a disjoint union. Indeed, for ℓ1 6= ℓ2 ,


c
Ωℓ+1
0 ⊆ Ωℓ0 ∩ Ehalt

.
Hence,
X 
P{ΩnetFound } = P Ωℓ0 ∩ Ehalt

.
ℓ∈[2,n3ς/4 ]

Next, as a corollary of Lemma 9.5, Lemma 9.7, and Lemma 9.6, we have that, for ℓ ≥ 2,
n c o
P (Ωℓ+1
0 )c
Ω ℓ
0 ∩ (E ℓ
halt = n−ω(1)
 c 
⇒ P Ωℓ0 ∩ Ehalt

≤ P Ωℓ+1
0 + n−ω(1) ,
which again due to Ωℓ+1
0
ℓ )c . Let us formualte this inequality into an inductive form:
⊆ Ωℓ0 ∩ (Ehalt
 ℓ  c 
(105) ℓ
P Ω0 ∩ Ehalt ≤P Ωℓ+1
0 + n−ω(1)
 ℓ+1 ℓ+1 c

≤P Ω0 ∩ (Ehalt ) + P Ωℓ+10
ℓ+1
∩ Ehalt + n−ω(1) .
52
Now we apply the above inequality recursively, we have
1 =P{Ω20 } + P{(Ω10 )c }
=P{Ω20 ∩ Ehalt
2
} + P{Ω20 ∩ Ehalt
c
} + n−ω(1)
(106) ≤P{Ω20 ∩ Ehalt
2
} + P{Ω30 } + 2n−ω(1)
X
≤ P{Ωs0 ∩ Ehalt
s
} + P{Ωℓ+1
0 }+n
3ς/4 −ω(1)
n = P{ΩnetFound } + 0 + n−ω(1) .
ℓ∈[2,s]

Rearranging the terms, the proof is complete.




10. From cluster-net to distance approximation


The goal of this section is to prove Theorem 1.1 and Theorem 1.3, which are built up on the
cluster-net {(Us , us )}s∈[ℓ] obtained from Algorithm buildNet. This section is organized as follows:
(1) First, show that with high probability, we can determine the distance between us and v ∈ V
from the number of points in the cluster which shares an edge with v.
(2) Second, we constructed a weighted graph Γ = Γ(G), the graph stated in Theorem 1.1, and
the corresponding metric deuc stated in Theorem 1.1.
(3) Finally, we introduce the measure ν stated in Theorem 1.3 and prove the theorem.

10.1. Partition of V and distance estimate. To avoid the dependences of the random sets
{(Us , us )}s∈[ℓ] , which are the output of buildNet, we double the size of our vertex set from (11):
|V| = 2n · (d + 2) · ⌈nς ⌉,
and partition it into two disjoint subsets of size n · (d + 2) · ⌈nς ⌉:
V = V1 ⊔ V2 .
Such modification does not affect the statement of the theorems, as discussed for the adjustment
appeared in (11) previously. A second modification is the following: We apply Algorithm buildNet
with input GV1 , instead of GV . Let ΩnetFound be the event stated in Theorem 9.1. Condition
on ΩnetFound , the algorithm returns a (δ, η)-cluster-net {(Us , us )}s∈[ℓ] with us ∈ Us ⊆ V1 with
ℓ ≤ n3ς/4 .
The discussion of this section is based on the above description and condition on
ΩnetFound .
Next, we partition V2 into (d + 2) · ⌈nς ⌉ subsets of size n, indexed from 0 to (d + 2) · ⌈nς ⌉ − 1:
[
V2 = Vs .
s∈[0,(d+2)·⌈nς ⌉−1]

For each Vs with s ∈ [ℓ], we want to extract Us′ ⊆ Vs so that (Us′ , us ) is a 3η-cluster. (See
Definition 4.2). This is possible if Enavi (Us , Vs ) holds.
Let us denote P
p(kXu − xk)
ps (x) := u∈Us ,
|Us |
and define  
′ |N (v) ∩ Us |
Us = v ∈ Vs ≥ p(1.5η) .
|Us |
53
Let
 
\
(107) Ω′netfound = ΩnetFound ∩  Enavi (Us , Vs ) ∩ Enet (Vs ) .
s∈[ℓ]

The reason we introduce Ω′netfound is the following:


Lemma 10.1. Given the occurrence of Ω′netFound , (Us′ , us ) is a 3η-cluster for each s ∈ [ℓ].
Proof. Within the event Enavi (Us , Vs )
|N (v) ∩ Us |
∀v ∈ Vs , − ps (Xv ) ≤ n−1/2+ς .
|Us |
And given that ΩnetFound ⊆ Eclu (Us , us ), for v ∈ Vs we have
ps (Xv ) < p(kXv − Xus k − η).
Now, we impose the assumption that kXv − Xus k ≥ 3η. Then,
|N (v) ∩ Us | (16)
≤ ps (Xv ) + n−1/2+ς < p(2η) + n−1/2+ς ≤ p(2η − Lp n−1/2+ς ) ≤ p(1.5η).
|Us |
Therefore, we conclude that for v ∈ Us′ , kXv − Xus k < 3η.
It remains to show |Us′ | ≥ n1−ς . First, from Enet (Vs ),
(108) |{v ∈ Vs : kXv − Xus k ≤ ε}| ≥ n1−ς .
Now, for each v ∈ B(Xus , ε) ∩ Vs ,
|N (v) ∩ Us | (16)
≥ ps (Xv ) − n−1/2+ς < p(ε + η) − n−1/2+ς ≤ p(η + ε + Lp n−1/2+ς ) ≤ p(1.5η).
|Us |
Therefore, we conclude that
{v ∈ Vs : kXv − Xus k ≤ ε} ⊆ Us′
and |Us′ | ≥ n1−ς follows from (108). 
Expanding our discussion, we introduce additional notation for simplicity. Define Vrest as the
set of vertices in V excluding those involved in any of Us and Us′ accross all s ∈ [ℓ]:
 [ 
Vrest := V \ Us ∪ Us ′ .
s∈[ℓ]

It is important to note that applying the algorithm buildNet to GV1 only reveals (XV1 , UV1 ).
Moreover, for each s ∈ [ℓ], to extract the set Us′ , the additional information we need to reveal are
UUs ,Vs and XVs . In fact, for each s ∈ [ℓ], the edges UUs′ ,Vrest remain hidden.
Our next step involves estimating the distance from each vertex to the cluster and the distance
between clusters, which relies on part of the remaining unrevealed edges. The primary goal in this
subsection is the following:
Proposition 10.2. Consider the event
\
(109) Ωdist = Ω′netFound ∩ Enavi (Us′ , Vrest ),
s∈[ℓ]

which an event of
Y = (XV1 , UV1 , XV1 , UU1 ,V1 , . . . , XVℓ , UUℓ ,Vℓ ).
54
S
/ s∈[ℓ] Us ∪ Us′ ,
Then, condition on Ωdist , for each v ∈
 ′

−1 |N (v) ∩ Us |
p − kXv − Xus k ≤ 0.01δ.
|Us′ |
And if v ∈ Us ∪ Us′ ,
kXv − Xus k ≤ 3η ≤ 0.01δ.
Further, for any sample y ∈ Ω′netFound ,
P{Ωcdist | Y = y} = n−ω(1) .
In particular,
P{Ωcdist | Ω′netFound } = n−ω(1) .
In other words, given Ωdist , by observing the graph G, one can detect distance from Xus to every
other Xv with accurcy 0.01δ.
Proof. Probability Estimate: Given that the definition of Ωdist and ℓ ≤ n−3ς/4 from Theorem 9.1,
it is suffice to prove
 c
(110) ∀s ∈ [ℓ], P Enavi Us′ , Vrest Y = y = n−ω(1) .
Since conditioning might not seem straight forward, let us go over it carefully once. First, let us
fix a realization
(XV1 , UV1 ) = (xV1 , uV1 ) ∈ ΩnetFound .
Then, the set {(Us , us )}s∈[ℓ] is determined (as a function of (xV1 , uV1 )). Now we fix s ∈ [ℓ] and a
realization of
(UUs ,Vs , XVs ) = (uUs ,Vs , xVs ) ∈ Enavi (Us , Vs ) ∩ Enet (Vs ).
Similarly, the set Us′ is determined (as a function of (xV1 , uV1 , uUs ,Vs , xVs )). Now, let us fixed a
realization of Y = y described in the statement of the Proposition. As a sanity check, the random
variables UUs′ ,V\Us has not been revealed, yet.
Now, we apply Lemma 9.4 to get
 c
PUUs ,V\Us Enavi Us′ , V \ Us Y = y) = n−ω(1) ,
which in turn implies
 c
P Enavi Us′ , V \ Us Ω′netFound = n−ω(1) ,
following Fubini’s theorem. With ℓ ≤ n3ς/4 from Theorem 9.1, (110) follows.
Distance Estimate: For the second statement, let us first notice (Us′ , us ) is a 3η-cluster from
Lemma 10.1. Thus, for v ∈ V \ Us ,
|N (v) ∩ Us′ |
p(kXv − Xus k − 3η) ≤ps (Xv ) ≤ + n−1/2+ς
|Us′ |
 |N (v) ∩ Us′ |
⇒ p kXv − Xus k − 3η − ℓ−1
p n
−1/2+ς
≤ .
|Us′ |
Now, we apply p−1 on both sides and relying on the fact that p is monotone decreasing to get
 ′ 
−1 |N (v) ∩ Us |
p ≥ kXv − Xus k − 3η − ℓ−1
p n
−1/2+ς
≥ 0.01δ.
|Us′ |
The bound on the other side can be derived by the same argument. We will omit the proof here. 
55
10.2. Construction of the weighted graph Γ and the metric deuc .
Definition 10.3 (Graph Γ(G, r)). Given the event Ωdist ⊆ ΩnetFound and a parameter r > 0,
we construct a weighted graph Γ(G, r) with weight w(u, v) in the following way: Starting with the
edgeless graph on V.
• First, for 1 ≤ i < j ≤ ℓ, if
 ′ 
−1 |N (uj ) ∩ Ui |
p ≤ r,
|Ui′ |
then we connect the edge {ui , uj } and assign a weight on the edge
 
|N (uj ) ∩ Ui′ |
w(ui , uj ) = p−1 + 0.04δ.
|Ui′ |
 
|N (uj )∩Ui′ |
(Intuitively, we want to define w(ui , uj ) = p−1 ′
|Ui | . However, a constant term
0.04δ is added to this weight to discourage overly long paths appeared as a shortest path
distance. This modification avoids cumulating gaps when comparing shortest path distance
and the underlying geodesic distance.)
• Second, for Sv∈/ {us }s∈[ℓ] ,
– if v ∈ s∈[ℓ] Us ∪ Us′ , then there exists unique sv ∈ [ℓ] such that v ∈ Usv ∪ Us′ v . Here,
we connect the edge {v, usv } with a weight
wr (v, usv ) = δ.
S
– if v ∈
/ s∈[ℓ] Us ∪ Us′ , then let
 
−1 |N (v) ∩ Us′ v |
sv := argmins∈[ℓ] p .
|Us′ v |
In the event of a tie, we choose sv to be the index with smaller value.) Then, we
connect the edge {v, usv } with a weight
wr (v, usv ) = δ.
Further, let dΓ(G,r) (v, w) denote the path distance of Γ(G, r). That is
(k−1 )
X
dΓ(G,r) (v, w) = min w(vi , vi+1 ) v0 = v, vk = w, and {vi , vi+1 } is an edge in Γ(G, r) .
i=0

Indeed, we will only consider the graph Γ with 2 choices of parameter r:


• For approximation in geodesic distance, Γ = Γ(G, 2δ1/3 ), where r = 2δ1/3 . Roughly speak-
ing, the induced subgraph of Γ restricting to {us }s∈[ℓ] is a graph where edges are formed
between ui and uj if kXui − Xuj k . δ1/3 . Then, for v ∈
/ {us }s∈[ℓ] , it is connected to exactly
one neighbor usv , with weight on the edge δ.
• In our approach to approximation within the Euclidean distance framework, we specify deuc
as the path distance metric on Γ = Γ(G, ∞), acknowledging that r = ∞. We recognize that
this choice may not seem the most intuitive, particularly given that Γ(G, ∞) offers limited
metric information compared to the geodesic setting. Specifically, the induced subgraph
of Γ(G, ∞) corresponding to uss∈[ℓ] is simply a complete graph. Further, as we will show
later, the direct connection between any two points (ui , uj ) indeed represents the shortest
path from ui to uj for all i, j ∈ [ℓ]. The reason behind adopting Γ(G, ∞) for defining deuc
is that we could recycle arguments common to the geodesic distance setting.
56
Definition 10.4. Given the event Ωdist , we define the weighted graph
Γ = Γ(G, 2δ1/3 )
as in Definition 10.3. This will be the graph stated in Theorem 1.1. Further, let Γeuc = Γ(G, ∞)
and define the metric deuc on V as
deuc (v, w) = dΓeuc (v, w),
the path metric on Γeuc .
To avoid ambiguity, let w(v, w) denote the weight of the edge {v, w} in Γ and weuc (v, w) denote
the weight of the edge {v, w} in Γeuc .
Let us wrap this subsection with the following observation on Γ:
Lemma 10.5. Condition on Ωdist . The following holds.
• For ∀1 ≤ i < j ≤ ℓ,
 ′

−1 |N (uj ) ∩ Ui |
p ≤ 2δ1/3
|Ui′ |
(111) ⇒ kXui − Xuj k + 0.03δ ≤ w(ui , uj ) ≤ kXui − Xuj k + 0.05δ.
S
/ s∈[ℓ] Us ∪ Us′ ,
• For each v ∈
(112) kXv − Xusv k ≤ 1.02δ,
Proof. The first statement follows from the definition of Γ and Proposition 10.2. For the second
statement, recall that Ωdist ⊆ ΩnetFound , thus {us }s∈[ℓ] is a δ-net. In particular, there exists s′v such
that kXv − Xus′ k ≤ δ. Thus, by Proposition 10.2,
v
 ′ 
−1 |N (v) ∩ Usv |
kXv − Xusv k ≤p + 0.01δ
|Us′ v |
!
−1
|N (v) ∩ Us′ ′v |
≤p + 0.01δ ≤ kXv − Xus′ k + 0.02δ ≤ 1.02δ.
|Us′ ′ | v
v


An immediate analogue in the case when r = ∞ is the following:
Lemma 10.6. Condition on Ωdist . If r = ∞, then the following holds.
• For ∀1 ≤ i < j ≤ ℓ,
(113) kXui − Xuj k + 0.03δ ≤ weuc (ui , uj ) ≤ kXui − Xuj k + 0.05δ.
S
/ s∈[ℓ] Us ∪ Us′ ,
• For each v ∈
(114) kXv − Xusv k ≤ 1.02δ.
We omit the proof here since it is an analogue to that of Lemma 10.5.
10.3. Proof of Theorem 1.1. Let us reformulate Theorem 1.1 in the context of Γ and Ωdist
discussed in the previous subsection.
Theorem 10.7. Given the event Ωdist, which happens with probability 1− n−ω(1) . For all v, w ∈ V,
we have
|dΓ (v, w) − dgd (Xv , Xw )| ≤ Cdiamgd (M )rM δ2/3 ,
where C ≥ 1 is an universal constant, and
|deuc (v, w) − kXv − Xw k| ≤ 4δ.
57
Before we proceed to the proof of the theorem, we need a comparison between Euclidean distance
and geodeisc Distance:
Lemma 10.8. For p, q ∈ M with kp − qk ≤ rM ,
dgd (p, q) ≤ kp − qk(1 + κkp − qk2 /2).
Proof. We fix such p, q described in the Lemma. Consider Proposition A.4 with H = Tp M .
By the definition of rM , we know that B(p, rM ) ∩ M is connected. By (e) from Proposition A.4,
we know the orthogonal projection P from M to Tp M if restricted to B(p, rM ) ∩ M . Denoting its
inverse by φ. Then, there exists v ∈ Tp M ∩ B(p, rM ) such that φ(v) = q. Now, we apply (d) from
the Proposition with v to get
dgd (p, q) ≤ kp − qk(1 + κkp − qk2 /2).


10.4. Proof for the Geodesic Distance Setting. Let us begin with the proof of Theorem 10.7
in the geodesic setting. In particular, we split comparison of dΓ and dgd into two parts:
Lemma 10.9. Condition on Ωdist . With Γ = Γ(G, 2δ1/3 ), for every pair of vertices v, w ∈ V,
dΓ (v, w) ≤ dgd (Xv , Xw )(1 + 20δ2/3 ) ≤ dgd (Xv , Xw ) + 20diamgd (M )δ2/3 .
Proof. Fix v, w ∈ V. S
Case: kXv − Xw k ≤ δ1/3 . Assume that both v, w ∈
/ s∈[ℓ] Us . Then,

kXusv − Xusw k ≤ kXv − Xusv k + kXv − Xw k + kXw − Xusw k ≤ 1.02δ + δ1/3 + 1.02δ ≤ 1.1δ1/3 .
where the second to last inequality follows from Lemma 10.5. Particularly, according to the event
Ωdist , we have
  ′   ′ 
−1 |N (usv ) ∩ Usw | −1 |N (usw ) ∩ Usv |
max p ,p ≤ kXusv − Xusw k + 0.01δ ≤ 1.2δ1/3 .
|Us′ w | |Us′ v |
Hence, by (111), {usv , usw } is an edge with weight
w(usv , usw ) ≤ kXusv − Xusw k + 0.05δ.
Combining these together, we have
dΓ (v, w) ≤w(v, usv ) + w(usv , usw ) + w(usw , w)
≤δ + kXusv − Xusw k + 0.05δ + δ
≤3δ + kXv − Xusv k + kXv − Xw k + kXw − Xusw k
(112)
≤ 6δ + kXv − Xw k
≤dgd (Xv , Xw ) + 6δ.
Therefore, we conclude that
dΓ (v, w) ≤ dgd (Xv , Xw ) + 6δ.
When v or w is contained in Us ∪ Us′ , the proof is simliar. Therefore, we will omit the repetition of
the proof here.

Case: kXv − Xw k ≥ δ1/3 . First of all, we have

dgd (Xv , Xw ) ≥ δ1/3 .


58
Consider a geodesic γ from Xv to Xw in M . Let t := ⌈dgd (Xv , Xw )/δ1/2 ⌉. Along the geodesic, we
can find a sequence of points
(Xv = X0 , X1 , X2 , . . . , Xt = Xw )
such that for i ∈ [t − 1],
1
dgd (Xi , Xi−1 ) = δ1/3
2
and dgd (Xt−1 , Xt ) ≤ 12 δ1/3 .
From ΩnetFound ⊇ Ωdist , we know that {us }s∈[ℓ] is a δ-net. Thus, there exists a sequence of
vertices
(v = v0 , v1 , v2 , . . . , vt = w)
with {vi }i∈[t−1] is a multiset of {us }s∈[ℓ] such that for each i ∈ [t − 1],
kXvi − Xi k ≤ δ.
As an immediate consequence, we have for i ∈ [t − 1],
1
kXvi−1 − Xvi k ≤ δ1/3 + 2δ ≤ δ1/3 ,
2
and following the previous case, we have
dΓ (vi−1 , vi ) ≤ dgd (Xvi−1 , Xvi ) + 6δ.
Now,
X
dΓ (v, w) ≤ dΓ (vi−1 , vi )
i∈[t]
X 
≤ dgd (Xvi−1 , Xvi ) + 6δ
i∈[t]
X 
≤ dgd (Xi−1 , Xi ) + dgd (Xi−1 , Xvi−1 ) + dgd (Xi , Xvi ) + 6δ
i∈[t]
X 
≤ dgd (Xi−1 , Xi ) + δ(1 + κδ2 /2) + δ(1 + κδ2 /2) + 6δ
i∈[t]
dgd (Xv , Xw )
≤dgd (Xv , Xw ) + 2 (6δ + 2δ(1 + κδ2 /2))
δ1/3
≤dgd (Xv , Xw )(1 + 20δ2/3 ),
where the last inequality follows from κ ≥ rM and δ2 ≤ rM .


Next, it remains to prove a lower bound of dΓ (v, w) by dgd (Xv , Xw ).

Lemma 10.10. Suppose n is large enough so that δ1/3 < 0.1rM . Condition on Ωdist . With
Γ = Γ(G, 2δ1/3 ), for every pair of vertices v, w ∈ V,
dΓ (v, w) ≥ dgd (Xv , Xw ) − Cdiamgd (M )rM δ2/3 ,
for some universal constant C ≥ 1.
Proof. Let
(v = v0 , v1 , v2 , . . . , vt = w)
59
be a shortest path in Γ connecting v and w. That is,
X
dΓ (v, w) = w(vi−1 , vi ).
i∈[t]

The main technical part is to show t is at most of order δ−1/3 . Let us assume that v, w ∈
/ {us }s∈[ℓ] .
We will first derive two claims about properties of the path.

Claim 1: {vi }i∈[t−1] ⊆ {us }s∈[ℓ] . Let us prove by contraction. Given that v has only one
neighbor usv , v1 ∈ {us }s∈[ℓ] . For the same reason, u ∈ {us }s∈[ℓ] as well. Suppose vi ∈
/ {us }s∈[ℓ] for
some 1 < i < t − 1. For the same reason we have
vi−1 = vi+1 = usvi ,
which makes (v0 , v1 , . . . , vi−1 , vi+1 , . . . , vt ) a shorter path, and this is a contradiction.

Claim 2: For 1 ≤ i < j ≤ t − 1 with j − i > 1, then


kXvi − Xvj k > δ1/3 .
Let us assume the claim fails that kXvi − Xvj k ≤ δ1/3 . Without lose of generality, from the first
Claim we may assume (vi , vi+1 , . . . , vj ) = (ui , ui+1 , . . . , uj ). (The path (vi , vi+1 , . . . , vj ) must have
distinct vertices, otherwise it is not a shortest path.) Then, given Ωdist ,
!
−1
|N (ui ) ∩ Uj′ |
p ≤ δ1/3 + 0.01δ ≤ 2δ1/3 .
|Uj′ |
Hence, by (111) we have
w(ui , uj ) ≤ kXui − Xuj k + 0.05δ.
On the other hand,
X X    
|N (ui′ −1 ) ∩ Ui′′ |
w(ui′ −1 , ui′ ) = p−1 + 0.04δ
|Ui′′ |
i′ ∈[i+1,j] i′ ∈[i+1,j]
X  
≥ kXui′ −1 − Xui′ k − 0.01δ + 0.04δ
i′ ∈[i+1,j]
≥kXui − Xuj k + (j − i)0.03δ ≥ kXui − Xuj k + 0.06δ > w(ui , uj ),
which implies that
(v0 , v1 , . . . , vi , vj , vj+1 , . . . , vt )
is a shorter path, a contradiction to the assumption of the path. Thus, the claim holds.
With the two claims been proved, t can be bounded from above. By the second claim, for any
i ∈ [1, t − 3],
w(vi , vi+1 )+w(vi+1 , vi+2 ) ≥ kXvi −Xvi+1 k+0.03δ +kXvi+1 −Xvi+2 k+0.03δ ≥ kXvi+2 −Xvi k ≥ δ1/3 .
Hence,     
X t−2 t − 4 1/3
1/3
dΓ (v, w) ≥ w(vi , vi+1 ) ≥ max ,0 δ ≥ δ .
2 2
i∈[t−2]
On the other hand, by Lemma 10.9,
dΓ (v, w) ≤ 2diamgd (M ).
Together we conclude that
5diamgd (M )δ−1/3 ≥ 4diamgd (M )δ−1/3 + 4 ≥ t.
60
The case when v or w is contained in {us }s∈[ℓ] is the same and simpler, and the same upper bound
for t follows. We omit the proof here.
Now we are ready to prove the lemma. Given the boundedness of t, we can show the following.
For each i ∈ [t],
w(vi , vi−1 ) ≥ kXvi − Xvi−1 k − 0.01δ.
Further, from definition of Γ(G), any weight does not exceed 2δ1/3 + 0.04δ ≤ 3δ1/3 .
Finally, relying on the additional assumption that δ1/3 ≤ 0.1rM , we can apply Lemma 10.8 to
get, for i ∈ [t],
27
dgd (Xvi , Xvi−1 ) ≤ kXvi − Xvi−1 k(1 + κkXvi − Xvi−1 k2 /2) ≤ kXvi − Xvi−1 k + rM δ.
2
Then, we conclude that
X
dΓ (v, w) ≥kXv1 − Xv0 k − δ + kXvt − Xvt−1 k − δ + kXvi − Xvi−1 k
i∈[t−1]
X 27

≥ dgd (Xvi , Xvi−1 ) − rM δ − 2δ
2
i∈[t]

≥dgd (Xv , Xw ) − diamgd (M )rM δ2/3 − 2δ.



Proof of the geodesic distance statement in Theorem 10.7: The proof follows from Lemma 10.9 and
Lemma 10.10. 
10.5. Proof for the Euclidean Distance Setting.
Proof of the Euclidean distance statement in Theorem 10.7: Claim 1: For 1 ≤ i < j ≤ ℓ, the
shortest path in Γeuc connecting ui and uj is (ui , uj ).
First, the path (ui , uj ) has length w(ui , uj ) ≤ kXui − Xuj k + 0.05δ by Lemma 10.6.
Consider any shortest path (ui = v0 , v1 , . . . , vt = uj ) in Γeuc connecting ui and uj . Then, as
illustrated in the proof of 1st Claim of Lemma 10.10, all vertices of this path are contained in
{us }s∈[ℓ] . Now suppose t ≥ 2. Then, for 1 ≤ t′ ≤ t, we have w(vt′ , vt′ −1 ) ≥ kXvt′ − Xvt′ −1 k + 0.03δ.
Hence, by Lemma 10.6,
X
length of (ui = v0 , v1 , . . . , vt = uj ) ≥ kXvt′ − Xvt′ −1 k + 0.03δ ≥ kXui − Xuj k + 0.06δ,
t′ ∈[t]

which is strictly greater than the length of (ui , uj ). Thus, a contradiction follows and the claim
holds.
Claim 2: In general, for any distinct v, w ∈ V, the shortest path in Γeuc connecting v and w is


 (v, usv , usw , w) if v, w ∈/ {us }s∈[ℓ] ,

(v, u , w)
sw if v ∈/ {us }s∈[ℓ] , w ∈ {us }s∈[ℓ] ,
(v, usv , w)
 if v ∈ {us }s∈[ℓ] , w ∈ / {us }s∈[ℓ] ,


(v, w) if v, w ∈ {us }s∈[ℓ] .
The last case was shown in the first claim. It suffices to show the first case, as we will see soon
that the second and third cases are similar. Consider the shortest path (v = v0 , v1 , . . . , vt = w) in
Γeuc connecting v and w. Then, immediately we have v1 = usv and vt−1 = usw since v and w are
only connected to usv and usw , respectively. Next, (usv = v1 , v2 , . . . , vt−1 = usw ) must also be the
shortest path in Γeuc connecting usv and usw , as a subpath of (v0 , v1 , . . . , vt ). From the first claim,
we know (usv = v1 , v2 , . . . , vt−1 = usw ) = (usv , usw ). Hence, the first case follows. The claim holds.
61
Finally, by Lemma 10.6, for v, w ∈
/ {us }s∈[ℓ] ,
deuc (v, w) = dΓeuc (v, w) ≤δ + kXusv − Xusw k + 0.05δ + δ ≤ kXusv − Xusw k + 2δ
≤ kXv − Xw k + kXv − Xusv k + kXw − Xusw k + 2δ ≤ kXv − Xw k + 4δ.
Also, the lower bound can be establish in the same way:
deuc (v, w) = dΓeuc (v,w) ≥ w(usv , usw ) ≥ kXusv − Xusw k + 0.03δ
≥ kXv − Xw k − kXv − Xusv k − kXw − Xusw k + 0.03δ ≥ kXv − Xw k − 2δ.
The same estimate when v ∈ {us }s∈[ℓ] or w ∈ {us }s∈[ℓ] can be established in the same way, we will
omit the proof here. Therefore, the theorem follows. 
10.6. Gromov-Hausdorff Distance: Proof of Theorem 1.3. Recall that Ω′netFound is an event
of
Y := (XV , UV1 , {XVs , UUs ,Vs }s∈[ℓ] ).
The discussion in this subsection is always conditioned on a sample Y = y ∈ Ω′netFound .
It is worth to make a few remarks:
• Recall that V0 ⊆ V2 is a vertex set size n and Ω′netFound is not an event of (XV0 , UV0 ,S Us′ ).
• While we have not condition on Ωdist ⊆ Ω′netFound , the graph Γ(G, r) is well-defined for any
r ≥ 0.
• For v ∈ V0 , recall the definition of sv for v ∈ V0 from the definition of Γ(G, r) (regardless of
the choice of r):
|N (v) ∩ Us′ |
sv = argmaxs∈[ℓ] .
|Us′ |
Now, we define a (random) measure ν on {us }s∈[ℓ] as follows:
|{v ∈ V0 : sv = s}|
ν({us }) = .
|V0 |
Fix any v ∈ V0 , given the fact that (Xw , U{w},Ss∈[ℓ] Us′ ) for w ∈ V0 \ {v} are i.i.d. copies of
(Xv , U{v},Ss∈[ℓ] Us′ ), which in turn implies that the conditional random variables {sw }w∈V0 are i.i.d.
copies. Hence, ν is the empirical measure of the measure ν0 defined by
ν0 ({us }) = P{sv = s}.
Before we proceed, let us emphasize that both ν and ν0 are random measures, due the fact that
{Us , us }s∈[ℓ] are random. Though, to avoid the complexity of notation, we will not explicitly write
the dependence on the sample y ∈ Ω′netFound .
Lemma 10.11. Fix any sample y ∈ Ω′netFound . With probability 1 − n−ω(1) (on (XV0 , UV0 ,Ss∈[ℓ] Us′ )),
the total variation distance of ν and ν0 is bounded by n−1/4 . That is, for any U ⊆ {us }s∈[ℓ] ,
|ν0 (U ) − ν(U )| < n−1/4 .
We denote the above event by Ων,ν0 ⊆ Ω′netFound .
Proof. Notice that nν({us }) is the sum of n i.i.d. Bernoulli random variables with probability
ν0 ({us }). For each s ∈ [ℓ], we can apply the Hoeffding’s inequality, for t ≥ 0,
 
 (tn)2
P ν({us }) − ν0 ({us }) ≥ t ≤ 2 exp −2 .
n
log(n)
By taking t = √
n
and apply the union bound argument, we have
62
 
log(n)
P ∃s ∈ [ℓ], ν({us }) − ν0 ({us }) ≥ √ = n−ω(1) ,
n
where we relied on ℓ ≤ n3ς/4 from ΩnetFound ⊇ Ω′netFound .
Finally, for U ′ ⊆ {us }s∈[ℓ] ,
X log(n) log(n)
|ν(U ′ ) − ν0 (U ′ )| ≤ √ ≤ √ ℓ ≤ log(n)n−1/2 n3ς/4 < n−1/4 .
n n
s∈[ℓ]

Proof of Theorem 1.3. Recall the event Ω′netFound introdueced in the last section, which happens
with probability 1−n−ω(1) . Now we fix a realziation y ∈ Ω′netFound . We will show that the statements
of the Theorem hold within the realization.
We define the graph Γ̃ to be the induced subgraph of Γ with vertex set V ′ = {us }s∈[ℓ] .
Notice that there is a natural coupling πµ,ν0 of µ and ν0 in the generation of the random graph
Γ(G): Fix v ∈ V0 , consider the (Xv , U{v},Ss∈[ℓ] Us′ ). We have Xv ∼ µ and sv = sv (Xv , U{v},Ss∈[ℓ] Us′ ) ∼
ν0 . Now, let v ′ be another vertex of V0 . Then, (X, u) = X(v), usv ) and (X ′ , u′ ) = (X(v ′ ), usv′ ) are
two independent copies of random pairs with distribution πµ,ν0 .

Consider the event Ωdist ⊆ Ω′netFound introduced in the last section. Within the event, we have
kXv − Xusv k ≤ 1.01δ. Further, from Proposition 10.2,

(115) P{Ωcdist | Y = y} = n−ω(1) .


Then, by Theorem 10.7, within the event Ωdist , we have
|dgd (X, X ′ ) − dΓ̃ (u, u′ )| ≤dgd (Xv , Xusv ) + dgd (Xv′ , Xus ′ ) + |dgd (Xusv , Xus ′ ) − dΓ (Xusv , Xus ′ )|
v v v
2/3 ′ 2/3
(116) ≤δ + δ + Cdiamgd (M )rM δ ≤ C diamgd (M )rM δ .

Next, given the total variation distance between ν0 and ν is almost n−1/4 from Lemma 10.11,
we also know there exists a coupling πν0 ,ν such that for (u, w) ∼ πν0 ,ν ,

P{u 6= w} < n−1/4 .

We can further couple (µ, ν, ν0 ) together to get a measure πµ,ν0 ,ν , with the marginal distribution
corresponding to (µ, ν0 ) and (µ, ν) being πµ,ν0 and πν0 ,ν , respectively. While such a coupling is not
unique, for any πµ,ν0 ,ν , the marginal distribution corresponding to (µ, ν) is a probability measure
π such that for a pair of independent copies of (X, u, w), (X, u′ , w′ ) ∼ π,
   
P |dgd (X, X ′ ) − dΓ (w, w′ )| ≥ 1.01δ ≤ P |dgd (X, X ′ ) − dΓ (u, u′ )| ≥ 1.01δ + P u = w + P u′ = w′
≤ n−ω(1) + 2n−1/4 ,
and the first statement about the coupling in the geodesic distance setting follows.
Now we move to the proof of the second statement. Same to the derivation of (116), we can
apply Theorem 10.7 to show that: Given the event Ωdist ⊆ Ω′netFound ,

∀ũ ∈ {us }s∈[ℓ] , |dgd (X, Xũ ) − dΓ (u, ũ)| ≤ C ′ diamgd (M )rM δ2/3 .
63
Thus, for any ũ ∈ {us }s∈[ℓ] and t > 0, we have

P X ∈ Bgd (Xũ , t + C ′ diamgd (M )rM δ2/3 ) |Y = y} − P{Ωcdist | Y = y}

≤P u ∈ BΓ (ũ, t)

≤P X ∈ Bgd (Xũ , t + C ′ diamgd (M )rM δ2/3 ) | Y = y} + P{Ωcdist | Y = y}.
Now, if we combine the above inequality with (115) and

P{w ∈ BΓ (ũ, t)} − P u ∈ BΓ (ũ, t) < n−1/4
from the total variation distance estimate Lemma 10.11, the second statement follows.
It remains to these two statements for the coupling with respect to Euclidean distance. However,
the proof is identical to the one in the geodesic distance setting, with the difference been replacing
(116) by the corresponding estimate given Theorem 10.7 for the Euclidean distance setting. We
omit the proof here. 

11. The Graph Observer


The goal of the graph observer is to reconstruct geometric information of the underlying manifold
M from the random geometric graph G. In this subsection, we describe explicitly what are given to
the graph observer. Proposition 11.1, a crucial practical result of this subsection, says how the graph
observer may “use” the results in this paper. First, the proposition says there are “graph-observer
versions” of parameters which the graph observer can compute exactly. Then the proposition gives
a test which the graph observer can run on these parameters. If the parameters pass the test, then
the graph observer may plug in these graph-observer versions of parameters as various parameters
throughout the paper, and the parameters will be feasible.
The graph observer is given the following information:
(i) an instance of the random graph G—for example, in terms of the adjacency matrix—
together with the description of how the random graph is constructed (see Subsubsec-
tion 2.2.3),
(ii) the property of the manifold M that it is compact, complete, connected, and smooth, and
that it is embedded in some Euclidean space RN (even though N is not given explicitly to
the graph observer),
(iii) the dimension d ∈ Z≥1 of the manifold M ,
(iv) the function p : [0, ∞) → [0, 1] (see Subsubsection 2.2.2),
(v) an upper bound3 D ◦ > 0, together with the information that diam(M ) ≤ D ◦ ,
(vi) an upper bound κ◦ > 0, together with the information that κ ≤ κ◦ (see Subsubsection 2.2.1),
(vii) a lower bound rM,0◦ ◦
> 0, together with the information that rM,0 ≥ rM,0 (see Subsubsec-
tion 2.2.1), and
(viii) a constant C ◦ > 0, together with the information that
µmin (x) ≥ C ◦ · xd ,
for every x ∈ [0, rM ] (even though rM is not given explicitly to the graph observer).
The graph observer may take any ς ∈ (0, 1/4) and take Cgap = 210 . These two parameters, ς and
Cgap , are considered known and need not be given to the graph observer.
Next, the graph observer computes:
 
(i) the positive integer n = |V|1−2ς ,
3As a rule of thumb, we use the superscript ◦
to indicate parameters which the graph observer is given or can

compute exactly. This circle superscript is a mnemonic: it has the shape of the letter “o”, the first letter in the
word “observer.”
64
◦ := 0.01 · min{1/κ◦ , r ◦ },
(ii) rM M,0
(iii) the Lipschitz constant Lp ,
(iv)
ℓ◦p := min |p′ (x)|,
x∈[0,2D ◦ ]

(v)
( )
◦ ◦ 1/d −ς/d n−1/2+ς
ε := max 6 · (2/C ) n , ,
p(D ◦ ) · Lp
(vi)
1 ◦ 2 ◦ ◦
c◦1 := (ℓ ) C (rM /4)d ,
800 p
(vii)
1/2
Cgap d1/2 Lp
C2◦ := 4 · ◦ 1/2 ◦ 1/2 ,
(ℓp ) (c1 )
(viii)
ℓ◦p
c◦3 := √ ,
Cgap d
(ix)
( )
L2p
η ◦ := max C2◦ · (ε◦ )1/2 , ◦ √ · ε◦ ,
c3 ℓ◦p d

(x) δ◦ := Cgap d · η ◦ , and
(xi) r ◦ := Cgap d2 · δ◦ .
It is not hard to see that all the eleven numbers computed above are strictly positive (and finite).
We note from the definition of rM ◦ that


(117) rM = 0.01 · min{1/κ◦ , rM,0

} ≤ 0.01 · min{1/κ, rM,0 } = rM .
Moreover, we deduce from Corollary A.6 in the appendix that
D ◦ ≥ diam(M ) > 2rM ,
and therefore, by the definition of ℓ◦p , we obtain
(118) ℓ◦p = min |p′ (x)| ≤ min |p′ (x)| = ℓp .
x∈[0,D ◦ ] x∈[0,2rM ]

Hence, both parameters rM ◦ and ℓ◦ which the graph observer can compute exactly are lower bounds
p
for their corresponding variables.
Proposition 11.1. We have the following items.
(a) The parameters ε◦ , c◦1 , C2◦ , c◦3 , η ◦ , δ◦ , r ◦ can be computed exactly by the graph observer.
(b) (“the test”) The two inequalities

(119) rM ≥ 24 Cgap d2 r ◦ and n ≥ 100
can be checked by the graph observer.
(c) If both inequalities in part (b) hold, then the parameters ε◦ , c◦1 , C2◦ , c◦3 , η ◦ , δ◦ , r ◦ may
play the roles of ε, c1 , C2 , c3 , η, δ, r, respectively, throughout the paper, as they satisfy
the parameter assumptions (12)–(17), and furthermore, these parameters are feasible (see
(18)).
65
Proof. The first two parts, (a) and (b), follow by design. Let us prove (c). Suppose that both
inequalities in (119) hold. We now check the parameter assumptions. First, we have
(117) (119)

rM ≥ rM ≥ 24 Cgap d2 r ◦ > r ◦ > δ◦ > η ◦ > ε◦ > ε◦ /6.
This shows that ε◦ /6 ∈ [0, rM ], and thus by the definitions of C ◦ and ε◦ , we have
2
µmin (ε◦ /6) ≥ C ◦ · (ε◦ /6)d ≥ C ◦ · ◦ · n−ς = 2n−ς .
C
We also have
n−1/2+ς 1
ε◦ ≥ ◦
≥p · n−1/2+ς ,
p(D )Lp kKk∞ Lp
where in the last inequality, we use the bound from Remark 3.5. This shows that (12) is satisfied.
Second, (13) is satisfied, since
1 ◦ 2 ◦ ◦ 1 2 ◦ 1 2
0 < c◦1 = (ℓp ) C (rM /4)d ≤ ℓp C (rM /4)d ≤ ℓ µmin (rM /4),
800 800 800 p
where we use the definition of C ◦ , noting that rM /4 ∈ [0, rM ].
Third, (14) is satisfied, since
1/2 1/4 1/2
Cgap d1/2 Lp kKk∞ Cgap d1/2 Lp
C2◦ = 4 · ≥ 4 · ,
(ℓ◦p )1/2 (c◦1 )1/2 (ℓp )1/2 (c◦1 )1/2
where we use the observation that kKk∞ ≤ 1 from the definition of K (see Definition 3.1).
Next, (15) and (16) follow from the definitions of c◦3 and η ◦ and the inequality ℓ◦p ≤ ℓp from
(118). Finally, (17) is immediate from the definitions and δ◦ and r ◦ .
Having shown that ε◦ , c◦1 , C2◦ , c◦3 , η ◦ , δ◦ , r ◦ may play the roles of ε, c1 , C2 , c3 , η, δ, r, we
proceed to show that the parameters are feasible. This final step is easy. From (117), we know that
rM ≥ rM◦ . Thus, with r ◦ playing the role of r, if both inequalities in (119) hold, then


rM ≥ rM ≥ 24 Cgap d2 r ◦ ,
and n ≥ 100, implying that the parameters are feasible. 

Acknowledgments
Han Huang and Pakawut Jiradilok were supported by Elchanan Mossel’s Vannevar Bush Faculty
Fellowship ONR-N00014-20-1-2826 and by Elchanan Mossel’s Simons Investigator award (622132).
Elchanan Mossel was partially supported by Bush Faculty Fellowship ONR-N00014-20-1-2826, Si-
mons Investigator award (622132), ARO MURI W911NF1910217 and NSF award CCF 1918421.
We would like to thank Anna Brandenberger, Tristan Collins, Dan Mikulincer, Ankur Moitra, Greg
Parker, and Konstantin Tikhomirov for helpful discussions.

References
[ADC19] Ernesto Araya and Yohann De Castro. Latent distance estimation for random geometric graphs. Advances
in Neural Information Processing Systems, 32, 2019.
[ADL23] Caelan Atamanchuk, Luc Devroye, and Gábor Lugosi. A note on estimating the dimension from a random
geometric graph. arXiv:2311.13059, 2023.
[DDC23] Quentin Duchemin and Yohann De Castro. Random geometric graph: some recent developments and
perspectives. In High dimensional probability IX—the ethereal volume, volume 80 of Progr. Probab.,
pages 347–392. Birkhäuser/Springer, Cham, [2023] ©2023.
[EMP22] Ronen Eldan, Dan Mikulincer, and Hester Pieters. Community detection and percolation of information
in a geometric setting. Combin. Probab. Comput., 31(6):1048–1069, 2022.
66
[FIK+ 20] Charles Fefferman, Sergei Ivanov, Yaroslav Kurylev, Matti Lassas, and Hariharan Narayanan. Recon-
struction and interpolation of manifolds. I: The geometric Whitney problem. Found. Comput. Math.,
20(5):1035–1133, 2020.
[FIL+ 21] Charles Fefferman, Sergei Ivanov, Matti Lassas, Jinpeng Lu, and Hariharan Narayanan. Reconstruction
and interpolation of manifolds II: Inverse problems for Riemannian manifolds with partial distance data.
arXiv:2111.14528, 2021.
[FILN20] Charles Fefferman, Sergei Ivanov, Matti Lassas, and Hariharan Narayanan. Reconstruction of a Riemann-
ian manifold from noisy intrinsic distances. SIAM J. Math. Data Sci., 2(3):770–808, 2020.
[FILN23] Charles Fefferman, Sergei Ivanov, Matti Lassas, and Hariharan Narayanan. Fitting a manifold to data in
the presence of large noise. arXiv:2312.10598, 2023.
[GLZ15] Chao Gao, Yu Lu, and Harrison H. Zhou. Rate-optimal graphon estimation. Ann. Statist., 43(6):2624–
2652, 2015.
[HMST23] Keaton Hamm, Caroline Moosmüller, Bernhard Schmitzer, and Matthew Thorpe. Manifold learning in
wasserstein space. arXiv:2311.08549, 2023.
[JM13] Adel Javanmard and Andrea Montanari. Localization from incomplete noisy distance measurements.
Found. Comput. Math., 13(3):297–345, 2013.
[LMSY22] Siqi Liu, Sidhanth Mohanty, Tselil Schramm, and Elizabeth Yang. Testing thresholds for high-dimensional
sparse random geometric graphs. In STOC ’22—Proceedings of the 54th Annual ACM SIGACT
Symposium on Theory of Computing, pages 672–677. ACM, New York, [2022] ©2022.
[LS23] Shuangping Li and Tselil Schramm. Spectral clustering in the gaussian mixture block model.
arXiv:2305.00979, 2023.
[MWZ23] Cheng Mao, Alexander S. Wein, and Shenduo Zhang. Detection-recovery gap for planted dense cycles.
arXiv:2302.06737, 2023.
[OMK10] Sewoong Oh, Andrea Montanari, and Amin Karbasi. Sensor network localization from local connectiv-
ity: Performance analysis for the mds-map algorithm. In 2010 IEEE Information Theory Workshop on
Information Theory (ITW 2010, Cairo), pages 1–5. IEEE, 2010.
[Pen03] Mathew Penrose. Random geometric graphs, volume 5 of Oxford Studies in Probability. Oxford University
Press, Oxford, 2003.
[Shi16] Takashi Shioya. Metric measure geometry, volume 25 of IRMA Lectures in Mathematics and Theoretical
Physics. EMS Publishing House, Zürich, 2016. Gromov’s theory of convergence and concentration of
metrics and measures.
[STP12] Daniel L. Sussman, Minh Tang, and Carey E. Priebe. Universally consistent latent position estimation
and vertex classification for random dot product graphs. arXiv:1207.6745, 2012.
[TSP13] Minh Tang, Daniel L. Sussman, and Carey E. Priebe. Universally consistent vertex classification for latent
positions graphs. Ann. Statist., 41(3):1406–1430, 2013.
[YSLY23] Zhigang Yao, Jiaji Su, Bingjie Li, and Shing-Tung Yau. Manifold fitting. arXiv:2304.07680, 2023.

Appendix A. Embedded Riemannian Manifold


A.1. Second fundamental form and distance between subspaces. The goal of this subsec-
tion is to establish Lemma A.3, which gives an upper bound for the distance between two subspaces
in terms of the second fundamental form. We begin by recalling the definition of the second fun-
damental form. We then show Lemma A.2, and use it to prove Lemma A.3.
Let M be a d-dimensional compact complete connected smooth manifold embedded in RN . Let
DX be the standard directional derivative with respect to a smooth vector field X and PTp M be
the orthogonal projection from Tp RN to Tp M (with respect to the Euclidean structure).
Definition A.1. The second fundamental form of M is
(120) II(X, Y )p := DXp Y − PTp M DXp Y = P(Tp M )⊥ (DXp Y ).
for smooth vector fields X, Y in M and p ∈ M . Let us also recall the standard fact that, for each
p ∈ M , II(X, Y )p is determined by Xp and Yp .
Define
κ = κM := max max kII(u, v)p k > 0.
p∈M u,v∈Tp M ∩SN−1
67
For the rest of this subsection, let I ⊆ R be a nonempty interval. Let γ : I → M be a unit-speed
smooth curve. For each t ∈ I, define Ht := Tγ(t) M . Note that we think of each tangent space Tp M
as the linear subspace of dimension d in RN containing the tangent vectors at p ∈ M . In particular,
the zero vector ~0 is always in Tp M .
Lemma A.2. For any interior point s ∈ I, and for any ϑ > 0, there exists ̺ > 0 such that for any
t ∈ (s − ̺, s + ̺) ∩ I, we have d(Hs , Ht ) ≤ (κ + ϑ) · |t − s|.
Proof. Without loss of generality, let us assume s = 0. Take any ϑ > 0.
For each t in an open neighborhood of 0, let X1 (t), . . . , Xd (t) be smooth vector fields along γ
which form an orthonormal basis of Ht . Since M is embedded in RN , they are smooth vector
fields in RN along γ. Hence, we can define G(t) := PH ⊥ [X1 (t), . . . , Xd (t)] ∈ R(N −d)×d where
0
PH ⊥ : RN → H0⊥ is the orthogonal projection to H0⊥ . Observe that
0

kG(t)kop = max kG(t) · wk


w∈Sd−1
n  o
= max PH ⊥ w1 X1 (t) + · · · + wd Xd (t) : [w1 , . . . , wd ]T ∈ Sd−1
0
n o
= max kPH ⊥ vk : v ∈ Ht , kvk = 1
0

(121) = d(H0 , Ht ).
Notice that G(0) ∈ R(N −d)×d is the zero matrix. Consider the first-degree Taylor approximation
G(t) = A · t + B(t) where A, B(t) ∈ R(N −d)×d are matrices, where each entry (B(t))ij is of order
O(t2 ). Therefore, for each (i, j) ∈ [N −d]×[d], there exists ̺eij > 0 such that for any t with |t| < ̺eij ,
we have
ϑ
|(B(t))ij | < · |t|.
N
Now we take ̺ := min{e ̺ij | (i, j) ∈ [N − d] × [d]} > 0. Thus, for any t ∈ R with |t| < ̺, we have
p ϑ
kB(t)kop ≤ (N − d)d · · |t| < ϑ · |t|.
N
Therefore, for any v ∈ Rd , we have by the triangle inequality that
k(G(t))(v)k = k(A · t + B(t))(v)k ≤ kAtvk + k(B(t))(v)k ≤ (kAkop + ϑ) · |t| · kvk,
whence
(121)
(122) d(H0 , Ht ) = kG(t)kop ≤ (kAkop + ϑ) · |t|,
for any t ∈ (−̺, ̺).
Now let α = [α1 , . . . , αd ]T ∈ Sd−1 be a unit vector such that kAαk = kAkop . Then
d
X
Y (t) := αi Xi (t)
i=1
is a smooth unit vector field along the curve γ in M , defined in an open neighborhood of 0. Observe
that
Aα = G′ (0)α = PH ⊥ (Dγ ′ (t) Ye ) ,
0 t=0
where Ye is the vector field along γ in M given by Yeγ(t) := Y (t). Therefore, we have
(123) kAkop = kAαk = kII(γ ′ , Ye )γ(0) k ≤ κ.
Note that the inequality above is valid since both γ ′ (0) and Yeγ(0) = Y (0) are unit vectors. We
finish by combining (122) and (123). 
68
Lemma A.3. For any interior points a, b ∈ I, we have d(Ha , Hb ) ≤ κ · |b − a|.
Proof. Without loss of generality, assume a ≤ b. Take an arbitrary ϑ > 0. We define a non-
decreasing sequence {ai }∞
i=0 of real numbers as follows. Take a0 := a. For each i ≥ 0, define
(124) ai+1 := max {x ∈ [ai , b] | d(Hai , Hx ) ≤ (κ + ϑ)(x − ai )} .
We give two remarks about (124). First, we do not require that d(Hai , Ht ) ≤ (κ + ϑ)(t − ai ) for
all ai ≤ t ≤ x. We only require the inequality to be true when t = x. Second, ai+1 is well-defined
since x 7→ d(Hai , Hx ) − (κ + ϑ)(x − ai ) is a continuous function, and hence the set in (124) is a
closed set.
Lemma A.2 implies that ai+1 > ai unless ai = b. We argue that there exists T < ∞ such that
aT = b. Suppose otherwise for the sake of contradiction. Then {ai }∞ i=0 is a strictly increasing
sequence, and hence it converges to a limit L ∈ (a, b].
Using Lemma A.2 again, we find that there exists ̺ > 0 such that for any t ∈ (L − ̺, L + ̺) ∩ I,
we have
(125) d(HL , Ht ) ≤ (κ + ϑ) · |t − L|.
Since {ai }∞
i=0 converges to L, there exists k ∈ Z≥0 such that L − ̺ < ak < L, and so (125) gives
d(Hak , HL ) ≤ (κ + ϑ) · |L − ak |,
which, by (124), implies that ak+1 ≥ L. However, this means ak+1 = ak+2 = · · · = L, which is a
contradiction.
Therefore, there exists an index T for which aT = b. By the triangle inequality, we have
T
X −1 T
X −1
d(Ha , Hb ) ≤ d(Hai , Hai+1 ) ≤ (κ + ϑ)(ai+1 − ai ) = (κ + ϑ)(b − a).
i=0 i=0
Since ϑ > 0 was arbitrary, by taking ϑ ց 0, the lemma follows. 
A.2. Projection as a local diffeomorphism. This subsection collects some useful geometric
results for us to use in the paper. Let M and κ be as described in the previous subsection. Let
dgd (p, q) denote the geodesic distance between points p, q ∈ M . We use the notation Bgd (p, r) ⊆ M
to denote the set of all points in M with geodesic distance strictly less than r from p ∈ M , while
B(p, r) ⊆ RN denotes the Euclidean open ball of radius r centered at p.
Proposition A.4. Fix p ∈ M . Suppose H is a d-dimensional affine subspace of RN through p
such that 1 − d(H, Tp M ) =: ζ > 0 and PH : RN → H is the orthogonal projection to H. Then
(a) PH is a diffeomorphism from Bgd (p, ζ/10κ) to its image.
(b) B(p, 0.09ζ 2 /κ) ∩ H ⊆ PH (Bgd (p, ζ/10κ)).
(c) Let φ be the inverse of PH |Bgd (p,ζ/10κ) so that φ(PH (q)) = q, for every q ∈ Bgd (p, ζ/10κ).
1
For any v ∈ B(p, 0.09ζ 2 /κ) ∩ H, we have kφ(v) − pk ≤ dgd (φ(v), p) ≤ 0.9ζ kv − pk. In
particular,
dgd (φ(v), p) ≤ kv − pk(1 + κ2 kv − pk2 /2) ≤ kφ(v) − pk(1 + κ2 kφ(v) − pk2 /2)
(d) If H = Tp M , then for any v ∈ B(p, 0.09/κ) ∩ Tp M , we have kPH ⊥ φ(v)k ≤ κkv − pk2 and
dgd (φ(v), p) ≤ kv − pk(1 + κ2 kv − pk2 /2).
(e) If M ∩ B(p, 0.09ζ 2 /κ) is connected, then M ∩ B(p, 0.09ζ 2 /κ) ⊆ φ(B(p, 0.09ζ 2 /κ) ∩ H).
Proof. With a translation, we assume p = ~0 ∈ RN throughout this proof.
(a) For any point q ∈ M with dgd (~0, q) < ζ/10κ, the triangle inequality and Lemma A.3 give
ζ
1 − d(H, Tq M ) ≥ 1 − d(H, T~0 M ) − d(T~0 M, Tq M ) > ζ − κ · = 0.9ζ.
10κ
69
For any v ∈ Tq M with kvk = 1, we have
p q
(126) kPH vk = 1 − kPH ⊥ vk2 ≥ 1 − d(H, Tq M )2 ≥ 1 − d(H, Tq M ) ≥ 0.9ζ.
Therefore, when we restrict the domain of PH to M , we find that the least singular value of the
linear map
(dPH )q Tq M = PH Tq M : Tq M → TPH (q) H
is bounded below by 0.9ζ > 0. Therefore, at any point q ∈ Bgd (~0, ζ/10κ), the map PH M is a local
diffeomorphism.
The next step is to show that PH is a diffeomorphism when restricted to Bgd (~0, ζ/10κ). It suffices
to show that PH B (~0,ζ/10κ) is injective. To that end, consider any q, q ′ ∈ Bgd (~0, ζ/10κ) with q 6= q ′ .
gd
We claim that PH q 6= PH q ′ .
Let γ : [0, t0 ] → M be a shortest unit-speed geodesic connecting q and q ′ with γ(0) = q and
γ(t0 ) = q ′ . Observe that γ ′ (0) is a unit vector in Tq M , and therefore by (126), we find that
(127) kPH γ ′ (0)k ≥ 0.9ζ > 0.

Hence, we can consider v := kPPH γ (0)

H γ (0)k
.

We will show that hPH γ (t), vi > 0 for t ∈ [0, t0 ], which implies
hPH q ′ − PH q, vi = hPH γ(t0 ) − PH γ(0), vi > 0,
and hence PH q ′ 6= PH q.
To achieve that, we take any t ∈ [0, t0 ] and compare γ ′ (t) and γ ′ (0). Observe that
q
kDγ ′ (t) γ ′ (t)k = k(Dγ ′ (t) γ ′ (t))⊥ k2 + k(Dγ ′ (t) γ ′ (t))⊤ k2 .
Since γ is a unit speed geodesic, the second summand inside the square root is 0, and thus
kDγ ′ (t) γ ′ (t)k = k(Dγ ′ (t) γ ′ (t))⊥ k = kII(γ ′ (t), γ ′ (t))k ≤ κ.
Therefore, we have
(128) kγ ′ (t) − γ ′ (0)k ≤ κt.
Notice that since q, q ′ ∈ Bgd (~0, ζ/10κ), we have
(129) t ≤ t0 = dgd (q, q ′ ) ≤ 0.2ζ/κ,
by the triangle inequality.
Hence,
hPH γ ′ (t), vi = kPH γ ′ (0)k + hPH (γ ′ (t) − γ ′ (0)), vi
≥ kPH γ ′ (0)k − kγ ′ (t) − γ ′ (0)k
(127),(128)
≥ 0.9ζ − κt
(129)
≥ 0.9ζ − κ · 0.2ζ/κ = 0.7ζ > 0.
We have finished the proof of part (a).
(b) Let U := PH (Bgd (~0, ζ/10κ)). From part (a), PH : Bgd (~0, ζ/10κ) → U is a diffeomorphism.
Let φ denote the inverse of PH . Suppose, for the sake of contradiction, that there exists z ∈
B(~0, 0.09ζ 2 /κ) ∩ H such that z ∈
/ U . Since ~0 ∈ U , we have z 6= ~0, and it makes sense to consider
z
u := kzk , and let us define S := {t ∈ R : tu ∈ U }.
70
By construction, observe the following: (i) S is an open set containing 0 ∈ R, (ii) S is bounded,
since every element x ∈ S satisfies |x| < ζ/10κ, and (iii) kzk ∈
/ S. Therefore, the set
S ′ := [0, ζ/10κ] \ S
is a compact set which contains kzk. We define s0 := min S ′ . Thus, we have
(130) 0 < s0 ≤ kzk < 0.09ζ 2 /κ,
and for any s ∈ [0, s0 ), we have su ∈ U , while s0 u ∈ / U.
Consider the curve γ e : [0, s0 ) → Bgd (~0, ζ/10κ), given by γ e(s) := φ(su). Let γ : [0, t0 ) →
~
Bgd (0, ζ/10κ) be the unit-speed reparametrization of γ e(s(t)) = γ(t), for a certain monotonically
increasing function s(t) with s(0) = 0. Note that PH γ(t) = s(t)·u, and so we have s′ (t) = kPH γ ′ (t)k.
For each t ∈ [0, t0 ), since γ(t) ∈ Bgd (~0, ζ/10κ) and kγ ′ (t)k = 1, we can use (126) to obtain
kPH γ ′ (t)k ≥ 0.9ζ, and therefore
Z t0
(131) s0 = s′ (t) dt ≥ 0.9ζt0 .
0

Combining (130) and (131), we find t0 < 0.1ζ/κ. This implies that limt→t0 γ(t) ∈ Bgd (~0, ζ/10κ).
However, this gives s0 u = limt→t0 PH γ(t) ∈ U , a contradiction.
(c) Each point v ∈ B(~0, 0.09ζ 2 /κ) ∩ H can be expressed as v = s0 u where s0 ∈ [0, 0.09ζ 2 /κ) and
u ∈ H with kuk = 1. Consider the same construction, as in part (b), of a curve γ e(s) = φ(su) for
e(s(t)). The same argument from part (b)
s ∈ [0, s0 ) with the unit-speed reparametrization γ(t) = γ
kvk
implies that dgd (φ(s0 u), ~0) ≤ t0 ≤ 0.9ζ
s0
= 0.9ζ . The other inequality is clear, since dgd (φ(v), ~0)
is the length of a geodesic joining φ(v) and ~0, which must be bounded below by the Euclidean
distance between the two points.
(d) Take any v ∈ B(~0, 0.09/κ) ∩ H. Consider the same construction as in the previous steps. In
particular, we have
• the expression v = s0 u and so kvk = s0 ,
e : [0, s0 ) → M , given by γ
• the curve γ e(s) = φ(su), and
e(s(t)).
• the unit-speed reparametrization γ : [0, t0 ) → M so that γ(t) = γ
Note that we have s0 ≥ 0.9t0 .
For each t ∈ [0, t0 ), using Lemma A.3, we have
(132) kPH ⊥ γ ′ (t)k ≤ d(H, Tγ(t) M ) = d(Tγ(0) M, Tγ(t) M ) ≤ κt,
which by integration implies
1
(133) kPH ⊥ φ(v)k ≤ κt20 ≤ κs20 = κkvk2 .
2
From (132), we also obtain
p
s′ (t) = kPH γ ′ (t)k ≥ 1 − κ2 t2 ≥ 1 − 0.51κ2 t2 ,
κs0
where we use κt ≤ κt0 ≤ 0.9 < 0.1. Therefore, we find by integrating that
Z t0 
s0 = s′ (t) dt ≥ t0 − 0.17κ2 t30 ≥ t0 · 1 − 0.3κ2 s20 ,
0
and hence  
s0 κ2 s20
dgd (φ(v), ~0) ≤ t0 ≤ ≤ s0 · 1 + .
1 − 0.3κ2 s20 2
71
(e) Suppose that N := M ∩ B(~0, 0.09ζ 2 /κ) is connected. Take an arbitrary point q ∈ N . We
want to show that q ∈ φ(B(~0, 0.09ζ 2 /κ) ∩ H). Note that this is clear when q = ~0, so for the
rest of this proof let us assume q 6= ~0. Because N is connected, we can take a unit-speed path
γ : [0, t0 ] → N with γ(0) = ~0 and γ(t0 ) = q, for some t0 > 0.
Since PH is a contraction, using part (b), we find that for each t ∈ [0, t0 ],
(134) PH (γ(t)) ∈ PH (M ∩ B(~0, 0.09ζ 2 /κ))
⊆ PH M ∩ PH (B(~0, 0.09ζ 2 /κ))
⊆ H ∩ B(~0, 0.09ζ 2 /κ)
⊆ PH (Bgd (~0, ζ/10κ)),
which shows that PH (γ(t)) is in the domain of φ. Hence, it makes sense to define γ e : [0, t0 ] → M
e(t) := φ(PH (γ(t))), for each t ∈ [0, t0 ]. We claim that γ(t) = e
by γ γ (t), for every t ∈ [0, t0 ].
Consider the set
S ′′ := {t ∈ [0, t0 ] : γ(t) = e
γ (t)} .
′′
For each t ∈ S , from (134), we have
e(t) ∈ φ(H ∩ B(~0, 0.09ζ 2 /κ)) =: N ′ .
γ(t) = γ
Since N ′ is an open set in M , there exists ̺ > 0 such that for each t′ ∈ (t − ̺, t + ̺) ∩ [0, t0 ], we
have γ(t′ ) ∈ N ′ . From parts (a) and (b), we know that φ ◦ PH |N ′ is the identity map on N ′ . Thus,
e(t′ ) = φ(PH (γ(t′ ))) = γ(t′ ), which implies that t′ ∈ S ′′ .
γ
The argument from the previous paragraph shows that S ′′ is an open set in [0, t0 ]. On the other
hand, since the function t 7→ γ(t) − e γ (t) is continuous, S ′′ is also a closed set in [0, t0 ]. We find
that S is either [0, t0 ] or ∅. Because 0 ∈ S ′′ , we conclude that S ′′ = [0, t0 ], and in particular,
′′

e(t0 ).
q = γ(t0 ) = γ
Finally, using (134) again, we find
e(t0 ) = φ(PH (γ(t0 ))) ∈ φ(B(~0, 0.09ζ 2 /κ) ∩ H),
q=γ
as desired. 
Corollary A.5. If p, p′ ∈ M satisfy kp − p′ k ≤ 0.01/κ, then d(Tp M, Tp′ M ) ≤ 2κ · kp − p′ k.
Proof. Let H := Tp M , and let PH : RN → H denote the orthogonal projection to Tp M . Let
v := PH (p′ ). Note that kv − pk ≤ kp′ − pk. Proposition A.4(d) gives
dgd (p′ , p) ≤ kv − pk(1 + κ2 kv − pk2 /2) ≤ kp′ − pk(1 + κ2 kp′ − pk2 /2) ≤ 2kp′ − pk,
and thus Lemma A.3 yields d(Tp M, Tp′ M ) ≤ κ · dgd (p, p′ ) ≤ 2κ · kp − p′ k. 
Corollary A.6. In the setting of Proposition A.4, if M ′ := M ∩ B(p, 0.09ζ 2 /κ) is connected, then
the projection PH is a diffeomorphism from M ′ to its image.
Proof. Using parts (a), (b) and (e) of Proposition A.4, we find that
(e) (b) (a)
M ∩ B(p, 0.09ζ 2 /κ) ⊆ φ(B(p, 0.09ζ 2 /κ) ∩ H) ⊆ φ(PH (Bgd (p, ζ/10κ))) = Bgd (p, ζ/10κ),
and therefore, by part (a) again, we conclude that PH |M ∩B(p,0.09ζ 2 /κ) is a diffeomorphism. 

Department of Mathematics, University of Missouri, Columbia, MO 65203


Email address, H. Huang: hhuang@missouri.edu

Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA 02139


Email address, P. Jiradilok: pakawut@mit.edu
72
Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA 02139
Email address, E. Mossel: elmos@mit.edu

73

You might also like