r2

MIT Open Access Articles
Robust Recovery for Stochastic Block

Models, Simplified and Generalized
The MIT Faculty has made this article openly available. Please share
how this access benefits you. Your story matters.
Citation: Mohanty, Sidhanth, Raghavendra, Prasad and Wu, David X. 2024. "Robust Recovery for
Stochastic Block Models, Simplified and Generalized."
As Published: 10.1145/3618260.3649761
Publisher: ACM
Persistent URL: https://hdl.handle.net/1721.1/155672
Version: Final published version: final published article, as it appeared in a journal, conference
proceedings, or other formally published context
Terms of use: Creative Commons Attribution

Robust Recovery for Stochastic Block Models, Simplified and
Generalized
Sidhanth Mohanty Prasad Raghavendra David X. Wu
sidhanth@csail.mit.edu raghavendra@berkeley.edu david_wu@berkeley.edu
MIT UC Berkeley UC Berkeley
United States United States United States
ABSTRACT ACM Reference Format:
We study the problem of robust community recovery: efficiently Sidhanth Mohanty, Prasad Raghavendra, and David X. Wu. 2024. Robust
Recovery for Stochastic Block Models, Simplified and Generalized. In Pro-
recovering communities in sparse stochastic block models in the
ceedings of the 56th Annual ACM Symposium on Theory of Computing (STOC
presence of adversarial corruptions. In the absence of adversarial ’24), June 24–28, 2024, Vancouver, BC, Canada. ACM, New York, NY, USA,
corruptions, there are efficient algorithms when the signal-to-noise 8 pages. https://doi.org/10.1145/3618260.3649761
ratio exceeds the Kesten–Stigum (KS) threshold, widely believed
to be the computational threshold for this problem. The question
we study is: does the computational threshold for robust commu- 1 INTRODUCTION
nity recovery also lie at the KS threshold? We answer this question The stochastic block model (SBM) has provided an enlightening
affirmatively, providing an algorithm for robust community recov- lens into understanding a wide range of computational phenom-
ery for arbitrary stochastic block models on any constant number ena in Bayesian inference problems, such as computational phase
of communities, generalizing the work of Ding, d’Orsi, Nasser & transitions & information-computation gaps [11, 20, 26, 31], spec-
Steurer on an efficient algorithm above the KS threshold in the case tral methods for sparse matrices [6, 24, 32], local message-passing
of 2-community block models. algorithms [17, 30, 34], and robustness [3, 13, 25, 28].
There are three main ingredients to our work: The SBM is a model of random graphs where the vertices are
(1) The Bethe Hessian of the graph is defined as 𝐻𝐺 (𝑡) ≜ (𝐷𝐺 − partitioned into communities, denoted by 𝒙, and the probability of
𝐼 )𝑡 2 − 𝐴𝐺 𝑡 + 𝐼 where 𝐷𝐺 is the diagonal matrix of degrees and 𝐴𝐺 an edge existing is contingent on the communities that the two
is the adjacency matrix. Empirical work suggested that the Bethe endpoints are part of. The algorithmic task is the community recov-
Hessian for the stochastic block model has outlier eigenvectors ery problem: given an input graph 𝑮, estimate the posterior 𝒙 |𝑮
corresponding to the communities right above the Kesten-Stigum with an efficient algorithm.
threshold. We formally confirm the existence of outlier eigenvalues
for the Bethe Hessian, by explicitly constructing outlier eigenvec- Definition 1.1 (Informal). In the stochastic block model, we are
tors from the community vectors. given a 𝑘 × 𝑘 matrix M, a distribution 𝜋 over [𝑘], and 𝑑 > 0, and
(2) We develop an algorithm for a variant of robust PCA on SBM𝑛 (M, 𝜋, 𝑑) denotes the distribution where an 𝑛-vertex graph 𝑮
sparse matrices. Specifically, an algorithm to partially recover top is sampled by:
eigenspaces from adversarially corrupted sparse matrices under (1) drawing a color 𝒙 (𝑢) ∼ 𝜋 for every 𝑢 ∈ [𝑛],
mild delocalization constraints. (2) for each pair of vertices 𝑢, 𝑣, the edge {𝑢, 𝑣 } is chosen with
(3) A rounding algorithm to turn vector assignments of vertices probability M𝒙 (𝑢 ),𝒙 (𝑣) · 𝑑𝑛 .
into a community assignment, inspired by the algorithm of Charikar In the community recovery problem, the goal is to give an efficient al-
& Wirth for 2XOR. gorithm that takes 𝑮 as input and outputs a community assignment
𝑥b approximating 𝒙 |𝑮 (see Definition 4.4 for a formal definition).
CCS CONCEPTS
Computational Thresholds. For a given M and 𝜋, increasing 𝑑
• Theory of computation → Random network models.
can only possibly make the problem easier. The main question is to
understand the computational threshold for community recovery
KEYWORDS — i.e. the minimum value of 𝑑 where the problem goes from being
robust inference, stochastic block model, community detection, intractable to admitting efficient algorithms.
spectral algorithms The first predictions for this computational threshold came from
the cavity method in statistical physics in the work of Decelle,
Krzakala, Moore & Zdeborova [11]. They posited that the location
of this transition is at the Kesten–Stigum threshold (henceforth KS
threshold), a threshold for broadcast processes on trees studied in
the works of Kesten & Stigum [22, 23]. The algorithmic side of these
This work is licensed under a Creative Commons Attribution 4.0 Interna- predictions was confirmed in the case of the 2-community block
tional License. model in the works of Mossel, Neeman & Sly [31] and Massoulié
STOC ’24, June 24–28, 2024, Vancouver, BC, Canada [26], and then for block models in increasing levels of generality
© 2024 Copyright held by the owner/author(s).
ACM ISBN 979-8-4007-0383-6/24/06 by Bordenave, Lelarge & Massoulié [6], Abbe & Sandon [2], and
https://doi.org/10.1145/3618260.3649761 Hopkins & Steurer [20].
367
STOC ’24, June 24–28, 2024, Vancouver, BC, Canada Sidhanth Mohanty, Prasad Raghavendra, and David X. Wu
Robust Algorithms. All of these algorithms utilize the knowledge algorithm also succeeds at the recovery task for arbitrary block
of the distribution the input is sampled from quite strongly — they models with a constant number of communities.
are based on Ω(log 𝑛)-length walk statistics in the stochastic block Theorem 1.2 (Informal statement of main theorem). Let
model. However, the full generative process in inference is not (M, 𝜋, 𝑑) be SBM parameters such that 𝑑 is above the KS threshold,
always known precisely. Thus, we would like algorithms that utilize and let 𝑮, 𝒙 ∼ SBM𝑛 (M, 𝜋, 𝑑). There exists 𝛿 = 𝛿 (M, 𝜋, 𝑑) > 0
but do not overfit to the distributional assumptions. such that the following holds. There is a polynomial time algorithm
Demanding that our algorithm be robust, i.e. resilient to adver- e that can be obtained by performing
that takes as input any graph 𝑮
sarial corruptions to the input, is often a useful way to design al-
arbitrary 𝛿𝑛 edge insertions and deletions to 𝑮 and outputs a coloring
gorithms that are less sensitive to distributional assumptions. This
b
𝒙 that has “constant correlation” with 𝒙, with high probability over
leads one to wonder: can algorithms that don’t strongly exploit the
the randomness of 𝑮 and 𝒙.
prior distribution approach the KS threshold?
Many of the ingredients in the above result are of independent
Optimization vs. Inference. Earlier approaches to robust recovery interest. First, we exhibit a symmetric matrix closely related to the
in 2-community block models were based on optimization: semidef- Bethe Hessian of the graph, such that its bottom eigenspace is corre-
inite programming relaxations of the minimum bisection problem, lated with the communities. Next, we design an efficient algorithm
as in the work of Guedon & Vershynin [18]. These approaches have to robustly recover the bottom-𝑟 eigenspace of a sparse matrix in
the advantage of being naturally robust, since the algorithms are the presence of adversarial corruptions. Finally, we demonstrate a
approximately Lipschitz around random inputs, but the minimum general rounding scheme to obtain community assignments from
bisection relaxation is not known to achieve statistical optimality this eigenspace.
and only succeeds well above the KS threshold.
The following two results point to the suboptimality of optimization- Remark 1.3 (Robustness against node corruptions). The node
based strategies. Moitra, Perry & Wein [28] considered the monotone corruption model, introduced by Liu & Moitra [25], is a harsher
adversary in the 2-community setting, where the adversary is al- generalization of the edge corruption model. In recent work, Ding,
lowed to make an unbounded number of edge insertions within d’Orsi, Hua & Steurer [14] proved that in the setting of sparse SBM,
communities and edge deletions across communities. At an intu- any algorithm that is robust to edge corruptions can be turned into
itive level, this is supposed to only make the problem easier and one robust to node corruptions in a blackbox manner. Hence, our
indeed does so for the minimum bisection approach, but to the con- results apply in this harsher setting too.
trary [28] proves that the threshold for recovery increases. Dembo,
Montanari & Sen [12] exactly nailed the size of the minimum bi-
1.1 Related Work
section in Erdős–Rényi graphs, which are complete noise and have We refer the reader to the survey of Abbe [1] for a detailed treatment
no signal in the form of a planted bisection — and strikingly, it is of the rich history and literature on community detection in block
actually smaller than the size of the planted bisection in the de- models, its study in other disciplines, and the many information-
tectable regime! Thus, it is conceivable that there are bisections theoretic and computational results in various parameter regimes.
completely orthogonal to the planted bisection in a stochastic block Introducing an adversary into the picture provides a beacon
model graph that nevertheless have the same size. towards algorithms that utilize but do not overfit to distributional
The problem of recovering communities is more related to the assumptions. Over the years, a variety of adversarial models have
task of Bayesian inference, i.e., applying Bayes’ rule and approximat- been considered, some of which we survey below.
ing 𝒙 |𝑮. Optimizing for the minimum bisection is akin to comput- Corruption Models for Stochastic Block Model. Prior to the
ing the maximum likelihood estimate, which does not necessarily works of [3, 13], Stefan & Massoulié [33] considered the robust
produce samples representative of the posterior distribution of 𝒙 |𝑮. recovery problem, and gave a robust spectral algorithm to recover
SDPs for Inference. The work of Banks, Mohanty & Raghaven- communities under 𝑂 (𝑛𝜀 ) adversarial edge corruptions for some
dra [3] proposed a semidefinite programming-based algorithm for small enough 𝜀 > 0.
inference tasks that incorporates the prior distribution in the formu- Liu & Moitra [25] introduced the node corruption model where
lation, and illustrated that this algorithm can distinguish between an adversary gets to perform arbitrary edge corruptions incident to
𝑮 sampled from the stochastic block model from an Erdős–Rényi a constant fraction of corrupted vertices, and gave algorithms that
graph of equal average degree anywhere above the KS threshold achieved optimal accuracy in the presence of node corruptions and
while being resilient to Ω(𝑛) arbitrary edge insertions and deletions. the monotone adversary sufficiently above the KS threshold. Soon
A similar SDP formulation was later studied by Ding, d’Orsi, after, Ding, d’Orsi, Hua & Steurer [14] gave algorithms achieving the
Nasser & Steurer [13] in the 2-community setting, and was used Kesten–Stigum threshold using algorithms for the edge corruption
to give an algorithm to recover the communities with a constant model in the low-degree setting [13], and results on the optimization
advantage over random guessing in the presence of Ω(𝑛) edge SDP in the high-degree setting [29] in a blackbox manner.
corruptions for all degrees above the KS threshold. They analyze Semirandom & Smoothed Models. Some works have consid-
the spectra of matrices associated with random graphs after deleting ered algorithm design under harsher adversarial models, where an
vertices with large neighborhoods, which introduces unfriendly adversarially chosen input undergoes some random perturbations.
correlations, and causes their analysis to be highly technical. Remarkably, at this point, the best algorithms for several graph
The main contribution of our work is an algorithm for robust and hypergraph problems match the performance of our best algo-
recovery, which is amenable to a significantly simpler analysis. Our rithms for their completely random counterparts. For example, at
368
Robust Recovery for Stochastic Block Models, Simplified and Generalized STOC ’24, June 24–28, 2024, Vancouver, BC, Canada
this point, the semirandom planted coloring and clique problems the case of symmetric 𝑘-community stochastic block models above
were introduced by Blum & Spencer [5], and Feige & Kilian [16], the KS threshold, [6] proved that for the randomly drawn graph 𝑮,
and a line of work [9, 27] culminating in the work of Buhai, Kothari there is a small 𝜀 > 0 for which its nonbacktracking
√ matrix 𝐵 𝑮 has
& Steurer [7] showed that the size of the planted clique/coloring exactly 𝑘 eigenvalues larger than (1 + 𝜀) 𝑑 in magnitude.
√
recoverable in the semirandom setting matches the famed 𝑛 in The Bethe Hessian matrix is a symmetric matrix associated with a
the fully random setting. graph, whose early appearances can be traced to the works of Ihara
Another example where algorithms for a semirandom version of [21] and Bass [4]. The Bethe Hessian of a graph with parameter
a block model-like problem have been considered is semirandom 𝑡 ∈ R is defined as
CSPs with planted solutions, where the work of Guruswami, Hsieh,
𝐻𝐺 (𝑡) ≜ (𝐷𝐺 − 𝐼 )𝑡 2 − 𝐴𝐺 𝑡 + 𝐼,
Kothari & Manohar [19] gives algorithms matching the guarantees
of solving fully random planted CSPs. where 𝐷𝐺 and 𝐴𝐺 are the diagonal degree matrix and adjacency
matrix of 𝐺, respectively. For 𝑡 in the interval [0, 1], it can be in-
1.2 Organization terpreted as a regularized version of the standard graph Laplacian.
In Section 2, we give an overview of our algorithm and proof. In The Bethe Hessian for the stochastic block model was considered
Section 3, we give some technical preliminaries. In Section 4, we in the empirical works [24, 32], where they observed that for some
describe our algorithm and show how to analyze it. choice of 𝑡, the Bethe Hessian and the nonbacktracking matrix has
outlier eigenvectors which can be used for finding communities in
2 TECHNICAL OVERVIEW block models. Concretely, in [32] they observed that for 𝑮 drawn
from stochastic block models above the KS threshold, there is a
An 𝑛-vertex graph 𝑮 is drawn from a stochastic block model and
choice of 𝑡 such that 𝐻 𝑮 (𝑡) only has a small number of negative
undergoes 𝛿𝑛 adversarial edge corruptions, and then the corrupted
e is given to us as input. For simplicity of discussion, we eigenvectors, all of which correlate with the hidden community
graph 𝑮
assignment.
restrict our attention to assortative symmetric 𝑘-community block
We confirm this empirical observation in the following proposi-
models above the KS threshold, i.e. the connection probability be-
tion.
tween two vertices 𝑖 and 𝑗 only depends on whether they belong
to the same community or different communities, and the intra- Proposition 2.1 (Bethe Hessian spectrum). Let (M, 𝜋, 𝑑) be 𝑘-
community probability is higher. Nevertheless, our approach gener- community SBM parameters such that 𝑑 is above the KS threshold,
alizes to any arbitrarily specified 𝑘-community block model above and let 𝑮, 𝒙 ∼ SBM𝑛 (M, 𝜋, 𝑑). Then there exists 𝜀 > 0 such that
the KS threshold. for 𝑡 ∗ = 1 √ , the Bethe Hessian 𝐻 (𝑡 ∗ ) has at most 𝑘 negative
𝑮
(1+𝜀 ) 𝑑
Let us first informally outline the algorithm; see Section 4 for
eigenvalues and at least 𝑘 − 1 negative eigenvalues.
formal details.
(1) First, we preprocess the corrupted graph 𝑮 e by truncating Constructing the outlier eigenspace. There are two assertions in
high degree vertices, which removes corruptions localized Proposition 2.1. To show that 𝐻 𝑮 (𝑡 ∗ ) has at most 𝑘 negative eigen-
on small sets of vertices in the graph. values, one can relate these negative eigenvalues to the 𝑘 outlier
(2) We then construct an appropriately defined graph-aware eigenvalues of 𝐵 𝑮 using an Ihara–Bass argument and use a con-
symmetric matrix 𝑀𝑮 ∈ R𝑛×𝑛 whose negative eigenvalues tinuity argument as outlined in Fan and Montanari [15, Theorem
contains information about the true communities for the 5.1].
uncorrupted graph. We motivate this construction in Sec- The more interesting direction is to exhibit at least 𝑘 − 1 negative
tion 2.1. eigenvalues; we will explicitly construct a 𝑘 − 1 dimensional sub-
(3) We recursively trim the rows and columns of 𝑀𝑮 space starting with the community vectors to witness the negative
e to remove
small negative eigenvalues in its spectrum. Then we use a eigenvalues for 𝐻𝐺 (𝑡 ∗ ).
spectral algorithm to robustly recover a subspace 𝑈 which Let 1𝑐 denote the indicator vector for the vertices belonging to
contains information about the communities. Both points community 𝑐 and 1 the all-ones vector. We show that every vector
are described in Section 2.2. in the span of {𝐴 (ℓ ) (1𝑐 − 𝑘1 1)}𝑐 ∈ [𝑘 ] achieves a negative quadratic
(4) Finally, we round the subspace 𝑈 into a community assign- form against 𝐻 𝑮 (𝑡 ∗ ), where 𝐴 (ℓ ) is the 𝑛 × 𝑛 matrix where the
ment, using a vertex embedding provided by 𝑈 . This is de- (𝑖, 𝑗)-th entry encodes the number of length-ℓ nonbacktracking
tailed in Section 2.3. walks between 𝑖 and 𝑗. This demonstrates a (𝑘 − 1)-dimensional
subspace on which the quadratic form is negative. Formally, we
2.1 Outlier Eigenvectors for the Bethe Hessian show the following:.
Bordenave, Lelarge & Massoulié [6] analyzed the spectrum of the Proposition 2.2. Under the same setting and notations as Proposi-
nonbacktracking matrix and rigorously established its connection tion 2.1, for ℓ ⩾ 0 define
to community detection. The asymmetric nonbacktracking matrix
𝐵𝐺 ∈ {0, 1}2|𝐸 (𝐺 ) | ×2|𝐸 (𝐺 ) | is indexed by directed edges, with 𝑀𝑮,ℓ ≜ 𝐴 (ℓ ) 𝐻 𝑮 (𝑡 ∗ )𝐴 (ℓ ) .

(𝐵𝐺 ) (𝑢1 →𝑣1 ),(𝑢2 →𝑣2 ) ≜ 1[𝑣 1 = 𝑢 2 ]1[𝑣 2 ≠ 𝑢 1 ]. log(1/𝜀 )
For ℓ = Θ 𝜀 and every 𝑐 ∈ [𝑘], we have
[6] showed that above the KS threshold, the 𝑘 outlier eigenvalues
for 𝐵 𝑮 correspond to the 𝑘 community vectors. More precisely, in 1𝑐 − 𝑘1 1, 𝑀𝑮,ℓ (1𝑐 − 𝑘1 1) ⩽ −Ω(𝑛).
369
Hence, 𝑀𝑮,ℓ has at most 𝑘 negative eigenvalues and at least 𝑘 − 1 subspace 𝑈 that non-trivially correlates with delocalized vectors in
negative eigenvalues. the true eigenspace 𝑉𝑀 . More formally, we will solve the following
problem.
Nonbacktracking powers and related constructions were previ-
ously studied in [26, 31], but there they take ℓ = Θ(log 𝑛), whereas Problem 2.4. Given the corrupted matrix 𝑀 e as input, give an
we only consider constant ℓ. Besides simplifying the analysis of the efficient algorithm to output a subspace 𝑈 with the following prop-
quadratic form, using constant ℓ is also critical for tolerating up to erties:
Ω(𝑛) corruptions. (1) Low dimensional. The dimension of 𝑈 is 𝑂 (𝑟 ).
As a consequence of Proposition 2.2, the negative eigenvectors (2) Delocalized. The diagonal
of 𝑀𝑮,ℓ are correlated with the centered community indicators entries of its projection matrix
Π𝑈 are bounded by 𝑂 𝑛𝑟 .
{1𝑐 − 𝑘1 1}𝑐 ∈ [𝑘 ] , while the negative eigenvectors of 𝐻 𝑮 (𝑡 ∗ ) are (3) Preserves delocalized part of negative eigenspace. For
correlated with {𝐴 (ℓ ) (1𝑐 − 𝑘1 1)}𝑐 ∈ [𝑘 ] . The upshot is that we can any 𝐶-delocalized unit vector 𝑦 such that ⟨𝑦, 𝑀𝑦⟩ < −Ω(1),
directly use the negative eigenvectors of 𝑀𝑮,ℓ to recover the true we have ⟨𝑦, Π𝑈 𝑦⟩ ⩾ Ω(1).
communities in the absence of corruptions.
In fact, our algorithm will recover a principal submatrix of 𝑀 e
Remark 2.3. Based on the empirical observations in [24, 32], a nat- whose eigenspace 𝑉 for eigenvalues less than −𝜂 is 𝑂 (𝑟 )-dimensional.
ural hope is to directly use the Bethe Hessian for recovery. However, Moreover, the eigenspace 𝑉 can be processed to another delocalized,
it turns out that the quadratic form of the centered true community 𝑂 (𝑟 )-dimensional subspace 𝑈 that satisfies the conditions outlined
indicators (1𝑐 − 𝑘1 1), 𝐻 𝑮 (𝑡 ∗ )(1𝑐 − 𝑘1 1) are actually positive close above.
to the KS threshold, so the same approach does not establish that the Although the matrix 𝑀 has a constant number of negative eigen-
negative eigenvectors of 𝐻 𝑮 (𝑡 ∗ ) correlate with the communities. e can have up to Ω(𝑛) many. At first glance,
values, its corruption 𝑀
We will now discuss how to recover the outlier eigenspace in it may be unclear how a constant dimensional subspace 𝑈 can be
the presence of adversarial corruptions. extracted from 𝑀.e The crucial observation is that the large nega-
tive eigenvalues introduced by the corruptions are highly localized.
2.2 Robust PCA for Sparse Matrices Thus, we will design an iterative trimming algorithm that aims to
The discussion above naturally leads to the following algorith- delete rows and columns to clean up these localized corruptions.
mic problem of robust recovery: Given as input a corrupted ver- When the algorithm terminates, it yields the 𝑂 (𝑟 )-dimensional
sion 𝑀 e of a symmetric matrix 𝑀, can we recover the bottom/top subspace 𝑉 .
𝑟 -dimensional eigenspace of 𝑀? Since the true communities are Recovering a Principal Submatrix. We now describe the trim-
constantly correlated with the outlier eigenspace of 𝑀 = 𝑀𝑮,ℓ , ming algorithm informally and refer the reader to the full version
recovering the outlier eigenspace of 𝑀 from its corrupted version of the paper for the formal details.
𝑀e=𝑀 e𝑮,ℓ is a major step towards robustly recovering communities. We first fix some small parameter 𝜂 > 0 and execute the follow-
The problem of robustly recovering the top eigenspace, a.k.a. ing procedure, which produces a series of principal submatrices
robust PCA has been extensively studied, and algorithms with 𝑀e (𝑡 ) for 𝑡 ⩾ 0, starting with 𝑀
e (0) ≜ 𝑀.
e
provable guarantees have been designed (see [8]). However, the e (𝑡 ) less
(1) At step 𝑡, if the eigenspace 𝑉 of eigenvalues of 𝑀
robust PCA problem in our work is distinct from those considered
than −𝜂 is 𝑂 (𝑟 )-dimensional, we terminate the algorithm
in the literature in a couple of ways. For us, the uncorrupted matrix
and output 𝑉 .
𝑀 is sparse and both the magnitude and location of the noisy entries
(2) Otherwise, compute the projection Π (𝑡 ) corresponding to
are adversarial. Furthermore, for our purposes, we need not recover e (𝑡 ) .
the ⩽ −𝜂 eigenspace of 𝑀
the actual outlier eigenspace of 𝑀. Indeed, as we discuss below, it
(3) Sample an index 𝑖 ∈ [𝑛] of 𝑀 e (𝑡 ) with probability propor-
suffices to robustly recover a constant dimensional subspace which (𝑡 )
is constantly correlated with the true communities. tional to Π𝑖,𝑖 .
We design an efficient algorithm to robustly recover such a sub- (4) Zero out row and column 𝑖, and set this new principal sub-
space under a natural set of sufficient conditions on 𝑀. Before we matrix as 𝑀e (𝑡 +1) .
describe these conditions, let us fix some notation. We will call a We now discuss the intuition behind the procedure and formally
vector 𝑥 ∈ R𝑛 to be 𝐶-delocalized if no coordinate is large relative prove its correctness in the full version. The main idea of step 3
to others, i.e., |𝑥𝑖 | 2 ⩽ 𝐶𝑛 ∥𝑥 ∥ 2 for all 𝑖 ∈ [𝑛]. Delocalization has is that one should prefer to delete highly localized eigenvectors
previously been used in the robust PCA literature under the name which have relatively large negative eigenvalues. This is reasonable
“incoherence” [8]. because the size of the diagonal entries of 𝑀 e (𝑡 ) serve as a rough
Let 𝑀 be a 𝑛 × 𝑛 matrix with at most 𝑟 negative eigenvalues. In proxy for the level of delocalization.
particular, the 𝑟 -dimensional negative eigenspace 𝑉𝑀 of 𝑀 is the As a concrete illustration of this intuition, suppose that 𝑀 e =
object of interest. Let 𝑀 e be a corrupted version of 𝑀, differing from Π (0) = −𝑢𝑢 ⊤ − 𝑣𝑣 ⊤ , where 𝑢, 𝑣 are orthogonal unit vectors. More-
𝑀 in 𝛿𝑛 coordinates. (0)
over, suppose 𝑢 is 𝐶-delocalized whereas 𝑣 = 𝑒 1 . Then Π 1,1 = 1
Given the corrupted version 𝑀, e a natural goal would be to recover
(0)
the 𝑟 -dimensional negative eigenspace 𝑉𝑀 . It is easy to see that it whereas |Π𝑖,𝑖 | ⩽ 𝐶 2 /𝑛 for 𝑖 > 1. Hence, deleting the first row and
could be impossible to recover the space 𝑉𝑀 . Instead, we will settle column of 𝑀e also deletes the localized eigenvector 𝑣. In general,
for a relaxed goal, namely, recover a slightly larger dimensional whenever one of the eigenvectors of 𝑀 e (𝑡 ) is heavily localized on a
370
(𝑡 )
subset of coordinates 𝑆, the diagonal entries in Π𝑆,𝑆 are dispropor- then this correlation is Ω(𝑛 2 ). See Definition 4.4 for how this notion
tionately large. This leads to a win-win scenario: either we reach the generalizes to arbitrary block models, and subsumes other notions
termination condition, or we are likely to mitigate the troublesome of weak-recovery defined in literature.
large localized eigenvectors. The projection matrix Π𝑈 satisfies
We now discuss how we achieve the second and third guarantees
in Problem 2.4. ⟨Π𝑈 , 𝑿 ⟩ ⩾ Ω(∥Π𝑈 ∥ 𝐹 · ∥𝑿 ∥ 𝐹 ) = Ω(𝑛).
Trimming the Subspace. The final postprocessing step is simple. We give a randomized rounding strategy according to which E 𝑋 b⪰
Let 𝑉 denote the eigenspace with eigenvalues less than −𝜂 for the 𝑐 · 𝑛 · Π𝑈 for some constant 𝑐 > 0. Consequently, E⟨𝑿, 𝑋 b⟩ = 𝑐𝑛 ·
matrix 𝑀 e (𝑇 ) obtained at end of iterative procedure. ⟨Π𝑈 , 𝑿 ⟩ ⩾ Ω(𝑛 2 ).
To ensure delocalization (condition 2 in Problem 2.4), the idea is Observe that for any community assignment 𝑥, its matrix repre-
to take its projector Π𝑉 and trim away the rows and columns with sentation 𝑋 is rank-(𝑘 − 1), which lets us write it as 𝑉𝑉 ⊤ for some
diagonal entry exceeding 𝑛𝜏 for some large parameter 𝜏 > 0. The 𝑛 × (𝑘 − 1) matrix 𝑉 . Here, the 𝑖-th row of 𝑉 is some vector 𝑣 𝑥 (𝑖 )
desired delocalized subspace 𝑈 is obtained by taking the eigenspace that only depends only on the community 𝑥 (𝑖) where vertex 𝑖 is
of the trimmed Π𝑉 corresponding to the eigenvalues exceeding a assigned.
threshold that is 𝑂 (𝜂). Since 𝑉 is 𝑂 (𝑟 )-dimensional, so too is 𝑈 . Our rounding scheme uses Π𝑈 to produce an embedding of
The more delicate part is condition 3 in Problem 2.4. Namely, the 𝑛 vertices as rows of a 𝑛 × (𝑘 − 1) matrix 𝑊 whose rows are
we must show that despite corruptions and the repeated trimming in {𝑣 1, . . . , 𝑣𝑘 }. In the community assignment 𝑥b outputted by the
steps, 𝑥 remains a delocalized witness vector for Π𝑈 , and thus has algorithm, the 𝑖-th vertex is assigned to community 𝑗 if the 𝑖-th row
constant correlation with the subspace 𝑈 . The key intuition for of 𝑊 is equal to 𝑣 𝑗 . We then show that E𝑊𝑊 ⊤ ⪰ 𝑐 · 𝑛 · Π𝑈 . Since
this is that delocalized witnesses are naturally robust to adversarial b = E𝑊𝑊 ⊤ , we can conclude E⟨𝑿, 𝑋
𝑋 b⟩ ⩾ Ω(𝑛 2 ).
corruptions, so long as the adversarial corruptions have bounded
row-wise ℓ1 norm. In particular, since delocalization is an ℓ∞ con- Rounding Scheme. Our first step is to obtain an embedding of the
straint, Hölder’s inequality bounds the difference in value of the 𝑛 vertices into R𝑘 −1 by choosing a (𝑘 − 1)-dimensional random sub-
quadratic form using 𝑀 and 𝑀. e In the full version of the paper, we space 𝑈 ′ of 𝑈 , then writing its projector as 𝑀 ′ 𝑀 ′⊤ , and choosing
prove that for sufficiently small constant levels of corruption, 𝑥 is the embedding as the rows of 𝑀 ′ : 𝑢 1′ , . . . , 𝑢𝑛′ . Suppose this embed-
also a delocalized witness for 𝑀 e and Π𝑈 . √
ding has the property that for some 𝑐 ′ > 0, the rows of 𝑐 ′𝑛𝑈 ′ lie
Finally, we discuss how to round the recovered subspace 𝑈 into inside the convex hull of 𝑣 1, . . . , 𝑣𝑘 , then we can express each 𝑢𝑖′
a community assignment. Í (𝑖 )
as a convex combination 𝑘𝑗=1 𝑝 𝑗 𝑣 𝑗 and then independently sam-
2.3 Rounding to Communities ple 𝑤𝑖 from {𝑣 1, . . . , 𝑣𝑘 } according to the probability distribution
(𝑖 )
At this stage, we are presented with a constant-dimensional sub- (𝑝 𝑗 ) 𝑗 ∈ [𝑘 ] . The resulting embedding 𝑊 would satisfy the property
𝑘 −1 · 𝑛 · Π , where this inequality holds
that E𝑊𝑊 ⊤ ⪰ 𝑐 ′ · dim(𝑈
space 𝑈 with the key feature that it is correlated with the com- ) 𝑈
munity assignment vectors {1𝑐 }𝑐 ∈ [𝑘 ] . Our goal is to round 𝑈 to a since the off-diagonal entries are equal, and the diagonal of 𝑊𝑊 ⊤
community assignment that is “well-correlated” with the ground is larger.
truth. In order to discuss how we achieve this goal, we must make The reason an appropriate scaling 𝑐 ′ exists follows from the facts
precise what it means to be “well-correlated” with the ground truth. that the convex hull of 𝑣 1, . . . , 𝑣𝑘 is full-dimensional and contains
Notice that a community assignment is just as plausible as the the origin, which we prove in the full version of the paper.
same assignment with the names of communities permuted, and
thus counting the number of correctly labeled vertices is not a 3 PRELIMINARIES
meaningful metric.
Stochastic Block Model Notation. We write 1 to denote the all-
A more meaningful metric is the number of pairwise mistakes,
ones vector and 𝑒𝑖 to denote the 𝑖th standard basis vector, with the
i.e. the number of pairs of vertices in the same community assigned
dimensions implicit. For a 𝑘-community block model, let 𝜋 ∈ R𝑘
to different communities or in different communities assigned to
denote the prior community probabilities, and Π = diag(𝜋), so that
the same community. A convenient way to express this metric is
𝜋 = Π1. The edge probabilities are parameterized by a symmetric
via the inner product of positive semidefinite matrices encoding
whether pairs of vertices belong to the same community or not. matrix M ∈ R𝑘 ×𝑘 , the block probability matrix. A true community
Given a community assignment 𝑥, we assign it the matrix 𝑋 , defined assignment 𝒙 : [𝑛] → [𝑘] is sampled i.i.d. from 𝜋. Conditioned on
M 𝑑
as 𝒙, an edge between 𝑖 and 𝑗 is sampled with probability 𝒙 (𝑖 ),𝒙
𝑛
(𝑗)
.
(
1 if 𝑥 (𝑖) = 𝑥 ( 𝑗) To ensure that the average degree is 𝑑, we stipulate that M𝜋 = 1.
𝑋 [𝑖, 𝑗] = 1 We will also use 𝑿 ∈ R𝑛×𝑘 to denote the one-hot encoding of
− 𝑘 −1 if 𝑥 (𝑖) ≠ 𝑥 ( 𝑗).
𝒙, i.e., the matrix where the 𝑡-th row is equal to 𝑒𝒙 (𝑡 ) . We will
For the ground truth assignment 𝒙 and the output of our algorithm sometimes find it convenient to access the columns of 𝑿 , which are
𝑥b, we measure the correlation with ⟨𝑿, 𝑋 b⟩. Observe that for any the indicator vectors for the 𝑘 different communities; we denote
b
guess 𝑋 that is oblivious to the input (for example, classifying all these by 1𝑐 for any community 𝑐 ∈ [𝑘]. For any 𝑓 : [𝑘] → R,
vertices to the same community, or blindly guessing), the value of define the lift of 𝑓 with respect to the true community assignment
b = 𝑿,
b⟩ is concentrated below 𝑂˜ (𝑛 3/2 ). On the other hand, if 𝑋 Í
⟨𝑿, 𝑋 by 𝒇 (𝑛) ≜ 𝑐 ∈ [𝑘 ] 𝑓 (𝑐) · 1𝑐 .
371
Another natural matrix that appears throughout the analysis is 4 RECOVERY ALGORITHM
the Markov transition matrix 𝑇 ≜ 𝑀Π, which by detailed balance Let 𝑮 be the graph drawn from SBM𝑛 (M, 𝜋, 𝑑), and let 𝑮 e denote
evidently has stationary distribution 𝜋. This is an asymmetric ma- the input graph which is 𝑮 along with an arbitrary 𝛿𝑛 adversarial
trix, but since 𝑇 defines a time-reversible Markov chain with respect edge corruptions. Our algorithm for clustering the vertices into
to 𝜋, 𝑇 is self-adjoint with respect to the inner product ⟨·, ·⟩ 𝜋 in R𝑘 communities proceeds in multiple phases, described formally below.
induced by 𝜋. Hence 𝑇 is diagonalizable with real eigenvalues and The first phase preprocesses the graph by making it bounded
its eigenvalues are 1 = 𝜆1 > |𝜆2 | ⩾ · · · ⩾ |𝜆𝑘 |, with ties broken by degree and constructs an appropriate matrix 𝑀 associated to the
placing positive eigenvalues before the negative ones. Note that graph. The second phase cleans up 𝑀 and uses a spectral algorithm
the normalization condition M𝜋 = 1 translates into 𝑇 1 = 1. to robustly recover a subspace containing nontrivial information
about the true communities. Finally, the third phase rounds the
Matrix Notation. We use ⪯ and ⪰ to denote inequalities on matri- subspace to an actual community assignment.
ces in the Loewner order. For any 𝑛×𝑛 matrix 𝑋 , we use Π ⩽𝑎 (𝑋 ) and
Π ⩾𝑎 (𝑋 ) to denote the projectors onto the spaces spanned by eigen- Algorithm 4.1. 𝑮 e is given as input, and a community assignment
vectors of 𝑋 with eigenvalue at most and at least 𝑎 respectively. We to the vertices is produced as output.
also define 𝑋 ⩽𝑎 ≜ Π ⩽𝑎 (𝑋 )𝑋 Π ⩽𝑎 (𝑋 ) and 𝑋 ⩾𝑎 ≜ Π ⩾𝑎 (𝑋 )𝑋 Π ⩾𝑎 (𝑋 ), Phase 1: Deletion of High-degree Vertices. For some large con-
the corresponding truncations of the eigendecomposition of 𝑋 . stant 𝐵 > 0 to be specified later, we perform the following trunca-
For 𝑆 ⊆ [𝑛], we use 𝑋𝑆,𝑆 to denote the matrix obtained by taking tion step: delete all edges incident on vertices with degree larger
𝑋 and zeroing out all rows and columns with indices outside 𝑆. e This forms a graph 𝑮
than 𝐵 in 𝑮. e𝐵 , with corresponding adjacency
matrix 𝐴 𝑮 ∈ R |𝑉 (𝑮 ) | × |𝑉 (𝑮 ) | . To avoid confusion, we preserve
e𝐵
Nonbacktracking Matrix and Bethe Hessian. For a graph 𝐺, let
the vertex set 𝑉 (𝑮), but it should be understood that the truncated
𝐵𝐺 be its nonbacktracking matrix, 𝐴𝐺 be its adjacency matrix, 𝐷𝐺
(ℓ ) vertices do not contribute to the graph.
be its diagonal matrix of degrees, 𝐴𝐺 be its ℓ-th nonbacktracking For technical considerations, we also define a (nonstandard)
power of 𝐴𝐺 , and 𝐻𝐺 (𝑡) ≜ (𝐷𝐺 − 𝐼 )𝑡 2 − 𝐴𝐺 𝑡 + 𝐼 be its Bethe truncated diagonal matrix
Hessian matrix. The matrix we use for our algorithm is 𝑀𝐺,ℓ (𝑡) ≜
(ℓ ) (ℓ ) (1)
𝐴𝐺 𝐻𝐺 (𝑡)𝐴𝐺 . We will drop the 𝐺 from the subscript when the e ≜ diag (deg(𝑣)1[deg(𝑣) ⩽ 𝐵]) 𝑣 ∈𝑉 (𝑮 )
𝐷𝑮
𝐵
graph 𝐺 is clear from context.
With this, we can then define the truncated Bethe Hessian matrix
Determinants. Below, we collect some standard linear algebraic 2
e (𝑡) ≜ 𝐼 − 𝑡𝐴 𝑮
𝐻𝑮 e + 𝑡 (𝐷 𝑮
e − 𝐼 ). (2)
facts that will prove useful. 𝐵 𝐵 𝐵
Finally, the input matrix to the next phase is

Fact 3.1. Suppose a matrix 𝑋 has a kernel of dimension 𝑘, then every
(𝑛 − 𝑗) × (𝑛 − 𝑗) submatrix of 𝑋 for 𝑗 < 𝑘 is singular. (ℓ ) (ℓ )
(3)
e𝐵 ,ℓ (𝑡) ≜ 𝐴 𝑮
𝑀𝑮 𝐻 e (𝑡)𝐴 e ,
e𝐵 𝑮 𝐵 𝑮𝐵
Fact 3.2 (Jacobi’s formula). For any differentiable function 𝑋 : R →
where we also choose the value of 𝑡 later.
R𝑛×𝑛 ,
𝑑 Remark 4.2. To reduce any chance of confusion with the notation,
det(𝑋 (𝑢)) we reiterate our conventions for distinguishing between different
𝑑𝑢
Õ𝑛 Õ 𝑛 versions of various matrices. If a graph is truncated at level 𝐵, then
𝑑
= det 𝑋 (𝑢) [𝑛]\{𝑖 },[𝑛]\{ 𝑗 } · (−1)𝑖+𝑗 · (𝑋 (𝑢))𝑖,𝑗 . we add a subscript 𝐵. We use tilde to denote that we are working
𝑖=1 𝑗=1
𝑑𝑢 with a corrupted graph. Finally, we use overline to denote that we
are working with the nonstandard version of the Bethe Hessian
Lemma 3.3. Let 𝑋 : R → R𝑛×𝑛 and 𝑓 : R → R be any pair of after truncation.
smooth functions. For any 𝑗 ⩾ 0, there exist functions (𝑞𝑆,𝑇 : R → For example, the matrix 𝐷 𝑮𝐵 no longer corresponds to the degree
R)𝑆,𝑇 ⊆ [𝑛], |𝑆 |=|𝑇 | ⩾𝑛− 𝑗 such that: matrix of 𝑮𝐵 , since as stated it still counts edges from truncated
vertices. This is done to simplify the analysis of the spectrum of
𝑑 𝑗 Õ
[det(𝑋 (𝑢)) · 𝑓 (𝑢)] = det 𝑋 (𝑢)𝑆,𝑇 𝑞𝑆,𝑇 (𝑢). 𝑀 𝑮𝐵 ,ℓ (𝑡) but we do not believe it to be essential.
𝑑𝑢
𝑆,𝑇 ⊆ [𝑛]
|𝑆 |=|𝑇 |⩾𝑛− 𝑗 Phase 2: Recovering a Subspace with Planted Signal. Define
𝑀 ≜ 𝑀𝑮 e𝐵 ,ℓ . We give an iterative procedure to “clean up” 𝑀 by
Proof. We prove this by induction. This is clearly true when 𝑗 = deleting a few rows and columns. We then run a spectral algorithm
0, and the induction step is a consequence of Jacobi’s formula. □ on the cleaned up version of 𝑀.
Let 𝜂 > 0 be a small constant we choose later, and let 𝐾 ≜ 𝐵 2ℓ+3 .
Kesten-Stigum Threshold. We say that a stochastic block model (1) Define 𝑀 (0) as 𝑀. Let 𝑡 be a counter initialized at 0, and
is above the Kesten–Stigum (KS) threshold if 𝜆2 (𝑇 ) 2𝑑 > 1, where Φ(𝑋 ) as the number of eigenvalues of 𝑋 smaller than −𝜂.
recall that 𝜆2 is the second largest eigenvalue in absolute value. We (2) While Φ(𝑀 (𝑡 ) ) > 2𝐾
𝜂 𝑟 : compute the projection matrix Π
(𝑡 ) ≜
use 𝑟 to denote the number of eigenvalues of 𝑇 equal to 𝜆2 (𝑇 ). Π ⩽−𝜂 (𝑀 (𝑡 ) ), choose a random 𝑖 ∈ [𝑛] with probability
372
(𝑡 )
Π𝑖,𝑖 of this notion, how it recovers other previously considered measures
Tr ( Π (𝑡 ) )
, and define 𝑀 (𝑡 +1) as the matrix obtained by ze-
of correlation in the case of the symmetric block model, and why it
roing out the 𝑖-th row and column of 𝑀 (𝑡 ) . Then increment is meaningful. In particular, it implies the notion of weak recovery
𝑡. used in [13].
Let 𝑇 be the time of termination and 𝜏 > 0 be a large enough
Our main guarantee is stated below.
constant we choose later. We compute Π (𝑇 ) , and then compute as
(𝑇 ) e as Π (𝑇 ) Theorem 4.6. For any SBM parameters (M, 𝜋, 𝑑) above the KS
the set 𝑆 of all indices 𝑖 where Π ⩽ 𝜏 . Define Π
𝑖,𝑖 𝑛 , 𝑆,𝑆 ⩾𝜂/𝐾
threshold, there is a constant 𝜌 (M, 𝜋, 𝑑) > 0 such that the above
and compute its span 𝑈 , where we recall that (𝑋 ) ⩾𝑎 denotes the algorithm takes in the corrupted graph 𝑮 e as input and outputs b 𝒙
truncation of the eigendecomposition of 𝑋 for eigenvalues at least achieving 𝜌 (M, 𝜋, 𝑑)-weak recovery with probability 1 − 𝑜𝑛 (1) over
𝑎. This subspace 𝑈 is passed to the next phase. the randomness of 𝑮 ∼ SBM𝑛 (M, 𝜋, 𝑑).
Phase 3: Rounding to a Community Assignment. Define 𝑟 ′ as D E
To prove the above theorem it suffices to analyze (E 𝑿 b) Ψ , 𝑿 Ψ .
𝑟 − 1 when 𝜆2 (𝑇 ) > 0 and as 𝑟 when 𝜆2 (𝑇 ) < 0. We first obtain
′
an 𝑟 ′ -dimensional embedding of the vertices into R𝑟 . Compute a To see why, let us first set up some notation. For each vertex 𝑖, we
′ ′
random 𝑟 -dimensional subspace 𝑈 of 𝑈 , and take an orthogonal obtain a simplex vector 𝑤𝑖 ∈ R𝑘 , which we can stack as rows into
basis 𝑢 1′ , . . . , 𝑢𝑟′ ′ . Place these vectors as a column of a matrix 𝑀 ′ in a weight matrix 𝑊 ∈ R𝑛×𝑘 . We then independently round each
′
R𝑛×𝑟 . The rows of 𝑀 ′ gives us the desired embedding. b =𝑊.
vertex so that E 𝑿
On the other hand, we use the natural embedding of the 𝑘 com- To analyze our rounding scheme, first note that E[𝑿 bΨ ] is equal
′
munities into R𝑟 induced by the 𝑟 ′ nontrivial right eigenvectors to 𝑊Ψ off of the diagonal and is larger than 𝑊Ψ on the diag-
onal, Dand thus E[ b
corresponding to the eigenvalue 𝜆2 (𝑇 ):′ (𝜓𝑖 )1⩽𝑖 ⩽𝑟 of 𝑇 . In partic-
′ E 𝑿 Ψ ] ⪰ 𝑊Ψ . Since 𝑿 Ψ is positive semidefi-
ular, let Ψ𝑟 ′ ≜ 𝜓 1 · · · 𝜓𝑟 ′ ∈ R𝑘 ×𝑟 be the matrix of these 𝑟 ′ nite, E[𝑿bΨ ], 𝑿 Ψ ⩾ ⟨𝑊Ψ , 𝑿 Ψ ⟩. Thus, it suffices to lower bound
′
nontrivial eigenvectors of 𝑇 . Then the row vectors 𝜙 1, . . . , 𝜙𝑘 ∈ R𝑟 ⟨𝑊Ψ , 𝑿 Ψ ⟩. By construction, 𝑊Ψ is equal to 𝑐 2 · Π𝑈 ′ , where recall
form the desired embedding of communities. that 𝑈 ′ was a random 𝑟 ′ -dimensional subspace of 𝑈 , the output of
In the rounding algorithm, we first find the largest 𝑐 such that Phase 2 of the algorithm. Thus,
all the rows of 𝑐 · 𝑀 ′ lie in the convex hull of 𝜙 1, . . . , 𝜙𝑘 . We can
𝑟′
find such a value of 𝑐 if it exists by solving a linear program, and E′ 𝑊Ψ = 𝑐 2 · E′ Π𝑈 ′ = 𝑐 2 · Π𝑈 .
we prove that this 𝑐 > 0 is guaranteed to exist in the full version 𝑈 𝑈 dim(𝑈 )
of this paper. Then, for each 𝑖 ∈ [𝑛] we express each row of 𝑐 · 𝑀 ′ In the full version of this manuscript, we prove that when
Í (𝑗) (𝑗) (M, 𝜋, 𝑑) are above the KS threshold, ⟨Π𝑈 , 𝑿 Ψ ⟩ ⩾ Ω(1) · ∥Π𝑈 ∥ 𝐹 ·
as a convex combination 𝑘𝑗=1 𝑤𝑖 𝜙 𝑗 for nonnegative 𝑤𝑖 such
Í𝑘 ∥𝑿 Ψ ∥ 𝐹 and dim(𝑈 ) = 𝑂 (1). Furthermore, we show that when
(𝑗)
that 𝑗=1 𝑤𝑖 = 1. Finally, we assign vertex 𝑖 to community 𝑗 with √
diag(Π𝑈 ) = 𝑂 (1/𝑛), we can take 𝑐 = Ω( 𝑛); this delocalization
(𝑗) condition is guaranteed by phase 2 of the algorithm.
probability 𝑤𝑖 , and output the resulting community assignment D CombinedE
b
𝒙. bΨ ∥ 𝐹 = 𝑂 (𝑛), it follows that E 𝑿
with the fact that ∥ 𝑿 bΨ , 𝑿 Ψ ⩾
Remark 4.3. Scaling the rows of 𝑀 ′ so as to lie in the convex hull bΨ ∥ 𝐹 · ∥𝑿 Ψ ∥ 𝐹 , which establishes Theorem 4.6.
Ω(1) · ∥ 𝑿
of {𝜙 𝑗 } 𝑗 ∈ [𝑘 ] , is reminiscent of the rounding algorithm of Charikar
& Wirth [10] to find a cut of size 12 + Ω( log(1/𝜀 𝜀 ACKNOWLEDGMENTS
) ) in a graph with
1
maximum cut of size + 𝜀: in their algorithm, they scale 𝑛 scalars We would like to thank Omar Alrabiah and Kiril Bangachev for
2
to lie in the interval [−1, 1]. diligent feedback on an earlier draft of this paper. DW acknowledges
support from NSF Graduate Research Fellowship DGE-2146752.
Analysis of Algorithm. Our goal is to prove that the output
b
𝒙 of our algorithm is well-correlated with the true community REFERENCES
assignment 𝒙. We begin by defining a notion of weak recovery for [1] Emmanuel Abbe. 2017. Community detection and stochastic block models: recent
𝑘-community stochastic block models. developments. The Journal of Machine Learning Research 18, 1 (2017), 6446–6531.
https://doi.org/10.1561/0100000067
[2] Emmanuel Abbe and Colin Sandon. 2015. Community detection in general
Definition 4.4 (Weak recovery). Let Ψ ≜ 𝜓 2 · · · 𝜓𝑘 ∈ stochastic block models: Fundamental limits and efficient algorithms for recovery.
R𝑘 × (𝑘 −1) be the matrix of the top-(𝑘 − 1) nontrivial eigenvectors In 2015 IEEE 56th Annual Symposium on Foundations of Computer Science. IEEE,
of the transition matrix 𝑇 of a stochastic block model. 670–688. https://doi.org/10.1109/focs.2015.47
[3] Jess Banks, Sidhanth Mohanty, and Prasad Raghavendra. 2021. Local Statistics,
For 𝜌 > 0, we say that a (randomized) algorithm for producing Semidefinite Programming, and Community Detection. In Proceedings of the
community assignments 𝑿 b ∈ R𝑛×𝑘 achieves 𝜌-weak recovery if 2021 ACM-SIAM Symposium on Discrete Algorithms (SODA). SIAM, 1298–1316.
D E https://doi.org/10.1137/1.9781611976465.79
bΨ , 𝑿 Ψ ⩾ 𝜌 ∥E 𝑿bΨ ∥ 𝐹 ∥𝑿 Ψ ∥ 𝐹 , [4] Hyman Bass. 1992. The Ihara-Selberg zeta function of a tree lattice. International
E𝑿 Journal of Mathematics 3, 06 (1992), 717–797.
[5] Avrim Blum and Joel Spencer. 1995. Coloring random and semi-random k-
where 𝐵 Ψ ≜ (𝐵Ψ)(𝐵Ψ) ⊤ for a matrix 𝐵 ∈ R𝑛×𝑘 . colorable graphs. Journal of Algorithms 19, 2 (1995), 204–234. https://doi.org/10.
1006/jagm.1995.1034
Remark 4.5. Intuitively, this notion is capturing the “advantage” of [6] Charles Bordenave, Marc Lelarge, and Laurent Massoulié. 2015. Non-backtracking
spectrum of random graphs: community detection and non-regular Ramanujan
the algorithm over random guessing, or simply outputting the most graphs. In 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.
likely community. See the full version for a more detailed discussion IEEE, 1347–1357. https://doi.org/10.1109/focs.2015.86
373
[7] Rares-Darius Buhai, Pravesh K Kothari, and David Steurer. 2023. Algorithms [21] Yasutaka Ihara. 1966. On discrete subgroups of the two by two projective linear
approaching the threshold for semi-random planted clique. In Proceedings of group over p-adic fields. Journal of the Mathematical Society of Japan 18, 3 (1966),
the 55th Annual ACM Symposium on Theory of Computing. 1918–1926. https: 219–235. https://doi.org/10.2969/jmsj/01830219
//doi.org/10.1145/3564246.3585184 [22] Harry Kesten and Bernt P Stigum. 1966. Additional limit theorems for indecom-
[8] Emmanuel J Candès, Xiaodong Li, Yi Ma, and John Wright. 2011. Robust principal posable multidimensional Galton-Watson processes. The Annals of Mathematical
component analysis? Journal of the ACM (JACM) 58, 3 (2011), 1–37. https: Statistics 37, 6 (1966), 1463–1481. https://doi.org/10.1214/aoms/1177699139
//doi.org/10.1145/1970392.1970395 [23] Harry Kesten and Bernt P Stigum. 1967. Limit theorems for decomposable multi-
[9] Moses Charikar, Jacob Steinhardt, and Gregory Valiant. 2017. Learning from dimensional Galton-Watson processes. J. Math. Anal. Appl. 17, 2 (1967), 309–338.
untrusted data. In Proceedings of the 49th Annual ACM SIGACT Symposium on https://doi.org/10.1016/0022-247x(67)90155-2
Theory of Computing. 47–60. https://doi.org/10.1145/3055399.3055491 [24] Florent Krzakala, Cristopher Moore, Elchanan Mossel, Joe Neeman, Allan Sly,
[10] Moses Charikar and Anthony Wirth. 2004. Maximizing quadratic programs: Lenka Zdeborová, and Pan Zhang. 2013. Spectral redemption in clustering sparse
Extending Grothendieck’s inequality. In 45th Annual IEEE Symposium on Foun- networks. Proceedings of the National Academy of Sciences 110, 52 (2013), 20935–
dations of Computer Science. IEEE, 54–60. https://doi.org/10.1109/focs.2004.39 20940. https://doi.org/10.48550/arXiv.1306.5550
[11] Aurelien Decelle, Florent Krzakala, Cristopher Moore, and Lenka Zdeborová. [25] Allen Liu and Ankur Moitra. 2022. Minimax rates for robust community detection.
2011. Asymptotic analysis of the stochastic block model for modular networks In 2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS).
and its algorithmic applications. Physical review E 84, 6 (2011), 066106. https: IEEE, 823–831. https://doi.org/10.1109/focs54457.2022.00083
//doi.org/10.1103/physreve.84.066106 [26] Laurent Massoulié. 2014. Community detection thresholds and the weak Ra-
[12] Amit Dembo, Andrea Montanari, and Subhabrata Sen. 2017. Extremal cuts manujan property. In Proceedings of the forty-sixth annual ACM symposium on
of sparse random graphs. The Annals of Probability 45, 2 (2017), 1190–1217. Theory of computing. 694–703. https://doi.org/10.1145/2591796.2591857
https://doi.org/10.1214/15-aop1084 [27] Theo McKenzie, Hermish Mehta, and Luca Trevisan. 2020. A new algorithm for
[13] Jingqiu Ding, Tommaso d’Orsi, Rajai Nasser, and David Steurer. 2022. Robust the robust semi-random independent set problem. In Proceedings of the Fourteenth
recovery for stochastic block models. In 2021 IEEE 62nd Annual Symposium on Annual ACM-SIAM Symposium on Discrete Algorithms. SIAM, 738–746. https:
Foundations of Computer Science (FOCS). IEEE, 387–394. https://doi.org/10.1109/ //doi.org/10.48550/arXiv.1808.03633
focs52979.2021.00046 [28] Ankur Moitra, William Perry, and Alexander S Wein. 2016. How robust are
[14] Jingqiu Ding, Tommaso d’Orsi, Yiding Hua, and David Steurer. 2023. Reaching reconstruction thresholds for community detection?. In Proceedings of the forty-
Kesten-Stigum Threshold in the Stochastic Block Model under Node Corruptions. eighth annual ACM symposium on Theory of Computing. 828–841. https://doi.
In The Thirty Sixth Annual Conference on Learning Theory. PMLR, 4044–4071. org/10.1145/2897518.2897573
https://doi.org/10.48550/arxiv.2305.10227 [29] Andrea Montanari and Subhabrata Sen. 2016. Semidefinite programs on sparse
[15] Zhou Fan and Andrea Montanari. 2017. How well do local algorithms solve semirandom graphs and their application to community detection. In Proceedings
definite programs?. In Proceedings of the 49th Annual ACM SIGACT Symposium of the forty-eighth annual ACM symposium on Theory of Computing. 814–827.
on Theory of Computing. 604–614. https://doi.org/10.1145/3055399.3055451 https://doi.org/10.1145/2897518.2897548
[16] Uriel Feige and Joe Kilian. 2001. Heuristics for semirandom graph problems. J. [30] Elchanan Mossel, Joe Neeman, and Allan Sly. 2014. Belief propagation, robust
Comput. System Sci. 63, 4 (2001), 639–671. https://doi.org/10.1006/jcss.2001.1773 reconstruction and optimal recovery of block models. In Conference on Learning
[17] Yuzhou Gu and Yury Polyanskiy. 2023. Uniqueness of BP fixed point for the Potts Theory. PMLR, 356–370. https://doi.org/10.48550/arXiv.1309.1380
model and applications to community detection. arXiv preprint arXiv:2303.14688 [31] Elchanan Mossel, Joe Neeman, and Allan Sly. 2018. A proof of the block model
(2023). https://doi.org/10.48550/arxiv.2303.14688 threshold conjecture. Combinatorica 38, 3 (2018), 665–708. https://doi.org/10.
[18] Olivier Guédon and Roman Vershynin. 2016. Community detection in sparse 1007/s00493-016-3238-8
networks via Grothendieck’s inequality. Probability Theory and Related Fields [32] Alaa Saade, Florent Krzakala, and Lenka Zdeborová. 2014. Spectral clustering
165, 3-4 (2016), 1025–1049. https://doi.org/10.1007/s00440-015-0659-z of graphs with the Bethe Hessian. Advances in neural information processing
[19] Venkatesan Guruswami, Jun-Ting Hsieh, Pravesh K Kothari, and Peter Manohar. systems 27 (2014). https://doi.org/10.48550/arXiv.1406.1880
2023. Efficient Algorithms for Semirandom Planted CSPs at the Refutation [33] Ludovic Stephan and Laurent Massoulié. 2019. Robustness of spectral methods
Threshold. arXiv preprint arXiv:2309.16897 (2023). https://doi.org/10.1109/ for community detection. In Conference on Learning Theory. PMLR, 2831–2860.
focs57990.2023.00026 https://doi.org/10.48550/arXiv.1811.05808
[20] Samuel B Hopkins and David Steurer. 2017. Efficient Bayesian estimation from [34] Qian Yu and Yury Polyanskiy. 2023. Ising model on locally tree-like graphs:
few samples: community detection and related problems. In 2017 IEEE 58th Uniqueness of solutions to cavity equations. IEEE Transactions on Information
Annual Symposium on Foundations of Computer Science (FOCS). IEEE, 379–390. Theory (2023). https://doi.org/10.1109/tit.2023.3316795
https://doi.org/10.1109/focs.2017.42
Received 13-NOV-2023; accepted 2024-02-11
374

r2

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

r2

Uploaded by

Copyright:

Available Formats

MIT Open Access Articles

Robust Recovery for Stochastic Block

Persistent URL: https://hdl.handle.net/1721.1/155672

Terms of use: Creative Commons Attribution

Finally, the input matrix to the next phase is

You might also like