Professional Documents
Culture Documents
Linear Equation Faster Than Mul
Linear Equation Faster Than Mul
ω 2 5ω 4
matrix A. Formally, the condition number of a sym- O max nnz pAq ω1 n2 , n ω 1 log2 pκ{q
metric matrix A, κpAq, is the ratio between the max-
imum and minimum eigenvalues of A. Here the best a vector x such that
known rate of convergence when all intermediate opera-
tions are restricted ato bit-complexity OplogpκpAqq is to }Ax ΠA b}22 ¤ }ΠA b}22 ,
an error of in Op κpAq logp1{qq iterations. This is
known to be tight if one restricts to matrix-vector mul- where c is a fixed constant and ΠA is the projection
tiplications in the intermediate steps [SV13, MMS18]. operator onto the column space of A.
This means for moderately conditioned (e.g. with κ
Note that }ΠA b}2 AT bpAT Aq1 , and when A is
poly pnq), sparse, systems , the best runtime bounds
are still via direct methods, which are stable when square and full rank, it is just }b}2 .
Oplogp1{κqq words of precision are maintained in inter- The cross-over point for the two bounds is at
3pω 1q
mediate steps [DDHK07]. nnz pAq n ω 1 . In particular, for the sparse case
Many of the algorithms used in scientific computing with nnz pAq Opnq, and the current best ω ¤
for solving linear systems involving large, space, matri- 2.372864 [LG14], we get an exponent of
ces are based on combining direct and iterative meth- " *
ods: we will briefly discuss this perspectives in Sec- ω 2 5ω 4
ω1 ω 1
max 2 ,
tion 1.3. From the asymptotic complexity perspective,
the practical successes of many such methods naturally maxt2.271595, 2.331645u 2.331645.
leads to the question of whether a one can provably do
better than the Opmintnω , nnz κpAquq time corre- As n ¤ nnz, this also translates to a running time
5ω 4
of Opnnz ω 1 q, which as 5ω 4 pω2q2
ω 1 ω ω 1 , refutes
sponding to the faster of direct or iterative methods.
k for constant values of k and any value of ω ¡ 2.
Somewhat surprisingly, despite the central role of this SLTHω
question in scientific computing and numerical analysis, We can parameterize the asymptotic gains over
as well as extensive studies of linear systems solvers, matrix multiplication for moderately sparse instances.
progress on this question has been elusive. The contin- Here we use the Or pq notation to hide lower-order terms,
ued lack of progress on this question has led to its use specifically Opf pnqq denotes Opf pnq logc pf pnqqq for
r
as a hardness assumption for showing conditional lower some absolute constant c.
bounds for numerical primitives such as linear elasticity
problems [KZ17] and positive linear programs [KWZ20]. Corollary 1.1. For any matrix A with dimension at
One formalization of such hardness is the Sparse Lin- most n, Opnω1θ q non-zeros, and condition number
ear Equation Time Hypothesis (SLTH) from [KWZ20]: nOp1q , a linear system in A can be solved to accuracy
5ω 4 θ pω 2q
SLTHγk denotes the assumption that a sparse linear sys-
nOp1q in time O r pmaxtn ω 1 , nω ω1 uq.
tem with κ ¤ nnz pAqk cannot be solved in time faster
than nnz pAqγ to within relative error n10k . Here Here the cross-over point happens at θ
pω1qpω2q . Also, because
5ω 4 pω2q2
ω 1 ω ω 1 , we can
improving over the smaller running time of both direct
also infer that for any 0 θ ¤ ω 2 and any ω ¡ 2, the
and iterative methods can be succinctly encapsulated as ω 1
mint1 k{2,ω u 2
refuting SLTHk . runtime is opnω q, or asymptotically faster than matrix
In this paper, we provide a faster algorithm for multiplication.
solving sparse linear systems. Our formal result is the
following (we use the form defined in [KWZ20] [Linear 1.1 Idea At a high level, our algorithm follows the
Equation Approximation Problem, LEA]). block Krylov space method (see e.g. Chapter 6.12 of
Saad [Saa03]). This method is a multi-vector extension
2 The hardness results in [KWZ20] were based on SLTH1.99 of the conjugate gradient / Lanczos method, which in
1.5
under the Real RAM model in part due to the uncertain status the single-vector setting is known to be problematic
of conjugate gradient in different models of computation. under round-off errors both in theory [MMS18] and in
.. .
Krylov spaces. Such bounds upper bound the prob-
.
..
Downloaded 06/27/21 to 143.215.38.55. Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/page/terms
.. .
Furthermore, due to the shortcomings of the matrix
.
..
anti-concentration bounds, we modify the solver algo- α
rithm so that it uses a more limited version of the block- 2 α
Krylov space that fall under the cases that could be
analyzed.
Before we describe the difficulties and new tools Figure 1: The difference between matrix anti-
needed, we first provide some intuition on why a factor concentration over finite fields and reals: a matrix that
m increase in word lengths may be the right answer is full rank for all α 0, but is always ill conditioned.
by upper-bounding the magnitudes of entries in a m-
step Krylov space. The maximum magnitude of Am b
is bounded by the max magnitude of A to the power analog of this is more difficult: the Krylov space matrix
of m, times a factor corresponding to the number of K is asymmetric, even for a symmetric matrix A. It is
summands in the matrix product: much easier for an asymmetric matrix with correlated
entries to be close to singular.
} Am b} 8 ¤ p n ~ A~ 8 q m } b} 8 . Consider for example a two-banded, two-block ma-
So the largest numbers in K (as well as AK) can be trix with all diagonal entries set to the same random
O p mq
bounded by pnκq , or Opm log κq words in front of variable α (see Figure 1):
the decimal point under the assumption of κ ¡ n. $
Should such a bound of Opm log κq hold for all
'
'
'
1 if i j and j ¤ n{2,
'
&α if i j 1 and j ¤ n{2,
'
'
numbers that arise, including the matrix inversions,
and the matrix B is sparse with Opnq entries, the Aij α if i j 1 and n{2 j,
'
2 if i j 1 and n{2 j,
cost of computing the block-Krylov matrices becomes '
'
'
Opm log κ ms nnz q, while the cost of the matrix
'
'
%
inversion portion encounters an overhead of Opm log κq,
0 otherwise.
for a total of O r pm2 sω log κq. In the sparse case of In the exact case, this matrix is full rank unless
nnz Opnq, and n ms, this becomes: α 0, even over finite fields. On the other hand, its
2 2 ω
minimum singular value is close to 0 for all values of α
O n m log κ m s log κ
ω
because:
(1.1) O n2 m log κ mnω2 log κ . Observation 1.1. The minimum singular value of
a matrix with 1s on the diagonal, α on the en-
Due to the gap between n2 and nω , setting m appropri- tries immediately below the diagonal, and 0 every-
ately gives improvement over nω when log κ nop1q . where else is at most |α|pn1q , due to the test vector
However, the magnitude of an entry in the inverse r1; α; α2 ; . . . ; pαqn1 s.
depends on the smallest magnitude, or in the matrix
case, its minimum singular value. Bounding and prop- Specifically, in the top-left block, as long as |α| ¡ 3{2,
agating the min singular value, which intuitively cor- the top left block has minimum singular value at most
responds to how close a matrix is to being degenerate, p2{3qn1 . On the other hand, rescaling the bottom-
represents our main challenge. In exact/finite fields set- right block by 1{α to get 1s on the diagonal gives 2{α
tings, non-degeneracies are certified via the Schwartz- on the off-diagonal. So as long as |α| 3{2, this value is
Zippel Lemma about polynomial roots. The numerical at least 4{3, which in turn implies a minimum singular
also encounter similar difficulties: a more detailed The randomness of the Krylov matrix induced by
discussion of it can be found in Section 7 of Sankar et the initial set of random vectors B is more difficult to
al. [SST06]. analyze: each column of B affects m columns of the
In order to bound the bit complexity of all interme- overall Krylov space matrix. In contrast, all existing
diate steps of the block Krylov algorithm by O r pmq log κ, analyses of lower bounds of singular values of possibly
we devise a more numerically stable algorithm for solv- asymmetric random matrices [SST06, TV10] rely on the
ing block Hankel matrices, as well as provide a new per- randomness in the columns of matrices being indepen-
turbation scheme to quickly generate a well-conditioned dent. The dependence between columns necessitates an-
block Krylov space. Central to both of our key compo- alyzing singular values of random linear combinations
nents is the close connection between condition number of matrices, which we handle by adapting -net based
and bit complexity bounds. proofs of anti-concentration bounds. Here we encounter
First, we give a more numerically stable solver for an additional challenge in bounding the minimum sin-
block Hankel/Toeplitz matrices. Fast solvers for Han- gular value of the block Krylov matrix. We resolve this
kel (and closely related Toeplitz) matrices have been ex- issue algorithmically: instead of picking a Krylov space
tensively studied in numerical analysis, with several re- that spans the entire <n , we stop things short by picking
cent developments on more stable algorithms [XXG12]. ms n O r pmq This set of extra columns significantly
However, the notion of numerical stability studied in simplify the proof of singular value lower bounds. This
these algorithms is the more practical variant where is similar in spirit to the analysis of minimum singular
the number of bits of precision is fixed. As a re- values of random matrices, which is significantly easier
sult, the asymptotic behavior of the stable algorithm for non-square matrices [RV10]. In the algorithm, the
from [XXG12] is quadratic in the number of digits in remaining columns are treated as a separate block that
the condition number, which in our case would trans- we reduce to via a Schur complement at the very end
r pm2 q (i.e., the overall cost
late to a prohibitive cost of O of the block elimination algorithm. Since the block is
ω
would be higher than n ). small, so is its overhead on the running time.
Instead, we combine developments in recur-
sive block Gausssian elimination [DDHK07, KLP 16, 1.3 History and Related Work Our algorithm
CKK 18] with the low displacement rank representa- has close connections with multiple lines of research
tion of Hankel/Toeplitz matrices [KKM79, BA80]. Such on more efficient solvers for sparse linear systems.
representations allow us to implicitly express both the This topic has been extensively studied not only in
Hankel matrix and its inverse by displaced versions of computer science, but also in applied mathematics
rank 2s matrices. This means the intermediate sizes of and engineering. For example, in the Editors of the
instances arising from recursion is Opsq times the di- Society of Industrial and Applied Mathematics News’
mension, for a total size of Opn log nq, giving a total ‘top 10 algorithms of the 20th century’, three of them
of Or pnsω1 q arithmetic operations involving words of (Krylov space methods, matrix decompositions, and
size Or pmq. We provide a rigorous analysis of the accu- QR factorizations) are directly related to linear systems
mulation of round-off errors similar to the analysis of solvers [Cip00].
recursive matrix multiplication based matrix inversion At a high level, our algorithm is a hybrid lin-
from [DDHK07]. ear systems solver. It combines iterative methods,
Motivated by this close connection with the con- namely block Krylov space methods, with direct meth-
dition number of Hankel matrices, we then try to ini- ods that factorize the resulting Gram matrix of the
tialize with Krylov spaces of low condition number. Krylov space. Hybrid methods have their origins in the
Here we show that a sufficiently small perturbation incomplete Cholesky method for speeding up elimina-
suffices for producing a well conditioned overall ma- tion/factorization based direct solvers. A main goal of
trix. In fact, the first step of our proof, that a small these methods is to reduce the Ωpn2 q space needed to
sparse random perturbation to A guarantees good sep- represent matrix factorizations / inverses, which is even
arations between its eigenvalues is a direct combination more problematic than time when handling large sparse
the interactions between objects have invariances (e.g. algorithms that are critical to our elimination routine
either by time or space differences). Examples of such for block-Hankel matrices.
structure include circulant matrices [Gra06], Toeplitz Key to recent developments in combinatorial pre-
/ Hankel matrices [KKM79, BA80, XXG12, XXCB14], conditioning is matrix concentration [RV07, Tro15].
and distances from n-body simulations [CRW93]. Many Such bounds provide guarantees for (relative) eigenval-
such algorithms require exact preservation of the struc- ues of random sums of matrices. For generating pre-
ture in intermediate steps. As a result, many of these conditioners, such randomness arise from whether each
works develop algorithms over finite fields [BA80, BL94, element is kept, and a small condition number (which
BJMS17]. in turn implies a small number of outer iterations usign
More recently, there has been work on developing the preconditioners) corresponds to a small deviation
more numerically stable variants of these algorithms for between the original and sampled matrices. In con-
structured matrices, or more generally, matrices that trast, we introduce randomness in order to obtain block
are numerically close to being structured [XCGL10, Krylov spaces whose minimum eigen-value is large. As
LLY11, XXG12, XXCB14]. However, these results only a result, the matrix tool we need is anti-concentration,
explicitly discussed the entry-wise Hankel/Toeplitz case which somewhat surprisingly is far less studied. Previ-
(which corresponds to s 1). Furthermore, because ous works on it are mostly related by similar problems
they rely on domain-decomposition techniques similar from numerical precision [SST06, TV10], and mostly ad-
to fast multiple methods, they produce one bit of dress situations where the entries in the resulting matrix
precision per each outer iteration loop. As the Krylov are independent. Our bound on the min singular value
space matrix has condition number exppΩpmqq, such of the random Krylov space can yield a crude bound
methods would lead to another factor of m in the solve for a sum of rectangluar random matrices, but we be-
cost when directly invoked. lieve much better matrix anti-concentration bounds are
Instead, our techniques for handling and bounding possible.
numerical errors are more closely related to recent devel- Due to space constraints, we will only describe the
opments in provably efficient sparse Cholesky factoriza- overall algorithm in Section 2, and give an outline of its
tions [KLP 16, KS16, Kyn17, CKK 18]. These meth- analysis in Section 3. Section 3.2 contains a breakdown
ods generated efficient preconditioners using only the of the main components of the analysis, whose details
condition of intermediate steps of Gaussian eliminat- and proofs can be found in the arXiv version [PV20].
nion, known as Schur complements, having small repre- Some research directions raised by this work, including
sentations. They avoided the explicit generation of the possible improvements and extensions are discussed in
dense representations of Schur complements by treat- Section 4.
ment them as operators, and implicitly applied random-
ized tools to directly sample/sketch the final succinct 2 Algorithm
representations, which have much smaller algorithmic We describe the algorithm, as well as the running times
costs. of its main components in this section. To simplify
On the other hand, previous works on spare discussion, we assume the input matrix A is symmetric,
Choleskfy factorizations required the input matrix to and has poly pnq condition number. If it is asymmetric
be decomposable into a sum of simple elements, of- (but invertible), we implicitly apply the algorithm to
ten through additional combinatorial structure of the AT A, using the identity A1 pAT Aq1 AT derived
matrices. In particular, this line of work on com- from pAT Aq1 A1 AT . Also, recall from the
binatorial preconditioning was initiated through a fo- discussion after Theorem 1.1 that we use O r pq to hide
cus on graph Laplacians, which are built from 2-by- lower order terms in order to simplify runtimes.
2 matrix blocks corresponding to edges of undirected Before giving details on our algorithm, we first dis-
graphs [Vai89, Gre96, ST14, KMP12]. Since then, there cuss what constitutes a linear systems solver algorithm,
has been substantial generalizations to the structures specifically the equivalence between many such algo-
amenable to such approaches, notably to finite element rithms and linear operators.
s s s Θp m q 3 Outline of Analysis
Downloaded 06/27/21 to 143.215.38.55. Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/page/terms
σmin pM q min
} M x} 2 σmax pM q max
} M x} 2 ,
x } x} 2 x } x} 2
Θp m q pAGqT AK pAGqT AG and the condition number of M is defined as κpM q
σmax pM q{σmin pM q.
Bounds on the minimum singular value allows us to
transfer perturbation errors to it to its inverse.
n
Lemma 3.1. If M is a full rank square matrix with min
and max singular values in the range rσmin , σmaxs, and
Figure 4: Full matrix AQ and its Associated Gram Ma- M M ¤
is some approximation of it such that M
trix pAQqT pAQq. Note that by our choice of parameters for some σ {2, then F
which means all singular values can change by at most Such a bound means that it suffices to show that
. the failure probability of any step of our algorithm is
nc for some constant c. The total number of steps is
Note that this implies that M is invertible. For the
bounds on inverses, note that poly pnq, so unioning over such probabilities still give a
success probability of at least 1 nc Op1q .
1 M 1 M 1 M M
M 1 I We will perturb our matrices using Gaussian ran-
variables. These random variables N p0, σ q have
M 1 M M 1 .
dom
M
density function g pxq σ?12π ex {2σ . They are par-
2 2
1
¤ the sum of Gaussians is another Gaussian, with variance
ticularly useful for showing anti-concentration because
So applying bounds on norms, as well as M
pσmin q1 ¤ 2σmin 1 gives 2
equalling to the sum of squares, or `22 -norm, of the vari-
ance terms. That is, for a vector x and a (dense) Gaus-
1
M M F 1 sian vector with entry-wise i.i.d. N p0, 1q normal random
1 variables, aka. g N p0, 1qn , we have xT g N p0, }x}2 q.
¤ M 2 M M
M
¤ 2σ
1 2
The density function of Gaussians means that
min .
F 2
their magnitude exceed n with probability at most
Opexppn2 qq. This probability is much smaller than
the nc failure probabilities that we want, so to sim-
Error Accumulation. Our notion of approximate
plify presentation we will remove it at the start.
operators also compose well with errors.
Lemma 3.2. If Z p1q and Z p2q are linear operators with
Claim 3.1. We can analyze our algorithm condition-
r p 1 q r p 2q ing on any normal random variable with variance σ,
N p0, σ q, having magnitude at most nσ.
(algorithmic) approximations Z and Z such that
for some 0.1, we have
p 1q r p1q , Z p2q Z r p2q ¤
Z Z
Tracking Word Length. The numerical round-
F F ing model that we will use is fixed point precision. The
advantage of such a fixed point representation is that
then their product satisfies
! ) it significantly simplifies the tracking of errors during
p 1q p 2q p 1q r p2q p1q p2q
Z Z Z Z ¤ 10 max 1, Z , Z . additions/subtractions. The need to keep exact opera-
r
F 2 2 tors means we cannot omit intermediate digits. Instead,
Proof. Expanding out the errors gives we track both the number of digits before and after the
decimal point.
p 1q p2q
Z Z Z Z r p 1q r p2q
The number of trailing digits, or words after the
p 1q
Z Zr Z p 1q p 2q p
Z Z 2q r p 2 q Z p 1q decimal point, compound as follows:
1. adding two numbers with L1 and L2 words after
Z p1q Z r p1q Z p2q Z r p2q
the decimal point each results in a number with
maxtL1 , L2 u words after the decimal point.
The terms involving the original matrix against the
error gets bounded by the error times the norm of the 2. multiply two numbers with L1 and L2 words after
original matrix. For the cross term, we have the decimal point each results in a number with
L1 L2 words after the decimal point.
p1q r p1q
Z Z Z p2q Z r p2q
As maxtL1 , L2 u ¤ L1 L2 when L1 and L2 are
p 1q p 1q p2q p 2q
¤ Z Z F Z Z 2 ¤ ,
r r 2 non-negative, we will in general assume that when we
multiply matrices with at most L1 and L2 words after
which along with 0.1 gives the overall bound. the decimal point, the result has at most L1 L2 words
B T Ai B.
pre-process H in time O r pmsω logpα1 1 qq to form
Theorem 3.2. Let A be an n n symmetric positive Solve p, q that corresponds to a linear operator Z
H
Theorem 3.3. If H is an sm sm symmetric s-block- 1. all eigenvalues of A are at least αA , and at most
Hankel matrix and 0 αH psmq100 is a parameter 1 ,
αA
and obtains a routine SolveA that when given a vector and its associated matrix-multiplication operator
b with Lb words after the decimal point, returns ZA b in MatVecAr p, δ q.
time r via
r n2 m plog p1{αA q ~b~
8 Lb q ,
4. Build solver for A
O
for some ZA with at most Oplogp1{αA qq words after the SolveAr Ð BlockKrylov MatVecAr p, δ q ,
ω 2
decimal point such that
n8 κ2 1 5 log n 2
, n 1 nnz pAq ω 1 .
ω
ω
ZA A 1 F
¤ αA .
5. Return
Note that we dropped as a parameter to simplify
the statement of the guarantees. Such an omission is 4
1
n θA2 SolveAr AT b .
acceptable because if we want accuracy less than the
eigenvalue bounds of A, we can simply run the algorithm
with αA Ð due to the pre-conditions holding upon Figure 5: Pseudocode for block Krylov space algorithm
αA decreasing. With iterative refinement (e.g. [Saa03],
it’s also possible to lower separate the dependence on
logp1{q to linear instead of the cubic dependence on Therefore its Frobenius norm, and in turn max singular
logp1{αH q. value, is at most 1. On the other hand, the max
singular of A is at least ~A~8 θA : consider the
3.3 Proof of Main Theorem It remains to use unit vector that’s 1 in the entry corresponding to the
the random perturbation specified in Theorem 3.1 and column containing the max magnitude entry of A, and
taking the outer product to reduce a general system to 0 everywhere else. This plus the bound on condition
the symmetric, eigenvalue well separated case covered number of κ gives that the minimum singular value of
in Lemma 3.4. A is at least
4
n θA
r 2
A is still at most 2n . Taking this perturbation bound
which incorporating the above, as well as }A}2 ¤ nθA
into Lemma 3.1 also gives that all eigenvalues of A r are
in the range gives
1
,1 .
ZA b πA b ¤ 3 AT b2 .
n5 κ2 A 1 T
4
n θA2 n θA
By Theorem 3.1, the minimum eigenvalue separa- 2
r
tion in this perturbed matrix A is at least
On the other hand, because the max eigenvalue of AT A
n8 κ2 1
5 log n is at most }A}F ¤ n2 θA
2 2
, the minimum eigenvalue of
. T 1
pA Aq is at least θA . So we have
2
As we only want an error of , we can round all Combining the two bounds then gives that the error in
entries in A to precision {κ without affecting the quality the return value is at most }ΠA b}2 .
of the answer. For the total running time, the number of non-zeros
So we can invoke Lemma 3.4 with in R implies that the total cost of mutliplying A r against
5 log n a vector with Opm logp1{αA qq Opm log n logpκ{qq
r r
αAr n8 κ2 1 , r pm logpκ{qq is
O
which leads to a solve operator ZAr such that
Or pnnz pAq nq m log2 pκ{q ¤ O r nnz pAq m log2 pκ{q ,
8 2 1 5 log n
ZA r 1
r A ¤ αA r ¤ n κ ¤ n40κ10 . where the inequality of nnzpAq ¤ n follows pre-
F
The error conversion lemma from Lemma 3.1 along with processing to remove empty rows and columns. So the
the condition that the min-singular value of A r is at least total construction cost given in Lemma 3.4 simplifies to
or factoring into the bound on the size of R via triangle The input vector, θ1Y y has max magnitude at most
inequality: 1, and can thus be rounded to Oplogpκ{qq words after
1 p T p
¤ 8 4,
2 the decimal point as well. This then goes into the solve
A
Z r A A
F n κ cost with logp~b~8 q Lb ¤ Oplogpnκ{qq, which gives a
which when inverted again via Lemma 3.1 and the min r pn2 m logpκ{qq, which is a lower order term
total of O
singular value bound gives compared to the construction cost. The cost of the
additional multiplication in A is also a lower order term.
1
Z r A pT A
p ¤ . Optimizing m in this expression above based on
n and nnz pAq gives that we should choose m so
A n 2
F only
It remains to propagate this error across the rescal- that
p 21 A, we have (
ing in Step 5. Since A n θA max n nnz pAq m, n2 m3 nω m2ω ,
1 1
AT A n41θ2 pT A
A p , or (
A max n nnz pAq mω1 , n2 mω 1
nω .
and in turn the error bound translates to The first term implies
1 1
n4 θ 2 Z p Tp
A A
¤ n4θ2 . m ¤ nω1 nnz pAq
1 ω 1 1
n nnz pAq ω
1
1
A F A
solving linear regression problems involving an n- real numbers: np-completeness, recursive functions
by-n integer matrix with rank r? and universal machines. Bulletin (New Series) of the
American Mathematical Society, 21(1):1–46, 1989.
Finally, we note that the paper by Eberly et [Cip00] Barry A Cipra. The best of the 20th
al. [EGG 06] that proposed block-Krylov based meth- century: Editors name top 10 algorithms.
ods for matrix inversion also included experimental re- SIAM news, 33(4):1–2, 2000. Available at:
sults that demonstrated good performances as an ex- https://archive.siam.org/pdf/news/637.pdf.
act solver over finite fields. It might be possible to [CKK 18] Michael B. Cohen, Jonathan A. Kelner, Ras-
mus Kyng, John Peebles, Richard Peng, Anup B. Rao,
practically evaluate block-Krylov type methods for solv-
and Aaron Sidford. Solving directed laplacian sys-
ing general systems of linear equations. Here it is
worth remarking that even if one uses naive Θpn3 q time
tems in nearly-linear time through sparse LU factor-
izations. In Mikkel Thorup, editor, 59th IEEE An-
matrix multiplication, both the Eberly et al. algo- nual Symposium on Foundations of Computer Science,
rithm [EGG 07] (when combined with p-adic represen- FOCS 2018, Paris, France, October 7-9, 2018, pages
tations), as well as our algorithm, still take sub-cubic 898–909. IEEE Computer Society, 2018. Available at:
time. https://arxiv.org/abs/1811.10722.
[CKP 17] Michael B. Cohen, Jonathan A. Kelner, John
Acknowldgements Peebles, Richard Peng, Anup B. Rao, Aaron Sidford,
and Adrian Vladu. Almost-linear-time algorithms for
Richard Peng was supported in part by NSF CAREER
markov chains and new spectral primitives for di-
award 1846218, and Santosh Vempala by NSF awards rected graphs. In Hamed Hatami, Pierre McKenzie,
AF-1909756 and AF-2007443. We thank Mark Gies- and Valerie King, editors, Proceedings of the 49th An-
brecht for bringing to our attention the works on block- nual ACM SIGACT Symposium on Theory of Com-
Krylov space algorithms; Yin Tat Lee for discussions puting, STOC 2017, Montreal, QC, Canada, June 19-
on random linear systems; Yi Li, Anup B. Rao, and 23, 2017, pages 410–419. ACM, 2017. Available at:
Ameya Velingker for discussions about high dimensional https://arxiv.org/abs/1611.00755.
concentration and anti-concentration bounds; Mehrdad [CLRS09] Thomas H. Cormen, Charles E. Leiserson,
Ghadiri, He Jia and anonymous reviewers for comments Ronald L. Rivest, and Clifford Stein. Introduction to
on earlier versions of this paper. Algorithms, 3rd Edition. MIT Press, 2009.
[CRW93] Ronald Coifman, Vladimir Rokhlin, and Stephen
Wandzura. The fast multipole method for the wave
References equation: A pedestrian prescription. IEEE Antennas
and Propagation magazine, 35(3):7–12, 1993.
[BA80] Robert R Bitmead and Brian DO Anderson. [CW87] Don Coppersmith and Shmuel Winograd. Matrix
Asymptotically fast solution of Toeplitz and related multiplication via arithmetic progressions. In Alfred V.
systems of linear equations. Linear Algebra and its Aho, editor, Proceedings of the 19th Annual ACM
Applications, 34:103–116, 1980. Symposium on Theory of Computing, 1987, New York,
[BHK20] Avrim Blum, John Hopcroft, and Ravindran Kan- New York, USA, pages 1–6. ACM, 1987.
nan. Foundations of Data Science. Cambridge Univer- [DDH07] James Demmel, Ioana Dumitriu, and Olga Holtz.
sity Press, 2020. Fast linear algebra is stable. Numerische Mathematik,
[BHV08] Erik G. Boman, Bruce Hendrickson, and 108(1):59–91, 2007.
Stephen A. Vavasis. Solving elliptic finite element sys- [DDHK07] James Demmel, Ioana Dumitriu, Olga Holtz,
tems in near-linear time with support preconditioners. and Robert Kleinberg. Fast matrix multiplication is
SIAM J. Numer. Anal., 46(6):3264–3284, 2008. stable. Numerische Mathematik, 106(2):199–224, 2007.
[BJMS17] Alin Bostan, C-P Jeannerod, Christophe [Dix82] John D Dixon. Exact solution of linear equations
Mouilleron, and É Schost. On matrices with dis- using P-adic expansions. Numerische Mathematik,
placement structure: Generalized operators and faster 40(1):137–141, 1982.
algorithms. SIAM Journal on Matrix Analysis and [DMM08] Petros Drineas, Michael W Mahoney, and Shan
Applications, 38(3):733–775, 2017. Muthukrishnan. Relative-error cur matrix decomposi-
[BL94] Bernhard Beckermann and George Labahn. A uni- tions. SIAM Journal on Matrix Analysis and Applica-
form approach for the fast computation of matrix-type tions, 30(2):844–881, 2008. Available at:.
Computation, International Symposium, ISSAC 2006, [KWZ20] Rasmus Kyng, Di Wang, and Peng Zhang.
Genoa, Italy, July 9-12, 2006, Proceedings, pages 63– Packing lps are hard to solve accurately, as-
70, 2006. suming linear equations are hard. In Shuchi
[EGG 07] Wayne Eberly, Mark Giesbrecht, Pascal Giorgi, Chawla, editor, Proceedings of the 2020 ACM-
Arne Storjohann, and Gilles Villard. Faster inversion SIAM Symposium on Discrete Algorithms, SODA
and other black box matrix computations using ef- 2020, Salt Lake City, UT, USA, January 5-8,
ficient block projections. In Symbolic and Algebraic 2020, pages 279–296. SIAM, 2020. Available at:
Computation, International Symposium, ISSAC 2007, https://dl.acm.org/doi/pdf/10.5555/3381089.3381106.
Waterloo, Ontario, Canada, July 28 - August 1, 2007, [Kyn17] Rasmus Kyng. Approximate Gaussian Elimina-
Proceedings, pages 143–150, 2007. tion. PhD thesis, Yale University, 2017. Available at:
[EH10] Herbert Edelsbrunner and John Harer. Computa- http://rasmuskyng.com/rjkyng-dissertation.pdf.
tional topology: an introduction. American Mathemat- [KZ17] Rasmus Kyng and Peng Zhang. Hardness results
ical Soc., 2010. for structured linear systems. In 58th IEEE An-
[GO89] Gene H Golub and Dianne P O’Leary. Some his- nual Symposium on Foundations of Computer Sci-
tory of the conjugate gradient and lanczos algorithms: ence, FOCS 2017, Berkeley, CA, USA, October
1948–1976. SIAM review, 31(1):50–102, 1989. 15-17, 2017, pages 684–695, 2017. Available at:
[Gra06] Robert M Gray. Toeplitz and circulant matrices: A https://arxiv.org/abs/1705.02944.
review. now publishers inc, 2006. [Lan50] Cornelius Lanczos. An iteration method for the
[Gre96] Keith D. Gremban. Combinatorial Preconditioners solution of the eigenvalue problem of linear differential
for Sparse, Symmetric, Diagonally Dominant Linear and integral operators. Journal of Research of the
Systems. PhD thesis, Carnegie Mellon University, National Bureau of Standards, 1950.
Pittsburgh, October 1996. CMU CS Tech Report [LG14] Francois Le Gall. Powers of tensors and fast ma-
CMU-CS-96-123. trix multiplication. In Proceedings of the 39th inter-
[GTVDV96] KA Gallivan, S Thirumalai, Paul Van Dooren, national symposium on symbolic and algebraic com-
and V Vermaut. High performance algorithms for putation, pages 296–303. ACM, 2014. Available at
toeplitz and block toeplitz matrices. Linear algebra http://arxiv.org/abs/1401.7714.
and its applications, 241:343–388, 1996. [LLY11] Lin Lin, Jianfeng Lu, and Lexing Ying. Fast con-
[HS52] Magnus Rudolph Hestenes and Eduard Stiefel. struction of hierarchical matrix representation from
Methods of conjugate gradients for solving linear sys- matrix–vector multiplication. Journal of Computa-
tems, volume 49-1. NBS Washington, DC, 1952. tional Physics, 230(10):4071–4087, 2011.
[HVDH19] David Harvey and Joris Van Der Hoeven. Poly- [LS92] George Labahn and Tamir Shalom. Inversion of
p q
nomial multiplication over finite fields in time o nlogn . toeplitz matrices with only two standard equations.
Available at https://hal.archives-ouvertes.fr/ Linear algebra and its applications, 175:143–158, 1992.
hal-02070816/document, 2019. [LV18] Kyle Luh and Van Vu. Sparse random matrices have
[KKM79] Thomas Kailath, Sun-Yuan Kung, and Martin simple spectrum. arXiv preprint arXiv:1802.03662,
Morf. Displacement ranks of matrices and linear 2018.
equations. Journal of Mathematical Analysis and [MMS18] Cameron Musco, Christopher Musco, and Aaron
Applications, 68(2):395–407, 1979. Sidford. Stability of the lanczos method for matrix
[KLP 16] Rasmus Kyng, Yin Tat Lee, Richard Peng, function approximation. In Proceedings of the Twenty-
Sushant Sachdeva, and Daniel A Spielman. Spar- Ninth Annual ACM-SIAM Symposium on Discrete Al-
sified cholesky and multigrid solvers for connec- gorithms, SODA 2018, New Orleans, LA, USA, Jan-
tion laplacians. In Proceedings of the 48th An- uary 7-10, 2018, pages 1605–1624, 2018. Available at:
nual ACM SIGACT Symposium on Theory of Com- https://arxiv.org/abs/1708.07788.
puting, pages 842–850. ACM, 2016. Available at [NTV17] Hoi Nguyen, Terence Tao, and Van Vu. Random
http://arxiv.org/abs/1512.01892. matrices: tail bounds for gaps between eigenvalues.
[KMP12] Ioannis Koutis, Gary L. Miller, and Probability Theory and Related Fields, 167(3-4):777–
Richard Peng. A fast solver for a class of lin- 816, 2017.
ear systems. Communications of the ACM, [Pan84] Victor Y. Pan. How to Multiply Matrices Faster,
55(10):99–107, October 2012. Available at volume 179 of Lecture Notes in Computer Science.
https://cacm.acm.org/magazines/2012/10/155538-a- Springer, 1984.
fast-solver-for-a-class-of-linear-systems/fulltext. [PS85] Franco P Preparata and Michael Ian Shamos. Com-
functional analysis. J. ACM, 54(4):21, 2007. [XCGL10] Jianlin Xia, Shivkumar Chandrasekaran, Ming
[RV10] Mark Rudelson and Roman Vershynin. Non- Gu, and Xiaoye S Li. Fast algorithms for hierarchically
asymptotic theory of random matrices: extreme singu- semiseparable matrices. Numerical Linear Algebra with
lar values. In Proceedings of the International Congress Applications, 17(6):953–976, 2010.
of Mathematicians 2010 (ICM 2010) (In 4 Volumes) [XXCB14] Yuanzhe Xi, Jianlin Xia, Stephen Cauley, and
Vol. I: Plenary Lectures and Ceremonies Vols. II–IV: Venkataramanan Balakrishnan. Superfast and stable
Invited Lectures, pages 1576–1602. World Scientific, structured solvers for toeplitz least squares via ran-
2010. domized sampling. SIAM Journal on Matrix Analysis
[Saa03] Y. Saad. Iterative Methods for Sparse Linear Sys- and Applications, 35(1):44–72, 2014.
tems. Society for Industrial and Applied Mathematics, [XXG12] Jianlin Xia, Yuanzhe Xi, and Ming Gu. A su-
Philadelphia, PA, USA, 2nd edition, 2003. Available perfast structured solver for toeplitz linear systems via
at http://www-users.cs.umn.edu/˜saad/toc.pdf. randomized sampling. SIAM Journal on Matrix Anal-
[SST06] Arvind Sankar, Daniel A Spielman, and Shang-Hua ysis and Applications, 33(3):837–858, 2012.
Teng. Smoothed analysis of the condition numbers and [Ye11] Yinyu Ye. Interior point algorithms: theory and
growth factors of matrices. SIAM Journal on Matrix analysis, volume 44. John Wiley & Sons, 2011.
Analysis and Applications, 28(2):446–476, 2006. [Zha18] Peng Zhang. Hardness and Tractability For
[ST11] D. Spielman and S. Teng. Spectral sparsification of Structured Numerical Problems. PhD thesis,
graphs. SIAM Journal on Computing, 40(4):981–1025, Georgia Institute of Technology, 2018. Avail-
2011. Available at http://arxiv.org/abs/0808.4134. able at: https://drive.google.com/file/d/
[ST14] D. Spielman and S. Teng. Nearly linear time al- 1KEZNzna-Y7y6rDERKuBFvFuU-hfHUMR3/view.
gorithms for preconditioning and solving symmetric,
diagonally dominant linear systems. SIAM Journal on
Matrix Analysis and Applications, 35(3):835–885, 2014.
Available at http://arxiv.org/abs/cs/0607105.
[Sto05] Arne Storjohann. The shifted number system
for fast linear algebra on integer matrices. Jour-
nal of Complexity, 21(4):609–650, 2005. Available at:
https://cs.uwaterloo.ca/˜astorjoh/shifted.pdf.
[Str69] Volker Strassen. Gaussian elimination is not opti-
mal. Numerische mathematik, 13(4):354–356, 1969.
[SV13] Sushant Sachdeva and Nisheeth K Vishnoi. Faster
algorithms via approximation theory. Theoretical
Computer Science, 9(2):125–210, 2013.
[Tro15] Joel A. Tropp. An introduction to matrix concen-
tration inequalities. Found. Trends Mach. Learn., 8(1-
2):1–230, 2015.
[Tur48] Alan M Turing. Rounding-off errors in matrix
processes. The Quarterly Journal of Mechanics and
Applied Mathematics, 1(1):287–308, 1948.
[TV10] Terence Tao and Van H. Vu. Smooth analysis of
the condition number and the least singular value.
Math. Comput., 79(272):2333–2352, 2010. Available
at: https://arxiv.org/abs/0805.3167.
[Vai89] P. M. Vaidya. Speeding-up linear programming us-
ing fast matrix multiplication. In 30th Annual Sympo-
sium on Foundations of Computer Science, pages 332–
337, Oct 1989.
[Wil12] Virginia Vassilevska Williams. Multiplying matri-
ces faster than coppersmith-winograd. In Proceedings
of the forty-fourth annual ACM symposium on Theory
of computing, pages 887–898. ACM, 2012.