Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

Solving Sparse Linear Systems Faster than Matrix Multiplication

Richard Peng* Santosh Vempala„


Downloaded 06/27/21 to 143.215.38.55. Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/page/terms

Abstract can be reduced to matrix multiplication via divide-and-


Can linear systems be solved faster than matrix multipli- conquer, and this reduction was shown to be stable
cation? While there has been remarkable progress for the when the word size for representing numbers1 is in-
creased by a factor of Oplog nq [DDHK07]. The cur-
special cases of graph structured linear systems, in the gen-

eral setting, the bit complexity of solving an n n linear
system Ax  p q
b is Õ nω , where ω 2.372864 is the ma- rent best runtime of Opnω q with ω 2.372864 [LG14]
trix multiplication exponent. Improving on this has been an follows a long line of work on faster matrix multipli-
open problem even for sparse linear systems with poly n pq cation algorithms [Str69, Pan84, CW87, Wil12, LG14]
condition number.
In this paper, we present an algorithm that solves linear and is also the current best running time for solving
systems in sparse matrices asymptotically faster than matrix Ax  b: When the input matrix/vector are integers,
multiplication for any ω ¡
2. This speedup holds for any
input matrix A with o nω1 log κ A
p { p p qqq matrix multiplication based algorithms can obtain the
exact rational value solution using Opnω q word opera-
non-zeros, where
p q pq
κ A is the condition number of A. For poly n -conditioned
pq
matrices with Õ n nonzeros, and the current value of ω, tions [Dix82, Sto05].
the bit complexity of our algorithm to solve to within any Methods for Matrix inversion or factorization are
{ pq p
1 poly n error is O n2.331645 . q often referred to as direct methods in the linear sys-
Our algorithm can be viewed as an efficient, randomized tems literature [DRSL16]. This is in contrast to it-
implementation of the block Krylov method via recursive erative methods, which gradually converge to the so-
low displacement rank factorizations. It is inspired by the lution. Iterative methods have little space overhead,
algorithm of [Eberly et al. ISSAC ‘06 ‘07] for inverting and therefore are widely used for solving large, sparse,
matrices over finite fields. In our analysis of numerical linear systems that arise in scientific computing. An-
stability, we develop matrix anti-concentration techniques other reason for their popularity is that they are natu-
to bound the smallest eigenvalue and the smallest gap in rally suited to producing approximate solutions of de-
eigenvalues of semi-random matrices. sired accuracy in floating point arithmetic, the de facto
method for representing real numbers. Perhaps the
1 Introduction most famous iterative method is the Conjugate Gra-
Solving a linear system Ax  b is a basic algorithmic dient (CG) / Lanczos algorithm [HS52, Lan50]. It was
problem with direct applications to scientific comput- introduced as an Opn  nnz q time algorithm under ex-
ing, engineering, and physics, and is at the core of al- act arithmetic, where nnz is the number of non-zeros
gorithms for many other problems, including optimiza- in the matrix. However, this bound only holds under
tion [Ye11], data science [BHK20], and computational the Real RAM model where the words have with un-
geometry [EH10]. It has enjoyed an array of elegant ap- bounded precision [PS85, BSS 89]. When taking bit
proaches, from Cramer’s rule and Gaussian elimination sizes into account, it incurs an additional factor of n.
to numerically stable iterative methods to more modern Despite much progress in iterative techniques in the in-
randomized variants based on random sampling [ST11, tervening decades, obtaining analogous gains over ma-
KMP12] and sketching [DMM08, Woo14]. Despite much trix multiplication in the presence of round-off errors
recent progress on faster solvers for graph-structured has remained an open question.
linear systems [Vai89, Gre96, ST14, KMP12, Kyn17],
1 We will be measuring bit-complexity under fixed-point arith-
progress on the general case has been elusive.
Most of the work in obtaining better running time metic. Here the machine word size is on the order of the maximum
number of digits of precision in A, and the total cost is measured
bounds for linear systems solvers has focused on effi- by the number of word operations. The need to account for bit-
ciently computing the inverse of A, or some factoriza- complexity of the numbers naturally led to the notion of condition
tion of it. Such operations are in turn closely related number [Tur48, Blu04]. The logarithm of the condition number
to the cost of matrix multiplication. Matrix inversion measures the additional number of words needed to store A1
(and thus A1 b) compared to A. In particular, matrices with
poly pnq condition number can be stored with a constant factor
* Georgia Tech rpeng@cc.gatech.edu
overhead in precision, and are numerically stable under standard
„ Georgia Tech vempala@gatech.edu
floating point number representations.

Copyright © 2021 by SIAM


504 Unauthorized reproduction of this article is prohibited
The convergence and stability of iterative methods Theorem 1.1. Given a matrix A with max dimension
typically depends on some condition number of the in- n, nnz pAq non-zeros (whose values fit into a single
put. When all intermediate steps are carried out to word), along with a parameter κpAq such that κpAq ¥
precision close to the condition number of A, the run- σmax pAq{σmin pAq, along with a vector b and error re-
ning time bounds of the conjugate gradient algorithm, quirement , we can compute, under fixed point arith-
as well as other currently known iterative methods, de- metic, in time
pend polynomially on the condition number of the input  ! )
Downloaded 06/27/21 to 143.215.38.55. Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/page/terms

ω 2 5ω 4
matrix A. Formally, the condition number of a sym- O max nnz pAq ω1 n2 , n ω 1 log2 pκ{q
metric matrix A, κpAq, is the ratio between the max-
imum and minimum eigenvalues of A. Here the best a vector x such that
known rate of convergence when all intermediate opera-
tions are restricted ato bit-complexity OplogpκpAqq is to }Ax  ΠA b}22 ¤  }ΠA b}22 ,
an error of  in Op κpAq logp1{qq iterations. This is
known to be tight if one restricts to matrix-vector mul- where c is a fixed constant and ΠA is the projection
tiplications in the intermediate steps [SV13, MMS18]. operator onto the column space of A.
This means for moderately conditioned (e.g. with κ   
Note that }ΠA b}2  AT bpAT Aq1 , and when A is
poly pnq), sparse, systems , the best runtime bounds
are still via direct methods, which are stable when square and full rank, it is just }b}2 .
Oplogp1{κqq words of precision are maintained in inter- The cross-over point for the two bounds is at
3pω 1q
mediate steps [DDHK07]. nnz pAq  n ω 1 . In particular, for the sparse case
Many of the algorithms used in scientific computing with nnz pAq  Opnq, and the current best ω ¤
for solving linear systems involving large, space, matri- 2.372864 [LG14], we get an exponent of
ces are based on combining direct and iterative meth- " *
ods: we will briefly discuss this perspectives in Sec- ω  2 5ω  4
ω1 ω 1
max 2 ,
tion 1.3. From the asymptotic complexity perspective,
the practical successes of many such methods naturally maxt2.271595, 2.331645u  2.331645.
leads to the question of whether a one can provably do
better than the Opmintnω , nnz  κpAquq time corre- As n ¤ nnz, this also translates to a running time
5ω 4
of Opnnz ω 1 q, which as 5ω 4 pω2q2
ω 1  ω  ω 1 , refutes
sponding to the faster of direct or iterative methods.
k for constant values of k and any value of ω ¡ 2.
Somewhat surprisingly, despite the central role of this SLTHω
question in scientific computing and numerical analysis, We can parameterize the asymptotic gains over
as well as extensive studies of linear systems solvers, matrix multiplication for moderately sparse instances.
progress on this question has been elusive. The contin- Here we use the Or pq notation to hide lower-order terms,
ued lack of progress on this question has led to its use specifically Opf pnqq denotes Opf pnq  logc pf pnqqq for
r
as a hardness assumption for showing conditional lower some absolute constant c.
bounds for numerical primitives such as linear elasticity
problems [KZ17] and positive linear programs [KWZ20]. Corollary 1.1. For any matrix A with dimension at
One formalization of such hardness is the Sparse Lin- most n, Opnω1θ q non-zeros, and condition number
ear Equation Time Hypothesis (SLTH) from [KWZ20]: nOp1q , a linear system in A can be solved to accuracy
5ω 4 θ pω 2q
SLTHγk denotes the assumption that a sparse linear sys-
nOp1q in time O r pmaxtn ω 1 , nω ω1 uq.
tem with κ ¤ nnz pAqk cannot be solved in time faster
than nnz pAqγ to within relative error   n10k . Here Here the cross-over point happens at θ 
pω1qpω2q . Also, because
5ω 4 pω2q2
ω 1  ω  ω 1 , we can
improving over the smaller running time of both direct
also infer that for any 0 θ ¤ ω  2 and any ω ¡ 2, the
and iterative methods can be succinctly encapsulated as ω 1
mint1 k{2,ω u 2
refuting SLTHk . runtime is opnω q, or asymptotically faster than matrix
In this paper, we provide a faster algorithm for multiplication.
solving sparse linear systems. Our formal result is the
following (we use the form defined in [KWZ20] [Linear 1.1 Idea At a high level, our algorithm follows the
Equation Approximation Problem, LEA]). block Krylov space method (see e.g. Chapter 6.12 of
Saad [Saa03]). This method is a multi-vector extension
2 The hardness results in [KWZ20] were based on SLTH1.99 of the conjugate gradient / Lanczos method, which in
1.5
under the Real RAM model in part due to the uncertain status the single-vector setting is known to be problematic
of conjugate gradient in different models of computation. under round-off errors both in theory [MMS18] and in

Copyright © 2021 by SIAM


505 Unauthorized reproduction of this article is prohibited
practice [GO89]. Our algorithm starts with a set of s convolution algorithms. The performance gains of the
initial vectors, B P <ns , and forms a column space Eberly et al. algorithms [EGG 06, EGG 07] can be
by multiplying these vectors by A repeatedly, m times. viewed as of similar nature, albeit in the more diffi-
Formally, the block Krylov space matrix is cult direction of solving linear systems. Specifically,
  they utilize algorithms for the Padé problem of com-
K  B AB A2 B . . . Am1 B . puting a polynomial from the result of its convolu-
tion [XB90, BL94]. Over finite fields, or under exact
Downloaded 06/27/21 to 143.215.38.55. Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/page/terms

The core idea of Krylov space methods is to efficiently


arithmetic, such algorithms for matrix Padé problems
take Opm log mq block operations [BL94], for a total of
orthogonalize this column space. For this space to be
Ops mq operations..
spanning, block Krylov space methods typically choose r ω
s and m so that sm  n.
The conjugate gradient algorithm can be viewed The overall time complexity follows from two op-
as an efficient implementation of the case s  1, posing goals:
m  n, and B is set to b, the RHS of the input 1. Quickly generate the Krylov space: repeated mul-
linear system. The block case with larger values of s tiplication by A allows us to generate Ai B using
was studied by Eberly, Giesbrecht, Giorgi, Storjohann, Opms  nnz q  Opn  nnz q arithmetic operations.
and Villard [EGG 06, EGG 07] over finite fields, and Choosing a sparse B then allows us to compute
they gave an Opn2.28 q time algorithm for computing the B T Ai B in Opn  sq arithmetic operations, for a to-
inverse of a sparse matrix over a finite field. tal overhead of Opn2 q  Opn  nnz q.
Our algorithm also leverages the top-level insight of
2. Quickly invert the Hankel matrix. Each operation
on an s-by-s block takes Opsω q time. Under the
the Eberly et al. results: the Gram matrix of the Krylov
space matrix (which can be used inter-changeably for
optimistic assumption of O r pmq block operations,
solving linear systems) is a block Hankel matrix. That
r pm  s q.
is, if we view the Gram matrix pAK qT pAK q as an m-
ω
the total is O
by-m matrix containing s-by-s sized blocks, then all the Under these assumptions, and the requirement of n 
blocks along each anti-diagonal are the same: ms, the total cost becomes about Opn  nnz m
s ω
q , which is at most O p n  nnz q as long as m ¡
pAK qT pAK q n
ω 2
ω 1 . However, this runtime complexity is over finite
 
B T A2 B B T A3 B . . . B T Am 1 B fields, where numerical stability is not an issue, instead
 B T A3 B T 4 T m 2 
  . . . B A
...
B . .
...
. B A
...
B 

of over reals under round-off errors, where one must
contend with numerical errors without blowing up the
B T Am 1 B B T Am 2 B . . . B T A2m B bit complexity. This is a formidable challenge; indeed,
with exact arithmetic, the CG method takes time Opn 
Formally, the s-by-s inner product matrix formed from nnz q, but this is misleading since the computation is
Ai B and Aj B is B T Ai j B, and only depends on i j. effective only the word sizes increase by a factor of
So instead of m2 blocks each of size s  s, we are able n (to about n log κ words), which leads to an overall
to represent a n-by-n matrix with about m blocks. complexity of Opn2  nnz  log κq.
Operations involving these m blocks of the Hankel
matrix can be handled using O r pmq block operations.
1.2 Our Contributions Our algorithm can be
This is perhaps easiest seen for computing matrix-vector viewed as the numerical generalization of the algorithms
products using K. If we use tiu to denote the ith block from [EGG 06, EGG 07]. We work with real num-
of the Hankel matrix, that is bers of bounded precision, instead of entries over a finite
Hti,j u  M pi j q
field. The core of our approach can be summarized as:
The block Krylov space method together with
for a sequence of matrices M , we get that the ith block fast Hankel solvers can be made numerically
of the product Hx can be written in block-form as stable using O r pm logpκqq words of precision.
¸ ¸
pHxqtiu  Hti,j u xtj u  M pi j q xt j u .
Doing so, on the other hand, requires developing
j j tools for two topics that have been extensively studied
Note this is precisely the convolution of (a sub-interval) in mathematics, but separately.
of M and x, with shifts indicated by i. Therefore, in 1. Obtain low numerical cost solvers for block Han-
the forward matrix-vector multiplication direction, a kel/Toeplitz matrices. Many of the prior algo-
speedup by a factor of about m is possible with fast rithms rely on algebraic identities that do not

Copyright © 2021 by SIAM


506 Unauthorized reproduction of this article is prohibited
generalize to the block setting, and are often n
(experimentally) numerically unstable [GTVDV96,
Gra06]. 1
2. Develop matrix anti-concentration bounds for an- α 1
alyzing the word lengths of inverses of random α

.. .
Krylov spaces. Such bounds upper bound the prob-

.
..
Downloaded 06/27/21 to 143.215.38.55. Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/page/terms

ability of random matrices being in some set of 1


small measure, which in our case is the set of α 1
nearly singular matrices. Previously, they were
known assuming the matrix entries are indepen- α n
dent [SST06, TV10], while Krylov spaces have cor- 2 α
related columns. 2

.. .
Furthermore, due to the shortcomings of the matrix

.
..
anti-concentration bounds, we modify the solver algo- α
rithm so that it uses a more limited version of the block- 2 α
Krylov space that fall under the cases that could be
analyzed.
Before we describe the difficulties and new tools Figure 1: The difference between matrix anti-
needed, we first provide some intuition on why a factor concentration over finite fields and reals: a matrix that
m increase in word lengths may be the right answer is full rank for all α  0, but is always ill conditioned.
by upper-bounding the magnitudes of entries in a m-
step Krylov space. The maximum magnitude of Am b
is bounded by the max magnitude of A to the power analog of this is more difficult: the Krylov space matrix
of m, times a factor corresponding to the number of K is asymmetric, even for a symmetric matrix A. It is
summands in the matrix product: much easier for an asymmetric matrix with correlated
entries to be close to singular.
} Am b} 8 ¤ p n ~ A~ 8 q m } b} 8 . Consider for example a two-banded, two-block ma-
So the largest numbers in K (as well as AK) can be trix with all diagonal entries set to the same random
O p mq
bounded by pnκq , or Opm log κq words in front of variable α (see Figure 1):
the decimal point under the assumption of κ ¡ n. $
Should such a bound of Opm log κq hold for all
'
'
'
1 if i  j and j ¤ n{2,
'
&α if i  j 1 and j ¤ n{2,
'
'
numbers that arise, including the matrix inversions,
and the matrix B is sparse with Opnq entries, the Aij  α if i  j 1 and n{2 j,
'
2 if i  j 1 and n{2 j,
cost of computing the block-Krylov matrices becomes '
'
'
Opm log κ  ms  nnz q, while the cost of the matrix
'
'
%
inversion portion encounters an overhead of Opm log κq,
0 otherwise.

for a total of O r pm2 sω log κq. In the sparse case of In the exact case, this matrix is full rank unless
nnz  Opnq, and n  ms, this becomes: α  0, even over finite fields. On the other hand, its
2 2 ω
 minimum singular value is close to 0 for all values of α
O n m log κ m s log κ
 ω
because:
(1.1)  O n2 m log κ mnω2 log κ . Observation 1.1. The minimum singular value of
a matrix with 1s on the diagonal, α on the en-
Due to the gap between n2 and nω , setting m appropri- tries immediately below the diagonal, and 0 every-
ately gives improvement over nω when log κ nop1q . where else is at most |α|pn1q , due to the test vector
However, the magnitude of an entry in the inverse r1; α; α2 ; . . . ; pαqn1 s.
depends on the smallest magnitude, or in the matrix
case, its minimum singular value. Bounding and prop- Specifically, in the top-left block, as long as |α| ¡ 3{2,
agating the min singular value, which intuitively cor- the top left block has minimum singular value at most
responds to how close a matrix is to being degenerate, p2{3qn1 . On the other hand, rescaling the bottom-
represents our main challenge. In exact/finite fields set- right block by 1{α to get 1s on the diagonal gives 2{α
tings, non-degeneracies are certified via the Schwartz- on the off-diagonal. So as long as |α| 3{2, this value is
Zippel Lemma about polynomial roots. The numerical at least 4{3, which in turn implies a minimum singular

Copyright © 2021 by SIAM


507 Unauthorized reproduction of this article is prohibited
value of at most p3{4qn1 in the bottom right block. of bounds on eigenvalue separation of random Gaus-
This means no matter what value α is set to, this matrix sians [NTV17] as well as min eigenvalue of random
will always have a singular value that’s exponentially sparse matrices [LV18]. This separation then ensures
close to 0. Furthermore, the Gram matrix of this matrix that the powers of A, A1 , A2 , . . . Am , are sufficiently dis-
also gives such a counter example to symmetric matrices tinguishable from each other. Such considerations also
with (non-linearly) correlated entries. Previous works come up in the smoothed analysis of numerical algo-
on analyzing condition numbers of asymmetric matrices rithms [SST06].
Downloaded 06/27/21 to 143.215.38.55. Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/page/terms

also encounter similar difficulties: a more detailed The randomness of the Krylov matrix induced by
discussion of it can be found in Section 7 of Sankar et the initial set of random vectors B is more difficult to
al. [SST06]. analyze: each column of B affects m columns of the
In order to bound the bit complexity of all interme- overall Krylov space matrix. In contrast, all existing
diate steps of the block Krylov algorithm by O r pmq log κ, analyses of lower bounds of singular values of possibly
we devise a more numerically stable algorithm for solv- asymmetric random matrices [SST06, TV10] rely on the
ing block Hankel matrices, as well as provide a new per- randomness in the columns of matrices being indepen-
turbation scheme to quickly generate a well-conditioned dent. The dependence between columns necessitates an-
block Krylov space. Central to both of our key compo- alyzing singular values of random linear combinations
nents is the close connection between condition number of matrices, which we handle by adapting -net based
and bit complexity bounds. proofs of anti-concentration bounds. Here we encounter
First, we give a more numerically stable solver for an additional challenge in bounding the minimum sin-
block Hankel/Toeplitz matrices. Fast solvers for Han- gular value of the block Krylov matrix. We resolve this
kel (and closely related Toeplitz) matrices have been ex- issue algorithmically: instead of picking a Krylov space
tensively studied in numerical analysis, with several re- that spans the entire <n , we stop things short by picking
cent developments on more stable algorithms [XXG12]. ms  n  O r pmq This set of extra columns significantly
However, the notion of numerical stability studied in simplify the proof of singular value lower bounds. This
these algorithms is the more practical variant where is similar in spirit to the analysis of minimum singular
the number of bits of precision is fixed. As a re- values of random matrices, which is significantly easier
sult, the asymptotic behavior of the stable algorithm for non-square matrices [RV10]. In the algorithm, the
from [XXG12] is quadratic in the number of digits in remaining columns are treated as a separate block that
the condition number, which in our case would trans- we reduce to via a Schur complement at the very end
r pm2 q (i.e., the overall cost
late to a prohibitive cost of O of the block elimination algorithm. Since the block is
ω
would be higher than n ). small, so is its overhead on the running time.
Instead, we combine developments in recur-
sive block Gausssian elimination [DDHK07, KLP 16, 1.3 History and Related Work Our algorithm
CKK 18] with the low displacement rank representa- has close connections with multiple lines of research
tion of Hankel/Toeplitz matrices [KKM79, BA80]. Such on more efficient solvers for sparse linear systems.
representations allow us to implicitly express both the This topic has been extensively studied not only in
Hankel matrix and its inverse by displaced versions of computer science, but also in applied mathematics
rank 2s matrices. This means the intermediate sizes of and engineering. For example, in the Editors of the
instances arising from recursion is Opsq times the di- Society of Industrial and Applied Mathematics News’
mension, for a total size of Opn log nq, giving a total ‘top 10 algorithms of the 20th century’, three of them
of Or pnsω1 q arithmetic operations involving words of (Krylov space methods, matrix decompositions, and
size Or pmq. We provide a rigorous analysis of the accu- QR factorizations) are directly related to linear systems
mulation of round-off errors similar to the analysis of solvers [Cip00].
recursive matrix multiplication based matrix inversion At a high level, our algorithm is a hybrid lin-
from [DDHK07]. ear systems solver. It combines iterative methods,
Motivated by this close connection with the con- namely block Krylov space methods, with direct meth-
dition number of Hankel matrices, we then try to ini- ods that factorize the resulting Gram matrix of the
tialize with Krylov spaces of low condition number. Krylov space. Hybrid methods have their origins in the
Here we show that a sufficiently small perturbation incomplete Cholesky method for speeding up elimina-
suffices for producing a well conditioned overall ma- tion/factorization based direct solvers. A main goal of
trix. In fact, the first step of our proof, that a small these methods is to reduce the Ωpn2 q space needed to
sparse random perturbation to A guarantees good sep- represent matrix factorizations / inverses, which is even
arations between its eigenvalues is a direct combination more problematic than time when handling large sparse

Copyright © 2021 by SIAM


508 Unauthorized reproduction of this article is prohibited
matrices. Such reductions can occur in two ways: either matrices [BHV08] and directed graphs / irreversible
by directly dropping entries from the (intermediate) ma- Markov chains [CKP 17]. However, recent works have
trices, or by providing more succinct representations of also shown that many classes of structures involving
these matrices using additional structures. more than two variables are complete for general linear
The main structure of our algorithm is based on systems [Zha18]. Nonetheless, the prevalence of approx-
the latter line of work on solvers for structured matri- imation errors in such algorithms led to the development
ces. Such systems arise from physical processes where of new ways of bounding numerical/round-off errors in
Downloaded 06/27/21 to 143.215.38.55. Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/page/terms

the interactions between objects have invariances (e.g. algorithms that are critical to our elimination routine
either by time or space differences). Examples of such for block-Hankel matrices.
structure include circulant matrices [Gra06], Toeplitz Key to recent developments in combinatorial pre-
/ Hankel matrices [KKM79, BA80, XXG12, XXCB14], conditioning is matrix concentration [RV07, Tro15].
and distances from n-body simulations [CRW93]. Many Such bounds provide guarantees for (relative) eigenval-
such algorithms require exact preservation of the struc- ues of random sums of matrices. For generating pre-
ture in intermediate steps. As a result, many of these conditioners, such randomness arise from whether each
works develop algorithms over finite fields [BA80, BL94, element is kept, and a small condition number (which
BJMS17]. in turn implies a small number of outer iterations usign
More recently, there has been work on developing the preconditioners) corresponds to a small deviation
more numerically stable variants of these algorithms for between the original and sampled matrices. In con-
structured matrices, or more generally, matrices that trast, we introduce randomness in order to obtain block
are numerically close to being structured [XCGL10, Krylov spaces whose minimum eigen-value is large. As
LLY11, XXG12, XXCB14]. However, these results only a result, the matrix tool we need is anti-concentration,
explicitly discussed the entry-wise Hankel/Toeplitz case which somewhat surprisingly is far less studied. Previ-
(which corresponds to s  1). Furthermore, because ous works on it are mostly related by similar problems
they rely on domain-decomposition techniques similar from numerical precision [SST06, TV10], and mostly ad-
to fast multiple methods, they produce one bit of dress situations where the entries in the resulting matrix
precision per each outer iteration loop. As the Krylov are independent. Our bound on the min singular value
space matrix has condition number exppΩpmqq, such of the random Krylov space can yield a crude bound
methods would lead to another factor of m in the solve for a sum of rectangluar random matrices, but we be-
cost when directly invoked. lieve much better matrix anti-concentration bounds are
Instead, our techniques for handling and bounding possible.
numerical errors are more closely related to recent devel- Due to space constraints, we will only describe the
opments in provably efficient sparse Cholesky factoriza- overall algorithm in Section 2, and give an outline of its
tions [KLP 16, KS16, Kyn17, CKK 18]. These meth- analysis in Section 3. Section 3.2 contains a breakdown
ods generated efficient preconditioners using only the of the main components of the analysis, whose details
condition of intermediate steps of Gaussian eliminat- and proofs can be found in the arXiv version [PV20].
nion, known as Schur complements, having small repre- Some research directions raised by this work, including
sentations. They avoided the explicit generation of the possible improvements and extensions are discussed in
dense representations of Schur complements by treat- Section 4.
ment them as operators, and implicitly applied random-
ized tools to directly sample/sketch the final succinct 2 Algorithm
representations, which have much smaller algorithmic We describe the algorithm, as well as the running times
costs. of its main components in this section. To simplify
On the other hand, previous works on spare discussion, we assume the input matrix A is symmetric,
Choleskfy factorizations required the input matrix to and has poly pnq condition number. If it is asymmetric
be decomposable into a sum of simple elements, of- (but invertible), we implicitly apply the algorithm to
ten through additional combinatorial structure of the AT A, using the identity A1  pAT Aq1 AT derived
matrices. In particular, this line of work on com- from pAT Aq1  A1 AT . Also, recall from the
binatorial preconditioning was initiated through a fo- discussion after Theorem 1.1 that we use O r pq to hide
cus on graph Laplacians, which are built from 2-by- lower order terms in order to simplify runtimes.
2 matrix blocks corresponding to edges of undirected Before giving details on our algorithm, we first dis-
graphs [Vai89, Gre96, ST14, KMP12]. Since then, there cuss what constitutes a linear systems solver algorithm,
has been substantial generalizations to the structures specifically the equivalence between many such algo-
amenable to such approaches, notably to finite element rithms and linear operators.

Copyright © 2021 by SIAM


509 Unauthorized reproduction of this article is prohibited
For an algorithm Alg that takes a matrix B as
input, we say that Alg is linear if there is a matrix BlockKrylov( MatVecA px, δ q: symmetric matrix
ZAlg such that for any input B, we have given as implicit matrix vector muliplication
access, αA : eigenvalue range/separation bounds
Alg pB q  ZAlg . for A that also doubles as error threshold, m:
Krylov step count, )
Downloaded 06/27/21 to 143.215.38.55. Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/page/terms

In this section, in particular in the pseudocode in


1. (FORM KRYLOV SPACE)
Algorithm 2, we use the name of the procedure,
SolveA pb, δ q, interchangeably with the operator correp- (a) Set s Ð tn{mu  Opmq,
sonding to a linear algorithm that solves a system in A, h Ð Opm2 logp1{αA qq. Let GS be an n  s
on vector b, to error δ ¡ 0. In the more formal analysis, random matrix with GSij set to N p0, 1q with
we will denote such corresponding linear operators us- probability nh , and 0 otherwise.
ing the symbol Z, with subscripts corresponding to the
routine if appropriate. (b) (Implicitly) compute the block Krylov space
 
 Am  1 G S
This operator/matrix based analysis of algorithms
K GS AGS A2 GS ... ,
was first introduced in the analysis of recursive Cheby-
shev iteration by Spielman and Teng [ST14], with cred-
2. (SPARSE INVERSE) Use fast solvers for block
its to the technique also attributed to Rohklin. It the
Hankel matrices to obtain a solver for the matrix:
advantage of simplifying analyses of multiple iterations
of such algorithms, as we can directly measure Frobe- M Ð pAK qT pAK q ,
nius norm differences between such operators and the
exact ones that they approximate. and in turn a solve to arbitrary error which we
Under this correspondence, the goal of producing denote SolveM p, q.
an algorithm that solves Ax  b for any b as input
becomes equivalent to producing a linear operator ZA 3. (PAD and SOLVE)
that approximates A1 , and then running it on the
(a) Let r  n  ms denote the number of
remaining columns. Generate a n  r dense
input b. For convenience, we also let the solver take
as input a matrix instead of a vector, in which case the
Gaussian matrix G, use it to complete the
basis as: Q  rK |Gs.
output is the result of solves against each of the columns
of the input matrix.
The high-level description of our algorithm is in Fig- (b) Compute the Schur complement of
ure 2. To keep our algorithms as linear operators, we pAQqT AQ onto its last r  n  ms en-
will ensure that the only approximate steps are from tries (the ones corresponding to the columns
inverting matrices (where condition numbers naturally of G) via the operation
lead to matrix approximation errors), and in forming
operators using fast convolution. We will specify ex- pAGqT AG  pAGqT  AK

plicitly in our algorithms when such round-off errors
occur.  SolveM pAK qT AG, αA10m
Some of the steps of the algorithm require care
for, efficiency, as well as tracking the number of words and invert this r-by-r matrix.
needed to represent the numbers. We assume the (c) Use the inverse of this Schur complement, as
bounds on bit-complexity in the analysis (Section 3) well as SolveM p, q to obtain a solver for
below, which is O r pmq when κ  poly pnq, and use this in QT Q, SolveQT Q p, q.
4. (SOLVE and UNRAVEL) Return the operator Q 
the brief description of costs in the outline of the steps
SolvepAQqT AQ ppAQqT x, αA q as an approximate
below. 10m
We start by perturbing the input matrix, resulting
solver for A.
in a symmetric positive definite matrix where all eigen-
values are separated by αA . Then we explicitly form
a Krylov matrix from sparse Random Gaussians: For Figure 2: Pseudocode for block Krylov space algorithm:
any vector u, we can compute Ai u from Ai1 u via a Solve p, q are operators corresponding to linear system
single matrix-vector multiplication in A. So computing solving algorithms whose formalization we discuss at the
each column of K requires Opnnz pAqq operations, each start of this section.
involving a length n vector with words of length O r pmq.

Copyright © 2021 by SIAM


510 Unauthorized reproduction of this article is prohibited
s s s s Of course, it does not suffice to solve these m s-by-s
blocks independently. Instead, the full algorithm, as
well as the SolveM operator, is built from efficiently
convolving such s-by-s blocks with matrices using Fast
Fourier Transforms. Such ideas can be traced back to
the development of super-fast solvers for (entry-wise)
Hankel/Toeplitz matrices [BA80, LS92, XXCB14].
Downloaded 06/27/21 to 143.215.38.55. Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/page/terms

Choosing s and m so that n  sm would then


give the overal running time, assuming that we
can bound the minimum singular value of K
by exppO r pmqq. This is a major shortcoming of our
. . . Am1 GS
n GS AGS A2 GS analysis: we can only prove such a bound when nsm ¥
Ωpmq. Its underlying cause is that rectangular semi-
random matrices can be analyzed using -nets, and thus
are significantly easier to analyze than square matrices.
This means we can only use m and s such that
n  ms  Θpmq, and we need to pad K with n  ms
columns to form a full rank, invertible, matrix. To this
end we add Θpmq dense Gaussian columns to K to form
Q, and solve the system AQ, and its associated Gram
matrix pAQqT pAQq instead. These matrices are shown
in Figure 4.
Because these additional columns are entry-wise
Figure 3: Randomized m-step Krylov Space Matrix i.i.d, its minimum singular value can be analyzed using
with n-by-s sparse Gaussian GS as starter existing tools [SST06, TV10], namely lower bounding
the dot product of a random vector against any normal
So we get the matrix K, as well as AK, in time vector. Thus, we can lower bound the minimum singular
value of Q, and in turn AQ, by exppO r pmqq as well.
r pnnz pAq  n  mq .
O This bound in turn translates to the minimum
To obtain a solver for AK, we instead solve its eigenvalue of the Gram matrix of AQ, pAQqT pAQq.
Gram matrix pAK qT pAK q. Each block of K T K has Partitioning its entries by those from K and G gives
the form pGS qT Ai GS for some 2 ¤ i ¤ 2m, and can be four blocks: one psmq-by-psmq block corresponding to
computed by multiplying pGS qT and Ai GS . As Ai GS pAK qT pAK q, one Θpmq-by-Θpmq block corresponding
is an n-by-s matrix, each non-zero in GS leads to a to pAGqT pAGq, and then the cross terms. To solve this
cost of Opsq operations involving words of length O r pmq. matrix, we apply block-Gaussian elimination, or equiv-
Then because we chose GS to have O r pm3 q non-zeros per alently, form the Schur complement onto the Θpmq-by-
column, the total number of non-zeros in GS is about Θpmq corresponding to the columns in AG.
Or p s  m3 q  O
r pnm2 q. This leads to a total cost (across To compute this Schur complement, it suffices to
the m values of i) of: solve the top-left block (corresponding to pAK qT pAK q)
 against every column in the cross term. As there are
r n2 m3 .
O at most Θpmq s columns, this solve cost comes out
to less than O r psω mq as well. We are then left with a
Θpmq-by-Θpmq matrix, whose solve cost is a lower order
The key step is then Step 2: a block version of the
Conjugate Gradient method. It will be implemented us-
ing a recursive data structure based on the notion of dis- term.
placement rank [KKM79, BA80]. To get a sense of why So the final solver operator costs

a facter algorithm may be possible, note that there are r nnz pAq  nm
O n2 m 3 nω m2ω
only Opmq distinct blocks in the matrix pAK qT pAK q.
So a natural hope is to invert these blocks by them- which leads to the final running time by choosing m to
selves: the cost of (stable) matrix inversion [DDH07], balance the terms. This bound falls short of the ideal
times the Or pmq numerical word complexity, would then case given in Equation 1.1 mainly due to the need for a
give a total of denser B to the well-conditionedness of the Krylov space
    matrix. Instead of Opnq non-zeros total, or about Opmq
r m2 sω  O  r nω m ω  2 .
ω
O r m2 n O per column, we need poly pmq non-zero variables per
m

Copyright © 2021 by SIAM


511 Unauthorized reproduction of this article is prohibited
column to ensure the an exppOpmqq condition number
of the block Krylov space matrix K. This in turn leads
to a total cost of Opn  nnz  poly pmqq for computing the
blocks of the Hankel matrix, and a worse trade off when
ω
summed against the mnω2 term.

s s s Θp m q 3 Outline of Analysis
Downloaded 06/27/21 to 143.215.38.55. Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/page/terms

In this section we outline our analysis of the algorithm


through formal theorem statements. We start by for-
malizing our tracking of convergence, and the tracking
of errors and roundoff errors.

3.1 Preliminaries We will use capital letters for


matrices, lower case letters for vectors and scalars. All
n AGS A2 GS ... Am GS AG subscripts are for indexing into entries of matrices and
vectors, and superscripts are for indexing into entries of
a sequence.
Norms and Singular Values. Our convergence
bounds are all in terms of the Euclidean, or `2 norms.
For a length
b° n vector x, the norm of x is given by
} x}  1¤i¤n xi . Similarly, for a matrix M , the
2

norm of its entries treated as a vector is known as the


Frobenius norm, and we have
d¸ b
ms Θpmq }M }F  2
Mij  Trace pM T M q.
ij

We will also use ~~ to denote entry-wise norms


over a matrix, specifically }M }8 to denote the max
magnitude of an entry in M . Note that }M }F  ~M ~2 ,
ms pAK qT AK pAK qT AG so we have ~M ~8 ¤ }M }F ¤ n ~M ~8 .
The minimum and maximum singular values of a
matrix M are then defined as the min/max norms of its
product against a unit vector:

σmin pM q  min
} M x} 2 σmax pM q  max
} M x} 2 ,
x } x} 2 x } x} 2
Θp m q pAGqT AK pAGqT AG and the condition number of M is defined as κpM q 
σmax pM q{σmin pM q.
Bounds on the minimum singular value allows us to
transfer perturbation errors to it to its inverse.
n
Lemma 3.1. If M is a full rank square matrix with min
and max singular values in the range rσmin , σmaxs, and
Figure 4: Full matrix AQ and its Associated Gram Ma- M €  M  ¤ 
€ is some approximation of it such that M
trix pAQqT pAQq. Note that by our choice of parameters for some  σ {2, then F

m is much smaller than s  n{m.


min

€ are in the range rσmin


1. All singular values in M 
, σmax s, and
€ is close to the inverse of M :
2. The inverse of M
 
 € 1
M
2 .
 M 1 F ¤ 10σmin

Copyright © 2021 by SIAM


512 Unauthorized reproduction of this article is prohibited
Proof. The bound on singular values follows from the Randomization and Normal Distributions
norm minimization/maximization definition of singular Our algorithms rely on randomly perturbing the input
values. Specifically, we get that for a unit vector x, matrices to make them non-degenerate, and much of
     our analysis revolving analyzing the effect of such per-
  M x  } M x } 2  ¤  M 
 €    € 
M x turbations on the eigenvalues. We make use of stan-
2 2
  dard notions of probability, in particular, the union
¤ M €  M  }x} ¤ , bound, which states that for any two events E1 and
Downloaded 06/27/21 to 143.215.38.55. Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/page/terms

E2 , Pr rE1 Y E2 s ¤ Pr rE1 s Pr rE2 s.


2
2

which means all singular values can change by at most Such a bound means that it suffices to show that
. the failure probability of any step of our algorithm is
nc for some constant c. The total number of steps is
Note that this implies that M € is invertible. For the
bounds on inverses, note that poly pnq, so unioning over such probabilities still give a
 success probability of at least 1  nc Op1q .
€ 1  M  1  M 1 M M
M € 1  I We will perturb our matrices using Gaussian ran-
 variables. These random variables N p0, σ q have
 M 1 M  M € 1 .
dom
€ M
density function g pxq  σ?12π ex {2σ . They are par-
2 2

 
 € 1 
 ¤ the sum of Gaussians is another Gaussian, with variance
ticularly useful for showing anti-concentration because
So applying bounds on norms, as well as M
pσmin  q1 ¤ 2σmin 1 gives 2
equalling to the sum of squares, or `22 -norm, of the vari-
  ance terms. That is, for a vector x and a (dense) Gaus-
 € 1
M  M F  1 sian vector with entry-wise i.i.d. N p0, 1q normal random
 1      variables, aka. g  N p0, 1qn , we have xT g  N p0, }x}2 q.
¤ M 2 M  M 
€  M  
€  ¤ 2σ
1   2
The density function of Gaussians means that
min .
F 2
their magnitude exceed n with probability at most
Opexppn2 qq. This probability is much smaller than
the nc failure probabilities that we want, so to sim-
Error Accumulation. Our notion of approximate
plify presentation we will remove it at the start.
operators also compose well with errors.
Lemma 3.2. If Z p1q and Z p2q are linear operators with
Claim 3.1. We can analyze our algorithm condition-
r p 1 q r p 2q ing on any normal random variable with variance σ,
N p0, σ q, having magnitude at most nσ.
(algorithmic) approximations Z and Z such that
for some  0.1, we have
   
 p 1q r p1q  , Z p2q  Z r p2q  ¤ 
Z  Z
Tracking Word Length. The numerical round-
F F ing model that we will use is fixed point precision. The
advantage of such a fixed point representation is that
then their product satisfies
  !     ) it significantly simplifies the tracking of errors during
 p 1q p 2q p 1q r p2q   p1q   p2q 
Z Z  Z Z  ¤ 10 max 1, Z  , Z  . additions/subtractions. The need to keep exact opera-
r
F 2 2 tors means we cannot omit intermediate digits. Instead,
Proof. Expanding out the errors gives we track both the number of digits before and after the
decimal point.
p 1q p2q
Z Z Z Z r p 1q r p2q
The number of trailing digits, or words after the
 
p 1q
 Z  Zr Z p 1q p 2q p
Z Z 2q r p 2 q Z p 1q decimal point, compound as follows:
  1. adding two numbers with L1 and L2 words after
Z p1q  Z r p1q Z p2q  Z r p2q
the decimal point each results in a number with
maxtL1 , L2 u words after the decimal point.
The terms involving the original matrix against the
error gets bounded by the error times the norm of the 2. multiply two numbers with L1 and L2 words after
original matrix. For the cross term, we have the decimal point each results in a number with
   L1 L2 words after the decimal point.
 p1q r p1q
 Z Z Z p2q  Z r p2q 
    As maxtL1 , L2 u ¤ L1 L2 when L1 and L2 are
 p 1q p 1q   p2q p 2q 
¤ Z  Z F  Z  Z 2 ¤  ,
r r 2 non-negative, we will in general assume that when we
multiply matrices with at most L1 and L2 words after
which along with  0.1 gives the overall bound. the decimal point, the result has at most L1 L2 words

Copyright © 2021 by SIAM


513 Unauthorized reproduction of this article is prohibited
after the decimal point. In particular, if Z is an operator 1. A can be perturbed so that its eigenvalues are
with LZ words after the decimal point, and its input B separated (Theorem 3.1).
has LB words after the decimal point, the output has
2. Such a separation implies a well-conditioned Krylov
at most LZ LB words after the decimal point.
space, when it is initialized with sparse random
Note that both of these bounds are for exact compu-
Gaussian vectors. (Theorem 3.2)
tations. The only round off errors come from round-off
errors by dropping some of the digits, as the matrices 3. This Krylov matrix can be solved efficiently using a
Downloaded 06/27/21 to 143.215.38.55. Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/page/terms

themselves are created. combination of low displacement rank solvers and


On the other hand, we need to bound the maximum fast convolutions (Theorem 3.3).
magnitude of our operators. The number of digits before
the decimal point is given by bounds on the magnitude 4. The last few rows/columns can be solved efficiently
of the numbers themselves. Such bounds also propagate via the Schur complement (Lemma 3.4).
nicely along multiplications. Due to space constraints, we refer to the reader to
the full version of the paper [PV20] for proofs of these
Lemma 3.3. If the maximum magnitude of entries in components.
two matrices Y and Z with dimension at most n are Anti-Concentration of semi-random matri-
both at most α, then all entries in Y Z have magnitude ces. A crucial part of our analysis is bounding the
at most nα2 as well. spectrum of semi-random matrices. Unfortunately, the
highly developed literature on spectral properties of ran-
Proof. dom matrices assumes independent entries or indepen-
  dent columns, which no longer hold in the semi-random
  ¸  ¸

p
 YZ q 
ij 
 
 Yik Zkj 
 
¤ |Yik | |Zkj | ¤ nα2 case of K. However, getting tight estimates is not im-
k k portant for us (the running time is affected only by the
logarithm of the gap/min value). So we adapt meth-
ods from random matrix theory to prove sufficient anti-
concentration.
Throughout our analyses, we will often rescale the
matrices so that their max magnitudes are n2 . This
Specifically, after symmetrizing the potentially
asymmetric input A by implicitly generating the op-
allows us to absorb any constant factor increases in mag- erator AT A, we need to bound (1) the minimum eigen-
nitudes from multiplying these matrices by Lemma 3.3
because pc  n2 q2 ¤ n2 .
value gap of the coefficient matrix after perturbation by
a symmetric sparse matrix and (2) the minimum sin-
Finally, by doing FFT based fast multiplications gular value of the block Krylov matrix constructed by
for all numbers involved [CLRS09, HVDH19], we can multiplying with a sparse random matrix.
multiple two numbers with an Oplog nq factor in their The first step of showing eigenvalue separation is
lengths. This means that when handling two matrices needed because if A has a duplicate eigenvalue, the
with L1 and L2 words after the decimal point, and resulting Krylov space in it has rank at most n  1.
whose maximum magnitude is µ, the overhead caused We obtain such a separation by perturbing the matrix
by the word-lengths of the numbers involved is O r pµ
randomly: its analysis follows readily from recent results
L1 L2 ) on separations of eigenvalues in random matrices by Luh
Random Variables and Probability For our and Vu [LV18].
perturbations we use standard Gaussian N p0, 1q ran-
dom variables. Theorem 3.1. For any n  n symmetric positive defi-
nite matrix A with:
Fact 3.1. For x  N p0, 1q, we have Prp|x| ¥ tq ¤
1. entries at most 1{n,
?2 et2 {2 .
2. eigenvalues at least 1{κ for some κ ¥ n3 ,
t 2π

Thus, with probability at least 1  exp?pn{2q, a stan-


and any probability where
dard Gaussian variable is bounded by n. We will use
Opn2 q such variables and condition
? on the event that all p¥
300 log κ log n
their norms are bounded by n. n
the symmetrically random perturbed matrix A defined as
3.2 Main Technical Ingredients The main tech- #
Āij n12 κ N p0, 1q
Aij  Aji 
nical components of the analysis can be summarized as def w.p. p,
follows: Āij w.p. 1  p,

Copyright © 2021 by SIAM


514 Unauthorized reproduction of this article is prohibited
with probability at least 1  n10 has all eigenvalues such that every contiguous square block-aligned minor
separated by at least κ5 log n . of H containing the top-right or bottom left corner have
minimum eigenvalue at least αH :
Given this separation, we show that a random n-  
by-s B gives a m-step Krylov space matrix, as long σmin Ht1:i,pmi 1q:mu , σmin Htpmi 1q:m,1:iu ¥ αH
as n  ms  Ωpmq. Furthermore, we pick this B to
be sparse in the columns, so we can quickly compute for all 1 ¤ i ¤ m, and all entries in H have magnitude
at most psmq2 αH 1 , then for any error , we can
Downloaded 06/27/21 to 143.215.38.55. Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/page/terms

B T Ai B.
pre-process H in time O r pmsω logpα1 1 qq to form
Theorem 3.2. Let A be an n  n symmetric positive Solve p, q that corresponds to a linear operator Z
H

definite matrix with entries at most 1{n, and αA n  10 H H


such that:
a parameter such that:
1. For any pmsq  k matrix B with max magni-
tude ~B ~8 and LB words after the decimal point,
1. all eigenvalues of A are at least αA , and
2. all pairs of eigenvalues of A are separated by at least SolveH pB, q returns ZH B in time
αA . (
r m  max sω1 k, s2 k ω2 
0.01
¤ m ¤ n4 1 O
 
p1 ~B ~8 q ms L
Let s and m be parameters such that n
and s  m ¤ n  5m. The n-by-s sparse Gaussian matrix log .
GS where each entry is set to N p0, 1q with probability
B
αH 
at least
10000m3 log p1{αA q 2. ZH is a high-accuracy approximation to H 1 :
n  
leads to the Krylov space matrix ZH  H 1  ¤ .
F
 
K  GS AGS A2 GS ... Am1 GS . 3. the entries of ZH have at most
Oplog2 m logpαH1 1 qq words after the decimal
With probability at least 1  n2 (over randomness in
point.
GS ), K has maximum singular value at most n2 , and
5m
minimum singular value at least αA .
The overhead of log2 m in the word lengths of ZH
These bounds allow us to bound the length of is from the Oplog nq layers of recursion in Fast Fourier
numbers, and in turn running time complexity of solving Transform times the Oplog nq levels of recursion in the
pAK qT AK. divide-and conquer block Schur complement algorithm.
Solvers for block Hankel matrices. An impor- Note that the relation between the ith block of H 
T
tant ingredient in our algorithm is a numerically effi- K K and K itself is:
cient solver for block Hankel matrices. For this we use
the notion of displacement rank by Kailath, Kung and Ht1:i,pni 1q:mu  KtT:,1:iu Kt:,pni 1q:mu
Morf [KKM79]. Its key statement is that any Schur  KtT:,1:iu Ami1 Kt:,1:iu .
Complement of a Toeplitz Matrix has displacement rank
2, and can be uniquely represented as the factorization So the min/max singular values of these matrices off by
of a rank 2 matrix. This combined with the fact that a factor of at most αA m
from the singular value bounds
the displacement rank of the inverse is the same as that on H itself.
of the matrix is used to compute the Schur comple- It remains to pad K with random columns to make
ment in the first super-fast/asymptotically fast solvers it a square matrix, Q. We bound the condition number
for Toeplitz matrices by Bitmead and Anderson [BA80]. of this matrix, and turn a solver for M  pAK qT pAK q
Here we extend this to block Toeplitz/Hankel matrices to one for pAQqT pAQq. Specifically, combining Theo-
and prove numerical stability, using the natural preser- rems 3.2 and 3.3 leads to:
vation of singular values of Schur complements. Specif-
ically, the analysis from Demmel, Dumitriu, Holtz and Lemma 3.4. Let A be an n  n symmetric positive
Kleinberg [DDHK07] can be readily adapted to this set- definite matrix with entries at most 1{n, and 0 αA
ting. n10 a parameter such that:

Theorem 3.3. If H is an sm  sm symmetric s-block- 1. all eigenvalues of A are at least αA , and at most
Hankel matrix and 0 αH psmq100 is a parameter 1 ,
αA

Copyright © 2021 by SIAM


515 Unauthorized reproduction of this article is prohibited
2. all pairs of eigenvalues of A are separated by at least LinearEquationApproximation( A, b: integer
αA , matrix/vector pair, κ: condition number bound
for A, : error threshold. )
3. all entries in A have magnitude at most αA , and at
most Oplogp1{αA qq words after the decimal point. 1. Compute θA Ð ~ A~ 8 .
For any parameter m such that n ¤ n ¤ 0.01n ,
0.01 0.2
2. Generate random symmetric matrix R with
Downloaded 06/27/21 to 143.215.38.55. Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/page/terms

the routine BlockKrylov as shown in Figure 2 pre-


processes A in time: Rij  Rji  n10κ2 N p0, 1q
1. Opnq matrix-vector multiplications of A against
vectors with at most Opm logp1{αA qq words both with probability p p {q
O log κ  log n q.
n
before and after the decimal point,
3. Implicitly generate
2. plus operations that cost a total of:
 r  1
n m  log p1{αA q .
AT A
 
r n2 m3 log2 1 αA
O p{ q ω 2 ω A 2
n4 θA
R

and obtains a routine SolveA that when given a vector and its associated matrix-multiplication operator
b with Lb words after the decimal point, returns ZA b in MatVecAr p, δ q.
time  r via
r n2 m  plog p1{αA q ~b~
8 Lb q ,
4. Build solver for A
O
for some ZA with at most Oplogp1{αA qq words after the SolveAr Ð BlockKrylov MatVecAr p, δ q ,
ω 2
decimal point such that
n8 κ2  1 5 log n 2
, n 1 nnz pAq ω 1 .
ω
ω
 
 ZA  A 1  F
¤ αA .
5. Return
Note that we dropped  as a parameter to simplify 
the statement of the guarantees. Such an omission is 4
1
n θA2  SolveAr AT b .
acceptable because if we want accuracy less than the
eigenvalue bounds of A, we can simply run the algorithm
with αA Ð  due to the pre-conditions holding upon Figure 5: Pseudocode for block Krylov space algorithm
αA decreasing. With iterative refinement (e.g. [Saa03],
it’s also possible to lower separate the dependence on
logp1{q to linear instead of the cubic dependence on Therefore its Frobenius norm, and in turn max singular
logp1{αH q. value, is at most 1. On the other hand, the max
singular of A is at least ~A~8  θA : consider the
3.3 Proof of Main Theorem It remains to use unit vector that’s 1 in the entry corresponding to the
the random perturbation specified in Theorem 3.1 and column containing the max magnitude entry of A, and
taking the outer product to reduce a general system to 0 everywhere else. This plus the bound on condition
the symmetric, eigenvalue well separated case covered number of κ gives that the minimum singular value of
in Lemma 3.4. A is at least

σmin pAq ¥ σmax pAq ¥


The overall algorithm then takes a symmetrized 1 θA
,
version of the original matrix A, perturbs it, and then κ κ
converts the result of the block Krylov space method which coupled with the rescaling by 1 gives
n2 θ A
back. Its pseudocode is in Figure 5

p be the copy of A scaled σmin A p ¥ 1  θA  1 .
Proof. (Of Theorem 1.1) Let A n2 θA κ n2 κ
down by n2 θA :
The matrix that we pass onto the block Krylov
r is then the outer-product of A p plus a
p  1
A 2
1 method, A,
n } A} 8
A 2
A.
n θA sparse random perturbation R with each entry is set
(symmetrically when across the diagonal) to n4 κ N p0, 1q
This rescaling gives us bounds on both the maximum with probability Oplog n logpκ{qq{n
and minimum entries of A. p The rescaling ensures that
the max magnitude of an entry in A p is at most 1{n2 . r ÐA
A pT A
p R.

Copyright © 2021 by SIAM


516 Unauthorized reproduction of this article is prohibited
r is symmetric. Furthermore, by Claim 3.1, The input on the other hand has
This matrix A
we may assume that the max magnitude of an entry in  1 T
R is at most n9 κ2 , which gives ΠA b  A AT A A b,

}R}F ¤ n8κ2 . so the error after multiplication by A is


  1 T
Z A
1 pT Ap
So we also get that the max magnitude of an entry in A A b,
 2
Downloaded 06/27/21 to 143.215.38.55. Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/page/terms

4
n θA
r 2
A is still at most 2n . Taking this perturbation bound
which incorporating the above, as well as }A}2 ¤ nθA
into Lemma 3.1 also gives that all eigenvalues of A r are
in the range   gives
1    
,1 .     
ZA b  πA b ¤ 3 AT b2 .
n5 κ2 A 1 T
 4
n θA2 n θA
By Theorem 3.1, the minimum eigenvalue separa- 2
r
tion in this perturbed matrix A is at least
On the other hand, because the max eigenvalue of AT A
n8 κ2 1
 5 log n is at most }A}F ¤ n2 θA
2 2
, the minimum eigenvalue of
. T  1 
pA Aq is at least θA . So we have
2

Also, by concentration bounds on the number of entries    


picked in R, its number of non-zeros is with high prob- }ΠA b}2  AT bpAT Aq1 ¥ n21θ AT b2 .
ability at most Opn log n logpκn{qq  O r pn logpκ{qq. A

As we only want an error of , we can round all Combining the two bounds then gives that the error in
entries in A to precision {κ without affecting the quality the return value is at most  }ΠA b}2 .
of the answer. For the total running time, the number of non-zeros
So we can invoke Lemma 3.4 with in R implies that the total cost of mutliplying A r against
 5 log n a vector with Opm logp1{αA qq  Opm log n logpκ{qq 
r r
αAr  n8 κ2 1 , r pm logpκ{qq is
O
which leads to a solve operator ZAr such that  
Or pnnz pAq nq m log2 pκ{q ¤ O r nnz pAq m log2 pκ{q ,
 
8 2 1 5 log n


 ZA r 1 
r  A  ¤ αA r ¤ n κ  ¤ n40κ10 . where the inequality of nnzpAq ¤ n follows pre-
F

The error conversion lemma from Lemma 3.1 along with processing to remove empty rows and columns. So the
the condition that the min-singular value of A r is at least total construction cost given in Lemma 3.4 simplifies to

nω m2ω log pκ{q


1
O n2 m3 log2 pκ{q
n5 κ2 implies that
 
 1
ZA  r 
A ¤ n30κ6 n  nnz pAq  m log pκ{qq
r F

or factoring into the bound on the size of R via triangle The input vector, θ1Y y has max magnitude at most
inequality: 1, and can thus be rounded to Oplogpκ{qq words after
 
 1 p T p 
  ¤ 8 4,
2 the decimal point as well. This then goes into the solve
 A
Z r A A
F n κ cost with logp~b~8 q Lb ¤ Oplogpnκ{qq, which gives a
which when inverted again via Lemma 3.1 and the min r pn2 m logpκ{qq, which is a lower order term
total of O
singular value bound gives compared to the construction cost. The cost of the
  additional multiplication in A is also a lower order term.
  1 
Z r  A pT A
p  ¤  . Optimizing m in this expression above based on
n and nnz pAq gives that we should choose m so
 A  n 2
F only
It remains to propagate this error across the rescal- that
p  21 A, we have (
ing in Step 5. Since A n θA max n  nnz pAq m, n2 m3  nω m2ω ,
 1  1
AT A  n41θ2 pT A
A p , or (
A max n  nnz pAq mω1 , n2 mω 1
 nω .
and in turn the error bound translates to The first term implies

 1  1  

 n4 θ 2 Z  p Tp
A A 
 ¤ n4θ2 . m ¤ nω1  nnz pAq
1 ω 1 1
 n  nnz pAq ω
1
1
A F A

Copyright © 2021 by SIAM


517 Unauthorized reproduction of this article is prohibited
while the second term implies 3. Are there general purpose bounds on the min
2 singular value of a sum of random matrices, akin
m ¤ nω
ω
1 . to matrix concentration bounds (which focus on the
max singular value) [RV10, Tro15].
Substituting the minimum of these two bounds back
into nω m2ω and noting that 2  ω ¤ 0 gives that the
The connections with random matrix theory can also be
total runtime dependence on n and nnz pAq is at most
leveraged in the reverse direction:
Downloaded 06/27/21 to 143.215.38.55. Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/page/terms

4. Can linear systems over random matrices with i.i.d.


! pω2qp2ωq p2ωq )
n  max n ω 1 , n
ω 2  ω
 nnz pAq ω 1 entries be solved faster?
! ω 2
)
 max n 5ωω 14 , n2  nnz pAq ω 1 . zeros set to 1 independently. Such matrices have con-
An interesting case here is sparse matrices with non-

dition number Θpn2 q with constant probability [RV10],


Incorporating the trailing terms, then gives the the which means that the conjugate gradient algorithm has
bound stated in Theorem 1.1, with c set to 2 plus the bit complexity Opn  nnz q on such systems. There-
number of log factors hidden in the O. r fore, we believe these matrices present a natural start-
ing point for investigating the possibility of faster algo-
4 Discussion rithms for denser matrices with nnz ¡ Ωpnω1 ).
We have presented a faster solver for linear systems with Numerical Algorithms. The bounded precision
moderately sparse coefficient matrices under bounded solver for block Hankel matrices in Theorem 3.3 is
word-length arithmetic with logarithmic dependence on built upon the earliest tools for speeding up solvers
the condition number. This is the first separation be- for such structured matrices [KKM79, BA80], as well
tween the complexity of matrix multiplication and solv- as the first sparsified block Cholesky algorithm for
ing linear systems in the bounded precision setting. solving graph Laplacians [KLP 16]. We believe the
While both our algorithm and analysis are likely im- more recent developments in solvers for Hankel/Toeplitz
provable, we believe they demonstrate that there are matrices [XXCB14] as well as graph Laplacians [KS16]
still many sparse numerical problems and algorithms can be incorporated to give better and more practical
that remain to be better understood theoretically. We routines for solving block-Hankel/Toeplitz matrices.
list a few avenues for future work. 5. Is there a superfast solver under bounded precision
Random Matrices. The asymptotic gap between for block Hankel/Toeplitz matrices that does not
our running time of about n2.33 and the Opn2.28 q use recursion?
running time of computing inverses of sparse matrices
over finite fields [EGG 06] is mainly due to the overhead It would also be interesting to investigate whether recent
our minimum singular value bound from Theorem 3.2, developments in randomized numerical linear algebra
specifically the requirement of Ωpm3 q non-zeros per can work for Hankel / Toeplitz matrices. Some possible
column on average. We conjecture that a similar bound questoins there are:
holds for O r pmq non-zeros per column, and also in the 6. Can we turn m  sω into Opms2 sω q using
full Krylov space case. more recent developments sparse projections (e.g.
CountSketch / sparse JL / sparse Gaussian instead
1. Can we lower bound the min singular value of
of a dense Gaussian).
a block Krylov space matrix generated from a
random matrix with O r pmq non-zeros per column? 7. Is there an algorithm that takes a rank r fac-
torization of I  XY P <nn , and computes
2. Can we lower bound the min singular value of a in time O r pnpolypr qq the a rank r factoriza-
block Krylov space matrix where m  s  n for tion/approximation of I  Y X?
general values of s (block size) and m (number of
steps)? Another intriguing question is the extensions of
this approach to the high condition number, or exact
The second improvement of removing the additional integer solution, setting. Here the current best run-
Ωpmq columns would not give asymptotic speedups. ning time bounds are via p-adic representations of frac-
It would however remove the need for the extra tions [Dix82], which are significantly less understood
steps (padding with random Gaussian columns) in compared to decimal point based representations. In
Lemma 3.4. Such a bound for the square case would the dense case, an algorithm via shifted p-adic num-
likely require developing new tools for analyzing matrix bers by Storjohann [Sto05] achieves an Opnω q bit com-
anti-concentration. plexity. Therefore, it is natural to hope for a similar

Copyright © 2021 by SIAM


518 Unauthorized reproduction of this article is prohibited
Or pn  nnz q bit complexity algorithm for producing ex- padé approximants. SIAM Journal on Matrix Analysis
act integer solutions. A natural starting point could be and Applications, 15(3):804–823, 1994.
the role of low-rank sketching in solvers that take ad- [Blu04] Lenore Blum. Computing over the reals: Where
vantage of displacement rank, i.e., extending the p-adic turing meets newton. Notices of the AMS, 51(9):1024–
algorithms to handle low rank matrices: 1034, 2004.
[BSS 89] Lenore Blum, Mike Shub, Steve Smale, et al.
8. Is there an Opn  rω1 q time algorithm for exactly On a theory of computation and complexity over the
Downloaded 06/27/21 to 143.215.38.55. Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/page/terms

solving linear regression problems involving an n- real numbers: np-completeness, recursive functions
by-n integer matrix with rank r? and universal machines. Bulletin (New Series) of the
American Mathematical Society, 21(1):1–46, 1989.
Finally, we note that the paper by Eberly et [Cip00] Barry A Cipra. The best of the 20th
al. [EGG 06] that proposed block-Krylov based meth- century: Editors name top 10 algorithms.
ods for matrix inversion also included experimental re- SIAM news, 33(4):1–2, 2000. Available at:
sults that demonstrated good performances as an ex- https://archive.siam.org/pdf/news/637.pdf.
act solver over finite fields. It might be possible to [CKK 18] Michael B. Cohen, Jonathan A. Kelner, Ras-
mus Kyng, John Peebles, Richard Peng, Anup B. Rao,
practically evaluate block-Krylov type methods for solv-
and Aaron Sidford. Solving directed laplacian sys-
ing general systems of linear equations. Here it is
worth remarking that even if one uses naive Θpn3 q time
tems in nearly-linear time through sparse LU factor-
izations. In Mikkel Thorup, editor, 59th IEEE An-
matrix multiplication, both the Eberly et al. algo- nual Symposium on Foundations of Computer Science,
rithm [EGG 07] (when combined with p-adic represen- FOCS 2018, Paris, France, October 7-9, 2018, pages
tations), as well as our algorithm, still take sub-cubic 898–909. IEEE Computer Society, 2018. Available at:
time. https://arxiv.org/abs/1811.10722.
[CKP 17] Michael B. Cohen, Jonathan A. Kelner, John
Acknowldgements Peebles, Richard Peng, Anup B. Rao, Aaron Sidford,
and Adrian Vladu. Almost-linear-time algorithms for
Richard Peng was supported in part by NSF CAREER
markov chains and new spectral primitives for di-
award 1846218, and Santosh Vempala by NSF awards rected graphs. In Hamed Hatami, Pierre McKenzie,
AF-1909756 and AF-2007443. We thank Mark Gies- and Valerie King, editors, Proceedings of the 49th An-
brecht for bringing to our attention the works on block- nual ACM SIGACT Symposium on Theory of Com-
Krylov space algorithms; Yin Tat Lee for discussions puting, STOC 2017, Montreal, QC, Canada, June 19-
on random linear systems; Yi Li, Anup B. Rao, and 23, 2017, pages 410–419. ACM, 2017. Available at:
Ameya Velingker for discussions about high dimensional https://arxiv.org/abs/1611.00755.
concentration and anti-concentration bounds; Mehrdad [CLRS09] Thomas H. Cormen, Charles E. Leiserson,
Ghadiri, He Jia and anonymous reviewers for comments Ronald L. Rivest, and Clifford Stein. Introduction to
on earlier versions of this paper. Algorithms, 3rd Edition. MIT Press, 2009.
[CRW93] Ronald Coifman, Vladimir Rokhlin, and Stephen
Wandzura. The fast multipole method for the wave
References equation: A pedestrian prescription. IEEE Antennas
and Propagation magazine, 35(3):7–12, 1993.
[BA80] Robert R Bitmead and Brian DO Anderson. [CW87] Don Coppersmith and Shmuel Winograd. Matrix
Asymptotically fast solution of Toeplitz and related multiplication via arithmetic progressions. In Alfred V.
systems of linear equations. Linear Algebra and its Aho, editor, Proceedings of the 19th Annual ACM
Applications, 34:103–116, 1980. Symposium on Theory of Computing, 1987, New York,
[BHK20] Avrim Blum, John Hopcroft, and Ravindran Kan- New York, USA, pages 1–6. ACM, 1987.
nan. Foundations of Data Science. Cambridge Univer- [DDH07] James Demmel, Ioana Dumitriu, and Olga Holtz.
sity Press, 2020. Fast linear algebra is stable. Numerische Mathematik,
[BHV08] Erik G. Boman, Bruce Hendrickson, and 108(1):59–91, 2007.
Stephen A. Vavasis. Solving elliptic finite element sys- [DDHK07] James Demmel, Ioana Dumitriu, Olga Holtz,
tems in near-linear time with support preconditioners. and Robert Kleinberg. Fast matrix multiplication is
SIAM J. Numer. Anal., 46(6):3264–3284, 2008. stable. Numerische Mathematik, 106(2):199–224, 2007.
[BJMS17] Alin Bostan, C-P Jeannerod, Christophe [Dix82] John D Dixon. Exact solution of linear equations
Mouilleron, and É Schost. On matrices with dis- using P-adic expansions. Numerische Mathematik,
placement structure: Generalized operators and faster 40(1):137–141, 1982.
algorithms. SIAM Journal on Matrix Analysis and [DMM08] Petros Drineas, Michael W Mahoney, and Shan
Applications, 38(3):733–775, 2017. Muthukrishnan. Relative-error cur matrix decomposi-
[BL94] Bernhard Beckermann and George Labahn. A uni- tions. SIAM Journal on Matrix Analysis and Applica-
form approach for the fast computation of matrix-type tions, 30(2):844–881, 2008. Available at:.

Copyright © 2021 by SIAM


519 Unauthorized reproduction of this article is prohibited
[DRSL16] Timothy A Davis, Sivasankaran Rajamanickam, [KS16] Rasmus Kyng and Sushant Sachdeva. Approxi-
and Wissam M Sid-Lakhdar. A survey of direct mate gaussian elimination for laplacians - fast, sparse,
methods for sparse linear systems. Acta Numerica, and simple. In Irit Dinur, editor, IEEE 57th An-
25:383–566, 2016. nual Symposium on Foundations of Computer Science,
[EGG 06] Wayne Eberly, Mark Giesbrecht, Pascal Giorgi, FOCS 2016, 9-11 October 2016, Hyatt Regency, New
Arne Storjohann, and Gilles Villard. Solving sparse Brunswick, New Jersey, USA, pages 573–582, 2016.
rational linear systems. In Symbolic and Algebraic Available at: https://arxiv.org/abs/1605.02353.
Downloaded 06/27/21 to 143.215.38.55. Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/page/terms

Computation, International Symposium, ISSAC 2006, [KWZ20] Rasmus Kyng, Di Wang, and Peng Zhang.
Genoa, Italy, July 9-12, 2006, Proceedings, pages 63– Packing lps are hard to solve accurately, as-
70, 2006. suming linear equations are hard. In Shuchi
[EGG 07] Wayne Eberly, Mark Giesbrecht, Pascal Giorgi, Chawla, editor, Proceedings of the 2020 ACM-
Arne Storjohann, and Gilles Villard. Faster inversion SIAM Symposium on Discrete Algorithms, SODA
and other black box matrix computations using ef- 2020, Salt Lake City, UT, USA, January 5-8,
ficient block projections. In Symbolic and Algebraic 2020, pages 279–296. SIAM, 2020. Available at:
Computation, International Symposium, ISSAC 2007, https://dl.acm.org/doi/pdf/10.5555/3381089.3381106.
Waterloo, Ontario, Canada, July 28 - August 1, 2007, [Kyn17] Rasmus Kyng. Approximate Gaussian Elimina-
Proceedings, pages 143–150, 2007. tion. PhD thesis, Yale University, 2017. Available at:
[EH10] Herbert Edelsbrunner and John Harer. Computa- http://rasmuskyng.com/rjkyng-dissertation.pdf.
tional topology: an introduction. American Mathemat- [KZ17] Rasmus Kyng and Peng Zhang. Hardness results
ical Soc., 2010. for structured linear systems. In 58th IEEE An-
[GO89] Gene H Golub and Dianne P O’Leary. Some his- nual Symposium on Foundations of Computer Sci-
tory of the conjugate gradient and lanczos algorithms: ence, FOCS 2017, Berkeley, CA, USA, October
1948–1976. SIAM review, 31(1):50–102, 1989. 15-17, 2017, pages 684–695, 2017. Available at:
[Gra06] Robert M Gray. Toeplitz and circulant matrices: A https://arxiv.org/abs/1705.02944.
review. now publishers inc, 2006. [Lan50] Cornelius Lanczos. An iteration method for the
[Gre96] Keith D. Gremban. Combinatorial Preconditioners solution of the eigenvalue problem of linear differential
for Sparse, Symmetric, Diagonally Dominant Linear and integral operators. Journal of Research of the
Systems. PhD thesis, Carnegie Mellon University, National Bureau of Standards, 1950.
Pittsburgh, October 1996. CMU CS Tech Report [LG14] Francois Le Gall. Powers of tensors and fast ma-
CMU-CS-96-123. trix multiplication. In Proceedings of the 39th inter-
[GTVDV96] KA Gallivan, S Thirumalai, Paul Van Dooren, national symposium on symbolic and algebraic com-
and V Vermaut. High performance algorithms for putation, pages 296–303. ACM, 2014. Available at
toeplitz and block toeplitz matrices. Linear algebra http://arxiv.org/abs/1401.7714.
and its applications, 241:343–388, 1996. [LLY11] Lin Lin, Jianfeng Lu, and Lexing Ying. Fast con-
[HS52] Magnus Rudolph Hestenes and Eduard Stiefel. struction of hierarchical matrix representation from
Methods of conjugate gradients for solving linear sys- matrix–vector multiplication. Journal of Computa-
tems, volume 49-1. NBS Washington, DC, 1952. tional Physics, 230(10):4071–4087, 2011.
[HVDH19] David Harvey and Joris Van Der Hoeven. Poly- [LS92] George Labahn and Tamir Shalom. Inversion of
p q
nomial multiplication over finite fields in time o nlogn . toeplitz matrices with only two standard equations.
Available at https://hal.archives-ouvertes.fr/ Linear algebra and its applications, 175:143–158, 1992.
hal-02070816/document, 2019. [LV18] Kyle Luh and Van Vu. Sparse random matrices have
[KKM79] Thomas Kailath, Sun-Yuan Kung, and Martin simple spectrum. arXiv preprint arXiv:1802.03662,
Morf. Displacement ranks of matrices and linear 2018.
equations. Journal of Mathematical Analysis and [MMS18] Cameron Musco, Christopher Musco, and Aaron
Applications, 68(2):395–407, 1979. Sidford. Stability of the lanczos method for matrix
[KLP 16] Rasmus Kyng, Yin Tat Lee, Richard Peng, function approximation. In Proceedings of the Twenty-
Sushant Sachdeva, and Daniel A Spielman. Spar- Ninth Annual ACM-SIAM Symposium on Discrete Al-
sified cholesky and multigrid solvers for connec- gorithms, SODA 2018, New Orleans, LA, USA, Jan-
tion laplacians. In Proceedings of the 48th An- uary 7-10, 2018, pages 1605–1624, 2018. Available at:
nual ACM SIGACT Symposium on Theory of Com- https://arxiv.org/abs/1708.07788.
puting, pages 842–850. ACM, 2016. Available at [NTV17] Hoi Nguyen, Terence Tao, and Van Vu. Random
http://arxiv.org/abs/1512.01892. matrices: tail bounds for gaps between eigenvalues.
[KMP12] Ioannis Koutis, Gary L. Miller, and Probability Theory and Related Fields, 167(3-4):777–
Richard Peng. A fast solver for a class of lin- 816, 2017.
ear systems. Communications of the ACM, [Pan84] Victor Y. Pan. How to Multiply Matrices Faster,
55(10):99–107, October 2012. Available at volume 179 of Lecture Notes in Computer Science.
https://cacm.acm.org/magazines/2012/10/155538-a- Springer, 1984.
fast-solver-for-a-class-of-linear-systems/fulltext. [PS85] Franco P Preparata and Michael Ian Shamos. Com-

Copyright © 2021 by SIAM


520 Unauthorized reproduction of this article is prohibited
putational geometry. an introduction. Springer-Verlag [Woo14] David P. Woodruff. Sketching as a tool for numeri-
New York, 1985. cal linear algebra. Foundations and Trends in Theoret-
[PV20] Richard Peng and Santosh S. Vempala. Solving ical Computer Science, 10(1-2):1–157, 2014. Available
sparse linear systems faster than matrix multiplication. at: https://arxiv.org/abs/1411.4357.
CoRR, abs/2007.10254, 2020. [XB90] Guo-liang Xu and Adhemar Bultheel. Matrix padé
[RV07] Mark Rudelson and Roman Vershynin. Sampling approximation: definitions and properties. Linear
from large matrices: An approach through geometric algebra and its applications, 137:67–136, 1990.
Downloaded 06/27/21 to 143.215.38.55. Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/page/terms

functional analysis. J. ACM, 54(4):21, 2007. [XCGL10] Jianlin Xia, Shivkumar Chandrasekaran, Ming
[RV10] Mark Rudelson and Roman Vershynin. Non- Gu, and Xiaoye S Li. Fast algorithms for hierarchically
asymptotic theory of random matrices: extreme singu- semiseparable matrices. Numerical Linear Algebra with
lar values. In Proceedings of the International Congress Applications, 17(6):953–976, 2010.
of Mathematicians 2010 (ICM 2010) (In 4 Volumes) [XXCB14] Yuanzhe Xi, Jianlin Xia, Stephen Cauley, and
Vol. I: Plenary Lectures and Ceremonies Vols. II–IV: Venkataramanan Balakrishnan. Superfast and stable
Invited Lectures, pages 1576–1602. World Scientific, structured solvers for toeplitz least squares via ran-
2010. domized sampling. SIAM Journal on Matrix Analysis
[Saa03] Y. Saad. Iterative Methods for Sparse Linear Sys- and Applications, 35(1):44–72, 2014.
tems. Society for Industrial and Applied Mathematics, [XXG12] Jianlin Xia, Yuanzhe Xi, and Ming Gu. A su-
Philadelphia, PA, USA, 2nd edition, 2003. Available perfast structured solver for toeplitz linear systems via
at http://www-users.cs.umn.edu/˜saad/toc.pdf. randomized sampling. SIAM Journal on Matrix Anal-
[SST06] Arvind Sankar, Daniel A Spielman, and Shang-Hua ysis and Applications, 33(3):837–858, 2012.
Teng. Smoothed analysis of the condition numbers and [Ye11] Yinyu Ye. Interior point algorithms: theory and
growth factors of matrices. SIAM Journal on Matrix analysis, volume 44. John Wiley & Sons, 2011.
Analysis and Applications, 28(2):446–476, 2006. [Zha18] Peng Zhang. Hardness and Tractability For
[ST11] D. Spielman and S. Teng. Spectral sparsification of Structured Numerical Problems. PhD thesis,
graphs. SIAM Journal on Computing, 40(4):981–1025, Georgia Institute of Technology, 2018. Avail-
2011. Available at http://arxiv.org/abs/0808.4134. able at: https://drive.google.com/file/d/
[ST14] D. Spielman and S. Teng. Nearly linear time al- 1KEZNzna-Y7y6rDERKuBFvFuU-hfHUMR3/view.
gorithms for preconditioning and solving symmetric,
diagonally dominant linear systems. SIAM Journal on
Matrix Analysis and Applications, 35(3):835–885, 2014.
Available at http://arxiv.org/abs/cs/0607105.
[Sto05] Arne Storjohann. The shifted number system
for fast linear algebra on integer matrices. Jour-
nal of Complexity, 21(4):609–650, 2005. Available at:
https://cs.uwaterloo.ca/˜astorjoh/shifted.pdf.
[Str69] Volker Strassen. Gaussian elimination is not opti-
mal. Numerische mathematik, 13(4):354–356, 1969.
[SV13] Sushant Sachdeva and Nisheeth K Vishnoi. Faster
algorithms via approximation theory. Theoretical
Computer Science, 9(2):125–210, 2013.
[Tro15] Joel A. Tropp. An introduction to matrix concen-
tration inequalities. Found. Trends Mach. Learn., 8(1-
2):1–230, 2015.
[Tur48] Alan M Turing. Rounding-off errors in matrix
processes. The Quarterly Journal of Mechanics and
Applied Mathematics, 1(1):287–308, 1948.
[TV10] Terence Tao and Van H. Vu. Smooth analysis of
the condition number and the least singular value.
Math. Comput., 79(272):2333–2352, 2010. Available
at: https://arxiv.org/abs/0805.3167.
[Vai89] P. M. Vaidya. Speeding-up linear programming us-
ing fast matrix multiplication. In 30th Annual Sympo-
sium on Foundations of Computer Science, pages 332–
337, Oct 1989.
[Wil12] Virginia Vassilevska Williams. Multiplying matri-
ces faster than coppersmith-winograd. In Proceedings
of the forty-fourth annual ACM symposium on Theory
of computing, pages 887–898. ACM, 2012.

Copyright © 2021 by SIAM


521 Unauthorized reproduction of this article is prohibited

You might also like