Jas Edinburgh 21I05

Which sparse direct solver?
Jennifer A. Scott
Computational Science and Engineering Department,
Rutherford Appleton Laboratory.
J.A.Scott@rl.ac.uk
Joint work with Nicholas I.M. Gould, RAL

and Yifan F. Hu, Wolfram Research.
Which?
Which?
Expert advice from an independent source.
Aim of Which?: a fair consumer world where everyone makes

confident choices.
Edinburgh January 2005

Sparse linear systems
Problem: we wish to solve
Ax = b
where the matrix A is
LARGE
s p a r s e
Need to be solved efficiently and reliably.
Informal definition: A is sparse if

• many entries are zero
• it is worthwhile to exploit these zeros.

• The idea of what is LARGE changed significantly over the last
30-40 years.
• Problems of order > 106 common.
• Largest problems require iterative solvers (eg CG, GMRES,
MINRES,...).
• Our interest lies mainly in direct solvers.
• Direct methods involve explicit factorization eg A = LU
(L, U lower and upper triangular matrices).
• Recently combining direct and iterative solvers has become
an active area of research eg direct solvers used to obtain
preconditioners for iterative solvers.

Many application areas in science, engineering, and finance give
rise to sparse systems, including
• chemical engineering
• economic modelling
• fluid flow
• oceanography
• acoustics
• linear programming
• structural engineering ...
In this study, we are interested in problems that lead to symmetric

systems.

Unconstrained optimization
min f (x)
x∈Rn
Iterative solution: given xk find improved approx. xk + pk where

1
pk = argmin{f (xk ) + ∇f (xk )T p + pT Gk p}.
2
Provided Gk is symmetric positive definite, preconditioned CG may
be used.
Crucial preconditioning step: given g, solve n × n system
Bk s = −g
for some symmetric positive definite Bk that approximates Gk .

x 10
4 GRIDGENA
0
0.5
1.5
2.5
3.5
4.5
0 1 2 3 4
nz = 512084 4
x 10

Constrained optimization
min f (x) such that c(x) = 0

x∈Rn
Iterative solution: given xk find improved approx. xk + pk where

1
pk = argmin{f (xk ) + ∇xf (xk )T p + pT Gk p}
2
such that
Ak p = −c(xk )
using preconditioned CG for given Gk and Ak = ∇xc(xk ).

Crucial preconditioning step: given g, solve 2n × 2n sparse
symmetric indefinite system
Bk ATk
    
s   g 
 = − (1)
 
   
Ak 0 y d
  

x 10
4 DTOC
0
0.5
1.5
0 0.5 1 1.5 2
nz = 69972 4
x 10

x 10
4 AUG3D
0
0.5
1.5
0 0.5 1 1.5 2
nz = 69984 4
x 10

HSL
• Began as Harwell Subroutine Library1963.

• Collection of portable, fully documented Fortran packages.
• Primarily written and developed by the Numerical Analysis
Group at RAL.
• Each package performs a basic numerical task and is designed
to be incorporated into programs under the user’s control.
• Particular strengths in
– sparse matrix computations
– optimization
– large-scale system solution
• used worldwide by academics and commercial organisations
item international reputation for reliability and efficiency.

HSL
HSL is FREE for UK academics: see

www.cse.clrc.ac.uk/Activity/HSL

HSL direct solvers
HSL contains a number of very different solvers:
• Some are for symmetric systems, others for unsymmetric
systems.
• There are solvers designed for element problems.
• There are solvers that use minimal storage.
• Some are designed for particular sparsity structures (banded,
highly unsymmetric, KKT ...).
• There are solvers for real systems and for complex systems.
• Some expolit high level BLAS.
• Some incorporate scaling/ iterative refinement/ different
orderings ...
Help! How does a user decide which to use?

HSL sparse real symmetric solvers
Code Year Fortran Pos. Def. Indefinite Element Banded Out-of-core

√ √
MA27 1982 77 × × ×
√
MA47 1993 77 × × × ×
√ √ √
MA55 1999 90 × ×
√ √
MA57 2000 77/90 × × ×
√ √ √
MA62 1997 77 × ×
√
MA67 2001 77 × × × ×
√
in Pos. Def. column indicates option for efficient solution of positive-definite
problems or designed specifically for such systems.

Key findings of HSL study
• MA57 was the best overall.

• For large problems, advantageous to use a nested dissection
ordering (rather than minimum degree).
• Out-of-core solvers use minimum memory but existing codes
not competitive in terms of time with MA57
• In many cases, using tiny pivot tolerance better than the default
tolerance.
• For some “tough” indefinite systems it is better to use MA47
or MA67.
Small number of problems not solved by any of our solvers

Solvers tested in this study
• We consider only direct solvers.

• Our tests are on real symmetric problems (some packages offer
versions for complex symmetric and/or Hermitian matrices, and
some can be used for unsymmetric systems).
• Serial solvers only. Note some packages have parallel versions
(and may be written primarily as parallel codes); we consider
only serial codes and serial versions of parallel solvers.

The packages tested
BCSLIB-EXT 2001 F77 The Boeing Company

MA57∗∗ 2004 F77/F90 I.S. Duff, HSL
MUMPS∗ 2003 F90 P. Amestoy, I. Duff, J.-Y. L’Excellent, J.Koster
Oblio∗ 2004 C++ F. Dobrian and A. Pothen
PARDISO∗ 2004 F77 O. Schenk and K. Gärtner
SPOOLES∗ 1999 C C. Ashcraft and R. Grimes
SPRSBLKLLT∗ 1997 F77 E.G. Ng and B.W. Peyton
TAUCS∗ 2003 C S. Toledo
UMFPACK∗ 2003 C T. Davis
WSMP∗ 2003 F90 & C A. Gupta and M. Joshi, IBM
∗ = free to academic users

∗∗ = free to UK academics

Phases of sparse direct solvers
• Analyse : chooses (tentative) pivot sequence using sparsity

patten and sets up data structures
• Factorise : uses the pivot sequence to factorize A (may scale A
prior to the factorization). Numerical considerations may delay
pivots.
• Solve : forward elimination followed by back substitution using
computed factors. Iterative refinement may be needed.

Ordering options and factorization algorithm
Code MD AMD MMD ND METIS MS User Other Algorithm

√ √ √
BCSLIB-EXT Multifrontal
√ √ √ √ √
MA57 Multifrontal
√ √ √ √ √ √
MUMPS Multifrontal
√ √ √
Oblio Left-/Right-looking, Multifrontal
√ √ √
PARDISO Left-right looking
√ √ √ √
SPOOLES Left-looking
√ √
SPRSBLKLLT Left-looking
√ √ √ √ √ √
TAUCS Left-looking/ Multifrontal
√
UMFPACK Unsym. Multifrontal
√ √ √
WSMP Multifrontal
√
denotes default

Pivoting strategies
Code Pos. Def. Indefinite

√
BCSLIB-EXT Numerical pivoting with 1 × 1 and 2 × 2 pivots.
√
MA57 Numerical pivoting with 1 × 1 and 2 × 2 pivots.
√
MUMPS Numerical pivoting with 1 × 1 pivots.
√
Oblio Numerical pivoting with 1 × 1 and 2 × 2 pivots.
√
PARDISO Supernode Bunch-Kaufmann.
√
SPOOLES Fast Bunch-Parlett.
√
SPRSBLKLLT ×
√
TAUCS ×∗
UMFPACK × Partial pivoting with preference for diagonal pivots.
√
WSMP No pivoting.
∗ = numerical pivoting to be included in future release

Other key features
Code Element Scaling Out-of Iterative Multiple Complex Hermitian

entry -core refinement rhs symmetric
√ √ √ √
BCSLIB-EXT
√ √ √
MA57
√ √ √ √
MUMPS
√ √ √ √
Oblio
√ √ √ √
PARDISO
√ √ √
SPOOLES
√
SPRSBLKLLT
√ √
TAUCS
√ √ √
UMFPACK
√ √ √
WSMP

How can we compare solvers?
First choose set T of test problems. Criteria used:

• Problems must arise from practical applications
• Include problems from a wide range of application areas
(linear programming, structural engineering, computational
fluid dynamics, acoustics, financial modelling ...)
• The matrices must be of order > 10000
• The data must be available to other users (June 2003)
Diverse test set comprises 88 positive definite and 61 indefinite

examples

The performance profile (Dolan and Moré)
• Suppose solver i ∈ A returns statistic sij ≥ 0 when run on

problem j ∈ T
• Assume the smaller this statistic, the better the solver
(eg sij might be CPU time required to solve problem j using
solver i).
• For j ∈ T , let ŝj = min{sij ; i ∈ A}.
• For α ≥ 1 and each i ∈ A define


 1 if sij ≤ αŝj
k(sij , ŝj , α) = 

0 otherwise.
• The performance profile of solver i is then given by
1
pi(α) = ∗ k(sij , ŝj , α), α ≥ 1.
X
|T | j∈T

The performance profile (Dolan and Moré) (cont.)
• Thus pi(1) is fraction of problems for which solver i is the best

(according to the statistic sij )
• pi(2) is fraction for which solver i is within a factor of 2 of the
best
• limα−→∞ pi(α) is fraction that i solved successfully.
In this study, the statistics we use are:

• The CPU times
• The number of entries in the matrix factor
• The memory used by the solver

Test environment
• Test performed on a single processor of a Compaq DS20 with

3.5 Gbytes RAM
• Codes compiled with full optimization and vendor blas.
• Double precision reals used.
• CPU time limit of 30 minutes for each code on each example.
• Default settings for most control parameters. Exceptions:
– Blocking parameter for high level BLAS set to 16
– For indefinite problems, stability threshold parameter
also set to u = 10−10
• Failures due to: time limit reached or insufficient memory or
scaled residual too large

Positive definite problems: AFS CPU times
Performance Profile: 0.AFS.CPU − 88 positive−definite problems, u=default
1
0.9
fraction of problems for which solver within α of best
0.8
0.7
0.6
0.5
0.4
BCSLIB−EXT (1 failed)
MA57 (1 failed)
0.3
MUMPS (1 failed)
Oblio (2 failed)
PARDISO (1 failed)
0.2
SPOOLES (2 failed)
SPRSBLKLLT (1 failed)
0.1 TAUCS (2 failed)
UMFPACK (4 failed)
WSMP (1 failed)
0
1 1.5 2 2.5 3 3.5 4 4.5 5
α

Positive definite problems: Analyse CPU times
Performance Profile: 0.Analyse.CPU − 88 positive−definite problems, u=default
1
0.9
0.8
0.7
0.6
0.5
0.4
MA57 (1 failed)
0.3
MUMPS (1 failed)
Oblio (2 failed)
PARDISO (1 failed)
0.2
SPOOLES (2 failed)
UMFPACK (4 failed)
WSMP (1 failed)
0
1 1.5 2 2.5 3 3.5 4 4.5 5
α

Positive definite problems: Factorise CPU times
Performance Profile: 0.Factorise.CPU − 88 positive−definite problems, u=default
1
0.9
0.8
0.7
0.6
0.5
0.4
MA57 (1 failed)
0.3
MUMPS (1 failed)
Oblio (2 failed)
PARDISO (1 failed)
0.2
SPOOLES (2 failed)
UMFPACK (4 failed)
WSMP (1 failed)
0
1 1.5 2 2.5 3 3.5 4 4.5 5
α

Positive definite problems: Solve CPU times
Performance Profile: 0.Solve.CPU − 88 positive−definite problems, u=default
1
0.9
0.8
0.7
0.6
0.5
0.4
MA57 (1 failed)
0.3
MUMPS (1 failed)
Oblio (2 failed)
PARDISO (1 failed)
0.2
SPOOLES (2 failed)
UMFPACK (4 failed)
WSMP (1 failed)
0
1 1.5 2 2.5 3 3.5 4 4.5 5
α

Positive definite problems: main findings
• Definitely worth using a code designed for symmetric systems.

• When the same ordering is used, there is little to choose
between most of the solvers (reassuring!).
• For many large problems in our set, using nested dissection was
the best ordering.
• BUT analyse is cheap so it can be worthwhile to try more than
one ordering (especially if repeated factorizations required).
• If repeated solves are required, it is worth investing in a code
which has carefully coded the solve phase.

Indefinite problems: AFS CPU times
Performance Profile: 1.AFS.CPU − 61 indefinite problems
1
0.9
0.8
0.7
0.6
0.5
0.4
BCSEXT−LIB (15 failed)

0.3 MA57 (2 failed)
MUMPS (16 failed)
MUMPS−unsym (8 failed)
0.2
Oblio (8 failed)
PARDISO (3 failed)
0.1 SPOOLES (15 failed)
UMFPACK (5 failed)
WSMP (31 failed)
0
1 1.5 2 2.5 3 3.5 4 4.5 5
α

Indefinite problems: Analyse CPU times
Performance Profile: 1.Analyse.CPU − 61 indefinite problems
1
0.9
0.8
0.7
0.6
0.5
0.4

0.3 MA57 (2 failed)
MUMPS (16 failed)
0.2
Oblio (8 failed)
PARDISO (3 failed)
UMFPACK (5 failed)
WSMP (31 failed)
0
1 1.5 2 2.5 3 3.5 4 4.5 5
α

Indefinite problems: Factorise CPU times
Performance Profile: 1.Factorise.CPU − 61 indefinite problems
1
0.9
0.8
0.7
0.6
0.5
0.4

0.3 MA57 (2 failed)
MUMPS (16 failed)
0.2
Oblio (8 failed)
PARDISO (3 failed)
UMFPACK (5 failed)
WSMP (31 failed)
0
1 1.5 2 2.5 3 3.5 4 4.5 5
α

Indefinite problems: Solve CPU times
Performance Profile: 1.Solve.CPU − 61 indefinite problems
1
0.9
0.8
0.7
0.6
0.5
0.4

0.3 MA57 (2 failed)
MUMPS (16 failed)
0.2
Oblio (8 failed)
PARDISO (3 failed)
UMFPACK (5 failed)
WSMP (31 failed)
0
1 1.5 2 2.5 3 3.5 4 4.5 5
α

Indefinite problems, tiny u: AFS CPU times
Performance Profile: 2.AFS.CPU − 61 indefinite problems, u=tiny
1
0.9
0.8
0.7
0.6
0.5
0.4

0.3 MA57 (5 failed)
MUMPS (9 failed)
0.2
Oblio (5 failed)
PARDISO (3 failed)
UMFPACK (17 failed)
WSMP (31 failed)
0
1 1.5 2 2.5 3 3.5 4 4.5 5
α

Indefinite definite problems: main findings
• There are some tough problems in the real world!

• The static pivoting used by PARDSO is fast but there are a few
examples where this can lead to an inaccurate solution (that
does not converge with iterative refinement).
• A penalty of static pivoting is need for iterative refinement
which increases the solve time.
• Better out-of-core solvers are needed for really large problems.

The future
• The study has shown up weaknesses in the solvers.
• It has already led to the authors of several of the codes making
improvements.
• For some, further changes/new versions are on the way.
• The static pivoting used by PARDISO has, in particular, opened
up new areas of research.
• Reliably solving large, tough indefinite problems (which may be
singular) remains a challenge.
• Note: the ease of use, generality of input data, quality of user
documentation, software maintenance etc varies considerably
between packages.
All results shortly to appear in a Rutherford Technical Report.

Our thanks
We would like to thank all the authors of the codes for their
invaluable help with this project.
We hope it has been mutually beneficial.
More pictures see: www.numerical.rl.ac.uk/talks/talks.shtml

Positive definite problems: factor entries
Performance Profile: 0.Real.factor − 88 positive−definite problems, u=default
1
0.9
0.8
0.7
0.6
0.5
0.4
MA57 (1 failed)
0.3 MUMPS (1 failed)
Oblio (2 failed)
PARDISO (1 failed)
0.2
SPOOLES (2 failed)
UMFPACK (4 failed)
WSMP (1 failed)
0
1 1.5 2 2.5 3 3.5 4 4.5 5
α

Indefinite problems: factor entries
Performance Profile: 1.Real.factor − 61 indefinite problems
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3 MA57 (2 failed)

MUMPS (16 failed)
0.2
Oblio (8 failed)
PARDISO (3 failed)
UMFPACK (5 failed)
WSMP (31 failed)
0
1 1.5 2 2.5 3 3.5 4 4.5 5
α

Indefinite problems, tiny u: factor entries
Performance Profile: 2.Real.factor − 61 indefinite problems, u=tiny
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3 MA57 (5 failed)

MUMPS (9 failed)
0.2
Oblio (5 failed)
PARDISO (3 failed)
UMFPACK (17 failed)
WSMP (31 failed)
0
1 1.5 2 2.5 3 3.5 4 4.5 5
α

Indefinite problems, tiny u: Factorise CPU times
Performance Profile: 2.Factorise.CPU − 61 indefinite problems, u=tiny
1
0.9
0.8
0.7
0.6
0.5
0.4

0.3 MA57 (5 failed)
MUMPS (9 failed)
0.2
Oblio (5 failed)
PARDISO (3 failed)
UMFPACK (17 failed)
WSMP (31 failed)
0
1 1.5 2 2.5 3 3.5 4 4.5 5
α

Indefinite problems, tiny u: Solve CPU times
Performance Profile: 2.Solve.CPU − 61 indefinite problems, u=tiny
1
0.9
0.8
0.7
0.6
0.5
0.4

0.3 MA57 (5 failed)
MUMPS (9 failed)
0.2
Oblio (5 failed)
PARDISO (3 failed)
UMFPACK (17 failed)
WSMP (31 failed)
0
1 1.5 2 2.5 3 3.5 4 4.5 5
α

Jas Edinburgh 21I05

Uploaded by

Copyright:

Available Formats

You might also like

Jas Edinburgh 21I05

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Jas Edinburgh 21I05

Uploaded by

Copyright:

Available Formats

Which sparse direct solver?

Joint work with Nicholas I.M. Gould, RAL

Aim of Which?: a fair consumer world where everyone makes

Edinburgh January 2005

Need to be solved efficiently and reliably.

Informal definition: A is sparse if

Edinburgh January 2005

Edinburgh January 2005

In this study, we are interested in problems that lead to symmetric

Edinburgh January 2005

Iterative solution: given xk find improved approx. xk + pk where

Edinburgh January 2005

Edinburgh January 2005

min f (x) such that c(x) = 0

Iterative solution: given xk find improved approx. xk + pk where

using preconditioned CG for given Gk and Ak = ∇xc(xk ).

Edinburgh January 2005

Edinburgh January 2005

Edinburgh January 2005

• Began as Harwell Subroutine Library1963.

Edinburgh January 2005

HSL is FREE for UK academics: see

Edinburgh January 2005

Help! How does a user decide which to use?

Edinburgh January 2005

Code Year Fortran Pos. Def. Indefinite Element Banded Out-of-core

Edinburgh January 2005

• MA57 was the best overall.

Small number of problems not solved by any of our solvers

Edinburgh January 2005

• We consider only direct solvers.

Edinburgh January 2005

BCSLIB-EXT 2001 F77 The Boeing Company

∗ = free to academic users

Edinburgh January 2005

• Analyse : chooses (tentative) pivot sequence using sparsity

Edinburgh January 2005

Code MD AMD MMD ND METIS MS User Other Algorithm

Edinburgh January 2005

Code Pos. Def. Indefinite

∗ = numerical pivoting to be included in future release

Edinburgh January 2005

Code Element Scaling Out-of Iterative Multiple Complex Hermitian

Edinburgh January 2005

First choose set T of test problems. Criteria used:

Diverse test set comprises 88 positive definite and 61 indefinite

Edinburgh January 2005

• Suppose solver i ∈ A returns statistic sij ≥ 0 when run on

Edinburgh January 2005

• Thus pi(1) is fraction of problems for which solver i is the best

In this study, the statistics we use are:

Edinburgh January 2005

• Test performed on a single processor of a Compaq DS20 with

Edinburgh January 2005

Edinburgh January 2005

Edinburgh January 2005

Edinburgh January 2005