Jas Edinburgh 21I05

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 44

Which sparse direct solver?

Jennifer A. Scott
Computational Science and Engineering Department,
Rutherford Appleton Laboratory.

J.A.Scott@rl.ac.uk

Joint work with Nicholas I.M. Gould, RAL


and Yifan F. Hu, Wolfram Research.
Which?

Which?
Expert advice from an independent source.

Aim of Which?: a fair consumer world where everyone makes


confident choices.

Edinburgh January 2005


Sparse linear systems
Problem: we wish to solve

Ax = b
where the matrix A is

LARGE
s p a r s e

Need to be solved efficiently and reliably.

Informal definition: A is sparse if


• many entries are zero
• it is worthwhile to exploit these zeros.

Edinburgh January 2005


• The idea of what is LARGE changed significantly over the last
30-40 years.
• Problems of order > 106 common.
• Largest problems require iterative solvers (eg CG, GMRES,
MINRES,...).
• Our interest lies mainly in direct solvers.
• Direct methods involve explicit factorization eg A = LU
(L, U lower and upper triangular matrices).
• Recently combining direct and iterative solvers has become
an active area of research eg direct solvers used to obtain
preconditioners for iterative solvers.

Edinburgh January 2005


Many application areas in science, engineering, and finance give
rise to sparse systems, including
• chemical engineering
• economic modelling
• fluid flow
• oceanography
• acoustics
• linear programming
• structural engineering ...

In this study, we are interested in problems that lead to symmetric


systems.

Edinburgh January 2005


Unconstrained optimization

min f (x)
x∈Rn

Iterative solution: given xk find improved approx. xk + pk where


1
pk = argmin{f (xk ) + ∇f (xk )T p + pT Gk p}.
2
Provided Gk is symmetric positive definite, preconditioned CG may
be used.
Crucial preconditioning step: given g, solve n × n system
Bk s = −g
for some symmetric positive definite Bk that approximates Gk .

Edinburgh January 2005


Edinburgh January 2005
x 10
4 GRIDGENA
0

0.5

1.5

2.5

3.5

4.5

0 1 2 3 4
nz = 512084 4
x 10

Edinburgh January 2005


Constrained optimization

min f (x) such that c(x) = 0


x∈Rn

Iterative solution: given xk find improved approx. xk + pk where


1
pk = argmin{f (xk ) + ∇xf (xk )T p + pT Gk p}
2
such that
Ak p = −c(xk )

using preconditioned CG for given Gk and Ak = ∇xc(xk ).


Crucial preconditioning step: given g, solve 2n × 2n sparse
symmetric indefinite system

Bk ATk
    
s   g 
 = − (1)
 
   
Ak 0 y d
  

Edinburgh January 2005


x 10
4 DTOC
0

0.5

1.5

0 0.5 1 1.5 2
nz = 69972 4
x 10

Edinburgh January 2005


x 10
4 AUG3D
0

0.5

1.5

0 0.5 1 1.5 2
nz = 69984 4
x 10

Edinburgh January 2005


HSL

• Began as Harwell Subroutine Library1963.


• Collection of portable, fully documented Fortran packages.
• Primarily written and developed by the Numerical Analysis
Group at RAL.
• Each package performs a basic numerical task and is designed
to be incorporated into programs under the user’s control.
• Particular strengths in
– sparse matrix computations
– optimization
– large-scale system solution
• used worldwide by academics and commercial organisations
item international reputation for reliability and efficiency.

Edinburgh January 2005


HSL

HSL is FREE for UK academics: see


www.cse.clrc.ac.uk/Activity/HSL

Edinburgh January 2005


HSL direct solvers
HSL contains a number of very different solvers:
• Some are for symmetric systems, others for unsymmetric
systems.
• There are solvers designed for element problems.
• There are solvers that use minimal storage.
• Some are designed for particular sparsity structures (banded,
highly unsymmetric, KKT ...).
• There are solvers for real systems and for complex systems.
• Some expolit high level BLAS.
• Some incorporate scaling/ iterative refinement/ different
orderings ...

Help! How does a user decide which to use?

Edinburgh January 2005


HSL sparse real symmetric solvers

Code Year Fortran Pos. Def. Indefinite Element Banded Out-of-core


√ √
MA27 1982 77 × × ×

MA47 1993 77 × × × ×
√ √ √
MA55 1999 90 × ×
√ √
MA57 2000 77/90 × × ×
√ √ √
MA62 1997 77 × ×

MA67 2001 77 × × × ×


in Pos. Def. column indicates option for efficient solution of positive-definite
problems or designed specifically for such systems.

Edinburgh January 2005


Key findings of HSL study

• MA57 was the best overall.


• For large problems, advantageous to use a nested dissection
ordering (rather than minimum degree).
• Out-of-core solvers use minimum memory but existing codes
not competitive in terms of time with MA57
• In many cases, using tiny pivot tolerance better than the default
tolerance.
• For some “tough” indefinite systems it is better to use MA47
or MA67.

Small number of problems not solved by any of our solvers

Edinburgh January 2005


Solvers tested in this study

• We consider only direct solvers.


• Our tests are on real symmetric problems (some packages offer
versions for complex symmetric and/or Hermitian matrices, and
some can be used for unsymmetric systems).
• Serial solvers only. Note some packages have parallel versions
(and may be written primarily as parallel codes); we consider
only serial codes and serial versions of parallel solvers.

Edinburgh January 2005


The packages tested

BCSLIB-EXT 2001 F77 The Boeing Company


MA57∗∗ 2004 F77/F90 I.S. Duff, HSL
MUMPS∗ 2003 F90 P. Amestoy, I. Duff, J.-Y. L’Excellent, J.Koster
Oblio∗ 2004 C++ F. Dobrian and A. Pothen
PARDISO∗ 2004 F77 O. Schenk and K. Gärtner
SPOOLES∗ 1999 C C. Ashcraft and R. Grimes
SPRSBLKLLT∗ 1997 F77 E.G. Ng and B.W. Peyton
TAUCS∗ 2003 C S. Toledo
UMFPACK∗ 2003 C T. Davis
WSMP∗ 2003 F90 & C A. Gupta and M. Joshi, IBM

∗ = free to academic users


∗∗ = free to UK academics

Edinburgh January 2005


Phases of sparse direct solvers

• Analyse : chooses (tentative) pivot sequence using sparsity


patten and sets up data structures
• Factorise : uses the pivot sequence to factorize A (may scale A
prior to the factorization). Numerical considerations may delay
pivots.
• Solve : forward elimination followed by back substitution using
computed factors. Iterative refinement may be needed.

Edinburgh January 2005


Ordering options and factorization algorithm

Code MD AMD MMD ND METIS MS User Other Algorithm


√ √ √
BCSLIB-EXT Multifrontal
√ √ √ √ √
MA57 Multifrontal
√ √ √ √ √ √
MUMPS Multifrontal
√ √ √
Oblio Left-/Right-looking, Multifrontal
√ √ √
PARDISO Left-right looking
√ √ √ √
SPOOLES Left-looking
√ √
SPRSBLKLLT Left-looking
√ √ √ √ √ √
TAUCS Left-looking/ Multifrontal

UMFPACK Unsym. Multifrontal
√ √ √
WSMP Multifrontal


denotes default

Edinburgh January 2005


Pivoting strategies

Code Pos. Def. Indefinite



BCSLIB-EXT Numerical pivoting with 1 × 1 and 2 × 2 pivots.

MA57 Numerical pivoting with 1 × 1 and 2 × 2 pivots.

MUMPS Numerical pivoting with 1 × 1 pivots.

Oblio Numerical pivoting with 1 × 1 and 2 × 2 pivots.

PARDISO Supernode Bunch-Kaufmann.

SPOOLES Fast Bunch-Parlett.

SPRSBLKLLT ×

TAUCS ×∗
UMFPACK × Partial pivoting with preference for diagonal pivots.

WSMP No pivoting.

∗ = numerical pivoting to be included in future release

Edinburgh January 2005


Other key features

Code Element Scaling Out-of Iterative Multiple Complex Hermitian


entry -core refinement rhs symmetric
√ √ √ √
BCSLIB-EXT
√ √ √
MA57
√ √ √ √
MUMPS
√ √ √ √
Oblio
√ √ √ √
PARDISO
√ √ √
SPOOLES

SPRSBLKLLT
√ √
TAUCS
√ √ √
UMFPACK
√ √ √
WSMP

Edinburgh January 2005


How can we compare solvers?

First choose set T of test problems. Criteria used:


• Problems must arise from practical applications
• Include problems from a wide range of application areas
(linear programming, structural engineering, computational
fluid dynamics, acoustics, financial modelling ...)
• The matrices must be of order > 10000
• The data must be available to other users (June 2003)

Diverse test set comprises 88 positive definite and 61 indefinite


examples

Edinburgh January 2005


The performance profile (Dolan and Moré)

• Suppose solver i ∈ A returns statistic sij ≥ 0 when run on


problem j ∈ T
• Assume the smaller this statistic, the better the solver
(eg sij might be CPU time required to solve problem j using
solver i).
• For j ∈ T , let ŝj = min{sij ; i ∈ A}.
• For α ≥ 1 and each i ∈ A define


 1 if sij ≤ αŝj
k(sij , ŝj , α) = 

0 otherwise.
• The performance profile of solver i is then given by
1
pi(α) = ∗ k(sij , ŝj , α), α ≥ 1.
X

|T | j∈T

Edinburgh January 2005


The performance profile (Dolan and Moré) (cont.)

• Thus pi(1) is fraction of problems for which solver i is the best


(according to the statistic sij )
• pi(2) is fraction for which solver i is within a factor of 2 of the
best
• limα−→∞ pi(α) is fraction that i solved successfully.

In this study, the statistics we use are:


• The CPU times
• The number of entries in the matrix factor
• The memory used by the solver

Edinburgh January 2005


Test environment

• Test performed on a single processor of a Compaq DS20 with


3.5 Gbytes RAM
• Codes compiled with full optimization and vendor blas.
• Double precision reals used.
• CPU time limit of 30 minutes for each code on each example.
• Default settings for most control parameters. Exceptions:
– Blocking parameter for high level BLAS set to 16
– For indefinite problems, stability threshold parameter
also set to u = 10−10
• Failures due to: time limit reached or insufficient memory or
scaled residual too large

Edinburgh January 2005


Positive definite problems: AFS CPU times
Performance Profile: 0.AFS.CPU − 88 positive−definite problems, u=default
1

0.9
fraction of problems for which solver within α of best

0.8

0.7

0.6

0.5

0.4
BCSLIB−EXT (1 failed)
MA57 (1 failed)
0.3
MUMPS (1 failed)
Oblio (2 failed)
PARDISO (1 failed)
0.2
SPOOLES (2 failed)
SPRSBLKLLT (1 failed)
0.1 TAUCS (2 failed)
UMFPACK (4 failed)
WSMP (1 failed)
0
1 1.5 2 2.5 3 3.5 4 4.5 5
α

Edinburgh January 2005


Positive definite problems: Analyse CPU times
Performance Profile: 0.Analyse.CPU − 88 positive−definite problems, u=default
1

0.9
fraction of problems for which solver within α of best

0.8

0.7

0.6

0.5

0.4
BCSLIB−EXT (1 failed)
MA57 (1 failed)
0.3
MUMPS (1 failed)
Oblio (2 failed)
PARDISO (1 failed)
0.2
SPOOLES (2 failed)
SPRSBLKLLT (1 failed)
0.1 TAUCS (2 failed)
UMFPACK (4 failed)
WSMP (1 failed)
0
1 1.5 2 2.5 3 3.5 4 4.5 5
α

Edinburgh January 2005


Positive definite problems: Factorise CPU times
Performance Profile: 0.Factorise.CPU − 88 positive−definite problems, u=default
1

0.9
fraction of problems for which solver within α of best

0.8

0.7

0.6

0.5

0.4
BCSLIB−EXT (1 failed)
MA57 (1 failed)
0.3
MUMPS (1 failed)
Oblio (2 failed)
PARDISO (1 failed)
0.2
SPOOLES (2 failed)
SPRSBLKLLT (1 failed)
0.1 TAUCS (2 failed)
UMFPACK (4 failed)
WSMP (1 failed)
0
1 1.5 2 2.5 3 3.5 4 4.5 5
α

Edinburgh January 2005


Positive definite problems: Solve CPU times
Performance Profile: 0.Solve.CPU − 88 positive−definite problems, u=default
1

0.9
fraction of problems for which solver within α of best

0.8

0.7

0.6

0.5

0.4
BCSLIB−EXT (1 failed)
MA57 (1 failed)
0.3
MUMPS (1 failed)
Oblio (2 failed)
PARDISO (1 failed)
0.2
SPOOLES (2 failed)
SPRSBLKLLT (1 failed)
0.1 TAUCS (2 failed)
UMFPACK (4 failed)
WSMP (1 failed)
0
1 1.5 2 2.5 3 3.5 4 4.5 5
α

Edinburgh January 2005


Positive definite problems: main findings

• Definitely worth using a code designed for symmetric systems.


• When the same ordering is used, there is little to choose
between most of the solvers (reassuring!).
• For many large problems in our set, using nested dissection was
the best ordering.
• BUT analyse is cheap so it can be worthwhile to try more than
one ordering (especially if repeated factorizations required).
• If repeated solves are required, it is worth investing in a code
which has carefully coded the solve phase.

Edinburgh January 2005


Indefinite problems: AFS CPU times
Performance Profile: 1.AFS.CPU − 61 indefinite problems
1

0.9
fraction of problems for which solver within α of best

0.8

0.7

0.6

0.5

0.4

BCSEXT−LIB (15 failed)


0.3 MA57 (2 failed)
MUMPS (16 failed)
MUMPS−unsym (8 failed)
0.2
Oblio (8 failed)
PARDISO (3 failed)
0.1 SPOOLES (15 failed)
UMFPACK (5 failed)
WSMP (31 failed)
0
1 1.5 2 2.5 3 3.5 4 4.5 5
α

Edinburgh January 2005


Indefinite problems: Analyse CPU times
Performance Profile: 1.Analyse.CPU − 61 indefinite problems
1

0.9
fraction of problems for which solver within α of best

0.8

0.7

0.6

0.5

0.4

BCSEXT−LIB (15 failed)


0.3 MA57 (2 failed)
MUMPS (16 failed)
MUMPS−unsym (8 failed)
0.2
Oblio (8 failed)
PARDISO (3 failed)
0.1 SPOOLES (15 failed)
UMFPACK (5 failed)
WSMP (31 failed)
0
1 1.5 2 2.5 3 3.5 4 4.5 5
α

Edinburgh January 2005


Indefinite problems: Factorise CPU times
Performance Profile: 1.Factorise.CPU − 61 indefinite problems
1

0.9
fraction of problems for which solver within α of best

0.8

0.7

0.6

0.5

0.4

BCSEXT−LIB (15 failed)


0.3 MA57 (2 failed)
MUMPS (16 failed)
MUMPS−unsym (8 failed)
0.2
Oblio (8 failed)
PARDISO (3 failed)
0.1 SPOOLES (15 failed)
UMFPACK (5 failed)
WSMP (31 failed)
0
1 1.5 2 2.5 3 3.5 4 4.5 5
α

Edinburgh January 2005


Indefinite problems: Solve CPU times
Performance Profile: 1.Solve.CPU − 61 indefinite problems
1

0.9
fraction of problems for which solver within α of best

0.8

0.7

0.6

0.5

0.4

BCSEXT−LIB (15 failed)


0.3 MA57 (2 failed)
MUMPS (16 failed)
MUMPS−unsym (8 failed)
0.2
Oblio (8 failed)
PARDISO (3 failed)
0.1 SPOOLES (15 failed)
UMFPACK (5 failed)
WSMP (31 failed)
0
1 1.5 2 2.5 3 3.5 4 4.5 5
α

Edinburgh January 2005


Indefinite problems, tiny u: AFS CPU times
Performance Profile: 2.AFS.CPU − 61 indefinite problems, u=tiny
1

0.9
fraction of problems for which solver within α of best

0.8

0.7

0.6

0.5

0.4

BCSEXT−LIB (11 failed)


0.3 MA57 (5 failed)
MUMPS (9 failed)
MUMPS−unsym (10 failed)
0.2
Oblio (5 failed)
PARDISO (3 failed)
0.1 SPOOLES (14 failed)
UMFPACK (17 failed)
WSMP (31 failed)
0
1 1.5 2 2.5 3 3.5 4 4.5 5
α

Edinburgh January 2005


Indefinite definite problems: main findings

• There are some tough problems in the real world!


• The static pivoting used by PARDSO is fast but there are a few
examples where this can lead to an inaccurate solution (that
does not converge with iterative refinement).
• A penalty of static pivoting is need for iterative refinement
which increases the solve time.
• Better out-of-core solvers are needed for really large problems.

Edinburgh January 2005


The future
• The study has shown up weaknesses in the solvers.
• It has already led to the authors of several of the codes making
improvements.
• For some, further changes/new versions are on the way.
• The static pivoting used by PARDISO has, in particular, opened
up new areas of research.
• Reliably solving large, tough indefinite problems (which may be
singular) remains a challenge.
• Note: the ease of use, generality of input data, quality of user
documentation, software maintenance etc varies considerably
between packages.

All results shortly to appear in a Rutherford Technical Report.

Edinburgh January 2005


Our thanks

We would like to thank all the authors of the codes for their
invaluable help with this project.

We hope it has been mutually beneficial.

More pictures see: www.numerical.rl.ac.uk/talks/talks.shtml

Edinburgh January 2005


Positive definite problems: factor entries
Performance Profile: 0.Real.factor − 88 positive−definite problems, u=default
1

0.9
fraction of problems for which solver within α of best

0.8

0.7

0.6

0.5

0.4

MA57 (1 failed)
0.3 MUMPS (1 failed)
Oblio (2 failed)
PARDISO (1 failed)
0.2
SPOOLES (2 failed)
SPRSBLKLLT (1 failed)
0.1 TAUCS (2 failed)
UMFPACK (4 failed)
WSMP (1 failed)
0
1 1.5 2 2.5 3 3.5 4 4.5 5
α

Edinburgh January 2005


Indefinite problems: factor entries
Performance Profile: 1.Real.factor − 61 indefinite problems
1

0.9
fraction of problems for which solver within α of best

0.8

0.7

0.6

0.5

0.4

0.3 MA57 (2 failed)


MUMPS (16 failed)
MUMPS−unsym (8 failed)
0.2
Oblio (8 failed)
PARDISO (3 failed)
0.1 SPOOLES (15 failed)
UMFPACK (5 failed)
WSMP (31 failed)
0
1 1.5 2 2.5 3 3.5 4 4.5 5
α

Edinburgh January 2005


Indefinite problems, tiny u: factor entries
Performance Profile: 2.Real.factor − 61 indefinite problems, u=tiny
1

0.9
fraction of problems for which solver within α of best

0.8

0.7

0.6

0.5

0.4

0.3 MA57 (5 failed)


MUMPS (9 failed)
MUMPS−unsym (10 failed)
0.2
Oblio (5 failed)
PARDISO (3 failed)
0.1 SPOOLES (14 failed)
UMFPACK (17 failed)
WSMP (31 failed)
0
1 1.5 2 2.5 3 3.5 4 4.5 5
α

Edinburgh January 2005


Indefinite problems, tiny u: Factorise CPU times
Performance Profile: 2.Factorise.CPU − 61 indefinite problems, u=tiny
1

0.9
fraction of problems for which solver within α of best

0.8

0.7

0.6

0.5

0.4

BCSEXT−LIB (11 failed)


0.3 MA57 (5 failed)
MUMPS (9 failed)
MUMPS−unsym (10 failed)
0.2
Oblio (5 failed)
PARDISO (3 failed)
0.1 SPOOLES (14 failed)
UMFPACK (17 failed)
WSMP (31 failed)
0
1 1.5 2 2.5 3 3.5 4 4.5 5
α

Edinburgh January 2005


Indefinite problems, tiny u: Solve CPU times
Performance Profile: 2.Solve.CPU − 61 indefinite problems, u=tiny
1

0.9
fraction of problems for which solver within α of best

0.8

0.7

0.6

0.5

0.4

BCSEXT−LIB (11 failed)


0.3 MA57 (5 failed)
MUMPS (9 failed)
MUMPS−unsym (10 failed)
0.2
Oblio (5 failed)
PARDISO (3 failed)
0.1 SPOOLES (14 failed)
UMFPACK (17 failed)
WSMP (31 failed)
0
1 1.5 2 2.5 3 3.5 4 4.5 5
α

Edinburgh January 2005

You might also like