Professional Documents
Culture Documents
Intel Adaptive Spike-Based Solver 1.0 User Guide
Intel Adaptive Spike-Based Solver 1.0 User Guide
I V1
W2 I V2
W3 I V3
W4 I
1
INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNEC-
TION WITH INTEL r PRODUCTS. NO LICENSE, EXPRESS OR IM-
PLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL
PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT
AS PROVIDED IN INTELS TERMS AND CONDITIONS OF SALE FOR
SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER,
AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY,
RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUD-
ING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A
PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT
OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROP-
ERTY RIGHT.
UNLESS OTHERWISE AGREED IN WRITING BY INTEL, THE INTEL
PRODUCTS ARE NOT DESIGNED NOR INTENDED FOR ANY APPLI-
CATION IN WHICH THE FAILURE OF THE INTEL PRODUCT COULD
CREATE A SITUATION WHERE PERSONAL INJURY OR DEATH MAY
OCCUR.
Intel may make changes to specifications and product descriptions at any
time, without notice. Designers must not rely on the absence or characteris-
tics of any features or instructions marked reserved or undefined. Intel
reserves these for future definition and shall have no responsibility whatso-
ever for conflicts or incompatibilities arising from future changes to them.
The information here is subject to change without notice. Do not finalize a
design with this information.
The products described in this document may contain design defects or er-
rors known as errata which may cause the product to deviate from published
specifications. Current characterized errata are available on request.
Contact your local Intel sales oce or your distributor to obtain the latest
specifications and before placing your product order.
Copies of documents which have an order number and are referenced in this
document, or other Intel literature, may be obtained by calling 1-800-548-
4725, or by visiting Intels Web Site.
2
* Other names and brands may be claimed as the property of others.
3
Contents
1 Overview 6
1.1 A Quick What, Why, and How . . . . . . . . . . . . . . . . . 6
1.2 A Hello World Example . . . . . . . . . . . . . . . . . . . . 9
1.3 Future Developments . . . . . . . . . . . . . . . . . . . . . . . 11
1.4 User Guide Outline . . . . . . . . . . . . . . . . . . . . . . . . 11
3 Separate calls 20
4 Banded Preconditioner 22
6 Intel
r
Adaptive Spike-Based Solver Examples 27
6.1 Example1: Automatic Partitioning . . . . . . . . . . . . . . . 27
6.2 Example2: Automatic Partitioning and Multiple RHS . . . . 29
6.3 Example3: Automatic Partitioning and Multiple RHS with
Separate Factorization and Solution . . . . . . . . . . . . . . 30
6.4 Example4: Manual Partitioning . . . . . . . . . . . . . . . . . 32
6.5 Example5: Automatic Partitioning Using the CSR Input For-
mat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
6.6 Example 6: Automatic Partitioning Using the CSR Input
Format with a Preconditioner . . . . . . . . . . . . . . . . . . 35
6.7 Toeplitz Matrix Example . . . . . . . . . . . . . . . . . . . . 37
6.8 Sparse Banded Matrix Example . . . . . . . . . . . . . . . . . 38
6.9 Calling Intel
r
Adaptive Spike-Based Solver from C Programs 41
7 Reference guide 43
7.1 Intel
r
Adaptive Spike-Based Solver 1.0 directory structure . 43
7.2 Intel
r
Adaptive Spike-Based Solver and ScaLAPACK . . . . 46
4
7.3 Spike Default . . . . . . . . . . . . . . . . . . . . . . . . . . 47
7.4 Spike . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
7.5 Spike Begin . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
7.6 Spike Preprocess . . . . . . . . . . . . . . . . . . . . . . . . 50
7.7 Spike Process . . . . . . . . . . . . . . . . . . . . . . . . . . 51
7.8 Spike End . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
7.9 spike param details . . . . . . . . . . . . . . . . . . . . . . . 52
7.10 matrix data details . . . . . . . . . . . . . . . . . . . . . . . 53
7.11 info details . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Bibliography 54
5
Chapter 1
Overview
A1
B1
A2
C2
Partitioned
A =
B2
A3
C3
6
block diagonal matrix consisting of all the Aj blocks (see Figure 1.1) and
S is D1 A, assuming for the moment that the Aj blocks are non-singular.
Matrix S has the structure of an identity matrix with some extra spikes,
hence the name of the package (Figure 1.2). In practice, D and S may
A = D S
A1 A1
I V1
B1
A2 A2
C2
W2 I V2
B2
A3 A3
C3 W3 I
A = D S + R
The key step of this iterative method is the solution of systems with the
D S matrix. Solving AX = F can now be seen as involving three steps
conceptually:
1. Solving the block-diagonal system DG = F . Because D consists of
decoupled systems of each diagonal block Ai , they can be solved in
parallel without synchronization between the individual systems. A
number of strategies based on the LU decomposition of each Ai can be
applied here. These include variants such as LU without pivoting, LU
with pivoting, as well as a combination of LU and U L decompositions
with or without pivoting.
7
3. Depending on how D and S were obtained earlier, which is related to
the exact strategy used in the two previous steps, R can be zero or
non-zero. If R is zero, then of course the Y obtained is the desired
solution to AX = F . Otherwise, some corrections must be computed.
This can be accomplished by a number of standard iterative methods
such as iterative refinement, GMRES, or BiCGStab, just to name a
few.
All in all, a large variety of strategies can be applied based on the basic
decomposition A = DS and the realization of the approximations D and
S; i.e., A = D S + R in which R is a correction, where M = D S is
an eective preconditioner for a variety of iterative schemes. The package
oers a number of choices to solve AX = F based on the framework of this
decomposition.
One can use the software to compute the solution of AX = F by a single
call where the specific strategy can be selected automatically or manually.
A user can also solve a system by issuing several step-by-step calls similar to
separating the LU factorization and the forward/backward substitutions in
LAPACK [1]. In this case, the user can handle more interesting situations
including the solution of dierent right-hand sides (RHS) at dierent times,
AXi = Fi while amortizing those one-time computation costs related to the
same matrix A.
To summarize, Intel r
Adaptive Spike-Based Solver 1.0 aims to solve
AX = F in parallel where A is a banded matrix. It currently supports users
using MPI to express parallelism. The algorithmic framework is based on a
decomposition of the form A = D S + R. This framework allows many
dierent strategies that can exploit special properties of the underlying pro-
cessor architectures, network properties, as well as the numerical nature of
the input matrix A. Intel r
Adaptive Spike-Based Solver 1.0 consists of two
main layers: a computational layer called Spike Core and a strategy selection
layer called Spike Adapt. Spike Core consists of the necessary linear algebra
software to support dierent solution strategies whereas Spike Adapt is an
independent layer that selects an ecient strategy based on the characteris-
tics of the input matrix A and the underlying computer system. By default,
Spike Adapt automatically picks a strategy on the users behalf. Neverthe-
less, expert users have the option to pick a strategy manually. A strategy
is defined by algorithmic choices for each of the three steps (involving D, S,
and as needed for non-zero R) outlined previously.
A user can ask for the solution to the problem AX = F via a single
software library call. This is covered in Chapter 2. Alternatively, this single
function call can be replaced by separate calls similar to separating the
calls to triangular factorization and the subsequent triangular solves. This
added complexity is especially worthwhile when solutions with dierent RHS
for the same matrix A are needed at dierent times, allowing the common
preprocessing cost pertaining to A to be amortized. Invoking the package
with multiple function calls is covered in Chapter 3. Finally, concerning
data distribution, the user can provide the complete matrix A and the RHS
in the MPI master process and rely on the functionality provided by the
8
software package to distribute the data to the remaining MPI processes.
Alternatively, the user can manually distribute the data. Chapter 5 covers
the data distribution options in greater detail.
A single call to the SPIKE subroutine takes care of data distribution and
strategy selection. The user only needs to set a few global parameters such
as number of processors, the local MPI rank, and the structure and the band-
width of the matrix. The matrix and RHS data are stored initially on the
MPI master process (i.e., process-0). The source code of hello world.f90
is listed in Figure 1.3. To create the executable, compile the source program
and link it with the Intel r
Adaptive Spike-Based Solver 1.0 libraries which
also provide BLAS and LAPACK libraries. Assuming that package has been
installed in a directory called <SPIKE directory> and the user is compiling
the source program called hello world.f90:
mpiifort hello world.f90 -o hello world.exe \
-I<SPIKE directory>/include \
-L<SPIKE directory>/lib/<arch> \
-lspike -lspike mpi comm \
-lspike adapt -lspike adapt de -lspike adapt grid f \
-lmkl solver -lmkl lapack -lmkl -lguide -lpthread
where mpiifort is the Fortran compiler driver for the Intel MPI Library
and <arch> is either 64, for IA-64 architecture or em64t, for Intel r 64
architecture.
A run of the resulting executable hello world.exe may look like
9
INCLUDE s p i k e . f i
program h e l l o w o r l d c o d e
use s p i k e m o d u l e
use mpi
! b e f o r e t h e MPIINIT c a l l i n g s e q u e n c e s
i n t e g e r : : i , rank , n b p r o c s , c o d e
integer : : i n f o
type ( s p i k e p a r a m ) : : p s p i k e ! Spike parameter data s t r u c t u r e
type ( m a t r i x d a t a ) : : mat ! Spike matrix data s t r u c t u r e
double p r e c i s i o n , dimension ( : , : ) , a l l o c a t a b l e : : f ! r h s
c a l l MPI INIT ( c o d e )
c a l l MPI COMM SIZE(MPI COMM WORLD, n b p r o c s , c o d e )
c a l l MPI COMM RANK(MPI COMM WORLD, rank , c o d e )
! s e t up S p i k e p a r a m e t e r d a t a s t r u c t u r e on a l l p r o c e s s o r s
p s p i k e%n b p r o c s=n b p r o c s ; p s p i k e%r a n k=r a n k
c a l l SPIKE DEFAULT( p s p i k e ) ! default values for pspike
! s e t up S p i k e m a t r i x d a t a p a r a m e t e r s on a l l p r o c e s s o r s
mat%f o r m a t = D ; mat%a s t r u = G ; mat%d i a g d o = Y
mat%n = 3 2 ; mat%k l = 1 ; mat%ku = 1
! c r e a t e i n p u t m a t r i x and r h s on P r o c e s s o r 0
i f ( r a n k = = 0 ) then
a l l o c a t e ( mat%A ( 1 : mat%k l+mat%ku + 1 , mat%n ) )
a l l o c a t e ( f ( 1 : mat%n , 1 : 1 ) )
mat%A( 1 , : ) = 1 . 0 d0 ; mat%A ( 2 , : ) = 6 . 0 d0 ; mat%A( 3 , : ) = 1 . 0 d0
f = 1 . 0 d0
end i f
! one c a l l t o S p i k e f o r s o l v i n g Ax=f
c a l l SPIKE ( p s p i k e , mat , f , i n f o )
if ( i n f o >=0) then
i f ( r a n k = = 0 ) then
do i =1 ,mat%n
print , i , f ( i , 1 )
end do
end i f
end i f
c a l l MPI FINALIZE ( c o d e )
end program h e l l o w o r l d c o d e
1 0.207106781186548
2 0.242640687119285
3 0.248737341529163
4 0.249783362055695
5 0.249962830805007
6 0.249993622774344
7 0.249998905841058
8 0.249999812272004
9 0.249999967790968
10 0.249999994473804
11 0.249999999051855
12 0.249999999837324
13 0.249999999972089
14 0.249999999995211
15 0.249999999999174
16 0.249999999999835
17 0.249999999999835
18 0.249999999999174
19 0.249999999995211
20 0.249999999972089
21 0.249999999837324
22 0.249999999051855
23 0.249999994473804
24 0.249999967790968
25 0.249999812272004
26 0.249998905841058
27 0.249993622774344
28 0.249962830805007
29 0.249783362055695
30 0.248737341529163
31 0.242640687119285
32 0.207106781186548
10
1.3 Future Developments
Enhancements to Intel r
Adaptive Spike-Based Solver 1.0 will be made in
several orthogonal areas: the kinds of sparse matrices handled via added
utility functions, the set of solution strategies it encompasses, and the variety
of parallel environments it supports.
When A is a general sparse matrix, often times reordering can transform
it either into a banded matrix, or a low-rank perturbation of a banded
matrix. We intend to oer utilities for matrix reordering and capabilities to
handle more general sparse matrices.
In addition to the current LU -based strategies for handling the diagonal
blocks of the D matrix, we intend to add other strategies (e.g., based on
least-squares) to handle very ill-condition systems. Other data distribution
strategies that exhibit better load-balancing properties will also be added.
MPI is the only parallel environment supported currently but alternative
parallel environments may be considered in future releases.
11
Chapter 2
Intelr
Adaptive Spike-Based Solver 1.0 contains two main components:
Spike Core is the component that implements the underlying numerical
methods including for example the solution of the S system in A = DS+R,
factorization of the D system, and outer iterations to deal with a non-zero
R. The second component Spike Adapt implements a strategy selection
method based on information about the underlying architecture, computer
platform, and the linear system in question. The single driver Spike con-
veniently integrates and makes available the functionalities oered by these
two components to the user via a single call. In brief, this driver exercises
the strategy selection mechanism and then proceeds to solve AX = F for
X given A and F using the selected strategy. The user can find out what
strategy was chosen by examining several parameters in the program, or by
running the standalone binary executable spike adapt.exe (at command
line) that comes with the package. The user also has the option of selecting a
strategy manually through setting several parameters, but this requires more
detailed knowledge of how the strategies work. To this end, this chapter also
gives a brief guideline on choosing strategies, but defers to the Appendix for
a more mathematical description.
The single driver call is
<SPIKE directory>/tools/environment
directory where <SPIKE directory> is the packages main directory after in-
stallation. For example, it could be
/opt/intel/spike/1.0.
12
These scripts set environment variables that are needed to build and run
applications using the software package. Select the appropriate script for the
Linux shell and architecture. For example, to initialize the package for the
BASH shell on an Intel r EM64T system, execute the following command:
2.2 Autoadapt
As illustrated in the hello world program in Figure 1.3, parameters con-
tained in two components of the derived type spike param variable pspike
need to be set. While the type spike param has many components, only
two need to be set manually by the user; the rest can be assigned default
values by making a call to the routine Spike Default. The two components
that need to be set are
Component Type Description
nbprocs integer number of processors - MPI related
rank integer rank of the local processor - MPI related
The rest of the components can be set to their default by calling the
routine Spike Default. For example
will set those components in the derived type spike param variable pspike to
their default values. These default values are given in Table 2.1. Note that
some of these components are inout in nature which means subroutine may
actually overwrite the input values as a result of executing the software. The
spike param derive type consists of a host of other output components.
Refer to Section 7.9 for comprehensive information.
2.3 Data
In this section we explain how we can set up the parameters within the
type matrix data variable mat that we use in our calling sequence example.
The type matrix data main purpose is to hold the matrix represented in a
number of popular representation. Both the LAPACK banded-type stor-
age format (without additional storage for pivoting) or CSR (Compressed
Sparse Row) format are supported. Depending on the the specific value of
13
Component Type Default Description
The three components above together specify a strategy for solving a banded
system using the Spike framework. When the autoadapt component below
is set to .true.(which is the default value), the input values of these three
components are ignored and overwritten to record the automatically chosen
strategy. Section 2.4 has more details on manual strategy selection.
Table 2.1: List of input components for the derived type spike param. Note:
RSS, DFS, OIS are inout whereas the rest are input only.
14
pspike%tp in the variable pspike being passed to the routine, the mat variable
on MPI process-0 may be used to hold the full original, or the mat variable
on each of the MPI processes may be used to hold part of the original ma-
trix. In the former case, Spike Core will partition the data held on process-0
and distribute them to the other processes under the hood. In the latter
case, the user needs to manually put the appropriate part of the matrix
on each of the dierent MPI processes. Chapter 5 will give the necessary
details for one to perform this task. For now, Table 2.2 gives details of the
matrix data structure relevant for pspike%tp=0, that is, the user put the
complete matrix into the mat variable on process-0.
Finally, if the matrix data have been defined on process-0, the data for
the right-hand-side (rhs) should also be defined on process-0 as described in
Table 2.3.
15
mat% Type Distribution Description
The input fields below are associated with the sparse CSR format (mat%format=S)
Table 2.2: List of parameter fields of the type matrix data variable mat.
Here all the matrix data are stored in process-0. If space for mat%A in
process-0 is allocated dynamically, the user may want to have it deallocated
automatically by setting pspike%memfree = .true.All the other parameter
fields must be declared as global (i.e. common for each MPI process).
16
Parameter Type Distribution Description
f double(mat%n,nrhs) (inout) rank 0 Right-hand side f (in)
Solution x of Ax=f (out)
nrhs stands for # of RHS
Table 2.3: Definition of the RHS (in) and solution (out) stored in rank 0.
While RSS and DFS are mostly orthogonal, they are not completely so. In-
deed, some factorization strategies are motivated and consequently appli-
cable only to some particular reduced system strategies. Therefore, not all
17
pspike%tp pspike%nbprocs
1 2 2n (n > 1) Even (= 2n ) Odd
TU FL
0 RL RP All All TU FL
EA TA
TU FL
1 None All TU FL TU FL
RL RP
Table 2.4: This table illustrates how the type of matrix partitioning and the
number of MPI processes aect the choice of (RSS,DFS) for the Spike Core
strategy. In future developments of Intelr
Adaptive Spike-Based Solver, the
choice of (RSS,DFS) will be independent of the setting of the tp component.
spike adapt.exe
<SPIKE directory>/bin/<arch>
where arch is either 64, for IA-64 architecture, or em64t, for Intel r 64
architecture. Given a set of input characteristics (matrix size, bandwidth,
number of MPI processes, sparsity, diagonal dominance, the number of right-
hand sides, type of matrix partitioning), this executable will suggest an
optimal Spike Core strategy. Edit the Fortran NAMELIST file, ivars.nml,
to specify the matrix parameters, e.g.:
18
sparsity = 0.9d0
diagonal_dominance = 1.2d0
n_rhs = 1
tp = 0 /
[cluster0]$ spike_adapt.exe
./spike_adapt.exe
Bandwidth = 161
Diagonal dominance = 1.20000000000000
Matrix size = 400000
Sparsity = 0.900000000000000
# RHS = 1
# Procs = 4
Type of partition: 0
19
Chapter 3
Separate calls
CALL Spike(pspike,mat,f,info)
We can see in additional to pspike, mat, f and info, there is a new parameter
pre needed for the split calls. This parameter pre is of type matrix data
and pertains to a preconditioner. However, the user needs not set any of the
component values. Consider it a work array of some sort that the software
uses internally.
Splitting a single call to SPIKE is useful for applications having iterations
with changing right-hand-sides but using the same original matrix. The fol-
lowing program invokes Spike Process multiple times rather than invoking
Spike multiple times. Figure 3.1 presents a program solving two dierent
right hand sides: (1, 0, 0, 0, 0, 0, 0, 0)T and then (0, 1, 0, 0, 0, 0, 0, 0)T . Note
that the program uses the global partitioning scheme, so the right hand sides
are set up in node 0.
In the program, Spike Begin, Spike Preprocess and Spike End are
called once while Spike Process is called twice (once for each right hand
side). This program is expected to run faster than an equivalent one with
20
! Declare v a r i a b l e s u s e d b y SpikePACK
integer : : i n f o
type ( s p i k e p a r a m ) : : pspike
type ( m a t r i x d a t a ) : : mat , p r e
double p r e c i s i o n , dimension ( 8 , 1 ) : : f
...
! S e t up p s p i k e and mat a s usual
...
! The f o l l o w i n g t w o c a l l s a r e c a l l e d o n c e
c a l l S p i k e B e g i n ( p s p i k e , mat , pre , i n f o )
c a l l S p i k e P r e p r o c e s s ( p s p i k e , pre , i n f o )
! S o l v e f o r t h e f i r s t r i g h t hand s i d e
i f ( r a n k = = 0 ) then
f =0.0 d0
f ( 1 , 1 ) = 1 . 0 d0
end i f
! S p i k e P r o c e s s ( ) i s i n v o k e d f o r t h e f i r s t r i g h t hand s i d e
c a l l S p i k e P r o c e s s ( p s p i k e , mat , pre , f , i n f o )
! The s o l u t i o n o f t h e f i r s t RHS i s s t o r e d i n f a f t e r S p i k e P r o c e s s ( ) .
...
! S o l v e f o r t h e s e c o n d r i g h t hand s i d e
i f ( r a n k = = 0 ) then
f=f 0 . 1 d0
end i f
! S p i k e P r o c e s s ( ) i s i n v o k e d f o r t h e s e c o n d r i g h t hand s i d e
c a l l S p i k e P r o c e s s ( p s p i k e , mat , pre , f , i n f o )
! The s o l u t i o n o f t h e s e c o n d RHS i s s t o r e d i n f a f t e r S p i k e P r o c e s s ( ) .
...
! The f o l l o w i n g c a l l i s c a l l e d o n c e
c a l l S p i k e E n d ( p s p i k e , mat , pre , info )
...
Figure 3.1: A program solving two right hand side using separate Spike calls.
two Spike calls because this program only initializes and frees Spike data
structures once while a program calling Spike twice would have duplicated
these works.
21
Chapter 4
Banded Preconditioner
Intelr
Adaptive Spike-Based Solver can be used as a framework for solving
banded systems to be used as eective preconditioners for general sparse sys-
tems, which are solved via iterative methods. In future releases, we will oer
dierent options for enabling an automatic derivation of a robust banded
preconditioner from an arbitrary general sparse systems. In particular the
component %BPS for the derived type spike param in Table 2.1, has been
introduced to such eect. For the current version, Version 1.0, the compo-
nent %BPS can only take two values: 0 (no preconditioner default value) or
1 where the banded preconditioner has to be set by the user. Some users
may take advantage of this option in the case where banded preconditioners
can be constructed directly from an application at hand, such as in nano-
electronics nanowire simulations [7]. Using the separate calling sequences
presented in Chapter 3, one can decide on a preconditioner pre that will be
called by the preprocessing sequence, while the processing sequence takes
advantage of the obtained factorization of the preconditioner to accelerate
the outer-iterative schemes. Therefore, with the option %BPS=1 the user
has the possibility of defining his own banded preconditioner (either dense
or sparse within the band) for solving iteratively an original system ma-
trix that can be general sparse. Depending on the data distribution format
(component %tp), the user must define the preconditioner pre using the de-
rived type spike param in a similar way he defines the original matrix mat
either using Table 2.2 (%tp=0) or Table 5.1 (%tp=1). In Chapter 6, Example
6 illustrates the use of the option %BPS= 1.
22
Chapter 5
It has been assumed until now that the matrix and RHS data initially reside
in the MPI master process (i.e., process-0). This is specified by setting the
spike param tp parameter to zero. When the matrix and RHS are entirely
in process-0, our software package automatically distributes a portion of the
data to each MPI process before invoking the solver. The price paid for
this convenience is the overhead associated with the data distribution and
potential limits on the overal problem size. Specifically, the problem size is
limited by the memory available to process-0.
Alternatively, we allow the user to partition dense matrices and RHSs
among the MPI processes before calling Spike Core. This chapter describes
the local partitioning schemes supported by Intel r
Adaptive Spike-Based
Solver 1.0. Let pspike and mat be the variables of type spike param and
matrix data, respectively, used during calls to Spike Default, Spike, Spike Begin,
Spike Process, etc. The dense banded format is specified by mat%format
= D, while the sparse CSR format is specified by mat%format = S. In
the following pspike%tp is set to 1 to manually distribute the matrix and
RHS to the MPI processes.
If the software were to distribute the data automatically (i.e., tp=0), one
would allocate a space of bwd-by-n for mat%A. Here Table 5.1 gives details
of the matrix data structure relevant for pspike%tp=1, that is, the user
distributes manually the complete matrix into the local mat variable on
each processors. Figure 5.1 illustrates this partitioning scheme. The user
must distribute this bwd-by-n array into pspike%nbprocs arrays of dimension
bwd-by-nj where the values of nj satisfying
nbprocs
!
n= nj
j=1
23
are set by the user. The values of nj are stored globally (i.e. commun
for all processors) in the array of integer mat%sizeA of dimension nbprocs,
such that mat%sizeA=(n1 , n2 , . . . , nnbprocs ). The matrix elements are stored
locally on each processors in mat%A.
The RHSs are distributed by rows in a natural way. Each MPI process
j 1 will have an array of dimension nj -by-nRHS, for j = 1, 2, . . . , nbprocs.
1 2 3 4
are set by the user. Figure 5.2 illustrates this partitioning scheme and Ta-
ble 5.1 gives details of the matrix data structure relevant for pspike%tp=1.
24
mat% Type Distribution Description
The input field below is for the dense banded case (mat%format=D)
The input fields below are associated with the sparse CSR format (mat%format=S)
Table 5.1: List of parameter fields of the type matrix data variable mat.
Here all the matrix data are distributed on each processors with pspike%tp=1.
All the other parameter fields must be declared as global (i.e. common for
each MPI process).
25
The values of nnzj are stored locally (i.e. on each processors) in the
integer mat%nbsa. The matrix elements are also stored locally on each pro-
cessors in the arrays of integer mat%sa, mat%jsa, mat%isa with dimension
mat%nbsa, mat%nbsa and nj + 1, repectively.
The RHSs are distributed by rows in a natural way. Each MPI process
j 1 will have an array of dimension nj -by-nRHS, for j = 1, 2, . . . , nbprocs.
26
Chapter 6
Solver Examples
6 1 1 0 0 0 0 0 x1 f1
1 6 1 1 0 0 0 0 x2 f2
1 1 6 1 1 0 0 0 x3 f3
0 1 1 6 1 1 0 0 x4 = f4
0 0 1 1 6 1 1 0 x5 f5
0 0 0 1 1 6 1 1 x6 f6
0 0 0 0 1 1 6 1 x7 f7
0 0 0 0 0 1 1 6 x8 f8
Note that examples 1, 2, 3, and 5 can use 1, 2, or 4 MPI processes.
Example 4 is designed for only 2 MPI processes.
program example1
use s p i k e m o d u l e
use mpi
i m p l i c i t none
i n t e g e r : : rank , code , n b p r o c s , i
double p r e c i s i o n , dimension ( : , : ) , a l l o c a t a b l e : : f
! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
integer : : i n f o
type ( s p i k e p a r a m ) : : p s p i k e
type ( m a t r i x d a t a ) : : mat
!
c a l l MPI INIT ( c o d e )
c a l l MPI COMM SIZE(MPI COMM WORLD, n b p r o c s , c o d e )
c a l l MPI COMM RANK(MPI COMM WORLD, rank , c o d e )
c a l l M P I E r r h a n d l e r s e t (MPI COMM WORLD, MPI ERRORS RETURN, c o d e ) ;
!
27
! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
! ! ! ! ! ! ! ! ! ! INPUT PARAMETER SPIKE
! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
p s p i k e%n b p r o c s=n b p r o c s
p s p i k e%r a n k=r a n k
c a l l SPIKE DEFAULT( p s p i k e )
! ! c h a n g e s f ro m d e f a u l t
p s p i k e%a u t o a d a p t =. f a l s e .
! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
! ! ! ! ! ! ! ! ! ! INPUT PARAMETER MATRIX and RHS
! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
! ! All processors
mat%f o r m a t =D
mat%ASTRU=G
mat%DIAGDO=Y
mat%n=8
mat%k l =2
mat%ku=2
! ! G l o b a l m a t r i x i s d e f i n e d o n l y on p r o c e s s o r 0
i f ( r a n k ==0) then ! ! o n l y on p r o c e s s o r 0 ( g l o b a l matrix )
a l l o c a t e ( mat%A ( 1 : mat%k l+mat%ku +1 ,mat%n ) )
mat%A( mat%ku + 1 , : ) = 6 . 0 d0
mat%A( mat%ku 1 ,:)= 1.0 d0
mat%A( mat%ku , : ) = 1 . 0 d0
mat%A( mat%ku +2 ,:)= 1.0 d0
mat%A( mat%ku +3 ,:)= 1.0 d0
! ! RHS
a l l o c a t e ( f ( 1 : mat%n , 1 : 1 ) )
f =1.0 d0
end i f
! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
! ! ! ! ! ! ! ! ! ! CALLING SPIKE ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !
! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
c a l l SPIKE ( p s p i k e , mat , f , i n f o )
i f ( i n f o >=0) then
! ! ! ! ! ! Global Solution
i f ( r a n k ==0) then
print , Global s o l u t i o n
do i =1 ,mat%n
print , i , f ( i , 1 )
end do
end i f
end i f
c a l l MPI FINALIZE ( c o d e )
end program example1
28
6.2 Example2: Automatic Partitioning and Mul-
tiple RHS
In this example, two systems with same coecient matrix are solved. The
RHS are (1, 0, 0, 0, 0, 0, 0, 0)T and (0, 1, 0, 0, 0, 0, 0, 0)T . This example calls
the subroutine SPIKE.
INCLUDE s p i k e . f i
program example2
use mpi
i m p l i c i t none
i n t e g e r : : rank , code , n b p r o c s , i
double p r e c i s i o n , dimension ( : , : ) , a l l o c a t a b l e : : f
! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
integer : : i n f o
type ( s p i k e p a r a m ) : : p s p i k e
type ( m a t r i x d a t a ) : : mat
!
c a l l MPI INIT ( c o d e )
c a l l MPI COMM SIZE(MPI COMM WORLD, n b p r o c s , c o d e )
c a l l MPI COMM RANK(MPI COMM WORLD, rank , c o d e )
c a l l M P I E r r h a n d l e r s e t (MPI COMM WORLD, MPI ERRORS RETURN, c o d e ) ;
!
! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
! ! ! ! ! ! ! ! ! ! INPUT PARAMETER SPIKE
! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
p s p i k e%n b p r o c s=n b p r o c s
p s p i k e%r a n k=r a n k
c a l l SPIKE DEFAULT( p s p i k e )
! ! c h a n g e s f ro m d e f a u l t
p s p i k e%a u t o a d a p t =. f a l s e .
p s p i k e%DFS=L
! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
! ! ! ! ! ! ! ! ! ! INPUT PARAMETER MATRIX and RHS
! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
mat%f o r m a t =D
mat%ASTRU=G
mat%DIAGDO=Y
mat%n=8
mat%k l =2
mat%ku=2
if ( r a n k ==0) then
a l l o c a t e ( mat%A ( 1 : mat%k l+mat%ku +1 ,mat%n ) )
mat%A( mat%k l + 1 , : ) = 6 . 0 d0
mat%A( mat%k l 1 ,:)= 1.0 d0
mat%A( mat%k l , : ) = 1 . 0 d0
mat%A( mat%k l +2 ,:)= 1.0 d0
mat%A( mat%k l +3 ,:)= 1.0 d0
! ! RHS
a l l o c a t e ( f ( 1 : mat%n , 1 : 2 ) )
f =0.0 d0
f ( 1 , 1 ) = 1 . 0 d0
f ( 2 , 2 ) = 1 . 0 d0
end i f
! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
! ! ! ! ! ! ! ! ! ! CALLING SPIKE ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !
! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
c a l l SPIKE ( p s p i k e , mat , f , i n f o )
i f ( i n f o >=0) then
! ! ! ! ! ! Global Solution
i f ( r a n k ==0) then
print , Global s o l u t i o n
do i =1 ,mat%n
print , i , f ( i , 1 ) , f ( i , 2 )
end do
end i f
! !!!!!!!!!
end i f
c a l l MPI FINALIZE ( c o d e )
end program example2
29
We get the following output:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> SPIKE SUMMARY >>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Spike Strategy RL3
TIME FOR PARTITIONING 4 . 2 5 1 9 5 6 9 3 9 6 9 7 2 6 6 e 03
TIME FOR SPIKE BANDED FACT 7 . 1 0 0 1 0 5 2 8 5 6 4 4 5 3 1 e 04
TIME FOR SPIKE BANDED SOLV 6 . 8 4 0 2 2 9 0 3 4 4 2 3 8 2 8 e 04
program example3
use mpi
i m p l i c i t none
i n t e g e r : : rank , code , n b p r o c s , i
double p r e c i s i o n , dimension ( : , : ) , a l l o c a t a b l e : : f
! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
integer : : i n f o
type ( s p i k e p a r a m ) : : p s p i k e
type ( m a t r i x d a t a ) : : mat , p r e
!
c a l l MPI INIT ( c o d e )
c a l l MPI COMM SIZE(MPI COMM WORLD, n b p r o c s , c o d e )
c a l l MPI COMM RANK(MPI COMM WORLD, rank , c o d e )
c a l l M P I E r r h a n d l e r s e t (MPI COMM WORLD, MPI ERRORS RETURN, c o d e ) ;
!
! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
! ! ! ! ! ! ! ! ! ! INPUT PARAMETER SPIKE
! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
p s p i k e%n b p r o c s=n b p r o c s
p s p i k e%r a n k=r a n k
c a l l SPIKE DEFAULT( p s p i k e )
! ! c h a n g e s f ro m d e f a u l t
p s p i k e%a u t o a d a p t =. f a l s e .
! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
! ! ! ! ! ! ! ! ! ! INPUT PARAMETER MATRIX and RHS
! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
mat%f o r m a t =D ! D f o r Dense Banded , S f o r S p a r s e b a n d e d , G for General Sparse
mat%ASTRU=G ! ! ! G e n e r a l s t r u c t u r e ( nons y m m e t r i c )
mat%DIAGDO=Y
mat%n=8
mat%k l =2
mat%ku=2
if ( r a n k ==0) then
a l l o c a t e ( mat%A ( 1 : mat%k l+mat%ku +1 ,mat%n ) )
mat%A( mat%ku + 1 , : ) = 6 . 0 d0
mat%A( mat%ku 1 ,:)= 1.0 d0
30
mat%A( mat%ku , : ) = 1 . 0 d0
mat%A( mat%ku +2 ,:)= 1.0 d0
mat%A( mat%ku +3 ,:)= 1.0 d0
! ! RHS
a l l o c a t e ( f ( 1 : mat%n , 1 : 1 ) )
end i f
! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
! ! ! ! ! ! ! ! ! ! CALLING SPIKE ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !
! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
if ( r a n k ==0) then
f =0.0 d0
f ( 1 , 1 ) = 1 . 0 d0
end i f
i f ( i n f o >=0) then
! ! ! ! ! ! Global Solution 1
i f ( r a n k ==0) then
print , Global s o l u t i o n 1
do i =1 ,mat%n
print , i , f ( i , 1 )
end do
end i f
! !!!!!!!!!
end i f
if ( r a n k ==0) then
f =0.0 d0
f ( 2 , 1 ) = 1 . 0 d0
end i f
i f ( i n f o >=0) then
! ! ! ! ! ! Global Solution 2
i f ( r a n k ==0) then
print , Global s o l u t i o n 2
do i =1 ,mat%n
print , i , f ( i , 1 )
end do
end i f
! !!!!!!!!!
end i f
c a l l MPI FINALIZE ( c o d e )
31
4 4 . 4 6 2 0 5 3 6 7 6 7 1 3 3 6 3 E002
5 1 . 8 4 4 2 5 2 6 2 9 5 9 2 9 4 4 E002
6 1 . 1 9 1 9 9 2 2 9 1 4 6 8 7 3 1 E002
7 5 . 5 4 5 5 6 0 5 1 9 3 8 2 5 1 0 E003
8 2 . 9 1 0 9 1 3 9 0 5 6 7 8 3 0 3 E003
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> SPIKE SUMMARY >>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Spike Strategy RP3
TIME FOR PARTITIONING 3 . 8 9 0 9 9 1 2 1 0 9 3 7 5 0 0 e 03
TIME FOR SPIKE BANDED FACT 7 . 1 4 0 6 3 6 4 4 4 0 9 1 7 9 7 e 04
TIME FOR SPIKE BANDED SOLV 4 . 5 2 0 4 1 6 2 5 9 7 6 5 6 2 5 e 04
program example4
use mpi
i m p l i c i t none
i n t e g e r : : rank , code , n b p r o c s , i
double p r e c i s i o n , dimension ( : , : ) , a l l o c a t a b l e : : f
! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
integer : : i n f o
type ( s p i k e p a r a m ) : : p s p i k e
type ( m a t r i x d a t a ) : : mat
!
c a l l MPI INIT ( c o d e )
c a l l MPI COMM SIZE(MPI COMM WORLD, n b p r o c s , c o d e )
c a l l MPI COMM RANK(MPI COMM WORLD, rank , c o d e )
c a l l M P I E r r h a n d l e r s e t (MPI COMM WORLD, MPI ERRORS RETURN, c o d e ) ;
!
! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
! ! ! ! ! ! ! ! ! ! INPUT PARAMETER SPIKE
! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
p s p i k e%n b p r o c s=n b p r o c s
p s p i k e%r a n k=r a n k
c a l l SPIKE DEFAULT( p s p i k e )
! ! c h a n g e s f ro m d e f a u l t
p s p i k e%t p =1 ! ! c u s t o m i z e d l o c a l partitioning of type 1
p s p i k e%a u t o a d a p t =. f a l s e .
! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
! ! ! ! ! ! ! ! ! ! INPUT PARAMETER MATRIX and RHS
! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
mat%f o r m a t =D ! ! d e n s e b a n d e d f o r m a t
mat%ASTRU=G
mat%DIAGDO=Y
! g l o b a l data
mat%n=8
mat%k l =2
mat%ku=2
a l l o c a t e ( mat%s i z e A ( 1 : 2 ) ) ! ! o n l y 2 p a r t i t i o n s a r e c o n s i d e r e d
mat%s i z e A ( 1 ) = 4
mat%s i z e A ( 2 ) = 4
! l o c a l d a t a f o r p a r t i t i o n number r a n k +1
a l l o c a t e ( mat%A ( 1 : mat%k l+mat%ku +1 ,mat%s i z e A ( r a n k + 1 ) ) )
mat%A( mat%ku + 1 , : ) = 6 . 0 d0
mat%A( mat%ku 1 ,:)= 1.0 d0
mat%A( mat%ku , : ) = 1 . 0 d0
mat%A( mat%ku +2 ,:)= 1.0 d0
mat%A( mat%ku +3 ,:)= 1.0 d0
! ! RHS ( l o c a l )
a l l o c a t e ( f ( 1 : mat%s i z e A ( r a n k + 1 ) , 1 : 1 ) )
f =1.0 d0
32
! !!!!!!!!!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!
! ! ! ! ! ! ! ! ! ! CALLING SPIKE !!!!!!!!!!!!!!!!!!!!!!!!!!!!!
! !!!!!!!!!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!
c a l l SPIKE ( p s p i k e , mat , f , info )
i f ( i n f o >=0) then
! ! ! ! ! ! Local Solution
print , L o c a l s o l u t i o n f o r p a r t i t i o n , r a n k+1
do i =1 ,mat%s i z e A ( r a n k +1)
print , i , f ( i , 1 )
end do
endif
c a l l MPI FINALIZE ( c o d e )
end program example4
6 0 1 0 0 0 0 0 x1 1
0 6 0 1 0 0 0 0 x2 1
1 0 6 0 1 0 0 0 x3 1
0 1 0 6 0 1 0 0 x4 = 1
0 0 1 0 6 0 1 0 x5 1
0 0 0 1 0 6 0 1 x6 1
0 0 0 0 1 0 6 0 x7 1
0 0 0 0 0 1 0 6 x8 1
INCLUDE s p i k e . f i
program example5
use mpi
i m p l i c i t none
i n t e g e r : : rank , code , n b p r o c s , i
33
double p r e c i s i o n , dimension ( : , : ) , a l l o c a t a b l e : : f
! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
integer : : i n f o
type ( s p i k e p a r a m ) : : p s p i k e
type ( m a t r i x d a t a ) : : mat
!
c a l l MPI INIT ( c o d e )
c a l l MPI COMM SIZE(MPI COMM WORLD, n b p r o c s , c o d e )
c a l l MPI COMM RANK(MPI COMM WORLD, rank , c o d e )
c a l l M P I E r r h a n d l e r s e t (MPI COMM WORLD, MPI ERRORS RETURN, c o d e ) ;
!
! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
! ! ! ! ! ! ! ! ! ! INPUT PARAMETER SPIKE
! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
p s p i k e%n b p r o c s=n b p r o c s
p s p i k e%r a n k=r a n k
c a l l SPIKE DEFAULT( p s p i k e )
! ! c h a n g e s f ro m d e f a u l t
p s p i k e%a u t o a d a p t =. f a l s e .
p s p i k e%RSS=F
p s p i k e%DFS=L
! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
! ! ! ! ! ! ! ! ! ! INPUT PARAMETER MATRIX and RHS
! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
mat%f o r m a t =S ! ! CSR
mat%ASTRU=G
mat%DIAGDO=Y
mat%n=8
if ( r a n k ==0) then
mat%nbsa =20 ! ! number o f nonz e r o e l e m e n t s i n CSR f o r m a t
a l l o c a t e ( mat%s a ( 1 : mat%nbsa ) ) ! array for values
a l l o c a t e ( mat%j s a ( 1 : mat%nbsa ) ) ! a r r a y f o r c olumn i n d e x e s
a l l o c a t e ( mat%i s a ( 1 : mat%n + 1 ) ) ! a r r a y f o r row CSR i n d e x e s
! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
! ! ! ! ! ! ! ! ! ! CALLING SPIKE ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !
! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
c a l l SPIKE ( p s p i k e , mat , f , i n f o )
i f ( i n f o >=0) then
! ! ! ! ! ! Global Solution
i f ( r a n k ==0) then
print , Global s o l u t i o n
do i =1 ,mat%n
print , i , f ( i , 1 )
end do
end i f
endif
c a l l MPI FINALIZE ( c o d e )
end program example5
34
3 0.241379310344828
4 0.241379310344828
5 0.241379310344828
6 0.241379310344828
7 0.206896551724138
8 0.206896551724138
6 0 0 0 0 0 0 1 x1 1
0 6 0 0 0 0 1 0 x2 0
0 0 6 0 0 1 0 0 x3 0
0 0 0 6 1 0 0 0 x4 = 0
0 0 0 1 6 0 0 0 x5 0
0 0 1 0 0 6 0 0 x6
0
0 1 0 0 0 0 6 0 x7 0
1 0 0 0 0 0 0 6 x8 0
This linear system is solved iteratively with the following dense, banded
preconditioner:
6 1 0 0 0 0 0 0
1 6 1 0 0 0 0 0
0 1 6 1 0 0 0 0
0 0 1 6 1 0 0 0
M =
0 0 0 1 6 1 0 0
0 0 0 0 1 6 1 0
0 0 0 0 0 1 6 1
0 0 0 0 0 0 1 6
INCLUDE s p i k e . f i
program example6
use mpi
i m p l i c i t none
i n t e g e r : : rank , code , n b p r o c s , i
double p r e c i s i o n , dimension ( : , : ) , a l l o c a t a b l e : : f
! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
integer : : i n f o
type ( s p i k e p a r a m ) : : p s p i k e
type ( m a t r i x d a t a ) : : mat , p r e
!
c a l l MPI INIT ( c o d e )
c a l l MPI COMM SIZE(MPI COMM WORLD, n b p r o c s , c o d e )
c a l l MPI COMM RANK(MPI COMM WORLD, rank , c o d e )
c a l l M P I E r r h a n d l e r s e t (MPI COMM WORLD, MPI ERRORS RETURN, c o d e ) ;
!
! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
! ! ! ! ! ! ! ! ! ! INPUT PARAMETER SPIKE
! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
p s p i k e%n b p r o c s=n b p r o c s
p s p i k e%r a n k=r a n k
c a l l SPIKE DEFAULT( p s p i k e )
! ! c h a n g e s f ro m d e f a u l t
p s p i k e%a u t o a d a p t =. f a l s e .
p s p i k e%DFS=L
35
p s p i k e%BPS=1 ! a b a n d e d preconditioner is provided by the user
! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
! ! ! ! ! ! ! ! ! ! INPUT PARAMETER MATRIX and RHS
! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
mat%f o r m a t =S ! ! CSR
mat%ASTRU=G
mat%DIAGDO=Y
mat%n=8
if ( r a n k ==0) then
mat%nbsa =16
a l l o c a t e ( mat%s a ( 1 : mat%nbsa ) )
a l l o c a t e ( mat%j s a ( 1 : mat%nbsa ) )
a l l o c a t e ( mat%i s a ( 1 : mat%n +1))
! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
! ! ! ! ! ! ! ! ! ! INPUT PARAMETER PRECONDITIONER
! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
! ! ! ! ! ! ! ! ! ! CALLING SPIKE ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !
! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
c a l l SPIKE BEGIN ( p s p i k e , mat , pre , i n f o )
i f ( i n f o >=0) then
! ! ! ! ! ! Global Solution
i f ( r a n k ==0) then
print , Global s o l u t i o n
do i =1 ,mat%n
print , i , f ( i , 1 )
end do
end i f
endif
c a l l MPI FINALIZE ( c o d e )
end program example6
36
2 6 . 3 9 5 4 6 8 0 6 4 4 1 9 2 9 3 E009
3 3 . 6 6 2 2 4 3 7 9 1 1 2 4 3 1 1 E009
4 2 . 6 0 2 2 5 5 0 8 3 2 6 2 6 2 2E010
5 1 . 8 0 4 4 5 3 9 6 6 7 3 8 2 0 7E009
6 9 . 7 1 3 0 8 2 7 5 4 6 6 5 3 4 6E009
7 6 . 9 1 9 7 3 6 1 8 1 3 7 2 2 0 7E010
8 2 . 8 5 7 1 4 2 1 9 1 5 7 3 5 0 1 E002
The input matrix elements and properties must be defined in the file
The following is a sample input file for a banded matrix (n = 48, 000), 3
on the main diagonal, 4 on the upper and lower o-diagonals, 0.1 on the
other o-diagonals, and upper and lower bandwidths of 80 (total bandwidth
is 161):
48000 ! ! n , s i z e of the matrix
80 ! ! k l , Lower b a n d
80 ! ! ku , Upper b a n d
3 0 . 0 d0 ! ! diagonal element
4.0 d0 ! ! f i r s t lower o f f diagonal element
4.0 d0 ! ! f i r s t upper o f f d i a g o n a l element
0 . 1 d0 ! ! OTHERS o f f d i a g o n a l e l e m e n t
1 ! ! s , number o f RHS ( THE v a l u e o f t h e RHS a r e generated by the code )
Y ! ! DIAGDO ? Y ( Yes ) , N ( No ) , I ( I n v e s t i g a t e )
Some of the components for the derived type spike param variable can be
changed from their default values while modying the input file
Finally one can run example toeplitz program with the command
SPIKE INFO
! ! NBPROCESSORS ? 4
! ! NBPARTITIONS ? 4
! ! SPIKE ADAPT ? F
! ! ALGORITHM ? R
! ! FACTORIZATION ? L
! ! TYPE OF SOLVER ? 3
! ! ACCURACY OUT . ? 1 . 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 e 07
37
! ! NB ITMAX OUT . ? 50
! ! ACCURACY IN . ? 1 . 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 e 05
! ! NB ITMAX IN . ? 30
! ! NEW ZERO PIVOT ? 1 . 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 e 09
! ! BOOST ? 1 . 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 e 10
! ! Orign . P a r t i t i o n . ? 0
! ! S i z e f i r s t l a s t p a r t i t i o n ? 12000
! ! Size p a r t i t i o n middle ? 12000
! ! F r e e memory ? T
! ! Compute R e s i d u a l ? T
! ! ADD . MEMORY NEEDED ( Mb ) 1 . 0 5 6 3 9 6 7 8 9 5 5 0 7 8 1 e +02
MATRIX INFO
! ! MATRIX FORMAT ? D
! ! MATRIX STRUCT . ? G
! ! D i a g . Dominant ? Y
! ! SIZE MATRIX ? 48000
DENSE BANDED MATRIX
! ! Lower b a n d ? 80
! ! Upper b a n d ? 80
RHS INFO
! ! Number o f RHS ? 1
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> SPIKE SUMMARY >>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Spike Strategy RL3
TIME FOR PARTITIONING 1 . 2 5 7 5 1 9 7 2 1 9 8 4 8 6 3 e 01
TIME FOR SPIKE BANDED FACT 9 . 3 6 5 0 8 1 7 8 7 1 0 9 3 7 5 e 01
TIME FOR SPIKE BANDED SOLV 2 . 1 3 1 6 5 9 9 8 4 5 8 8 6 2 3 e 01
38
and it contains the following fields
csrfile ! ! g e n e r i c name o f s p a r s e f o r m a t
I ! ! DIAGDO ? Y ( Yes ) , N ( No ) , I ( I n v e s t i g a t e )
. false . ! ! s p a r s e 2 dense banded ( t r u e or f a l s e )
The sparse system matrix is stored using four files where the generic name
of those file is defined by the first line of the input file above (i.e. here the
name is csrfile). The names of these four files (located in the same direc-
tory above) are: csrfile.sa for the matrix elements, csrfile.jsa for the
column indices, csrfile.isa for the start-of-row indicies and csrfile.sf
for the right-hand-side elements. The number of non-zero elements is indi-
cated at the beginning of the first two files, while the beginning of the last
two indicates the number of rows. In addition, the first line of csrfile.sf
contains the number of right-hand-side as well (if this number is greater
than one, the elements should be stored in multicolumns).
Similarly to the toeplitz example, some of the components for the derived
type spike param variable can be changed from their default values while
modying the input file
In Intel
r
Adaptive Spike-Based Solver 1.0, only the (F,L) strategy is al-
lowed for solving sparse banded systems. However, the last field of the file
matrix sparse.in is an utility routine which gives the option to the user to
transform the CSR input matrix to a dense banded matrix. It will then set
the option mat%format=D for enabling the use of all the other strategies
for dense banded systems.
Finally one can run example toeplitz program with the command
SPIKE INFO
! ! NBPROCESSORS ? 4
! ! NBPARTITIONS ? 4
! ! SPIKE ADAPT ? F
! ! ALGORITHM ? F
! ! FACTORIZATION ? L
! ! TYPE OF SOLVER ? 3
! ! ACCURACY OUT . ? 1 . 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 e 07
! ! NB ITMAX OUT . ? 50
! ! ACCURACY IN . ? 1 . 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 e 05
! ! NB ITMAX IN . ? 30
! ! NEW ZERO PIVOT ? 1 . 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 e 09
! ! BOOST ? 1 . 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 e 10
! ! Orign . P a r t i t i o n . ? 0
! ! S i z e f i r s t l a s t p a r t i t i o n ? 240
! ! Size p a r t i t i o n middle ? 240
! ! F r e e memory ? T
! ! Compute R e s i d u a l ? T
! ! ADD . MEMORY NEEDED ( Mb ) 1 . 1 2 9 7 6 0 7 4 2 1 8 7 5 0 0 e 01
MATRIX INFO
! ! MATRIX FORMAT ? S
! ! MATRIX STRUCT . ? G
! ! D i a g . Dominant ? N
! ! D e g r e e o f D i a g . Dominant ? 4 . 5 1 5 9 1 6 5 0 2 0 8 2 1 8 1 e 01
! ! Degree o f S p a r s i t y ( w i t h i n t h e band ) ? 1 . 9 3 9 9 7 8 6 9 5 0 0 8 0 2 0 e 01
! ! SIZE MATRIX ? 960
SPARSE BANDED MATRIX
39
! ! Lower b a n d ? 43
! ! Upper b a n d ? 43
! ! # o f nonz e r o el . ? 15844
P a r d i s o R e o r d e r 2 . 1 5 6 9 4 9 0 4 3 2 7 3 9 2 6 e 01
Pardiso Factor 3 . 2 8 1 9 9 8 6 3 4 3 3 8 3 7 9 e 02
TIME FACTLU ( < t o copy UL+FACT LU , i f any ) 1 2 . 4 8 8 8 2 0 5 5 2 8 2 5 9 2 8 e 01
TIME FOR COMPUTING THE SPIKES 1 1 . 5 9 7 4 0 4 4 7 9 9 8 0 4 6 9 e 05
RHS INFO
! ! Number o f RHS ? 1
40
20 1 . 4 0 2 8 6 5 8 5 0 5 3 6 1 0 5 e 04
21 9 . 7 6 7 5 6 3 9 2 5 2 0 7 8 8 5 e 05
22 1 . 5 4 9 1 4 7 1 7 0 8 1 9 6 8 3 e 04
23 2 . 6 2 3 3 5 4 8 6 1 7 2 6 8 2 4 e 05
24 8 . 8 2 3 9 4 3 1 4 7 0 1 1 5 4 4 e 06
TIME p o s t p r o c e s s MATMUL 3 . 4 9 3 1 1 8 2 8 6 1 3 2 8 1 2 e 02
TIME p o s t p r o c e s s SOLVE 9 . 0 5 9 9 0 6 0 0 5 8 5 9 3 7 5 e 06
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> SPIKE SUMMARY >>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Spike Strategy FL3
TIME FOR PARTITIONING 1 . 1 1 7 9 4 4 7 1 7 4 0 7 2 2 7 e 03
TIME FOR SPIKE BANDED FACT 2 . 4 9 0 7 2 0 7 4 8 9 0 1 3 6 7 e 01
TIME FOR SPIKE BANDED SOLV 1 . 5 7 4 3 2 0 7 9 3 1 5 1 8 5 5 e 01
<SPIKE dir>/include/spike.h
m a t r i x d a t a c i n t e r f a c e mat , p r e ;
/ Data s t r u c t u r e a s s o c i a t e d w i t h t h e o r i g i n a l m a t r i x mat
and p r e ( i f s e p a r a t e c a l l i n g i s u s e d ) /
/ I n s i d e main f u n c t i o n /
//
c o d e = M P I I n i t (& a r g c , & a r g v ) ;
c o d e = MPI Comm size (MPI COMM WORLD, & n b p r o c s ) ;
c o d e = MPI Comm rank (MPI COMM WORLD, & r a n k ) ;
//
s p i k e d e f a u l t (& p s p i k e ) ;
41
/ Default values f o r pspike /
.
.
/ CALL FOR SPIKE w i t h DEFINITION o f INPUT PARAMETERS /
.
.
/ End o f main f u n c t i o n /
<SPIKE dir>/examples/examples c
If necessary, modify both the makefile and makefile.target to use the desired
compiler and MPI implementation. The examples use the Intel compilers
and MPI library by default. Moreover, the makefile should (i) link the
libspike.a library, (ii) link the BLAS and LAPACK libraries, and (iii)
specify the path to the spike.h header file.
42
Chapter 7
Reference guide
7.1 Intel
r
Adaptive Spike-Based Solver 1.0 direc-
tory structure
High-level Directory Structure
The table below shows a high-level structure after installation. All directo-
ries are under the packages main directory, for example
/opt/intel/spike/1.0.
Directory Comment
doc Documentation
43
packages main directory, for example, /opt/intel/spike/1.0.
44
libmkl intel thread.so Parallel drivers library supporting In-
tel compiler
libmkl lapack.a Dummy library. Contains references
to Intel MKL libraries
libmkl.so Dummy library. Contains references
to Intel MKL libraries
libmkl solver.a Dummy library. Contains references
to Intel MKL libraries
libmkl solver lp64.a Sparse Solver, Interval Solver, and
GMP routines library supporting
LP64 interface
libspike.a Spike Core routines
libspike adapt.a Spike Adapt routines
libspike adapt de.so Spike Adapt routines, performance
model specific
libspike adapt grid f.a Spike Adapt routines, grid specific
libspike mpi comm.a Default MPI wrapper copied from
libspike mpi comm intelmpi.a.
User can build their own. See
Appendix C for detail.
libspike mpi comm intelmpi.a MPI wrapper supporting Intel MPI
Library for Linux
libspike mpi comm mpich1.a MPI wrapper supporting MPICH 1
libspike mpi comm mpich2.a MPI wrapper supporting MPICH 2
libspike mpi comm openmpi.a MPI wrapper supporting Open MPI
lib/em64t Intel64r static libraries
libguide.a Intelr Legacy OpenMP run-time li-
brary for static linking
libguide.so Intelr Legacy OpenMP run-time li-
brary for dynamic linking
libmkl core.a Kernel library for
Intel64r architecture
libmkl core.so Library dispatcher for dynamic load of
processor-specific kernel library
libmkl intel lp64.a LP64 interface library for Intel com-
piler
libmkl intel lp64.so LP64 interface library for Intel com-
piler
libmkl intel thread.a Parallel drivers library supporting In-
tel compiler
libmkl intel thread.so Parallel drivers library supporting In-
tel compiler
libmkl lapack.a Dummy library. Contains references
to Intel MKL libraries
libmkl.so Dummy library. Contains references
to Intel MKL libraries
libmkl solver.a Dummy library. Contains references
to Intel MKL libraries
45
libmkl solver lp64.a Sparse Solver, Interval Solver, and
GMP routines library supporting
LP64 interface
libspike.a Spike Core routines
libspike adapt.a Spike Adapt routines
libspike adapt de.so Spike Adapt routines, performance
model specific
libspike adapt grid f.a Spike Adapt routines, grid specific
libspike mpi comm.a Default MPI wrapper copied from
libspike mpi comm intelmpi.a.
User can build their own. See
Appendix C for detail.
libspike mpi comm intelmpi.a MPI wrapper supporting Intel MPI
Library for Linux
libspike mpi comm mpich1.a MPI wrapper supporting MPICH 1
libspike mpi comm mpich2.a MPI wrapper supporting MPICH 2
libspike mpi comm openmpi.a MPI wrapper supporting Open MPI
spike adapt/64 Itanium2r Spike Adapt data files
de Subdirectory, calibration data files
spike adapt/em64t Intel64r Spike Adapt data files
de Subdirectory, calibration data files
tools/environment Initialization shell scripts
spikevars64.csh Itanium2r platforms; C shell
spikevars64.sh Itanium2r platforms; Bourne shell
spikevarsem64t.csh Intel64r platforms; C shell
spikevarsem64t.sh Intel64r platforms; Bourne shell
Table 7.1: Detailed package directory structure
7.2 Intel
r
Adaptive Spike-Based Solver and ScaLA-
PACK
This section is addressed to ScaLAPACK users who would like to experiment
with Intelr
Adaptive Spike-Based Solver, making only minor changes to
their code for solving dense banded linear systems (data in double precision).
We describe a practical way to insert Intel r
Adaptive Spike-Based Solver
calling sequences in place of ScaLAPACK ones.
The ScaLAPACK calling sequences that are concerned with this migra-
tion procedure are:
For non-diagonally dominant systems
46
PDDBTRF, PDGBTRS: Separated calling sequences Factoriza-
tion and Solve
As described in the documentation, our software package can also han-
dle single or separated calling sequences. In contrast to ScaLAPACK, the
diagonally dominant property does not involve new calling sequences but
can be defined in the data structure matrix data within the parameter
mat%diagdo.
where we suppose the users to be familiar with all the above parameters
(as described in the ScaLAPACK user guide [3]). This calling sequence can
be replaced by the following one:
p s p i k e%t p =1 ! d a t a l o c a l d i s t r i b u t i o n o f t y p e 1 i s c o m p a t i b l e w i t h ScaLAPACK
! i f t h e u s e r w a n t s t o t u r n o f f s p i k e a d a p t b y p s p i k e%a u t o a d a p t =. f a l s e .
! t h e u s e r can s e l e c t h e r e h i s own S p i k e C o r e s t r a t e g y ( RSS , DFS , OIS )
a l l o c a t e ( mat%s i z e A ( 1 : n b p r o c s ) )
mat%s i z e A ( 1 : n b p r o c s 1)=DESCA( 4 ) ! ScaLAPACK v a r i a b l e
! size of the l o c a l partition
mat%s i z e A ( n b p r o c s )=n( n b p r o c s 1)mat%s i z e A ( 1 ) ! s i z e o f t h e l a s t partition
In the case of separated calling sequences, the setup of the above param-
eters is identical. Also the BLACS command introduced in ScaLAPACK are
unnecessary as our package is independent of the library BLACS.
47
Syntax
CALL Spike Default(pspike)
Description
The routine assigns defaults values to those input and inout components of
the type spike param variable pspike that have default. Other components
remain unchanged.
Input Parameters
pspike Intel
r
Adaptive Spike-Based Solver data structure of
type spike param described in Section 2.2.
Output Parameters
pspike Intel
r
Adaptive Spike-Based Solver data structure
described in Section 2.2. On exit, the components of
pspike tabulated in Table 2.1 will be assinged their
default values specified there.
7.4 Spike
Spike solver driver solves complete system via one call.
Syntax
CALL Spike(pspike,mat,f,info)
Description
The routine solves the system specified by a matrix contained in mat with
the right hand side(s) contained in f.
Input Parameters
pspike Intel
r
Adaptive Spike-Based Solver type
spike param data structure described in Section 2.2
mat matrix data structure of type matrix data described
in Section 2.3 and Chapter 5.
f double precision array containing the right hand
side(s). Depending on the value of pspike%tp, f may
be global on rank 0 or locally distributed on each pro-
cessor.
48
Output Parameters
pspike Intel
r
Adaptive Spike-Based Solver data structure
described in Section 2.2
f the computed solution of the system
info return the error code. If info=0 the execution is suc-
cessful. If info=0, the package encountered a prob-
lem and has stopped unexpectedly, the detail descrip-
tion of the meaning of error code is presented in Sec-
tion 7.11.
Syntax
CALL Spike Begin(pspike,mat,pre,info)
Description
The routine partitions the matrix and allocates a work table. Morever,
Spike Adapt may be invoked in this routine.
Input Parameters
pspike Intel
r
Adaptive Spike-Based Solver data structure of
type spike param described in Section 2.2. On entry,
if pspike%autoadapt is .true., Spike Adapt will be
invoked to select a solver strategy.
mat matrix data structure of type matrix data described
in Section 2.3 and Chapter 5.
pre preconditioner data structure of type matrix data.
The use of banded preconditioner is described in
chapter 4
49
Output Parameters
pspike Intel
r
Adaptive Spike-Based Solver data structure
described in Section 2.2. On exit, if Spike Adapt was
invoked, pspike%DFS, pspike%RSS and pspike%OIS will
be updated.
mat matrix data structure described in Section 2.3. If the
matrix is defined with global data as input, on exit,
mat will contain the local partitioning of the matrix
on each processors (the memory of the global matrix
in rank 0 is deallocated if pspike%memfree is set to
.true.).
pre Contents set by Spike Begin. It contains the lo-
cal partitioning of the preconditioner (it may just
be a copy of the matrix) that will be used in
Spike Preprocess.
info return the error code. If info=0, the execution is suc-
cessful. If info= 0, the package encountered a prob-
lem and has stopped unexpectedly, the detail descrip-
tion of the meaning of error code is presented in Sec-
tion 7.11.
Syntax
CALL Spike Preprocess(pspike,pre,info)
Description
The routine factorizes the preconditioner pre using the strategy specified
in pspike. Note that pre could be an explicit preconditioner supplied
by the user or is just in fact a copy (made automatically by the software
package) of the original system.
Input Parameters
pspike Intelr
Adaptive Spike-Based Solver data structure
described in Section 2.2
pre the output from Spike Begin after the Spike Begin
call.
50
Output Parameters
pspike Intel
r
Adaptive Spike-Based Solver data structure
described in Section 2.2
pre Contents modified, it contains the factorization of the
preconditioner ready to be used in Spike Process
multiple number of times
info return the error code. If info=0 the execution is suc-
cessful. If info=0, the package encountered a prob-
lem and has stopped unexpectedly, the detail descrip-
tion of the meaning of error code is presented in Sec-
tion 7.11.
Syntax
CALL Spike Process(pspike,mat,pre,f,info)
Description
The routine solves the reduced system then retrieves the overall solution.
In this verision of Intel
r
Adaptive Spike-Based Solver, the solver includes
outer-iterations. The preconditioner is defined by pre, and the original ma-
trix is defined by mat. The routine Spike Process can be repeated if needed
for applications that involves iterations with changing right-hand-sides f but
with the same original matrix of coecients.
Input Parameters
pspike Intelr
Adaptive Spike-Based Solver data structure
described in Section 2.2
mat matrix data structure. On entry, the matrix
data should have been processed by a previous
Spike Begin call, so that data have been distributed
to all processors.
pre set up by Spike Preprocess in a previous call.
f On entry, f stores the right-hand side. Depending on
the value of pspike%tp, f may be global on rank 0 or
locally distributed on each processor.
51
Output Parameters
pspike Intel
r
Adaptive Spike-Based Solver data structure
described in Section 2.2
f On exit, f stores the solution of the system. Depend-
ing on the value of pspike%tp, f may be global on rank
0 or locally distributed on each processor.
info return the error code. If info=0 the execution is suc-
cessful. If info=0, the package encountered a prob-
lem and has stopped unexpectedly, the detail descrip-
tion of the meaning of error code is presented in Sec-
tion 7.11.
Syntax
CALL Spike End(pspike,mat,pre,info)
Description
The routine clears the memory space, deallocating all local partitioning for
mat and pre.
Input Parameters
pspike Intel
r
Adaptive Spike-Based Solver data structure
described in Section 2.2
mat matrix data structure described in Section 2.3.
pre preconditioner data structure.
Output Parameters
pspike Intelr
Adaptive Spike-Based Solver data structure
described in Section 2.2
mat matrix data structure described in Section 2.3. On
exit, several components of mat are deallocated.
pre On exit, pre is deallocated.
info return the error code. If info=0 the execution is suc-
cessful. If info=0, the package encountered a prob-
lem and has stopped unexpectedly, the detail descrip-
tion of the meaning of error code is presented in Sec-
tion 7.11.
52
output components. This is listed in the follow Table 7.2.
memory double (out) global Total amount of memory (in Mb) needed by
Spike Core
failed logical (out) global Return .true.if Spike Core fails to reach
the accuracy specified in the eps out
component
Below are the output components fields for timing information if the timing
component is set to .true.
Table 7.2: List of output components for the derived type spike param. The
variable of this type can be local on each partition or global (i.e. common
to all partitions).
Adaptive Spike-Based Solver 1.0, this is exclusively used for the matrix
representing the linear system. In the future, the user can explicitly store,
using this type, a separate matrix used as a preconditioner to the linear
system. The components and meaning of this type is given previously in
Chapter 5.
53
7.11 info details
Errors and warnings encountered during a run of Intel r
Adaptive Spike-
Based Solver are stored in an integer variable, info. All MPI, LAPACK
and PARDISO errors are fatal; in other words, execution of the program is
terminated if an error is encountered. Other possible sources of warnings and
errors are Spike Core and Spike Adapt errors . If the output info parameter
is not zero, either an error (info< 0) or warning (info> 0) was encountered.
The possible return values for the info parameter are given in Table 7.3.
If info< 0 the user can determine whether Spike Core, Spike Adapt,
MPI, LAPACK, or PARDISO is responsable for the unexpected termination.
The correponding error code is stored in the component pspike%error code.
Please refer to Table 7.4 for possible return codes on pspike%error code if a
fatal error is encountered in Spike Core (info=1), and similarly refer to Ta-
ble 7.5 if a fatal error is encountered in Spike Adapt (info=2). When info
equals 3, 4, 5, the error code is also stored in pspike%error code, and
the user should consult the MPI, LAPACK, or PARDISO documentation,
respectively.
54
info= 1 Description
0 Successful exit - Default value
-200 memory allocation error
-201 rho = 0, BiCGStab(out) failed
-202 omega =0, BiCGStab(out) failed
-303 cannot select Spike Adapt if you want to use your own
preconditioner %BPS=1
-304 the format of the preconditioner is incorrect, it should be
pre%format=D or S
-305 the preconditioner should be banded
-306 the preconditioner should be the same size as the matrix
-307 if preconditioner (option %BPS=1), one needs to use iter-
ative methods %OIS= 0)
-308 the preconditioner cannot be used with DFS=P
-309 either upper or lower bandwidth is too small for the size
of the partitions
-310 number of processors has to be even for RSS=A or P
-313 the size of the matrix mat%n must be > 1
-314 mat%kl and mat%ku must be 1
-315 the format of the matrix is incorrect, it should be
mat%format=D or S
-320 Spike Adapt cannot be selected if only one processor
-399 wrong value for %tp
-400 combinations (DFS, RSS) not supported by Version 1.0
-401 DFS=L or P are only possible options if one processor
is used
-402 DFS= A cannot be used here see Table 2.4
-405 RSS=R cannot be used here see Table 2.4
-407 only tp=0 can handle one processor run
55
info= 2 Classification Description
1 Information Spike Core strategy selected by grid lookup
2 Information Spike Core strategy selected by performance
models
3 Warning Spike Core strategy selected arbitrarily
-310 Error pspike%tp=2 requires an even number if MPI
processes
-312 Error pspike%tp=2 requires RSS =A
-313 Error pspike%tp=1 cannot be used when RSS =A
-402 Error Memory allocation failed during model evalu-
ation
-403 Error SPIKE ADAPT DATA environment variable not
set
-404 Error Error reading directory specified by
SPIKE ADAPT DATA environment variable
-405 Error Performance models not found in directory
specified by SPIKE ADAPT DATA environment
variable
-406 Error Could not open performance models
-407 Error Could not read performance models
Table 7.5: This table contains descriptions of the Spike Adapt return codes
for %error code.
56
Bibliography
[2] Michael W. Berry and Ahmed Sameh. Multiprocessor schemes for solv-
ing block tridiagonal linear systems. The International Journal of Su-
percomputer Applications, 1(3):3757, 1988.
[8] Eric Polizzi and Ahmed H. Sameh. A parallel hybrid banded system
solver: the spike algorithm. Parallel Comput., 32(2):177194, 2006.
[9] Eric Polizzi and Ahmed H. Sameh. Spike: A parallel environment for
solving banded linear systems. Computers & Fluids, 36(1):113120,
2007.
57
[11] O. Schenk and K. Gartner. Solving unsymmetric sparse systems of
linear equations with pardiso. Journal of Future Generation Computer
Systems, 20(3):475487, 2004.
58
Appendix A
Mathematical Description of
Key Strategies
59
sections. Throughout, we present the solution process of Az = r in which z
is the action M 1 r.
A.1 Az = r via TU
The matrix, RHS, and solution are distributed among the MPI processes as
shown in Figure A.1.
A1 z1 r1 (1)
B1
C2
A2 z2 r2 (2)
B2
A= C3 z= r=
A3 z3 r3 (3)
B3
C4
A4 z4 r4 (4)
Lj Uj Aj for j = 1, 2, 3
Uj Lj Aj for j = 2, 3, 4
60
I. *..
.. (1)
. V1
I *
*. I. *..
..
W2 .. . V2 (2)
* I *
S= I.
*. .. *..
W3 .. . V3 (3)
* I *
*. I.
..
W4 .. (4)
* I
L U = 0
Vjb Bj
Figure A.3: The bottom of the Vj spike can be computed using only the
bottom m m blocks of L and U. Similarly, the top of the Wj spike may be
obtained if one performs the UL-factorization.
5. Solve
0 Cj
. 0
(t) (b)
. zj+1 . zj1
Aj zj = rj
0 .
Bj 0
using the LU or U L factorization of Aj (j = 1, 2, 3, 4; C1 = 0; and
B4 = 0).
61
A.2 Az = r via FL
The matrix, RHS, and solution are distributed among the MPI processes as
shown in Figure A.1.
The F L scheme consists of the following steps:
4. Solve
0 Cj
. 0
(t) (b)
. zj+1 . zj1
Aj zj = rj
0 .
Bj 0
using the LU factorization of Aj (j = 1, 2, 3, 4; C1 = 0; and B4 = 0).
2. Solve for Vj :
62
0
.
. for j = 1, 2, 3
Lj Uj Vj =
0
Bj
3. Solve for Wj :
Cj
0
Lj Uj Wj =
. for j = 2, 3, 4
.
0
9. Retrieve z1 and z2
(t)
z1 = h1 V1 z2
(b)
z2 = h2 W2 z1
63
10. Retrieve zj (j = 1, 2, 3, 4)
(t) (b)
zj = rj Vj zj+1 Wj zj1 (V4 = 0 and W1 = 0)
A.4 Az = r via TA
The matrix, RHS, and solution are distributed among the MPI processes as
shown in Figure A.4.
A1 z1 r1 (1)
B1
C2
A= A2 z= z2 r= r2 (2, 4)
B2
C3
A3 z3 r3 (3)
64
5. Solve the truncated reduced system (block diagonal) via a direct scheme
where each block has the following form:
(b)
( ) ( (b) ) ( (b) )
Im Vj zj gj
(t) (t) = (t) (j = 1, 2)
Wj+1 Im zj+1 gj+1
6. Solve
0 Cj
. 0
(t) (b)
Aj zj = rj
. zj+1 . zj1
0 .
Bj 0
using the LU or U L factorization of Aj (j = 1, 2, 3; C1 = 0; and
B3 = 0).
A.5 Az = r via EA
The matrix, RHS, and solution are distributed among the MPI processes as
shown in Figure A.4.
The EA scheme consists of the following steps:
65
with a truncated preconditioner
(b)
Im V1
(t)
W Im
Mr = 2
(b)
Im V2
(t)
W3 Im
6. Solve
0 Cj
. 0
(t) (b)
Aj zj = rj . z
j+1 . zj1
0 .
Bj 0
using the LU or U L factorization of Aj (j = 1, 2, 3; C1 = 0; and
B3 = 0).
66
Appendix B
67
Figure B.1: This schematic illustrates how Spike Adapt might select an
optimal Spike Core strategy using grid lookup. The horizontal and vertical
axes represent two of the relevant matrix characteristics (e.g., matrix size
and bandwidth). If the grid encloses this matrix, an optimal Spike Core
strategy, represented by the dierent colors, is selected based on proximity.
68
Appendix C
SpikePACK uses the Message Passing Interface (MPI) for parallel compu-
tation. Though MPI is a standard API, dierent implementations are gen-
erally not compatible because of header inconsistencies. Also, MPI libraries
built with dierent Fortran compilers are usually not compatible. There-
fore, SpikePACK does not call the MPI library directly in order to avoid
becoming dependent on a particular MPI implementation. Instead, it calls
wrapper functions contained in a separate library. Pre-built versions of this
library are provided for four common MPI implementations:
These libraries were built with the Intel compilers and are in <spikepack
directory>/lib/<arch>, where <arch> is either 64 for the IA-64 architecture
or em64t for the Intelr 64 architecture. SpikePACK also includes a default
library libspike mpi comm.a which is identical to libspike mpi comm intelmpi.a.
It is used by the example building scripts. Users can build their own default
library if they prefer a dierent compiler or MPI implementation. To do
this, simply build a new libspike mpi comm.a using the source code for
the MPI wrappers shipped with SpikePACK, as follows:
69
3. Compile the wrappers using desired MPI compiler driver for Fortran.
To use the Intel
r MPI Library and the Intel
r Fortran Compiler for
Linux, for example, compile the MPI wrappers as follows:
mpiifort -O3 -c spike mpi comm.f90
70