Bank 1985

Applied Numerical Mathematics 1 (1985) 489-492 4x9
North-Holland
AN EFFICIENT IMPLEMENTATtON FOR SSOR AND INCOMPLETE

FACTORIZATION PRECONDITIONINGS *
Randolph E. BANK
Deparmem of Mathematics. UnirTersi(r*of Cal@wia. Surt Diego, Lu Jolla, CA 02003 U.S. A.
Craig C. DOUGLAS
Departnmt of Computer Science, Durhan~, NC 27706. U.S. A.
We investigate methods for efficiently implementing a class of incomplete factorization preconditioners which
includes Symmetric Gauss Seidel [9]. SSOR 191.generalized SSOR [I]. DuPont Kendall Rachford [4]. ICCG(0)
[7], and MICCG(0) [6]. Our techniques can be extended to similar methods for nonsymmetric matrices.
Symmetric matrices
We consider the solution of the linear system

Ax=h, (1)
where A is an N X N symmetric, positive definite matrix and A = D - L - L”, where D is
diagonal and L is strictly lower triangular. Such linear systems are often solved by iterative
methods, for example, Symmetric Gauss Seidel 191, SSOR [9]. generalized SSOR [I]. DuPont
Kendall Rachford [4], ICCG(O) [7],and MICCG(0) [6].
A single step of a basic (unaccelerated) iterative method, starting from an initial guess .i- can be
written as
Solve B8 = Y= b - A& (24
Set R = 2 + 6. (2b)
For the iterative methods cited before, B is symmetric, positive definite and can be written as
B=(_D-L)_D-‘(_D-LT) (3)
Since A and B are symmetric and positive definite, the underlying ierative scheme (2) can be
accelerated by standard techniques such as Chebyshev. conjugate gradients, and conjugate
residuals.
Let F= D - ,D be a diagonal matrix and let M denote the computational cost (in floating
point multiplies) of forming the matrix-vector product Ax. The obvious approach to implement-
ing the basic iterative step (2a) apparently requires 21w + 0( N) multiplies. Our goal is to reduce
this to M + 0( N ). See Eisenstat [5] for a different solution to the same problem. We note that
the case F = 0 I (unaccelerated Symmetric Gauss-Seidel) is of particular interest since we can
l
reduce the number of multiplies per iteration to M + N.
* This research was supported in part by ONR Grxtt NOOO14-82-K-0197. ONR Grant N00014-82-K-0184. FSU-ONR
Grant Fl.N00014-80-C-0076, and NSF MCS-8106181. This work was done while the second author w;ts :tt the Yale
University Depart men t of Computer Science.
016809274/85/$3.30 63 1985, Elsevier Science Publishers B.V. (North-Holland)

The basic i&a for accomplishing this reduction in cost is embodied in the following procedure
for solving
k=cu(raLL’), (4)
where r and I’ are input vectors and (Yis a scalar. This is solved using the process
Dw = ar+L(aP+w)=q, (5 a 1
(_D-P):=& 6b)
r-A:= r-q-k F:+ Lz. (5C 1
Despite the apparently implicit nature of (5a). it can be solved easily for w using forward
substitution. In fact. w itself need not be saved in any form since 4 is the important vector
computed in this equation. Computing (I and z. given Y and C. requires M -k 3N multiplies
(multiplies dnd divides). Computing Y- A: requires N multiplications if we represent the vector
implicitly in terms of r - 4 + Fz and z.
The basic algorithm. using fixed acceleration parameters T,. 1 < i < m. is given by
Algorithm Fixed acceleration parameters-Preliminary.

1.
As,,.
( 1) r;, = h -
(2) For i = 1 to M
(a) Bz, = 7 I’;_ ,.
(b) s, = x, , + :, .
(c) r, = r, _, + A,_,.
Straightforward implementation of Algorithm 1 requires 2M + N multiplies. Using the process in

(5) we can reformulate this algorithm as
Algorithm 2. Fixed acceleration parameters- Final.

(I) I;,= h - Ds,, + L I _Y,,,
(2) For i = 1 to HZ
(a) DN;,= T -‘r ,_, + L(T-‘s,_, + w,)=q,,
(b) (_D - L’jz, = y,.
(c) r, = r, ._, - cl, + Fz,.
(d) A-,= A-, , + 2,.
(3 ,‘,,, = r,,,+ Lx,,, = h - As ,,,.
The computational cost of the inner loops of Algorithm 2 is at most M = 4N multiplies. If we do

not accelerate at all (7, = 1). the cost is reduced to at most M + 2N multiplies. Algorithm 2
requires one additional N-vector for storing (I, and z, (which may share the same space). The
vector t; can be stored over the original right hand side h.
This technique is not limited to fixed acceleration parameters, For instance, the preconditioned
conjugate gradient algorithm is given by
Algorithm 3. PCG - Preliminary

Ax,,,
( 1) Ii) = h -
(2) PO= 0.
(3) For i = 1 to tt1
(a) Bz, = r,__1.
(b) y, = =,rj;_ l; P,=Y,/Y,-1: P, =o*
w P, =z, + P,P,+
(4 a, = v,/‘p;rAp,.
(e) x, = q-1 + qp,,
(f) r, = r,_1 - (Y,AP,,
In order to reduce the number of matrix multiplies to one. we implicitly represent Ap, as well as
the residual. Thus, we set Ap, = v, - Lp,. Then we can reformulate this algorithm as
Algorithm 4. PCG- Final.

(1) ro = h - Dx, + L*x,,
(2j p. = v. = 0,
(3) For i = 1 to 111
(a) _Dw,= YI-, + L(x_, + W/)= 4,.
(b) Y,= +V,: P, = X/Y,- 1: P, = 0,
(c) (D - LT)Z, = 4,9
(d) /I = q, + pIv,_, + F=,,
(e) P: =ii+PiPI-j9
(f) a, = Y,/(PT( v, + v, - DP, )h
(8)
(h) ‘;,lr;; ; ;;ly();
(4) c,, = r,,, + Lx,,, = h’-’‘Ax,,,.
To implement Aigorithm 4, we need three temporary vectors of length N. one each for 11,.p,. and
q,. The vector i, can share the space of q,. As before, r, can be stored over the ridge hand side h.
The inner loops of Algorithm 4 requires at most M + 8N multiplies per iteration.
2. Nonsymmetric matrices
Assume A is an N X N nonsymmetric stiffness matrix and A = D - L - U. where D is

diagonal, L is strictly lower triangular, and U is strictly upper triangular. Then the matrix B
corresponding to the incomplete LDU factorization class of smoothers is
B = (I- L)_s_‘(_D - U), (6)
where _D, _S. and _T are diagonal.
The algorithms of the last section can be extended to handle B of the form (6). Given the
linear system (4), we replace (5) by
_Tcs?=ar+ L(av+ w’), (74
q=sw. (7W
(D- u)z=q, (7c)

r- Ai=r- q+ Fz+ Lz. (74
The generalization of Algorithm 2 requires M + 0( _N) multiplies. Unfortunately. some adaptive

schemes, like Orthomin (1) [8] or Orthodir (1) [lo]. appear to require 1.5 M + O( N) nlultiplies
(assuming the cost of multiplying by L and U are the same). This is because the identity
xTLx = XTLTX9
Aleorithm/Form,
Preliminar1 Final Final with F = 0 -I
L-
Unirccelrrated 2M+N M+ZN M+N

Accelerated/Fixed 2M +2N M+4N M+3h’
PCG 2M +5N M+8N M+7N
which is implicitly used in Algorithm 4. line 3 (f) does not necessarily hold when U replaces LT.
Thus, it appears we need an extra half matrix multiply to form the equivalent of Ap for purposes
of computing inner products.
2. Final remarks
Table I contains a summary of the cost of each algorithm. The column in Table 1 correspond-
ing to the speciai case of F = 0 I is imp<)rtant since it corresponds to the Symmetric Gauss
l
Seidel preconditioner. In practice. lrariants of the Gauss Seidel iteration are among the most
popular smoothing iterations used in multigrid codes [2.3]. Since the cost of smoothing is usually
a ma_& expense in a multigrid code. reducing the number of matrix multiplies can significantly
reduce the overall computational cost.
Although the cost of the adaptive acceleration in Algorithm 4 is somewhat higher than the cost
for the fixed acceleration in Algorithm 2 in terms of multiplications. the actual cost may not be
that much greater. In particular, if A is stored in a general format, then the effective cost of
floating point operations of a matrix multiply is normally somewhat higher than those for inner
products or scalar vector multiplies, because operations corresponding to matrix multiplication
are usually done in N short loops and accessing each nonzero of A involves some sort of indirect
addressing.
References
0. Ax&son. A generalized SSOR. BIT 13 (1972) 443-467.

R.E. Bank. PLTMG b’scr’s Glriclc. Research Report. University of California at San Diego. San Diego. 1981.
C’. Douglas. A multigrid optimal order solver for elliptic boundary value problems: the finite difference case, in:
R. Vichncvctsky and R.S. Stepleman. Eds.. Aclr~rtc~ev in Conlpurc~rMe:hJs fiw Purriul D@xwtiul Equurions -. V
(IMAC‘S. New Brunswick. NJ. 1984) 369-374.
T. Dupont. R.P. Kendall. and H.H. Rachford. Jr.. An approximate factorization procedure for solving self-adjoint
elliptic diffcrcnce equations. S/n M J Nmwr. Ad S (1968) 559-573.
S.C’. Ei~enstat. Efficient implementation of ;L class of conjugate gradient methods. SIAM J. Sci. Star. Conop. 2
(19X1).
I. Gustafsson. A clrrss of first order factorization methods. BIT IX (1978) 142-156.
J.A. Mcijcrink and H.A. van der Vorst. An iterative solution method for linear systems of which the the
cocfficicnt matrix is ;I symmetric M-matrix. Muth. Cmp. _{I (1977) 148-162.
P.K.W. Vinsome. Orthomin. and iterative method for solving sparse sets of simultaneous linear equations. in:
Prr~ccwlir~,~.~
I)/ rlw Frmrlt .~wtpa~iunlor1Rmwwir Sinrulutiort , Society of Petroleum Engineers of AIM E. 1976, pp.
149.-159.
D.M. Young. lrerutir*c Sol~trronof l.urgcj Lincur ...~wv~u (Academic Press. New York. 1971).
D.M. Young and K.C’. JCL Generalized conjugate gradient acceleration of nonsymmetrizahle iterative methods,
I.imw Al~qdvwirrid 1I.vApphurioris ,‘4 ( 1980) 159- 194.

Bank 1985

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Bank 1985

Uploaded by

Copyright:

Available Formats

Applied Numerical Mathematics 1 (1985) 489-492 4x9

AN EFFICIENT IMPLEMENTATtON FOR SSOR AND INCOMPLETE

We consider the solution of the linear system

reduce the number of multiplies per iteration to M + N.

016809274/85/$3.30 63 1985, Elsevier Science Publishers B.V. (North-Holland)

Algorithm Fixed acceleration parameters-Preliminary.

Straightforward implementation of Algorithm 1 requires 2M + N multiplies. Using the process in

Algorithm 2. Fixed acceleration parameters- Final.

The computational cost of the inner loops of Algorithm 2 is at most M = 4N multiplies. If we do

Algorithm 3. PCG - Preliminary

Algorithm 4. PCG- Final.

Assume A is an N X N nonsymmetric stiffness matrix and A = D - L - U. where D is

(D- u)z=q, (7c)

The generalization of Algorithm 2 requires M + 0( _N) multiplies. Unfortunately. some adaptive

Unirccelrrated 2M+N M+ZN M+N

0. Ax&son. A generalized SSOR. BIT 13 (1972) 443-467.

You might also like