Download as pdf or txt
Download as pdf or txt
You are on page 1of 723

Scientific Libraries Reference

Manual, Volume 1
004–2081–002
Copyright © 1989, 1994, 1995, 1997–1999 Silicon Graphics, Inc. All Rights Reserved. This manual or parts thereof may not be
reproduced in any form unless permitted by contract or by written permission of Silicon Graphics, Inc.

LIMITED AND RESTRICTED RIGHTS LEGEND

Use, duplication, or disclosure by the Government is subject to restrictions as set forth in the Rights in Data clause at FAR
52.227-14 and/or in similar or successor clauses in the FAR, or in the DOD, DOE or NASA FAR Supplements. Unpublished rights
reserved under the Copyright Laws of the United States. Contractor/manufacturer is Silicon Graphics, Inc., 1600 Amphitheatre
Pkwy., Mountain View, CA 94043-1351.

Autotasking, CF77, Cray, Cray Ada, CraySoft, Cray Y-MP, Cray-1, CRInform, CRI/TurboKiva, HSX, LibSci, MPP Apprentice, SSD,
SUPERCLUSTER, UNICOS, X-MP EA, and UNICOS/mk are federally registered trademarks and Because no workstation is an
island, CCI, CCMT, CF90, CFT, CFT2, CFT77, ConCurrent Maintenance Tools, COS, Cray Animation Theater, Cray APP, Cray C90,
Cray C90D, Cray C++ Compiling System, CrayDoc, Cray EL, Cray J90, Cray J90se, CrayLink, Cray NQS, Cray/REELlibrarian,
Cray S-MP, Cray SSD-T90, Cray SV1, Cray T90, Cray T3D, Cray T3E, CrayTutor, Cray X-MP, Cray XMS, Cray-2, CSIM, CVT,
Delivering the power . . ., DGauss, Docview, EMDS, GigaRing, HEXAR, IOS, ND Series Network Disk Array,
Network Queuing Environment, Network Queuing Tools, OLNET, RQS, SEGLDR, SMARTE, SUPERLINK,
System Maintenance and Remote Testing Environment, Trusted UNICOS, and UNICOS MAX are trademarks of Cray Research,
L.L.C., a wholly owned subsidiary of Silicon Graphics, Inc.

SGI is a trademark of Silicon Graphics, Inc. IRIX and Silicon Graphics are registered trademarks and the Silicon Graphics logo is a
trademark of Silicon Graphics, Inc.

CDC is a trademark of Control Data Systems, Inc. DEC, ULTRIX, VAX, and VMS are trademarks of Digital Equipment
Corporation. ER90 is a trademark of EMASS, Inc. ETA is a trademark of ETA Systems, Inc. IBM is a trademark of International
Business Machines Corporation. MIPS is a trademark of MIPS Computer Systems. UNIX is a registered trademark in the United
States and other countries, licensed exclusively through X/Open Company Limited. X/Open is a registered trademark of X/Open
Company Ltd. X Window System and the X device are trademarks of The Open Group.

The UNICOS operating system is derived from UNIX® System V. The UNICOS operating system is also based in part on the
Fourth Berkeley Software Distribution (BSD) under license from The Regents of the University of California.
New Features

Scientific Libraries Reference Manual, Volume 1 004–2081–002

No user interface changes were made for this release.


Record of Revision

Version Description

5.0 March 1989


Documentation supporting the UNICOS 5.0 release running on Cray Research
computer systems.

6.0 January 1991


Reprint with revision supporting the UNICOS 6.0 release running on Cray Research
computer systems.

7.0 August 1992


Reprint with revision supporting UNICOS 7.0 release running on Cray Research
computer systems.

8.0 August 1993


Reprint with revision supporting the CrayLibs 1.0 release (asynchronous) that runs
on Cray Research systems. In this revision of the documentation, the math library is
no longer documented in the same manual as the scientific library. Instead, it is
documented in the Math Library Reference Manual, publication SR-2138. The sort and
search routines, which were in the UNICOS 7.0 version of the scientific library, were
moved to the UNICOS Fortran Library.

8.1 June 1994


Rewrite to support the CrayLibs 1.1 release (asynchronous) that runs on Cray
Research systems. This revision incorporates support for the Cray MPP hardware
platform.

8.2 October 1994


Rewrite to support the CrayLibs 1.2 release (asynchronous) that runs on Cray
Research systems. This revision incorporates support for the Basic Linear Algebra
Subprograms for shared arrays (BLAS_S).

2.0 December 1995


Rewrite to support the CrayLibs 2.0 release that runs on Cray Research systems.
This revision incorporates support for Scalable LAPACK (ScaLAPACK) and
documented support for 32-bit FFT routines. Additional routines were added for
FFT and BLAS.

3.0 June 1997


Rewrite to support the CrayLibs 3.0 release that runs on Cray Research systems.
This revision removes support for the Basic Linear Algebra Subprograms for shared

004–2081–002 i
Scientific Libraries Reference Manual, Volume 1

arrays (BLAS_S). See the New Features page for more details about additional
functionality added at this release.

3.1 August 1998


Updated to reflect changes in the Programming Environment 3.1 release. The
printed text of this manual was made available in postscript (.ps) format only for
this release.

3.3 July 1999


Updated to reflect changes in the Programming Environment 3.3 release. The
printed text of this manual was made available in postscript (.ps) format only for
this release.

ii 004–2081–002
About This Guide

This publication documents subprograms and routines available to users of the


CrayLibs product, which is included in the Programming Environment 3.3
release. The CrayLibs product contains several libraries; the library routines can
be called from source code written in a number of programming languages,
including Fortran, C, Pascal, and assembly language. The information in this
document supplements information contained in other manuals of the
Programming Environment documentation set.
This is a reference manual for application and system programmers. Readers
should also have a working knowledge of either the UNICOS, UNICOS/mk, or
UNIX operating system and a working knowledge of the Fortran or C
programming language.

Documentation Organization
The printed versions of the Scientific Library man pages appear in 2 volumes
and are grouped according to topics. See the INTRO_LIBSCI(3S) man page for
details about the contents of each volume.
Each topic section also has an introductory man page which explains the
contents of the section and provides other information about the usage of those
routines. The following introductory man pages are available:
INTRO_BLACS(3S)
INTRO_BLAS1(3S)
INTRO_BLAS2(3S)
INTRO_BLAS3(3S)
INTRO_CORE(3S)
INTRO_FFT(3S)
INTRO_LAPACK(3S)
INTRO_MACH(3S)
INTRO_SCALAPACK(3S)
INTRO_SPARSE(3S)

004–2081–002 iii
Scientific Libraries Reference Manual, Volume 1

INTRO_SPEC(3S)
INTRO_SUPERSEDED(3S)

Related Publications
The following manuals document the CrayLibs products. All man pages in
these manuals can also be viewed online by using the man command.
• Intrinsic Procedures Reference Manual
• Application Programmer’s Library Reference Manual
• Scientific Libraries Ready Reference
• Application Programmer’s Library Ready Reference
The following manuals describe the products in the Programming Environment.
These publications describe the operating system, input/output (I/O), and
other related topics.
• Segment Loader (SEGLDR) and ld Reference Manual
• UNICOS User Commands Reference Manual
• UNICOS User Commands Ready Reference
• Guide to Parallel Vector Applications
• Application Programmer’s I/O Guide
In addition to these documents, several documents are available that describe
the compiler systems available on UNICOS and UNICOS/mk. Some of these
manuals are:
• CF90 Ready Reference
• CF90 Commands and Directives Reference Manual
• Fortran Language Reference Manual, Volume 1
• Fortran Language Reference Manual, Volume 2
• Fortran Language Reference Manual, Volume 3
• Cray C/C++ Reference Manual

iv 004–2081–002
About This Guide

The following manuals document the compilers that are available on IRIX
systems:
• MIPSPro 7 Fortran 90 Commands and Directives Reference Manual
• MIPSpro Assembly Language Programmer’s Guide
• MIPSpro Fortran 77 Language Reference Manual
• MIPSpro Fortran 77 Programmer’s Guide
• MIPSpro 64-Bit Porting and Transition Guide

Obtaining Publications
SGI maintains information about available publications at the following URL:
http://techpubs.sgi.com/library

This Web site contains information that allows you to browse documents online,
order documents, and send feedback to SGI. You can also order a printed SGI
document by calling 1 800 627 9307.
The User Publications Catalog describes the availability and content of all Cray
hardware and software documents that are available to customers. Customers
who subscribe to the Cray Inform (CRInform) program can access this
information on the CRInform system.
SGI maintains information on publicly available Cray documents at the
following URL:
http://www.cray.com/swpubs/

This Web site contains information that allows you to browse documents online
and send feedback to SGI. To order a printed Cray document, either call
+1 651 683 5907 or send a facsimile of your request to fax number
+1 651 683 3840. SGI employees may also order printed Cray documents by
sending their orders via electronic mail to orderdsk.
Customers outside of the United States and Canada should contact their local
service organization for ordering information and documentation information.

Conventions
The following conventions are used throughout this document:

004–2081–002 v
Scientific Libraries Reference Manual, Volume 1

Convention Meaning
command This fixed-space font denotes literal items such as
commands, files, routines, path names, signals,
messages, and programming language structures.
variable Italic typeface denotes variable entries and words
or concepts being defined.
user input This bold, fixed-space font denotes literal items
that the user enters in interactive sessions.
Output is shown in nonbold, fixed-space font.
In addition to these formatting conventions, several naming conventions are
used throughout the documentation. “Cray PVP systems” denotes all
configurations of Cray parallel vector processing (PVP) systems that run the
UNICOS operating system. “Cray MPP systems” denotes all configurations of
the Cray T3E series that run the UNICOS/mk operating system. “IRIX
systems” denotes SGI platforms which run the IRIX operating system.
The default shell in the UNICOS and UNICOS/mk operating systems, referred
to as the standard shell, is a version of the Korn shell that conforms to the
following standards:
• Institute of Electrical and Electronics Engineers (IEEE) Portable Operating
System Interface (POSIX) Standard 1003.2–1992
• X/Open Portability Guide, Issue 4 (XPG4)
The UNICOS and UNICOS/mk operating systems also support the optional use
of the C shell.

Man page sections


The entries in this document are based on a common format. The following list
shows the order of sections in an entry and describes each section. Most entries
contain only a subset of these sections.

Section heading Description


NAME Specifies the name of the entry and briefly states
its function.
SYNOPSIS Presents the syntax of the entry.
IMPLEMENTATION Identifies the systems to which the entry applies.

vi 004–2081–002
About This Guide

STANDARDS Provides information about the portability of a


utility or routine.
DESCRIPTION Discusses the entry in detail.
NOTES Presents items of particular importance.
CAUTIONS Describes actions that can destroy data or
produce undesired results.
WARNINGS Describes actions that can harm people,
equipment, or system software.
ENVIRONMENT Describes predefined shell variables that
VARIABLES determine some characteristics of the shell or that
affect the behavior of some programs, commands,
or utilities.
RETURN VALUES Describes possible return values that indicate a
library or system call executed successfully, or
identifies the error condition under which it
failed.
EXIT STATUS Describes possible exit status values that indicate
whether the command or utility executed
successfully.
MESSAGES Describes informational, diagnostic, and error
messages that may appear. Self-explanatory
messages are not listed.
ERRORS Documents error codes. Applies only to system
calls.
FORTRAN Describes how to call a system call from Fortran.
EXTENSIONS Applies only to system calls.
BUGS Indicates known bugs and deficiencies.
EXAMPLES Shows examples of usage.
FILES Lists files that are either part of the entry or are
related to it.

004–2081–002 vii
Scientific Libraries Reference Manual, Volume 1

SEE ALSO Lists entries and publications that contain related


information.

Reader Comments
If you have comments about the technical accuracy, content, or organization of
this document, please tell us. Be sure to include the title and part number of
the document with your comments.
You can contact us in any of the following ways:
• Send e-mail to the following address:
techpubs@sgi.com

• Send a fax to the attention of “Technical Publications” at: +1 650 932 0801.
• Use the Feedback option on the Technical Publications Library World Wide
Web page:
http://techpubs.sgi.com

• Call the Technical Publications Group, through the Technical Assistance


Center, using one of the following numbers:
For SGI IRIX based operating systems: 1 800 800 4SGI
For UNICOS or UNICOS/mk based operating systems or Cray Origin 2000
systems: 1 800 950 2729 (toll free from the United States and Canada) or
+1 651 683 5600
• Send mail to the following address:
Technical Publications
SGI
1600 Amphitheatre Pkwy.
Mountain View, California 94043–1351
We value your comments and will respond to them promptly.

viii 004–2081–002
CONTENTS

intro_libsci, INTRO_LIBSCI .......................... Introduction to Scientific Library routines .................................................... 1

Solvers for dense linear systems and eigensystems


intro_lapack, INTRO_LAPACK .......................... Introduction to LAPACK solvers for dense linear systems .......................... 7
eispack, EISPACK .................................................. Introduction to Eigensystem computation for dense linear systems ........... 23
linpack, LINPACK .................................................. Single-precision real and complex LINPACK routines .............................. 29

Vector-vector linear algebra subprograms


intro_blas1, INTRO_BLAS1 ............................... Introduction to vector-vector linear algebra subprograms .......................... 33
csrot, CSROT ........................................................... Applies a real plane rotation to a pair of complex vectors ........................ 37
haxpy, HAXPY, GAXPY ............................................. Adds a scalar multiple of a real or complex vector to another real or
complex vector ............................................................................................ 38
hdot, HDOT, GDOTC, GDOTU .................................... Computes a dot product (inner product) of two real or complex
vectors .......................................................................................................... 40
sasum, SASUM, SCASUM ........................................... Sums the absolute value of elements in a real or complex vector ............. 42
saxpby, SAXPBY, CAXPBY ...................................... Adds a scalar multiple of a real or complex vector x to a scalar
multiple of another real or complex vector y .............................................. 44
saxpy, SAXPY, CAXPY ............................................. Adds a scalar multiple of a real or complex vector to another real or
complex vector ............................................................................................ 46
scopy, SCOPY, CCOPY ............................................. Copies a real or complex vector into another real or complex vector ....... 48
sdot, SDOT, CDOTC, CDOTU .................................... Computes a dot product (inner product) of two real or complex
vectors .......................................................................................................... 50
shad, SHAD ................................................................ Computes the Hadamard product of two vectors ........................................ 52
snrm2, SNRM2, SCNRM2 ........................................... Computes the Euclidean norm of a vector .................................................. 54
spaxpy, SPAXPY ...................................................... Adds a scalar multiple of a real vector to a sparse real vector .................. 56
spdot, SPDOT ........................................................... Computes the dot product of a real vector and a real sparse vector .......... 57
srot, SROT, CROT .................................................... Applies a real plane rotation or complex coordinate rotation ..................... 58
srotg, SROTG, CROTG ............................................. Constructs a Givens plane rotation ............................................................. 60
srotm, SROTM ........................................................... Applies a modified Givens plane rotation ................................................... 62
srotmg, SROTMG ...................................................... Constructs a modified Givens plane rotation .............................................. 64
sscal, SSCAL, CSSCAL, CSCAL ............................. Scales a real or complex vector .................................................................. 70
ssum, SSUM, CSUM .................................................... Sums the elements of a real or complex vector .......................................... 72
sswap, SSWAP, CSWAP ............................................. Swaps two real or complex vectors ............................................................ 73

Matrix-vector linear algebra subprograms


intro_blas2, INTRO_BLAS2 ............................... Introduction to matrix-vector linear algebra subprograms .......................... 75
chbmv, CHBMV ........................................................... Multiplies a complex vector by a complex Hermitian band matrix ........... 78
chemv, CHEMV ........................................................... Multiplies a complex vector by a complex Hermitian matrix .................... 81
cher, CHER ................................................................
Performs Hermitian rank 1 update of a complex Hermitian matrix ........... 83
cher2, CHER2 ........................................................... Performs Hermitian rank 2 update of a complex Hermitian matrix ........... 85
chpmv, CHPMV ........................................................... Multiplies a complex vector by a packed complex Hermitian matrix ........ 87
chpr, CHPR ................................................................
Performs Hermitian rank 1 update of a packed complex Hermitian
matrix ........................................................................................................... 89
chpr2, CHPR2 ........................................................... Performs Hermitian rank 2 update of a packed complex Hermitian
matrix ........................................................................................................... 91
sgbmv, SGBMV, CGBMV ............................................. Multiplies a real or complex vector by a real or complex general
band matrix .................................................................................................. 93

004– 2081– 002 ix


sgemv, SGEMV, CGEMV ............................................. Multiplies a real or complex vector by a real or complex general
matrix ........................................................................................................... 96
sger, SGER, CGERC, CGERU .................................... Performs rank 1 update of a real general matrix ........................................ 98
sgesum, SGESUM, CGESUM ...................................... Adds a scalar multiple of a real or complex matrix to a scalar
multiple of another real or complex matrix .............................................. 100
ssbmv, SSBMV ........................................................... Multiplies a real vector by a real symmetric band matrix ........................ 102
sspmv, SSPMV, CSPMV ............................................. Multiplies a real or complex symmetric packed matrix by a real or
complex vector .......................................................................................... 104
sspr, SSPR, CSPR .................................................... Performs symmetric rank 1 update of a real or complex symmetric
packed matrix ............................................................................................ 106
sspr2, SSPR2 ........................................................... Performs symmetric rank 2 update of a real symmetric packed matrix ... 108
sspr12, SSPR12 ...................................................... Performs two simultaneous symmetric rank 1 updates of a real
symmetric packed matrix .......................................................................... 110
ssymv, SSYMV, CSYMV ............................................. Multiplies a real or complex vector by a real or complex symmetric
matrix ......................................................................................................... 112
ssyr, SSYR, CSYR .................................................... Performs symmetric rank 1 update of a real or complex symmetric
matrix ......................................................................................................... 114
ssyr2, SSYR2 ........................................................... Performs symmetric rank 2 update of a real symmetric matrix ............... 116
stbmv, STBMV, CTBMV ............................................. Multiplies a real or complex vector by a real or complex triangular
band matrix ................................................................................................ 118
stbsv, STBSV, CTBSV ............................................. Solves a real or complex triangular banded system of equations ............. 121
stpmv, STPMV, CTPMV ............................................. Multiplies a real or complex vector by a real or complex triangular
packed matrix ............................................................................................ 124
stpsv, STPSV, CTPSV ............................................. Solves a real or complex triangular packed system of equations ............. 126
strmv, STRMV, CTRMV ............................................. Multiplies a real or complex vector by a real or complex triangular
matrix ......................................................................................................... 128
strsv, STRSV, CTRSV ............................................. Solves a real or complex triangular system of equations ......................... 130

Matrix-matrix linear algebra subprograms


intro_blas3, INTRO_BLAS3 ............................... Introduction to matrix-matrix linear algebra subprograms ........................ 133
chemm, CHEMM ........................................................... Multiplies a complex general matrix by a complex Hermitian matrix ..... 135
cher2k, CHER2K ...................................................... Performs Hermitian rank 2k update of a complex Hermitian matrix ....... 138
cherk, CHERK ........................................................... Performs Hermitian rank k update of a complex Hermitian matrix ......... 141
scopy2, SCOPY2, CCOPY2 ...................................... Copies a real or complex matrix into another real or complex matrix .... 143
sgemm, SGEMM, CGEMM ............................................. Multiplies a real or complex general matrix by a real or complex
general matrix ............................................................................................ 145
sgemms, SGEMMS, CGEMMS ...................................... Multiplies a real or complex general matrix by a real or complex
general matrix, using Strassen’s algorithm ............................................... 148
ssymm, SSYMM, CSYMM ............................................. Multiplies a real or complex general matrix by a real or complex
symmetric matrix ....................................................................................... 152
ssyr2k, SSYR2K, CSYR2K ...................................... Performs symmetric rank 2k update of a real or complex symmetric
matrix ......................................................................................................... 155
ssyrk, SSYRK, CSYRK ............................................. Performs symmetric rank k update of a real or complex symmetric
matrix ......................................................................................................... 158
strmm, STRMM, CTRMM ............................................. Multiplies a real or complex general matrix by a real or complex
triangular matrix ........................................................................................ 160

Signal processing routines (FFT)


intro_fft, INTRO_FFT ........................................ Introduction to signal processing routines ................................................. 163

x 004– 2081– 002


ccfft, CCFFT ........................................................... Applies a multitasked complex-to-complex Fast Fourier Transform
(FFT) .......................................................................................................... 167
ccfft2d, CCFFT2D .................................................. Applies a two-dimensional complex-to-complex Fast Fourier
Transform (FFT) ........................................................................................ 172
ccfft3d, CCFFT3D .................................................. Applies a three-dimensional complex-to-complex Fast Fourier
Transform (FFT) ........................................................................................ 178
ccfftm, CCFFTM ...................................................... Applies multiple multitasked complex-to-complex Fast Fourier
Transforms (FFTs) ..................................................................................... 183
ccnvl, CCNVL ........................................................... Computes the convolution of a complex sequence with one or more
other complex sequences ........................................................................... 188
ccnvlf, CCNVLF ...................................................... Computes the convolution of a complex sequence with one or more
other complex sequences by using a Fourier transform method .............. 192
cfft, CFFT ................................................................ Applies a multitasked complex Fast Fourier Transform (FFT) ................ 195
cfft2, CFFT2 ........................................................... Applies a complex Fast Fourier Transform (FFT) .................................... 201
cfft2d, CFFT2D ...................................................... Applies a multitasked two-dimensional complex Fast Fourier
Transform (FFT) ........................................................................................ 202
cfft3d, CFFT3D ...................................................... Applies a multitasked three-dimensional complex Fast Fourier
Transform (FFT) ........................................................................................ 208
cfftmlt, CFFTMLT .................................................. Applies complex-to-complex Fast Fourier Transforms (FFTs) on
multiple input vectors ................................................................................ 215
crfft2, CRFFT2 ...................................................... Applies a complex-to-real Fast Fourier Transform (FFT) ........................ 217
descinit3d, DESCINIT3D ................................... Initializes a descriptor vector that contains information about the
distribution of a three-dimensional (3D) array across a 3D grid of
processors .................................................................................................. 219
filterg, FILTERG .................................................. Computes a correlation of two vectors ..................................................... 224
filters, FILTERS .................................................. Computes a correlation of two vectors (symmetric coefficient) ............... 225
ggfft, GGFFT ........................................................... Applies a multitasked complex-to-complex Fast Fourier Transform
(FFT) .......................................................................................................... 227
hconv, HCONV ........................................................... Performs the convolution of two sequences of real numbers ................... 231
hcorr, HCORR ........................................................... Performs the correlation of two sequences of real numbers ..................... 233
hcorrs, HCORRS ...................................................... Performs the correlation of two sequences of real numbers
(symmetric filter) ....................................................................................... 235
hgfft, HGFFT, GHFFT ............................................. Computes a real-to-complex or complex-to-real Fast Fourier
Transform (FFT) ........................................................................................ 237
hopfilt, HOPFILT .................................................. Solves Weiner-Levinson linear equations ................................................. 243
mcfft, MCFFT ........................................................... Applies multiple multitasked complex Fast Fourier Transforms
(FFTs) ........................................................................................................ 245
opfilt, OPFILT ...................................................... Solves Weiner-Levinson linear equations ................................................. 253
pccfft2d, PCCFFT2D ............................................. Applies a two-dimensional (2D) complex-to-complex Fast Fourier
Transform (FFT) to a matrix distributed across a set of processors ......... 255
pccfft3d, PCCFFT3D ............................................. Applies a three-dimensional (3D) complex-to-complex Fast Fourier
Transform (FFT) to a matrix distributed across a set of processors ......... 262
pscfft2d, PSCFFT2D, PCSFFT2D ........................ Applies a two-dimensional (2D) real-to-complex or complex-to-real
Fast Fourier Transform (FFT) to a matrix distributed across a set of
processors .................................................................................................. 269
pscfft3d, PSCFFT3D, PCSFFT3D ........................ Applies a three-dimensional (3D) real-to-complex or complex-to-real
Fast Fourier Transform (FFT) to a matrix distributed across a set of
processors .................................................................................................. 277
rcfft2, RCFFT2 ...................................................... Applies a real-to-complex Fast Fourier Transform (FFT) ........................ 285
rfftmlt, RFFTMLT .................................................. Applies complex-to-real or real-to-complex Fast Fourier Transforms
(FFTs) on multiple input vectors .............................................................. 286

004– 2081– 002 xi


scfft, SCFFT, CSFFT ............................................. Computes a real-to-complex or complex-to-real Fast Fourier
Transform (FFT) ........................................................................................ 290
scfft2d, SCFFT2D, CSFFT2D ............................... Applies a two-dimensional real-to-complex or complex-to-real Fast
Fourier Transform (FFT) ........................................................................... 296
scfft3d, SCFFT3D, CSFFT3D ............................... Applies a multitasked three-dimensional real-to-complex Fast Fourier
Transform (FFT) ........................................................................................ 303
scfftm, SCFFTM, CSFFTM ...................................... Applies multiple real-to-complex or complex-to-real Fast Fourier
Transforms (FFTs) ..................................................................................... 309
scnvl1d, SCNVL1D .................................................. Computes a real one-dimensional (1D) convolution of two vectors ........ 316
scnvl2d, SCNVL2D .................................................. Computes a real two-dimensional (2D) convolution of two matrices ...... 320
sconv, SCONV ........................................................... Performs the convolution of two sequences of real numbers ................... 326
scorr, SCORR ........................................................... Performs the correlation of two sequences of real numbers ..................... 328
scorrs, SCORRS ...................................................... Performs the correlation of two sequences of real numbers
(symmetric filter) ....................................................................................... 330

xii 004– 2081– 002


INTRO_LIBSCI ( 3S ) INTRO_LIBSCI ( 3S )

NAME
INTRO_LIBSCI – Introduction to Scientific Library routines

IMPLEMENTATION
See individual man pages for implementation details

DESCRIPTION
The printed versions of the Scientific Library routines appear in 3 volumes and are grouped according to
topics. Not all man pages are available on all hardware types; see the individual man pages for details about
supported hardware types.
Volume 1 contains the following topic sections:
• Solvers for dense linear systems and eigensystems (see INTRO_LAPACK(3S) introductory man page)
• Vector-vector linear algebra subprograms (see INTRO_BLAS1(3S) introductory man page)
• Matrix-vector linear algebra subprograms (see INTRO_BLAS2(3S) introductory man page)
• Matrix-matrix linear algebra subprograms (see INTRO_BLAS3(3S) introductory man page)
• Signal processing routines (see INTRO_FFT(3S) introductory man page)
Volume 2 contains the following topic sections:
• Solvers for dense linear systems and eigensystems (see INTRO_LAPACK(3S) introductory man page)
• Scalable LAPACK subprograms for UNICOS/mk systems (see INTRO_SCALAPACK(3S) introductory
man page)
• Solvers for sparse linear systems (not available on UNICOS/mk systems) (see INTRO_SPARSE(3S)
introductory man page)
• Solvers for special linear systems (not available on UNICOS/mk systems) (see INTRO_SPEC(3S)
introductory man page)
• Basic Linear Algebra Communication Subprograms (BLACS) routines (see INTRO_BLACS(3S)
introductory man page)
• Out-of-core routines (not available on UNICOS/mk systems) (see INTRO_CORE(3S) introductory man
page)
• Machine constant functions (see INTRO_MACH(3S) introductory man page)
• Superseded routines (not available on UNICOS/mk systems): (see INTRO_SUPERSEDED(3S)
introductory man page)

004– 2081– 002 1


INTRO_LIBSCI ( 3S ) INTRO_LIBSCI ( 3S )

NOTES

Default kinds
When using the CF90 compiler or MIPSpro 7 Fortran 90 compiler on UNICOS, UNICOS/mk, or IRIX, all
arguments must be of default kind unless documented otherwise. On UNICOS and UNICOS/mk, default
kind is KIND=8 for integer, real, complex, and logical arguments; on IRIX, the default kind is KIND=4.
Multitasking
Many of the Scientific Library routines are multitasked. This means that a program that calls a multitasked
Scientific Library routine will run in parallel mode and take advantage of multiple processors whenever
possible, even if the program has not specifically requested multitasking. If a significant percentage of time
is spent in the Scientific Library routine, this feature can significantly reduce wall-clock time.
The NCPUS environment variable determines the maximum number of (logical) central processors that a
multitasked Scientific Library routine uses. If you do not define this variable, the default value is the
number of central processors on the system. To change the number of CPUs used, you can set the value of
NCPUS before your program is executed. If you do not want your program to run in multitasked mode, set
the value of NCPUS equal to 1.
To set the number of logical CPUs used by multitasked Scientific Library routines equal to n, use one of the
following commands.
Under the POSIX shell (sh) or Korn shell (ksh):
NCPUS= n
export NCP US

Under C shell (csh):


setenv NCP US n

The majority of LAPACK routines do perform multitasking.


Guidelines for Choosing NCPUS
The value of NCPUS can dramatically impact the performance of the routines in the Scientific Library.
Clearly, in a dedicated environment you should set NCPUS to the number of physical processors on the
machine to get optimal performance. However, in a multiuser environment, the choice of NCPUS is not as
simple. Setting NCPUS to a large number on a heavily loaded machine probably will lead to an increase in
both elapsed "wall-clock" time and user time. Setting NCPUS to a very small number can mean a lost
opportunity for better job turnaround time.
Multitasked Routines
The following tables display the Scientific Library routines that are multitasked. Each of these routines (or
one or more subprograms called by each routine) was compiled with Autotasking, except for those routines
marked as superseded. These routines (or one or more of its subprograms) were compiled with
microtasking.

2 004– 2081– 002


INTRO_LIBSCI ( 3S ) INTRO_LIBSCI ( 3S )

The routines are grouped by the section of the manual in which they appear, according to the list given
previously in the DESCRIPTION section of this man page. In many cases, a real variable (single-precision)
routine is paired with its complex variable equivalent.
LAPACK routines are not listed. Most LAPACK routines do not perform multiprocessing, but almost all
LAPACK routines call Level 2 BLAS and Level 3 BLAS that do multiprocessing.
The following are the multitasked Level 2 BLAS routines:
• SGBMV, CGBMV
• SGEMV, CGEMV
• SGER
• CGERC
• CGERU
• CHBMV
• SSBMV
• STRSV, CTRSV
• CHEMV
• CHER
• CHER2
• SSPR
• SSPR12
• SSYMV, CSYMV
• SSYR, CSYR
• SSYR2
• STBMV, CTBMV
• STBSV, CTBSV
• STRMV, CTRMV
The following are the multitasked Level 3 BLAS routines:
• SCOPY2, CCOPY2
• SGEMMS, CGEMMS
• SGEMM, CGEMM
• CHEMM
• CHER2K
• CHERK

004– 2081– 002 3


INTRO_LIBSCI ( 3S ) INTRO_LIBSCI ( 3S )

• SSYMM, CSYMM
• SSYR2K, CSYR2K
• SSYRK, CSYRK
• STRMM, CTRMM
• STRSM, CTRSM
The following are the multitasked LINPACK routines:
• SCHDD, CCHDD
• SCHEX, CCHEX
• SCHUD, CCHUD
• SGBFA, CGBFA
• SGEDI
• SGEFA
• SPODI, CPODI
• SSVDC, CSVDC
• STRDI, CTRDI
The following are the multitasked Out-of-core routines:
• SCOPY2RV, CCOPY2RV
• SCOPY2VR, CCOPY2VR
• VSGEMM, VCGEMM
• VSGETRF, VCGETRF
• VSGETRS, VCGETRS
• VSPOTRF, VSPOTRS
• VSTRSM, VCTRSM
The following are the multitasked EISPACK routines:
• BAKVEC
• BALBAK
• BANDR
• CBABK2
• COMBAK
• COMLR2
• COMQR2

4 004– 2081– 002


INTRO_LIBSCI ( 3S ) INTRO_LIBSCI ( 3S )

• CORTB
• CORTH
• FIGI2
• HQR2
• HTRIB3
• HTRIDI
• IMTQLV
• MINFIT
• QZIT
• REBAKB
• REDUC
• REDUC2
• SVD
• TRED2
The following are the multitasked Sparse routines:
• SITRSOL
• SSGETRF
• SSGETRS
• SSPOTRF
• SSPOTRS
• SSTSTRF
• SSTSTRS
The following are the multitasked Signal processing routines:
• CCNVL
• CCNVLF
• CCFFT
• CCFFTM
• CCFFT2D
• CCFFT3D

004– 2081– 002 5


INTRO_LIBSCI ( 3S ) INTRO_LIBSCI ( 3S )

The following are the multitasked Superseded routines:


• MXM
• MXMA
• MXV
• MXVA
• SMXPY
• SXMPY
Multiple-routine Man Pages
Many of the routines in the Scientific Library are available in both real (single-precision) and complex
versions. Often little or no difference exists between these versions, other than the data types of some inputs
and outputs. In this case, the routines are described on the same man page, and that man page is named
after the real (single-precision) routine.
The man(1) command can find such a man page online by either the real or complex routine name.
However, in this manual, you may have to search for the real equivalent of any complex routine you want to
find. Typically, the real routine name is the same as the complex routine name, except that when the
complex routine name’s first or second letter is C (for complex), the real routine name’s first or second letter
is S (for single precision).

SEE ALSO
INTRO_BLACS(3S), INTRO_BLAS1(3S), INTRO_BLAS2(3S), INTRO_BLAS3(3S), INTRO_FFT(3S),
INTRO_LAPACK(3S), INTRO_MACH(3S), INTRO_SCALAPACK(3S)
The following man pages are not available on UNICOS/mk systems:
INTRO_CORE(3S), INTRO_SPARSE(3S), INTRO_SPEC(3S), INTRO_SUPERSEDED(3S)

6 004– 2081– 002


INTRO_LAPACK ( 3S ) INTRO_LAPACK ( 3S )

NAME
INTRO_LAPACK – Introduction to LAPACK solvers for dense linear systems

IMPLEMENTATION
See individual man pages for implementation details

DESCRIPTION
The preferred solvers for dense linear systems are those parts of the LAPACK package included in the
current version of the Scientific Library. The LAPACK routines in the Scientific Library supersede the older
LINPACK routines (see LINPACK(3S) for more information).
LAPACK Routines
LAPACK is a public domain library of subroutines for solving dense linear algebra problems, including the
following:
• Systems of linear equations
• Linear least squares problems
• Eigenvalue problems
• Singular value decomposition (SVD) problems
For details about which routines are supported, see LAPACK Routines Contained in the Scientific Library,
which follows.
The LAPACK package is designed to be the successor to the older LINPACK and EISPACK packages. It
uses today’s high-performance computers more efficiently than the older packages. It also extends the
functionality of these packages by including equilibration, iterative refinement, error bounds, and driver
routines for linear systems, routines for computing and reordering the Schur factorization, and condition
estimation routines for eigenvalue problems.
Performance issues are addressed by implementing the most computationally-intensive algorithms by using
the Level 2 and 3 Basic Linear Algebra Subprograms (BLAS). Because most of the BLAS were optimized
in single- and multiple-processor environments for UNICOS and UNICOS/mk systems, these algorithms give
near optimal performance.
The original Fortran programs are described in the LAPACK User’s Guide by E. Anderson, Z. Bai,
C. Bischof, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney,
S. Ostrouchov, and D. Sorensen, published by the Society for Industrial and Applied Mathematics (SIAM),
Philadelphia, 1992. You can order the LAPACK User’s Guide, publication TPD– 0003.
LAPACK Routines Contained in the Scientific Library
Most of the single-precision (64-bit) real and complex routines from LAPACK 2.0 are supported in the
Scientific Library. This includes driver routines and computational routines for solving linear systems, least
squares problems, and eigenvalue and singular value problems. Selected auxiliary routines for generating
and manipulating elementary orthogonal transformations are also supported.

004– 2081– 002 7


INTRO_LAPACK ( 3S ) INTRO_LAPACK ( 3S )

The Scientific Library does not include the LAPACK driver routines for certain generalized eigenvalue and
singular value computations and the divide-and-conquer routines for computing eigenvalues, which were new
for LAPACK 2.0. This may be added in a future release. Also, most of the auxiliary routines used only
internally by LAPACK have been renamed to avoid conflicts with user-defined subroutine names.
The LAPACK routines in the Scientific Library are described online in man pages. For example, to see a
description of the arguments to the expert driver routine for solving a general system of equations, enter the
following command:
% man sgesvx

The user interface to all LAPACK routines is exactly the same as the standard LAPACK interface, except
for the CPTSV(3L) and CPTSVX(3L) driver routines. An optional character argument was added to CPTSV
and CPTSVX to afford upward compatibility with the storage format in LINPACK’s CPTSL. However,
because the argument is optional the LAPACK calling sequence also is accepted.
Several enhancements were made to the public-domain LAPACK software to improve performance for
UNICOS and UNICOS/mk systems. In particular, the solve routines were redesigned to give better
performance for one or a small number of right-hand sides, and to make better use of parallelism when the
number of right-hand sides is large.
Tuning parameters for the block algorithms provided in the Scientific Library are set within the LAPACK
routine ILAENV(3L). ILAENV(3L) is an integer function subprogram that accepts information about the
problem type and dimensions, and it returns one integer parameter, such as the optimal block size, the
minimum block size for which a block algorithm should be used, or the crossover point (the problem size at
which it becomes more efficient to switch to an unblocked algorithm). The setting of tuning parameters
occurs without user intervention, but users may call ILAENV(3L) directly to discover the values that will be
used (for example, to determine how much workspace to provide).
Naming Scheme
The name of each LAPACK routine is a coded specification of its function (within the limits of standard
FORTRAN 77 six-character names).
All driver and computational routines have five- or six-character names of the form XYYZZ or XYYZZZ.
The first letter in each name, X, indicates the data type, as follows:
S REAL (single precision)
C COMPLEX
The next two letters, YY, indicate the type of matrix (or the most-significant matrix). Most of these
two-letter codes apply to both real and complex matrices, but a few apply specifically to only one or the
other. The matrix types are as follows:
BD BiDiagonal
GB General Band
GE GEneral (nonsymmetric)
GG General matrices, Generalized problem

8 004– 2081– 002


INTRO_LAPACK ( 3S ) INTRO_LAPACK ( 3S )

GT General Tridiagonal
HB Hermitian Band (complex only)
HE HErmitian (possibly indefinite) (complex only)
HG Hessenberg matrix, Generalized problem
HP Hermitian Packed (possibly indefinite) (complex only)
HS upper HeSsenberg
OP Orthogonal Packed (real only)
OR ORthogonal (real only)
PB Positive definite Band (symmetric or Hermitian)
PO POsitive definite (symmetric or Hermitian)
PP Positive definite Packed (symmetric or Hermitian)
PT Positive definite Tridiagonal (symmetric or Hermitian)
SB Symmetric Band (real only)
SP Symmetric Packed (possibly indefinite)
ST Symmetric Tridiagonal
SY SYmmetric (possibly indefinite)
TB Triangular Band
TG Triangular matrices, Generalized problem
TP Triangular Packed
TR TRiangular
TZ TrapeZoidal
UN UNitary (complex only)
UP Unitary Packed (complex only)
Some LAPACK auxiliary routines also have man pages on UNICOS and UNICOS/mk systems. These
routines use the special YY designation:
LA LAPACK Auxiliary routine
For example, ILAENV(3) is the auxiliary routine that determines the block size for a particular algorithm and
problem size.
The last two or three letters, ZZ or ZZZ, indicate the computation performed. For example, SGETRF
performs a TRiangular Factorization of a Single-precision (real) GEneral matrix; CGETRF performs the
factorization of a Complex GEneral matrix.

004– 2081– 002 9


INTRO_LAPACK ( 3S ) INTRO_LAPACK ( 3S )

Lists of Available LAPACK Routines


The following pages contain tables of driver and computational routines from LAPACK available in the
Scientific Library. For details about the argument lists and usage of these routines, see the individual online
man pages or the LAPACK User’s Guide, publication TPD– 0003.
Driver Routines
These routines are listed in alphabetical order.

Name Purpose

CHESV Solves a complex Hermitian indefinite system of linear equations AX = B.


CHESVX Solves a complex Hermitian indefinite system of linear equations AX = B and provides an
estimate of the condition number and error bounds on the solution.
CHPSV Solves a complex Hermitian indefinite system of linear equations AX = B; A is held in
packed storage.
CHPSVX Solves a complex Hermitian indefinite system of linear equations AX = B (A is held in
packed storage) and provides an estimate of the condition number and error bounds on the
solution.
SGBSV Solves a general banded system of linear equations AX = B.
CGBSV
SGBSVX Solves any of the following general banded systems of linear equations and provides an
CGBSVX estimate of the condition number and error bounds on the solution.
AX = B
T
A X=B
H
A X=B
SGEES Compute eigenvalues, Schur form, and Schur vectors of a general matrix
CGEES
SGEESX Compute eigenvalues, Schur form, Schur vectors, and condition numbers of a general matrix
CGEESX
SGEEV Compute eigenvalues and eigenvectors of a general matrix
CGEEV
SGEEVX Compute eigenvalues, eigenvectors, and condition numbers of a general matrix
CGEEVX
SGEGS Compute the generalized Schur factorization of a matrix pair (A,B)
CGEGS
SGEGV Compute the eigenvalues and eigenvectors of a matrix pair (A,B)
CGEGV

10 004– 2081– 002


INTRO_LAPACK ( 3S ) INTRO_LAPACK ( 3S )

Name Purpose

SGELS Finds a least squares or minimum norm solution of an overdetermined or underdetermined


CGELS linear system.
SGELSS Solve linear least squares problem using SVD
CGELSS
SGELSX Computes a minimum norm solution of a linear least squares problem using a complete
CGELSX orthogonal factorization.
SGESV Solves a general system of linear equations AX = B.
CGESV
SGESVD Compute the singular value decomposition (SVD) of a general matrix
CGESVD
SGESVX Solves any of the following general systems of linear equations and provides an estimate of
CGESVX the condition number and error bounds on the solution.
AX = B
T
A X=B
H
A X=B
SGTSV Solves a general tridiagonal system of linear equations AX = B.
CGTSV
SGTSVX Solves any of the following general tridiagonal systems of linear equations and provides an
CGTSVX estimate of the condition number and error bounds on the solution.
AX = B
T
A X=B
H
A X=B
SPBSV Solves a symmetric or Hermitian positive definite banded system of linear equations
CPBSV AX = B.
SPBSVX Solves a symmetric or Hermitian positive definite banded system of linear equations AX = B
CPBSVX and provides an estimate of the condition number and error bounds on the solution.
SPOSV Solves a symmetric or Hermitian positive definite system of linear equations AX = B.
CPOSV
SPOSVX Solves a symmetric or Hermitian positive definite system of linear equations AX = B and
CPOSVX provides an estimate of the condition number and error bounds on the solution.
SPPSV Solves a symmetric or Hermitian positive definite system of linear equations AX = B; A is
CPPSV held in packed storage.

004– 2081– 002 11


INTRO_LAPACK ( 3S ) INTRO_LAPACK ( 3S )

Name Purpose

SPPSVX Solves a symmetric or Hermitian positive definite system of linear equations AX = B (A is


CPPSVX held in packed storage) and provides an estimate of the condition number and error bounds
on the solution.
SPTSV Solves a symmetric or Hermitian positive definite tridiagonal system of linear equations
CPTSV AX = B.
SPTSVX Solves a symmetric or Hermitian positive definite tridiagonal system of linear equations
CPTSVX AX = B and provides an estimate of the condition number and error bounds on the solution.
SSBEV Compute all eigenvalues and eigenvectors of a symmetric or Hermitian band matrix
CHBEV
SSBEVX Compute selected eigenvalues and eigenvectors of a symmetric or Hermitian band matrix
CHBEVX
SSBGV Compute all eigenvalues and eigenvectors of a generalized symmetric-definite or Hermitian-
CHBGV definite banded eigenproblem
SSPEV Compute all eigenvalues and eigenvectors of a symmetric or Hermitian packed matrix
CHPEV
SSPEVX Compute selected eigenvalues and eigenvectors of a symmetric or Hermitian packed matrix
CHPEVX
SSPGV Compute all eigenvalues and eigenvectors of a generalized symmetric-definite or
CHPGV Hermitian-definite packed eigenproblem
SSPSV Solves a real or complex symmetric indefinite system of linear equations AX = B; A is held
CSPSV in packed storage.
SSPSVX Solves a real or complex symmetric indefinite system of linear equations AX = B (A is held
CSPSVX in packed storage) and provides an estimate of the condition number and error bounds on
the solution.
SSTEV Compute all eigenvalues and eigenvectors of a real symmetric tridiagonal matrix
SSTEVX Compute selected eigenvalues and eigenvectors of a real symmetric tridiagonal matrix
SSYEV Compute all eigenvalues and eigenvectors of a symmetric or Hermitian matrix
CHEEV
SSYEVX Compute selected eigenvalues and eigenvectors of a symmetric or Hermitian matrix
CHEEVX
SSYGV Compute all eigenvalues and eigenvectors of a generalized symmetric-definite or
CHEGV Hermitian-definite eigenproblem
SSYSV Solves a real or complex symmetric indefinite system of linear equations AX = B.
CSYSV

12 004– 2081– 002


INTRO_LAPACK ( 3S ) INTRO_LAPACK ( 3S )

Name Purpose

SSYSVX Solves a real or complex symmetric indefinite system of linear equations AX = B and
CSYSVX provides an estimate of the condition number and error bounds on the solution.

Computational Routines
These computational routines are listed in alphabetical order, with real matrix routines and complex matrix
routines grouped together as appropriate.

Name Purpose

CHECON Estimates the reciprocal of the condition number of a complex Hermitian indefinite matrix,
using the factorization computed by CHETRF.
CHERFS Improves the computed solution to a complex Hermitian indefinite system of linear
equations AX = B and provides error bounds for the solution.
CHETRF Computes the factorization of a complex Hermitian indefinite matrix, using the diagonal
pivoting method.
CHETRI Computes the inverse of a complex Hermitian indefinite matrix, using the factorization
computed by CHETRF.
CHETRS Solves a complex Hermitian indefinite system of linear equations AX = B, using the
factorization computed by CHETRF.
CHPCON Estimates the reciprocal of the condition number of a complex Hermitian indefinite matrix
in packed storage, using the factorization computed by CHPTRF.
CHPRFS Improves the computed solution to a complex Hermitian indefinite system of linear
equations AX = B (A is held in packed storage) and provides error bounds for the solution.
CHPTRF Computes the factorization of a complex Hermitian indefinite matrix in packed storage,
using the diagonal pivoting method.
CHPTRI Computes the inverse of a complex Hermitian indefinite matrix in packed storage, using the
factorization computed by CHPTRF.
CHPTRS Solves a complex Hermitian indefinite system of linear equations AX = B (A is held in
packed storage) using the factorization computed by CHPTRF.
ILAENV Determines tuning parameters (such as the block size).
SBDSQR Compute the singular value decomposition of a general matrix reduced to bidiagonal form
CBDSQR
SGBCON Estimates the reciprocal of the condition number of a general band matrix, in either the 1-
CGBCON norm or the infinity-norm, using the LU factorization computed by SGBTRF or CGBTRF.

004– 2081– 002 13


INTRO_LAPACK ( 3S ) INTRO_LAPACK ( 3S )

Name Purpose

SGBEQU Computes row and column scalings to equilibrate a general band matrix and reduce its
CGBEQU condition number. Does not multiprocess or call any multiprocessing routines.
SGBRFS Improves the computed solution to any of the following general banded systems of linear
CGBRFS equations and provides error bounds for the solution.
AX = B
T
A X=B
H
A X=B
SGBTRF Computes an LU factorization of a general band matrix, using partial pivoting with row
CGBTRF interchanges.
SGBTRS Solves any of the following general banded systems of linear equations using the LU
CGBTRS factorization computed by SGBTRF or CGBTRF.
AX = B
T
A X=B
H
A X=B
SGEBAK Back transform the eigenvectors of a matrix transformed by SGEBAL/CGEBAL.
CGEBAK
SGEBAL Balances a general matrix A.
CGEBAL
SGEBRD Reduces a general matrix to upper or lower bidiagonal form by an orthogonal/unitary
CGEBRD transformation.
SGECON Estimates the reciprocal of the condition number of a general matrix, in either the 1-norm or
CGECON the infinity-norm, using the LU factorization computed by SGETRF or CGETRF.
SGEEQU Computes row and column scalings to equilibrate a general rectangular matrix and to reduce
CGEEQU its condition number.
SGEHRD Reduces a general matrix to upper Hessenberg form by an orthogonal/unitary transformation.
CGEHRD
SGELQF Computes an LQ factorization of a general rectangular matrix.
CGELQF
SGEQLF Computes a QL factorization of a general rectangular matrix.
CGEQLF
SGEQPF Computes a QR factorization with column pivoting of a general rectangular matrix.
CGEQPF

14 004– 2081– 002


INTRO_LAPACK ( 3S ) INTRO_LAPACK ( 3S )

Name Purpose

SGEQRF Computes a QR factorization of a general rectangular matrix.


CGEQRF
SGERFS Improves the computed solution to any of the following general systems of linear equations
CGERFS and provides error bounds for the solution.
AX = B
T
A X=B
H
A X=B
SGERQF Computes an RQ factorization of a general rectangular matrix.
CGERQF
SGETRF Computes an LU factorization of a general matrix, using partial pivoting with row
CGETRF interchanges.
SGETRI Computes the inverse of a general matrix, using the LU factorization computed by SGETRF
CGETRI or CGETRF.
SGETRS Solves any of the following general systems of linear equations using the LU factorization
CGETRS computed by SGETRF or CGETRF.
AX = B
T
A X=B
H
A X=B
SGGBAK Back transform the eigenvectors of a generalized eigenvalue problem transformed by
CGGBAK SGGBAL
SGGBAL Balance a pair of general matrices (A,B)
CGGBAL
SGGHRD Reduce a pair of matrices (A,B) to generalized upper Hessenberg form
CGGHRD
SGTCON Estimates the reciprocal of the condition number of a general tridiagonal matrix, in either
CGTCON the 1-norm or the infinity-norm, using the LU factorization computed by SGTTRF or
CGTTRF.
SGTRFS Improves the computed solution to any of the following general tridiagonal systems of linear
CGTRFS equations and provides error bounds for the solution.
AX = B
T
A X=B
H
A X=B

004– 2081– 002 15


INTRO_LAPACK ( 3S ) INTRO_LAPACK ( 3S )

Name Purpose

SGTTRF Computes an LU factorization of a general tridiagonal matrix, using partial pivoting with
CGTTRF row interchanges.
SGTTRS Solves a general tridiagonal system of linear equations using the LU factorization computed
CGTTRS by SGTTRF or CGTTRF. AX = B
T
A X=B
H
A X=B
SHGEQZ Compute the eigenvalues of a matrix pair (A,B) in generalized upper Hessenberg form using
CHGEQZ the QZ method
SHSEIN Compute eigenvectors of a upper Hessenberg matrix by inverse iteration
CHSEIN
SHSEQR Compute eigenvalues, Schur form, and Schur vectors of a upper Hessenberg matrix
CHSEQR
SLAMCH Computes machine-specific constants.
SLARF Applies an elementary reflector.
CLARF
SLARFB Applies a block reflector.
CLARFB
SLARFG Generates an elementary reflector.
CLARFG
SLARFT Forms the triangular factor of a block reflector.
CLARFT
SLARGV Generate a vector of real or complex plane rotations
CLARGV
SLARNV Generates a vector of random numbers.
CLARNV
SLARTG Generates a plane rotation.
CLARTG
SLARTV Apply a vector of real or complex plane rotations to two vectors
CLARTV
SLASR Apply a sequence of real plane rotations to a matrix
CLASR
SOPGTR Generates the orthogonal/unitary matrix Q from SSPTRD/CHPTRD.
CUPGTR

16 004– 2081– 002


INTRO_LAPACK ( 3S ) INTRO_LAPACK ( 3S )

Name Purpose

SOPMTR Multiplies by the orthogonal/unitary matrix Q from SSPTRD/CHPTRD.


CUPMTR
SORGBR Generates one of the orghogonal/unitary matrices Q or P H from SGEBRD/CGEBRD.
CUNGBR
SORGHR Generates the orthogonal/unitary matrix Q from SGEHRD/CGEHRD.
CUNGHR
SORGLQ Generates all or part of the orthogonal or unitary matrix Q from an LQ factorization
CUNGLQ determined by SGELQF or CGELQF.
SORGQL Generates all or part of the orthogonal or unitary matrix Q from a QL factorization
CUNGQL determined by SGEQLF or CGEQLF.
SORGQR Generates all or part of the orthogonal or unitary matrix Q from a QR factorization
CUNGQR determined by SGEQRF or CGEQRF.
SORGRQ Generates all or part of the orthogonal or unitary matrix Q from an RQ factorization
CUNGRQ determined by SGERQF or CGERQF.
SORGTR Generates the orthogonal/unitary matrix Q from SSYTRD/CHETRD.
CUNGTR
SORMBR Multiplies by one of the orthogonal/unitary matrices Q or P from SGEBRD/CGEBRD.
CUNMBR
SORMHR Multiplies by the orthogonal/unitary matrix Q from SGEHRD/CGEHRD.
CUNMHR
SORMLQ Multiplies a general matrix by the orthogonal or unitary matrix from an LQ factorization
CUNMLQ determined by SGELQF or CGELQF.
SORMQL Multiplies a general matrix by the orthogonal or unitary matrix from a QL factorization
CUNMQL determined by SGEQLF or CGEQLF.
SORMQR Multiplies a general matrix by the orthogonal or unitary matrix from a QR factorization
CUNMQR determined by SGEQRF or CGEQRF.
SORMRQ Multiplies a general matrix by the orthogonal or unitary matrix from an RQ factorization
CUNMRQ determined by SGERQF or CGERQF.
SORMTR Multiplies by the orthogonal/unitary matrix Q from SSYTRD/CHETRD.
CUNMTR
SPBCON Estimates the reciprocal of the condition number of a symmetric or Hermitian positive
CPBCON definite band matrix, using the Cholesky factorization computed by SPBTRF or CPBTRF.
SPBEQU Computes row and column scalings to equilibrate a symmetric or Hermitian positive definite
CPBEQU band matrix and to reduce its condition number.

004– 2081– 002 17


INTRO_LAPACK ( 3S ) INTRO_LAPACK ( 3S )

Name Purpose

SPBRFS Improves the computed solution to a symmetric or Hermitian positive definite banded
CPBRFS system of linear equations AX = B and provides error bounds for the solution.
SPBSTF Compute a split Cholesky factorization of a symmetric or Hermitian positive definite band
CPBSTF matrix.
SPBTRF Computes the Cholesky factorization of a symmetric or Hermitian positive definite band
CPBTRF matrix.
SPBTRS Solves a symmetric or Hermitian positive definite banded system of linear equations AX =
CPBTRS B, using the Cholesky factorization computed by SPBTRF or CPBTRF.
SPOCON Estimates the reciprocal of the condition number of a symmetric or Hermitian positive
CPOCON definite matrix, using the Cholesky factorization computed by SPOTRF or CPOTRF.
SPOEQU Computes row and column scalings to equilibrate a symmetric or Hermitian positive definite
CPOEQU matrix and reduces its condition number.
SPORFS Improves the computed solution to a symmetric or Hermitian positive definite system of
CPORFS linear equations AX = B and provides error bounds for the solution.
SPOTRF Computes the Cholesky factorization of a symmetric or Hermitian positive definite matrix.
CPOTRF
SPOTRI Computes the inverse of a symmetric or Hermitian positive definite matrix, using the
CPOTRI Cholesky factorization computed by SPOTRF or CPOTRF.
SPOTRS Solves a symmetric or Hermitian positive definite system of linear equations AX = B, using
CPOTRS the Cholesky factorization computed by SPOTRF or CPOTRF.
SPPCON Estimates the reciprocal of the condition number of a symmetric or Hermitian positive
CPPCON definite matrix in packed storage, using the Cholesky factorization computed by SPPTRF or
CPPTRF.
SPPEQU Computes row and column scalings to equilibrate a symmetric or Hermitian positive definite
CPPEQU matrix in packed storage and reduces its condition number.
SPPRFS Improves the computed solution to a symmetric or Hermitian positive definite system of
CPPRFS linear equations AX = B (A is held in packed storage) and provides error bounds for the
solution.
SPPTRF Computes the Cholesky factorization of a symmetric or Hermitian positive definite matrix in
CPPTRF packed storage.
SPPTRI Computes the inverse of a symmetric or Hermitian positive definite matrix in packed
CPPTRI storage, using the Cholesky factorization computed by SPPTRF or CPPTRF.
SPPTRS Solves a symmetric or Hermitian positive definite system of linear equations AX = B (A is
CPPTRS held in packed storage) using the Cholesky factorization computed by SPPTRF or CPPTRF.

18 004– 2081– 002


INTRO_LAPACK ( 3S ) INTRO_LAPACK ( 3S )

Name Purpose
H
SPTCON Uses the LDL factorization computed by SPTTRF or CPTTRF to compute the reciprocal
CPTCON of the condition number of a symmetric or Hermitian positive definite tridiagonal matrix.
SPTEQR Compute eigenvalues and eigenvectors of a symmetric or Hermitian positive definite
CPTEQR tridiagonal matrix.
SPTRFS Improves the computed solution to a symmetric or Hermitian positive definite tridiagonal
CPTRFS system of linear equations AX = B and provides error bounds for the solution.
SPTTRF Computes the LDL H factorization of a symmetric or Hermitian positive definite tridiagonal
CPTTRF matrix.
H
SPTTRS Uses the LDL factorization computed by SPTTRF or CPTTRF to solve a symmetric or
CPTTRS Hermitian positive definite tridiagonal system of linear equations.
SSBGST Reduce a symmetric or Hermitian definite banded generalized eigenproblem to standard
CHBGST form.
SSBTRD Reduce a symmetric or Hermitian band matrix to real symmetric tridiagonal form by an
CHBTRD orthogonal/unitary transformation.
SSPCON Estimates the reciprocal of the condition number of a real or complex symmetric indefinite
CSPCON matrix in packed storage, using the factorization computed by SSPTRF or CSPTRF.
SSPGST Reduce a symmetric or Hermitian definite generalized eigenproblem to standard form, using
CHPGST packed storage.
SSPRFS Improves the computed solution to a real or complex symmetric indefinite system of linear
CSPRFS equations AX = B (A is held in packed storage) and provides error bounds for the solution.
SSPTRD Reduces a symmetric/Hermitian packed matrix A to real symmetric tridiagonal form by an
CHPTRD orthogonal/unitary transformation.
SSPTRF Computes the factorization of a real or complex symmetric indefinite matrix in packed
CSPTRF storage, using the diagonal pivoting method.
SSPTRI Computes the inverse of a real or complex symmetric indefinite matrix in packed storage,
CSPTRI using the factorization computed by SSPTRF or CSPTRF.
SSPTRS Solves a real or complex symmetric indefinite system of linear equations AX = B (A is held
CSPTRS in packed storage) using the factorization computed by SSPTRF or CSPTRF.
SSTEBZ Compute eigenvalues of a symmetric tridiagonal matrix by bisection.
SSTEIN Compute eigenvectors of a real symmetric tridiagonal matrix by inverse iteration.
CSTEIN
SSTEQR Compute eigenvalues and eigenvectors of a real symmetric tridiagonal matrix using the
CSTEQR implicit QL or QR method.

004– 2081– 002 19


INTRO_LAPACK ( 3S ) INTRO_LAPACK ( 3S )

Name Purpose

SSTERF Compute all eigenvalues of a symmetric tridiagonal matrix using the root-free variant of the
QL or QR algorithm.
SSYCON Estimates the reciprocal of the condition number of a real or complex symmetric indefinite
CSYCON matrix, using the factorization computed by SSYTRF or CSYTRF.
SSYGST Reduce a symmetric or Hermitian definite generalized eigenproblem to standard form.
CHEGST
SSYRFS Improves the computed solution to a real or complex symmetric indefinite system of linear
CSYRFS equations AX = B and provides error bounds for the solution.
SSYTRD Reduces a symmetric/Hermitian matrix A to real symmetric tridiagonal form by an
CHETRD orthogonal/unitary transformation.
SSYTRF Computes the factorization of a real complex symmetric indefinite matrix, using the
CSYTRF diagonal pivoting method.
SSYTRI Computes the inverse of a real or complex symmetric indefinite matrix, using the
CSYTRI factorization computed by SSYTRF or CSYTRF.
SSYTRS Solves a real or complex symmetric indefinite system of linear equations AX = B, using the
CSYTRS factorization computed by SSYTRF or CSYTRF.
STBCON Estimates the reciprocal of the condition number of a triangular band matrix, in either the
CTBCON 1-norm or the infinity-norm.
STBRFS Provides error bounds for the solution of any of the following triangular banded systems of
CTBRFS linear equations:
AX = B
T
A X=B
H
A X=B
STBTRS Solves any of the following triangular banded systems of linear equations:
CTBTRS AX = B
T
A X=B
H
A X=B
STGEVC Compute eigenvectors of a pair of matrices (A,B) in generalized Schur form.
CTGEVC
STPCON Estimates the reciprocal of the condition number of a triangular matrix in packed storage, in
CTPCON either the 1-norm or the infinity-norm.

20 004– 2081– 002


INTRO_LAPACK ( 3S ) INTRO_LAPACK ( 3S )

Name Purpose

STPRFS Provides error bounds for the solution of any of the following triangular systems of linear
CTPRFS equations where A is held in packed storage.
AX = B
T
A X=B
H
A X=B
STPTRI Computes the inverse of a triangular matrix in packed storage.
CTPTRI
STPTRS Solves any of the following triangular systems of linear equations where A is held in packed
CTPTRS storage.
AX = B
T
A X=B
H
A X=B
STRCON Estimates the reciprocal of the condition number of a triangular matrix, in either the 1-norm
CTRCON or the infinity-norm.
STREVC Compute eigenvectors of a real upper quasi-triangular matrix.
CTREVC Compute eigenvectors of a complex triangular matrix.
STREXC Exchange diagonal blocks in the real Schur factorization of a real matrix.
CTREXC Exchange diagonal elements in the Schur factorization of a complex matrix.
STRRFS Provides error bounds for the solution of any of the following triangular systems of linear
CTRRFS equations:
AX = B
T
A X=B
H
A X=B
STRSEN Compute condition numbers to measure the sensitivity of a cluster of eigenvalues and its
CTRSEN corresponding invariant subspace.
STRSNA Compute condition numbers for specified eigenvalues and eigenvectors of a real upper
quasi-triangular matrix.
CTRSNA Compute condition numbers for specified eigenvalues and eigenvectors of a complex upper
triangular matrix.
STRSYL Solve the Sylvester matrix equation
CTRSYL

004– 2081– 002 21


INTRO_LAPACK ( 3S ) INTRO_LAPACK ( 3S )

Name Purpose

STRTRI Computes the inverse of a triangular matrix.


CTRTRI
STRTRS Solves any of the following triangular systems of linear equations:
CTRTRS AX = B
T
A X=B
H
A X=B
STRZRQF Reduces an upper trapezoidal matrix to upper triangular form by an orthogonal/unitary
CTZRQF transformation.

SEE ALSO
LINPACK(3S) which lists the names of the LINPACK routines that are superseded by the linear system
solvers in LAPACK
LAPACK User’s Guide, CRI publication TPD– 0003

22 004– 2081– 002


EISPACK ( 3S ) EISPACK ( 3S )

NAME
EISPACK – Introduction to Eigensystem computation for dense linear systems

IMPLEMENTATION
UNICOS systems (except Cray T90 systems that support IEEE arithmetic)

DESCRIPTION
EISPACK is a package of Fortran routines for solving the eigenvalue problem and for computing and using
the singular-value decomposition.
The original Fortran versions are described in the Matrix Eigensystem Routines – EISPACK Guide, second
edition, by B. T. Smith, J. M. Boyle, J. J. Dongarra, B. S. Garbow, Y. Ikebe, V. C. Klema, and C. B. Moler,
published by Springer-Verlag, New York, 1976, Library of Congress catalog card number 76– 2662. The
original Fortran versions also are documented in the Matrix Eigensystem Routines - EISPACK Guide
Extensions (Lecture Notes in Computer Science, Vol. 51) by B. S. Garbow, J. M. Boyle, J. J. Dongarra, and
C. B. Moler, published by Springer-Verlag, New York, 1977, Library of Congress catalog card number
77– 2802.
Most EISPACK routines are superseded by routines from the more recent public domain package, LAPACK,
described in the LAPACK User’s Guide (see INTRO_LAPACK(3S) for a complete reference). Of particular
interest to EISPACK users who want to switch to LAPACK is Appendix D, "Converting from LINPACK
and EISPACK," of the LAPACK User’s Guide. This appendix contains a table that shows the name of the
LAPACK routines that are functionally equivalent to each EISPACK routine.
Each Scientific Library version of the EISPACK routines has the same name, algorithm, and calling
sequence as the original version. Optimization of each routine includes the following:
• Use of the Level 1 BLAS routines when applicable, and use of the Level 2 and 3 BLAS in TRED1,
TRED2, TRBAK, and REDUC.
• Removal of Fortran IF statements when the result of either branch is the same.
• Unrolling complicated Fortran DO loops to improve vectorization.
• Use of Fortran compiler directives to aid vector optimization.
These modifications increase vectorization and use optimized library routines; therefore, they reduce
execution time. Only the order of computations within a loop is changed; the modified versions produce the
same answers as the original versions, unless the problem is sensitive to small changes in the data.

004– 2081– 002 23


EISPACK ( 3S ) EISPACK ( 3S )

The following table lists the routines, name, matrix or decomposition, and purpose for each routine.

Purpose Matrix or Decomposition Name

Forms eigenvectors by back transforming corresponding Real nonsymmetric tridiagonal BAKVEC


matrix determined by FIGI
Balances matrix and isolates eigenvalues when possible Real general BALANC
Forms eigenvectors by back transforming those of the Real general BALBAK
corresponding matrices determined by BALANC
Finds the eigenvalues that lie in a specified interval by using Real symmetric tridiagonal BISECT
bisection
Forms eigenvectors by back transforming those of the Complex general CBABK2
corresponding matrices determined by CBAL
Balances matrix and isolates eigenvalues when possible Complex general CBAL
Reduces to a symmetric tridiagonal matrix Real symmetric banded BANDR
Finds those eigenvectors that correspond to ordered list of Real symmetric banded BANDV
eigenvalues by using inverse iteration
Finds some eigenvalues by using QR algorithm with shifts Real symmetric banded BQR
of origin
Finds eigenvalues and eigenvectors Complex general CG
Finds eigenvalues and eigenvectors Complex Hermitian CH
Finds eigenvectors that correspond to specified eigenvalues Complex upper Hessenberg CINVIT
by using inverse iteration
Forms eigenvectors by back transforming those of the Complex general COMBAK
corresponding matrices determined by COMHES
Reduces matrix to upper Hessenberg form by using Complex general COMHES
elementary similarity transformations
Finds eigenvalues by using modified LR method Complex upper Hessenberg COMLR
Finds eigenvalues and eigenvectors, by using modified LR Complex upper Hessenberg COMLR2
method
Finds eigenvalues by QR method Complex upper Hessenberg COMQR
Finds eigenvalues and eigenvectors by QR method Complex upper Hessenberg COMQR2
Forms eigenvectors by back transforming those of the Complex general CORTB
corresponding matrices determined by CORTH

24 004– 2081– 002


EISPACK ( 3S ) EISPACK ( 3S )

Purpose Matrix or Decomposition Name

Reduces matrix to upper Hessenberg form by using unitary Complex general CORTH
similarity transformations
Forms eigenvectors by back transforming those of the Real general ELMBAK
corresponding matrices determined by ELMHES
Reduces matrix to upper Hessenberg form by using Real general ELMHES
elementary similarity transformations
Accumulates transformations used in the reduction to upper Real general ELTRAN
Hessenberg form done by ELMHES
Reduces to symmetric tridiagonal matrix that has the same Real nonsymmetric tridiagonal FIGI
eigenvalues
Reduces to symmetric tridiagonal matrix that has the same Real nonsymmetric tridiagonal FIGI2
eigenvalues, retaining the diagonal similarity transformations
Finds eigenvalues by QR method Real upper Hessenberg HQR
Finds eigenvalues and eigenvectors by QR method Real upper Hessenberg HQR2
Finds eigenvectors given the eigenvectors of the real Complex Hermitian HTRIBK
symmetric tridiagonal matrix calculated by HTRIDI
(including eigenvectors calculated by TQL2 or IMTQL2)
Finds eigenvectors given the eigenvectors of the real Complex Hermitian (packed) HTRIB3
symmetric tridiagonal matrix calculated by HTRID3
(eigenvectors calculated by TQL2 or IMTQL2, among
others)
Reduces to real symmetric tridiagonal form by using unitary Complex Hermitian HTRIDI
similarity transformations
Reduces to real symmetric tridiagonal form by using unitary Complex Hermitian (packed) HTRID3
similarity transformations
Finds eigenvalues by using implicit QL method, and Real symmetric tridiagonal IMTQLV
associates them with their corresponding submatrix indices
Finds eigenvalues by implicit QL method Real symmetric tridiagonal IMTQL1
Finds eigenvalues and eigenvectors by implicit QL method Real symmetric tridiagonal IMTQL2
Finds eigenvectors that correspond to specified eigenvalues Real upper Hessenberg INVIT
by using inverse iteration
Determines the singular-value decomposition A = USV T , Real rectangular MINFIT
forming U T B rather than U by using Householder
bidiagonalization and a variant of the QR algorithm

004– 2081– 002 25


EISPACK ( 3S ) EISPACK ( 3S )

Purpose Matrix or Decomposition Name

Forms eigenvectors by back transforming those of the Real general ORTBAK


corresponding matrices determined by ORTHES
Reduces matrix to upper Hessenberg form by using Real general ORTHES
orthogonal similarity transformations
Accumulates transformations used in the reduction to upper Real general ORTRAN
Hessenberg form done by ORTHES
Reduces matrices A and B in the generalized eigenproblem Real general QZHES
(Ax = λBx ) so that A is in upper Hessenberg form and B is
in upper triangular form by using orthogonal transformations
Further reduces matrices A and B as calculated by QZHES Real general QZIT
for the generalized eigenproblem (Ax = λBx ), so that A is in
quasi-upper triangular form and B is still upper triangular
Produces three arrays that can be used to calculate the Real general QZVAL
eigenvalues for the generalized eigenproblem (Ax = λBx ),
with A and B as calculated by QZIT
Finds the eigenvectors that correspond to a list of Real general QZVEC
eigenvalues for the generalized eigenproblem (Ax = λBx ),
with A and B as calculated by QZIT
Finds the smallest or largest eigenvalues by rational QR Real symmetric tridiagonal RATQR
method with Newton corrections
Forms generalized eigenvectors by back transforming those Real general REBAK
of the corresponding matrices determined by REDUC or
REDUC2
Forms eigenvectors by back transforming those of the Real general REBAKB
corresponding matrices determined by REDUC2
Reduces the generalized eigenproblem (Ax = λBx ) to a Real symmetric REDUC
standard symmetric eigenproblem by using the Cholesky
factorization of B
Reduces either of the generalized eigenproblems Real symmetric REDUC2
(ABx = λBx or BAx = λBx ) to a standard symmetric
eigenproblem by using the Cholesky factorization of B
Finds eigenvalues and eigenvectors Real general RG
Finds generalized eigenvalues and eigenvectors Real general RGG
(Ax = λBx )

26 004– 2081– 002


EISPACK ( 3S ) EISPACK ( 3S )

Purpose Matrix or Decomposition Name

Finds eigenvalues and eigenvectors Real symmetric RS


Finds eigenvalues and eigenvectors Real symmetric banded RSB
Finds generalized eigenvalues and eigenvectors Real symmetric RSG
(Ax = λBx )
Finds generalized eigenvalues and eigenvectors Real symmetric RSGAB
(ABx = λx )
Finds generalized eigenvalues and eigenvectors Real symmetric RSGBA
(BAx = λx )
Finds eigenvalues and eigenvectors Real symmetric RSM
Finds eigenvalues and eigenvectors Real symmetric packed RSP
Finds eigenvalues and eigenvectors Real symmetric tridiagonal RST
Finds eigenvalues and eigenvectors Special real tridiagonal RT
Determines the singular-value decomposition A = USV by T
Real rectangular SVD
using Householder bidiagonalization and a variant of the QR
algorithm
Finds the eigenvectors from a set of ordered eigenvalues by Real symmetric tridiagonal TINVIT
using inverse iteration
Finds the eigenvalues by rational QL method Real symmetric tridiagonal TQLRAT
Finds the eigenvalues and/or eigenvectors by the rational QL Real symmetric tridiagonal TQL1
or QL method
Finds the eigenvalues and/or eigenvectors by the rational QL Real symmetric tridiagonal TQL2
or QL method
Forms eigenvectors by back transforming those of the Real symmetric TRBAK
corresponding matrices determined by TRED1
Forms eigenvectors by back transforming those of the Real symmetric (packed) TRBAK3
corresponding matrices determined by TRED3
Reduces to symmetric tridiagonal matrix by using orthogonal Real symmetric TRED1
similarity transformations
Reduces to symmetric tridiagonal matrix by using and Real symmetric TRED2
accumulating orthogonal similarity transformations
Reduces to symmetric tridiagonal matrix by using orthogonal Real symmetric (packed) TRED3
similarity transformations

004– 2081– 002 27


EISPACK ( 3S ) EISPACK ( 3S )

Purpose Matrix or Decomposition Name

Finds the eigenvalues that lie between specified indices by Real symmetric tridiagonal TRIDIB
using bisection
Finds the eigenvalues that lie in a specified interval and each Real symmetric tridiagonal TSTURM
corresponding eigenvector by using bisection and inverse
iteration

SEE ALSO
LAPACK User’s Guide, CRI publication TPD– 0003

28 004– 2081– 002


LINPACK ( 3S ) LINPACK ( 3S )

NAME
LINPACK – Single-precision real and complex LINPACK routines

IMPLEMENTATION
UNICOS systems (except Cray T90 systems that support IEEE arithmetic)

DESCRIPTION
LINPACK is a public domain package of Fortran routines that solves systems of linear equations and
computes the QR, Cholesky, and singular value decompositions. The original Fortran programs are
described in the LINPACK User’s Guide by J. J. Dongarra, C. B. Moler, J. R. Bunch, and G. W. Stewart,
published by the Society for Industrial and Applied Mathematics (SIAM), Philadelphia, 1979, Library of
Congress catalog card number 78– 78206.
Most LINPACK routines are superseded by routines from the more recent public domain package, LAPACK,
described in the LAPACK User’s Guide (see INTRO_LAPACK(3S) for a complete reference). Of particular
interest to LINPACK users who want to switch to LAPACK is Appendix D, "Converting from LINPACK
and EISPACK," of the LAPACK User’s Guide. This appendix contains a table that shows the name of the
LAPACK routines that are functionally equivalent to each LINPACK routine.
Each single-precision Scientific Library version of the LINPACK routines has the same name, algorithm, and
calling sequence as the original version. Optimization of each routine includes the following:
• Replacement of calls to the BLAS routines SSCAL, SCOPY, SSWAP, SAXPY, and SROT with inline
Fortran code vectorized by the Cray Research Fortran compilers. (SROTG is still called by LINPACK.)
• Removal of Fortran IF statements in which the result of either branch is the same.
• Replacement of SDOT to solve triangular systems of linear equations in SPOSL, STRSL, and SCHDD
with more vectorizable code.
These optimizations affect only the execution order of floating-point operations in DO loops. See the
LINPACK User’s Guide for further descriptions. The complex routines have been added without much
optimization.
As mentioned previously, LAPACK does not completely supersede LINPACK. In the following table, an
asterick (*) marks LINPACK routines that are not superseded in public domain LAPACK. This table lists
the name, matrix or decomposition, and purpose for each routine.

Name Matrix or Decomposition Purpose

SGECO Real general Factors and estimates condition


SGEFA Factors
SGESL Solves
SGEDI Computes determinant and inverse

004– 2081– 002 29


LINPACK ( 3S ) LINPACK ( 3S )

Name Matrix or Decomposition Purpose

CGECO Complex general Factors and estimates condition


CGEFA Factors
CGESL Solves
CGEDI Computes determinant and inverse
SGBCO Real general banded Factors and estimates condition
SGBFA Factors
SGBSL Solves
SGBDI Computes determinant
CGBCO Complex general banded Factors and estimates condition
CGBFA Factors
CGBSL Solves
CGBDI Computes determinant
SPOCO Real positive definite Factors and estimates condition
SPOFA Factors
SPOSL Solves
SPODI Computes determinant and inverse
CPOCO Complex positive definite Factors and estimates condition
CPOFA Factors
CPOSL Solves
CPODI Computes determinant and inverse
SPPCO Real positive definite packed Factors and estimates condition
SPPFA Factors
SPPSL Solves
SPPDI Computes determinant and inverse
CPPCO Complex positive definite packed Factors and estimates condition
CPPFA Factors
CPPSL Solves
CPPDI Computes determinant and inverse
SPBCO Real positive definite banded Factors and estimates condition
SPBFA Factors
SPBSL Solves
SPBDI Computes determinant
CPBCO Complex positive definite banded Factors and estimates condition
CPBFA Factors
CPBSL Solves
CPBDI Computes determinant

30 004– 2081– 002


LINPACK ( 3S ) LINPACK ( 3S )

Name Matrix or Decomposition Purpose

SSICO Real symmetric indefinite Factors and estimates condition


SSIFA Factors
SSISL Solves
SSIDI Computes inertia, determinant, and inverse
CSICO Complex symmetric Factors and estimates condition
CSIFA Factors
CSISL Solves
CSIDI Computes determinant and inverse
CHICO Complex Hermitian indefinite Factors and estimates condition
CHIFA Factors
CHISL Solves
CHIDI Computes inertia, determinant, and inverse
SSPCO Real symmetric indefinite packed Factors and estimates condition
SSPFA Factors
SSPSL Solves
SSPDI Computes inertia, determinant, and inverse
CSPCO Complex symmetric indefinite packed Factors and estimates condition
CSPFA Factors
CSPSL Solves
CSPDI Computes inertia, determinant, and inverse
CHPCO Complex Hermitian indefinite packed Factors and estimates condition
CHPFA Factors
CHPSL Solves
CHPDI Computes inertia, determinant, and inverse
STRCO Real triangular Factors and estimates condition
STRSL Solves
STRDI Computes determinant and inverse
CTRCO Complex triangular Factors and estimates condition
CTRSL Solves
CTRDI Computes determinant and inverse
SGTSL Real tridiagonal Solves
CGTSL Complex tridiagonal Solves
SPTSL Real positive definite Solves tridiagonal
CPTSL Complex positive Solves definite tridiagonal

004– 2081– 002 31


LINPACK ( 3S ) LINPACK ( 3S )

Name Matrix or Decomposition Purpose

SCHDC * Real Cholesky decomposition Decomposes


SCHDD * Downdates
SCHUD * Updates
SCHEX * Exchanges
CCHDC * Complex Cholesky decomposition Decomposes
CCHDD * Downdates
CCHUD * Updates
CCHEX * Exchanges
SQRDC Real Performs orthogonal factorization
SQRSL Solves
CQRDC Complex Performs orthogonal factorization
CQRSL Solves
SSVDC Real Performs singular value decomposition
CSVDC Complex Performs singular value decomposition

SEE ALSO
INTRO_LAPACK(3S) for information and references about the LAPACK routines that supersede LINPACK
LAPACK User’s Guide, CRI publication TPD– 0003
Dongarra, J. J., C. B. Moler, J. R. Bunch, and G. W. Stewart, LINPACK User’s Guide. Society for
Industrial and Applied Mathematics (SIAM), Philadelphia, 1979.

32 004– 2081– 002


INTRO_BLAS1 ( 3S ) INTRO_BLAS1 ( 3S )

NAME
INTRO_BLAS1 – Introduction to vector-vector linear algebra subprograms

IMPLEMENTATION
See individual man pages for implementation details

DESCRIPTION
The linear algebra subprograms are written to run optimally on UNICOS and UNICOS/mk systems. These
subprograms use call-by-address convention when called by a Fortran, C, or CAL program.
Level 1 Basic Linear Algebra Subprograms
The Level 1 BLAS perform basic vector-vector operations. Only the single-precision real and complex data
types are supported from the standard set of Level 1 BLAS. In addition, several half-precision subroutines
are provided as extensions to the BLAS on UNICOS/mk systems, using the following naming conventions:
H half-precision (32-bit) REAL
G half-precision (32-bit) COMPLEX
The following three types of vector-vector operations are available:
• Dot products and various vector norms
• Scaling, copying, swapping, and computing linear combination of vectors
• Generate or apply plane or modified plane rotations
Increment arguments
A vector’s description consists of the name of the array (x or y) followed by the storage spacing (increment)
in the array of vector elements (incx or incy). The increment can be positive or negative. When a vector x
consists of n elements, the corresponding actual array arguments must be of a length at least
1+(n – 1) .  incx  . For a negative increment, the first element of x is assumed to be x (1+(n – 1) .  incx  ).
The standard specification of _SCAL, _NRM2, _ASUM, and I_AMAX does not define their behavior for
negative increments, so this functionality is an extension to the standard BLAS.
Setting an increment argument to 0 can cause unpredictable results.
Fortran type declaration for functions
Always declare the data type of external functions. Declaring the data type of the complex Level 1 BLAS
functions is particularily important because, based on the first letter of their names and the Fortran data
typing rules, the default implied data type would be REAL.
Fortran type declarations for function names follow:
Type Function Name
REAL SASUM, SCASUM, SCNRM2, SDOT, SNRM2, SPDOT, SSUM
COMPLEX CDOTC, CDOTU, CSUM

004– 2081– 002 33


INTRO_BLAS1 ( 3S ) INTRO_BLAS1 ( 3S )

When using half-precision routines, the following types can only be declared in Fortran 90:
Type Function Name
REAL(KIND=4) HDOT
COMPLEX(KIND=4) GTOC,GDOTU
Level 1 BLAS search functions
Several search functions are properly a part of Level 1 BLAS, but they are not described in this section of
the manual. See the INTRO_SORTSEARCH(3F) man page for details. These functions are as follows
(functions marked with an asterisk [*] are extensions to the standard set of Level 1 BLAS routines):
ISA MAX, ICA MAX, ISA MIN*, ISMAX* , ISM IN*

These man pages are documented in the Application Programmer’s Library Reference Manual.
Table of Level 1 BLAS routines
The following table contains the purpose, operation, and name of each Level 1 BLAS routine (except search
functions). The first routine name listed in each table block is the name of the manual page that contains
documentation for any routines listed in that block. The routines marked with an asterisk (*) are extensions
to the standard set of Level 1 BLAS routines. For complete details about each operation, see the individual
man pages.

Purpose Operation Name

Adds a scalar multiple of a real or complex vector to y← α E x + y HAXPY


another real or complex vector (32-bit version) GAXPY
n
Σ xi yi
Computes a dot product (inner product) of two real or hdot ← x T y = HDOT
complex vectors (32-bit version) i =1
n
Σ xi yi
gdotc ← x H y = GDOTC
i =1
n
Σ xi yi
gdotu ← x T y = GDOTU
i =1
n
Σ
Sums the absolute values of the elements of a real sasum ← ||x ||1 =  xi  SASUM
vector (also called the l 1 norm) i =1

Sums the absolute values of the real and imaginary parts scasum ← ||Real (x )||1 + ||Imag (x )||1 SCASUM
n n
Σ  Real (xi )  + iΣ=1  Imag (xi ) 
of the elements of a complex vector =
i =1

Adds a scalar multiple of a real or complex vector to a y ← αx + βy SAXPBY,


scalar multiple of another vector CAXPBY*
Adds a scalar multiple of a real or complex vector to y ← αx + y SAXPY,
another vector CAXPY

34 004– 2081– 002


INTRO_BLAS1 ( 3S ) INTRO_BLAS1 ( 3S )

Purpose Operation Name

Copies a real or complex vector into another vector y ←x SCOPY,


CCOPY
n
Σ xi yi
Computes a dot product of two real or complex vectors sdot ← x T y = SDOT
i =1
n
Σ xi yi
cdotc ← x H y = CDOTC
i =1
n
Σ xi yi
cdotu ← x T y = CDOTU
i =1

Computes the Hadamard product of two vectors z (i ) := α x (i ) y (i ) + β z (i ) SHAD

√Σ x
Computes the Euclidean norm (also called l 2 norm) of a n SNRM2
snrm2 ← ||x ||2 = i
2
real or complex vector i =1


n SCNRM2
scnrm2 ← ||x || = Σ x x 2 i i
i =1

Adds a scalar multiple of a vector to a sparse vector yJ ← αxi + yJ


i i
SPAXPY*
n
Σ xi yJ
Computes a dot product of a real vector and a sparse spdot ← SPDOT*
real vector i =1
i

Applies a real plane rotation to a pair of complex CSROT


vectors
Applies an orthogonal plane rotation SROT
Applies a complex Givens plane rotation CROT
Constructs a Givens plane rotation SROTG,
CROTG
Applies a modified Givens plane rotation SROTM
Constructs a modified Givens plane rotation SROTMG
Scales a real or complex vector x ← αx SSCAL,
CSSCAL,
CSCAL
n
Σ xi
Sums the elements of a real or complex vector sum ← SSUM*,
i =1 CSUM*
Swaps two real or two complex vectors x ←
→y SSWAP,
CSWAP

004– 2081– 002 35


INTRO_BLAS1 ( 3S ) INTRO_BLAS1 ( 3S )

SEE ALSO
Lawson, C., Hanson, R., Kincaid, D., and Krogh, F., "Basic Linear Algebra Subprograms for Fortran Usage,"
ACM Transactions on Mathematical Software, 5 (1979), pp. 308 – 325.

36 004– 2081– 002


CSROT ( 3S ) CSROT ( 3S )

NAME
CSROT – Applies a real plane rotation to a pair of complex vectors

SYNOPSIS
CALL CSROT ( n, x, incx, y, incy, c, s)

IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses private data.

DESCRIPTION
CSROT applies a real plane rotation to a pair of complex vectors. The form of the operation is the
following:
 xi   c s   xi 
 :=   
 yi   −s c   yi 

for each pair of values (x(i), y(i)) i=1,. . ., n


This routine has the following arguments:
n Integer. (input)
Number of pairs of elements of x and y to be rotated.
x Complex array of dimension (n– 1) .  incx  + 1. (input and output)
On input, array x. If incx > 0, x(i) is stored in x(1+(i– 1) . incx); if incx < 0, x(i) is stored in
x(1– (n– i) . incx). On output, the rotated vector x.
incx Integer. (input)
Increment between elements of x. incx should not be 0.
y Complex array of dimension 1+(n– 1) .  incy  ) (input and output)
On input, array y. If incy > 0, y(i) is stored in y(1+(i– 1) . incy); if incy < 0, y(i) is stored in
y(1– (n– i) . incy). On output, the rotated vector y.
incy Integer. (input)
Increment between elements of y. incy should not be 0.
c Real. (input)
Cosine of the angle of rotation.
s Real. (input)
Sine of the angle of rotation.

SEE ALSO
CROTG(3S), SROT(3S), SROTG(3S), SROTM(3S)

004– 2081– 002 37


HAXPY ( 3S ) HAXPY ( 3S )

NAME
HAXPY, GAXPY – Adds a scalar multiple of a real or complex vector to another real or complex vector

SYNOPSIS
CALL HAXPY (n, alpha, x, incx, y, incy)
CALL GAXPY (n, alpha, x, incx, y, incy)

IMPLEMENTATION
UNICOS/mk systems
These subroutines execute on a single processor and use private data.

DESCRIPTION
HAXPY adds a scalar multiple of a real vector to another real vector.
GAXPY adds a scalar multiple of a complex vector to another complex vector.
HAXPY and GAXPY perform the following vector operation:
y←αx+y
where α is a real or complex scalar, and x and y are real or complex vectors.
These routines have the following arguments:
n INTEGER(KIND=8). (input)
Number of elements in the vectors. If n ≤ 0, HAXPY and GAXPY return without any
computation.
alpha HAXPY: REAL(KIND=4). (input)
GAXPY: COMPLEX(KIND=4). (input)
Scalar multiplier α. If real α = 0. or complex α = 0 = 0. + 0.i, HAXPY and GAXPY return
without any computation.
x HAXPY: REAL(KIND=4) array of dimension (n– 1) .  incx  + 1. (input)
GAXPY: COMPLEX(KIND=4) array of dimension (n– 1) .  incx  + 1. (input)
Contains the vector to be scaled before summation.
incx INTEGER(KIND=8). (input)
Increment between elements of x. incx should not be 0.
y HAXPY: REAL(KIND=4) array of dimension (n– 1) .  incy  + 1. (input and output)
GAXPY: COMPLEX(KIND=4) array of dimension (n– 1) .  incy  + 1. (input and output)
Before calling the routine, y contains the vector to be summed. After the routine ends, y
contains the result of the summation.
incy INTEGER(KIND=8). (input)
Increment between elements of y. incy should not be 0.

38 004– 2081– 002


HAXPY ( 3S ) HAXPY ( 3S )

NOTES
HAXPY and GAXPY are based on SAXPY and CAXPY from the Level 1 Basic Linear Algebra Subprograms
(Level 1 BLAS).
When working backward (incx < 0 or incy < 0), each routine starts at the end of the vector and moves
backward, as follows:
x(1– incx . (n– 1)), x(1– incx . (n– 2)), . . ., x(1)
y(1– incy . (n– 1)), y(1– incy . (n– 2)), . . ., y(1)

RETURN VALUES
When n ≤ 0, real α = 0., or complex α = 0 = 0.+0.i, these routines return immediately with no change in
their arguments.

004– 2081– 002 39


HDOT ( 3S ) HDOT ( 3S )

NAME
HDOT, GDOTC, GDOTU – Computes a dot product (inner product) of two real or complex vectors

SYNOPSIS
dot = HDOT (n, x, incx, y, incy)
dot = GDOTC (n, x, incx, y, incy)
dot = GDOTU (n, x, incx, y, incy)

IMPLEMENTATION
UNICOS/mk systems
This subroutine executes on a single processor and uses private data.

DESCRIPTION
HDOT computes a dot product of two real vectors (l 2 real inner product).
GDOTC computes a dot product of the conjugate of a complex vector and another complex vector (l 2 real
inner product).
GDOTU computes a dot product of two complex vectors.
HDOT and GDOTU perform the following vector operation:
n
dot ← x T y = Σ xi yi
i =1
T
where x and y are real or complex vectors and x is the transpose of x.
GDOTC performs the following vector operation:
n
dot ← x H y = Σ xi yi
i =1
H
where x and y are complex vectors, and x is the conjugate transpose of x.
These functions have the following arguments:
dot HDOT: REAL(KIND=4). (output)
GDOTC, GDOTU: COMPLEX(KIND=4). (output)
Result (dot product). If n ≤ 0, dot is set to 0.
n INTEGER(KIND=8). (input)
Number of elements in each vector.
x HDOT: REAL(KIND=4) array of dimension (n– 1) .  incx  + 1. (input)
GDOTC, GDOTU: COMPLEX(KIND=4) array of dimension (n– 1) .  incx  + 1. (input)
Array x contains the first vector operand.

40 004– 2081– 002


HDOT ( 3S ) HDOT ( 3S )

incx INTEGER(KIND=8). (input)


Increment between elements of x. incx should not be 0.
y HDOT: REAL(KIND=4) array of dimension (n– 1) .  incy  + 1. (input)
GDOTC, GDOTU: COMPLEX(KIND=4) array of dimension (n– 1) .  incy + 1. (input)
Array y contains the second vector operand.
incy INTEGER(KIND=8). (input)
Increment between elements of y. incy should not be 0.

NOTES
HDOT, GDOTC, and GDOTU are based on SDOT, CDOTC, and CDOTU from the Level 1 Basic Linear Algebra
Subprograms (Level 1 BLAS).
When working backward (incx < 0 or incy < 0), each routine starts at the end of the vector and moves
backward, as follows:
x(1– incx . (n– 1)), x(1– incx . (n– 2)), . . ., x(1)
y(1– incy . (n– 1)), y(1– incy . (n– 2)), . . ., y(1)

004– 2081– 002 41


SASUM ( 3S ) SASUM ( 3S )

NAME
SASUM, SCASUM – Sums the absolute value of elements in a real or complex vector

SYNOPSIS
sum = SASUM (n, x, incx)
sum = SCASUM (n, x, incx)

IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, these subroutines execute on a single processor and use private data.

DESCRIPTION
SASUM sums the absolute values of the elements of a real vector, as follows:
n
sum ← ||x ||1 = Σ
i =1
 xi 

where x is a real vector of length n.


SCASUM sums the absolute values of the real and imaginary parts of the elements of a complex vector, as
follows:
n n
sum← ||Real (x )||1 + ||Imag (x )||1= Σ  Real (xi )  + jΣ=1  Imag (x j ) 
i =1

where x is a complex vector of length n.


These real functions have the following arguments:
sum Real. (output)
SASUM: Sum of the absolute values of the elements of x.
SCASUM: Sum of the absolute values of the real and imaginary parts of the elements of x.
n Integer. (input)
Number of elements in the vector to be summed. If n ≤ 0, SASUM and SCASUM return 0.
x SASUM: Real array of dimension (n– 1) .  incx  + 1. (input)
SCASUM: Complex array of dimension (n– 1) .  incx  + 1. (input)
Array x contains the vector to be summed.
incx Integer. (input)
Increment between elements of x. If incx = 0, the results will be unpredictable.

42 004– 2081– 002


SASUM ( 3S ) SASUM ( 3S )

NOTES
SASUM and SCASUM are Level 1 Basic Linear Algebra Subprograms (Level 1 BLAS).
When working backward (incx < 0), each routine starts at the end of the vector and moves backward, as
follows:
x(1– incx . (n– 1)), x(1– incx . (n– 2)), . . ., x(1)

004– 2081– 002 43


SAXPBY ( 3S ) SAXPBY ( 3S )

NAME
SAXPBY, CAXPBY – Adds a scalar multiple of a real or complex vector x to a scalar multiple of another
real or complex vector y

SYNOPSIS
CALL SAXPBY (n, alpha, x, incx, beta, y, incy)
CALL CAXPBY (n, alpha, x, incx, beta, y, incy)

IMPLEMENTATION
UNICOS/mk systems
These subroutines execute on a single processor and use private data only.

DESCRIPTION
SAXPBY adds a scalar multiple of a real vector x to a scalar multiple of a real vector y.
CAXPBY adds a scalar multiple of a complex vector x to a scalar multiple of a complex vector y.
y←αx+βy
where x and y are n-vectors and α and β are scalars.
The following special cases are recognized:
α = 0: equivalent to SSCAL or CSCAL
α = 1, β = 0: equivalent to SCOPY or CCOPY
α ≠1, β = 0: like SCOPY or CCOPY, with scaling
α ≠0, β = 1: equivalent to SAXPY or CAXPY
These routines have the following arguments:
n Integer. (input)
Number of elements of the vectors x and y.
alpha SAXPBY: Real. (input)
CAXPBY: Complex. (input)
The scalar α.
x SAXPBY: Real array of dimension (1+(n– 1) .  incx ). (input)
CAXPBY: Complex array of dimension (1+(n– 1) .  incx  ). (input)
The vector x. If incx > 0, the i-th element of the vector x is located in x(1+(i-1) .  incx  ). If
incx < 0, the i-th element of the vector x is located in x(1+(n-i) .  incx  ).
incx Integer. (input)
Increment between elements of the vector x. If incx < 0, x is processed in reverse order.
beta SAXPBY: Real. (input)
CAXPBY: Complex. (input)
The scalar β.

44 004– 2081– 002


SAXPBY ( 3S ) SAXPBY ( 3S )

y SAXPBY: Real array of dimension (1+(n– 1) .  incy  ). (input/output)


CAXPBY: Complex array of dimension (1+(n– 1) .  incy  ). (input/output)
On entry, the vector y. If incy > 0, the i-th element of the vector y is located in y(1+(i-1) .
 incy  ). If incy < 0, the i-th element of the vector y is located in y(1+(n-i) .  incy ).
On exit, y is overwritten with the vector sum αx+βy.
incy Integer. (input)
Increment between elements of y. If incy < 0, y is processed in reverse order.

004– 2081– 002 45


SAXPY ( 3S ) SAXPY ( 3S )

NAME
SAXPY, CAXPY – Adds a scalar multiple of a real or complex vector to another real or complex vector

SYNOPSIS
CALL SAXPY (n, alpha, x, incx, y, incy)
CALL CAXPY (n, alpha, x, incx, y, incy)

IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses only private data.

DESCRIPTION
SAXPY adds a scalar multiple of a real vector to another real vector.
CAXPY adds a scalar multiple of a complex vector to another complex vector.
SAXPY and CAXPY perform the following vector operation:
y ← αx +y
where α is a real or complex scalar, and x and y are real or complex vectors.
These routines have the following arguments:
n Integer. (input)
Number of elements in the vectors. If n ≤ 0, SAXPY and CAXPY return without any
computation.
alpha SAXPY: Real. (input)
CAXPY: Complex. (input)
Scalar multiplier α. If real α = 0 or complex α = 0 = 0. + 0.i, SAXPY and CAXPY return
without any computation.
x SAXPY: Real array of dimension (n– 1) .  incx  + 1. (input)
CAXPY: Complex array of dimension (n– 1) .  incx  + 1. (input)
Contains the vector to be scaled before summation.
incx Integer. (input)
Increment between elements of x. If incx = 0, the results will be unpredictable.

46 004– 2081– 002


SAXPY ( 3S ) SAXPY ( 3S )

y SAXPY: Real array of dimension (n– 1) .  incy  + 1. (input and output)


CAXPY: Complex array of dimension (n– 1) .  incy  + 1. (input and output)
Before calling the routine, y contains the vector to be summed. After the routine ends, y
contains the result of the summation.
incy Integer. (input)
Increment between elements of y. If incy = 0, the results will be unpredictable.

NOTES
SAXPY and CAXPY are Level 1 Basic Linear Algebra Subprograms (Level 1 BLAS).
When working backward (incx < 0 or incy < 0), each routine starts at the end of the vector and moves
backward, as follows:
x(1– incx . (n– 1)), x(1– incx . (n– 2)), . . ., x(1)
y(1– incy . (n– 1)), y(1– incy . (n– 2)), . . ., y(1)

RETURN VALUES
When n ≤ 0, real α = 0., or complex α = 0 = 0.+0.i, these routines return immediately with no change in
their arguments.

004– 2081– 002 47


SCOPY ( 3S ) SCOPY ( 3S )

NAME
SCOPY, CCOPY – Copies a real or complex vector into another real or complex vector

SYNOPSIS
CALL SCOPY (n, x, incx, y, incy)
CALL CCOPY (n, x, incx, y, incy)

IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses private data.

DESCRIPTION
SCOPY copies a real vector into another real vector.
CCOPY copies a complex vector into another complex vector.
SCOPY and CCOPY perform the following vector operation:
y←x
where x and y are real or complex vectors.
These routines have the following arguments:
n Integer. (input)
Number of elements to be copied. If n ≤ 0, SCOPY and CCOPY return without any computation.
x SCOPY: Real array of dimension (n– 1) .  incx  + 1. (input)
CCOPY: Complex array of dimension (n– 1) .  incx  + 1. (input)
Vector from which to copy.
incx Integer. (input)
Increment between elements of x. If incx = 0, the results will be unpredictable.
y SCOPY: Real array of dimension (n– 1) .  incy  + 1. (output)
CCOPY: Complex array of dimension (n– 1) .  incy + 1. (output)
Result vector.
incy Integer. (input)
Increment between elements of y. If incy = 0, the results will be unpredictable.

NOTES
SCOPY and CCOPY are Level 1 Basic Linear Algebra Subprograms (Level 1 BLAS).

48 004– 2081– 002


SCOPY ( 3S ) SCOPY ( 3S )

When working backward (incx < 0 or incy < 0), each routine starts at the end of the vector and moves
backward, as follows:
x(1– incx . (n– 1)), x(1– incx . (n– 2)), . . ., x(1)
y(1– incy . (n– 1)), y(1– incy . (n– 2)), . . ., y(1)

004– 2081– 002 49


SDOT ( 3S ) SDOT ( 3S )

NAME
SDOT, CDOTC, CDOTU – Computes a dot product (inner product) of two real or complex vectors

SYNOPSIS
dot = SDOT (n, x, incx, y, incy)
dot = CDOTC (n, x, incx, y, incy)
dot = CDOTU (n, x, incx, y, incy)

IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses private data.

DESCRIPTION
SDOT computes a dot product of two real vectors l 2 real inner product).
CDOTC computes a dot product of the conjugate of a complex vector and another complex vector l 2 complex
inner product).
CDOTU computes a dot product of two complex vectors.
SDOT and CDOTU perform the following vector operation:
n
dot ← x y = Σ x i y
T
i
i =1
T
where x and y are real or complex vectors, and x is the transpose of x.
CDOTC performs the following vector operation:
n
Σ xi
H
dot ← x y = yi
i =1
H
where x and y are complex vectors, and x is the conjugate transpose of x.
These functions have the following arguments:
dot SDOT: Real. (output)
CDOTC, CDOTU: Complex. (output)
Result (dot product). If n ≤ 0, dot is set to 0.
n Integer. (input)
Number of elements in each vector.
x SDOT: Real array of dimension (n– 1) .  incx  + 1. (input)
CDOTC, CDOTU: Complex array of dimension (n– 1) .  incx  + 1. (input)
Array x contains the first vector operand.

50 004– 2081– 002


SDOT ( 3S ) SDOT ( 3S )

incx Integer. (input)


Increment between elements of x. If incx = 0, the results will be unpredictable.
y SDOT: Real array of dimension (n– 1) .  incy  + 1. (input)
CDOTC, CDOTU: Complex array of dimension (n– 1) .  incy  + 1.
Array y contains the second vector operand.
incy Integer. (input)
Increment between elements of y. If incy = 0, the results will be unpredictable.

NOTES
SDOT, CDOTC, and CDOTU are Level 1 Basic Linear Algebra Subprograms (Level 1 BLAS).
When working backward (incx < 0 or incy < 0), each routine starts at the end of the vector and moves
backward, as follows:
x(1– incx . (n– 1)), x(1– incx . (n– 2)), . . ., x(1)
y(1– incy . (n– 1)), y(1– incy . (n– 2)), . . ., y(1)

004– 2081– 002 51


SHAD ( 3S ) SHAD ( 3S )

NAME
SHAD – Computes the Hadamard product of two vectors

SYNOPSIS
CALL SHAD (n, alpha, x, incx, y, incy, beta, z, incz)

IMPLEMENTATION
UNICOS/mk systems

DESCRIPTION
SHAD computes the Hadamard product of two vectors X and Y, storing the results in a vector Z.

z (i ) := α x (i ) y (i ) + β z (i ), i = 1, . . ., n
α = 0 is recognized as a special case. β = 0 or β = 1 is also recognized as a special case.
The SHAD routine accepts the following arguments:
n Integer. (input)
The number of elements in each vector.
alpha Real. (input)
The scalar α.
x Real array, dimension (1+(n– 1) . incx). (input)
The vector x.
If incx > 0, the ith element of the vector x is located in x(1+(i– 1) . incx).
If incx < 0, the ith element of the vector x is located in x(1+(n– i) .  incx  ).
incx Integer. (input)
The increment between elements of the vector x.
incx must not = 0.
y Real array, dimension (1+(n– 1) . incy). (input)
The vector y.
If incy > 0, the ith element of the vector y is located in y(1+(i– 1) . incy).
If incy < 0, the ith element of the vector y is located in y(1+(n– i) .  incy  ).
incy Integer. (input)
The increment between elements of the vector y. incy must not = 0.
beta Real. (input)
The scalar beta.

52 004– 2081– 002


SHAD ( 3S ) SHAD ( 3S )

z Real array, dimension (1+(n– 1) . incz). (input/output)


On entry, the vector Z.
If incz > 0, the ith element of the vector z is located in z(1+(i– 1) . incz).
If incz < 0, the ith element of the vector z is located in z(1+(n– i) .  incz ). On exit, z is
overwritten with the Hadamard product α . x . y + β . z .
incz Integer. (input)
The increment between elements of the vector z. incz must not = 0.

004– 2081– 002 53


SNRM2 ( 3S ) SNRM2 ( 3S )

NAME
SNRM2, SCNRM2 – Computes the Euclidean norm of a vector

SYNOPSIS
enrm = SNRM2 (n, x, incx)
enrm = SCNRM2 (n, xi, incx)

IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses private data.

DESCRIPTION
SNRM2 computes the Euclidean (l 2 ) norm of a real vector, as follows:

√Σ x
n
enrm ← ||x ||2 = √x T x = i
2
i =1

where x is a real vector, and x T denotes the transpose of x.


SCNRM2 computes the Euclidean (l 2 ) norm of a complex vector, as follows:


n
enrm ← ||x ||2 =√x H x = Σ xi xi
i =1
H
where x is a complex vector, and x denotes the conjugate transpose of x.
These functions have the following arguments:
enrm Real. (output)
Result (Euclidean norm). If n ≤ 0, enrm is set to 0.
n Integer. (input)
Number of elements in the operand vector.
x SNRM2: Real array of dimension (n– 1) .  incx  + 1. (input)
SCNRM2: Complex array of dimension (n– 1) .  incx  + 1. (input)
Array x contains the operand vector.
incx Integer. (input)
Increment between elements of x. If incx = 0, the results will be unpredictable.

NOTES
SNRM2 and SCNRM2 are Level 1 Basic Linear Algebra Subprograms (Level 1 BLAS).

54 004– 2081– 002


SNRM2 ( 3S ) SNRM2 ( 3S )

The version of these routines on UNICOS systems does not behave the same way that the public domain
FORTRAN version behaves. For performance reasons, they do not scale the input values; input for the
routines must be within a certain range of numbers.
When working backward (incx < 0), each routine starts at the end of the vector and moves backward, as
follows:
x(1– incx . (n– 1)), x(1– incx . (n– 2)), . . ., x(1)

004– 2081– 002 55


SPAXPY ( 3S ) SPAXPY ( 3S )

NAME
SPAXPY – Adds a scalar multiple of a real vector to a sparse real vector

SYNOPSIS
CALL SPAXPY (n, alpha, x, y, index)

IMPLEMENTATION
UNICOS systems

DESCRIPTION
SPAXPY adds a scalar multiple of a real vector to a sparse real vector. It performs the following vector
operation where α is a real scalar, x is a real vector, and y is a sparse real vector:
y←αx+y
This routine has the following arguments:
n Integer. (input)
Number of vector elements to be used in the computation. If n ≤ 0, SPAXPY returns without
any computation.
alpha Real. (input)
Scalar multiplier α. If α = 0.0, SPAXPY returns without any computation.
x Real array of dimension n. (input)
Contains the dense vector operand to be scaled before adding.
y Real array of dimension MAX{index(1),. . .,index(n)}. (input and output)
On input, y contains the sparse vector used in the addition. On output, y receives the resulting
vector.
index Integer array of dimension n. (input)
Contains the vector of indices for elements of y. All elements in index should be unique.
SPAXPY executes an operation equivalent to the following Fortran code:
DO 10 I=1 ,N
Y(I NDEX(I ))=ALP HA*X(I )+Y(IN DEX(I) )
10 CONTIN UE

NOTES
SPAXPY is an extension to the standard Level 1 Basic Linear Algebra Subprograms (Level 1 BLAS).

56 004– 2081– 002


SPDOT ( 3S ) SPDOT ( 3S )

NAME
SPDOT – Computes the dot product of a real vector and a real sparse vector

SYNOPSIS
dot = SPDOT (n, y, index, x)

IMPLEMENTATION
UNICOS systems

DESCRIPTION
SPDOT computes the dot product of a real vector and a sparse real vector (l 2 real inner product). It
T
performs the following vector operation where y is a real vector, y is the transpose of y, and x is a real
sparse vector:
n
dot ← y T x = Σ yi xi
i =1

This function has the following arguments:


dot Real. (output)
Result (dot product). If n ≤ 0, dot is set to 0 on output.
n Integer. (input)
Number of vector elements to be used in the computation.
y Real array of dimension MAX{index(1),. . .,index(n)}. (input)
Contains the sparse real vector operand.
index Integer array of dimension n. (input)
Contains the real vector of indices for the elements of y. All values in index should be unique.
x Real array of dimension n. (input)
Contains the real dense vector operand.
SPDOT executes an operation equivalent to the following Fortran code:
DOT =0. 0
DO 10 I=1 ,N
DOT=DO T+Y(IN DEX (I) )*X (I)
10 CON TINUE

NOTES
SPDOT is an extension to the standard Level 1 Basic Linear Algebra Subprograms (Level 1 BLAS).

004– 2081– 002 57


SROT ( 3S ) SROT ( 3S )

NAME
SROT, CROT – Applies a real plane rotation or complex coordinate rotation

SYNOPSIS
CALL SROT (n, x, incx, y, incy, c, s)
CALL CROT (n, x, incx, y, incy, c, s)

IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, these subroutines execute on a single processor and use private data.

DESCRIPTION
SROT applies a plane rotation matrix to a real sequence of ordered pairs:
(x i , y i ), for all i = 1, 2, . . ., n.
CROT applies a rotation matrix to a complex sequence of ordered pairs:
(x i , y i ), for all i = 1, 2, . . ., n.
These routines have the following arguments:
n Integer. (input)
Number of ordered pairs (planar points in SROT) to be rotated. If n ≤ 0, SROT or CROT returns
without computation.
x SROT: Real array of dimension (n– 1) .  incx  + 1. (input and output)
On input, array x contains the x-coordinate of each planar point to be rotated. On output, array x
contains the x-coordinate of each rotated planar point.
CROT: Complex array of dimension (n– 1) .  incx  + 1. (input and output)
On input, array x contains the first element of each ordered pair to be rotated. On output, array
x contains the first element of each rotated ordered pair.
incx Integer. (input)
Increment between elements of x. If incx = 0, the results will be unpredictable.
y SROT: Real array of dimension (n– 1) .  incy  + 1. (input and output)
On input, array y contains the y-coordinate of each planar point to be rotated. On output, array y
contains the y-coordinate of each rotated planar point.
CROT: Complex array of dimension (n– 1) .  incy  + 1. (input and output)
On input, array y contains the second element of each ordered pair to be rotated. On output,
array y contains the second element of each rotated ordered pair.
incy Integer. (input)
Increment between elements of y. If incy = 0, the results will be unpredictable.

58 004– 2081– 002


SROT ( 3S ) SROT ( 3S )

c Real. (input)
Cosine of the angle of rotation, usually calculated using SROTG(3S) or CROTG(3S).
s SROT: Real. (input)
Sine of the angle of rotation, usually calculated using SROTG.
CROT: Complex. (input)
Complex sine of the angle of rotation, usually calculated using CROTG.

NOTES
SROT and CROT are Level 1 Basic Linear Algebra Subprograms (Level 1 BLAS). SROT applies the
following plane rotation to each pair of elements (x i , y i ):
 xi   c s   xi 
  ←     for i =1, 2,. . ., n
 yi   −s c   yi 
2 2
If coefficients c and s satisfy c + s = 1.0, the rotation matrix is orthogonal, and the transformation is called
a Givens plane rotation. If c = 1 and s = 0, SROT returns without modifying any input parameters.
CROT applies the following rotation to each pair of complex elements (x i , y i ):
 xi   c s   xi 
  ←     for i =1, 2,. . ., n
 yi   −s c   yi 
where s is the complex conjugate of s.
For CROT, if the coefficient c is real, and the coefficients c and s satisfy c 2 + ss = 1.0, the rotation matrix is
unitary, and the transformation is called a Givens complex rotation.
To calculate the Givens coefficients c and s from a two-element vector to determine the angle of rotation,
use SROTG(3S) or CROTG(3S).
When working backward (incx < 0 or incy < 0), each routine starts at the end of the vector and moves
backward, as follows:
x(1– incx . (n– 1)), x(1– incx . (n– 2)), . . ., x(1)
y(1– incy . (n– 1)), y(1– incy . (n– 2)), . . ., y(1)

SEE ALSO
CROTG(3S), SROTG(3S), SROTM(3S)

004– 2081– 002 59


SROTG ( 3S ) SROTG ( 3S )

NAME
SROTG, CROTG – Constructs a Givens plane rotation

SYNOPSIS
CALL SROTG (a, b, c, s)
CALL CROTG (a, b, c, s)

IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, these subroutines execute on a single processor and use private data.

DESCRIPTION
SROTG computes the elements of a rotation matrix such that:
 c s   a   r 
 .   =  
 −s c   b   0 

where r = ± √a 2+b 2 and c 2+s 2 = 1


CROTG computes the elements of a rotation matrix such that:
 c s   a   r 
  .   = 
 −s c   b   0 

where r =
a √aa +bb
√aa
and the notation z represents the complex conjugate of z.
These routines have the following arguments:
a SROTG: Real.
CROTG: Complex.
(input and output)
SROTG: On input, the first component of the vector to be rotated. On output, a is overwritten by r,
the first component of the vector in the rotated coordinate system, where:
r =sign ( √(a 2+b 2 ), a ), if  a  >  b 
r =sign ( √(a 2+b 2 ), b ), if  a  ≤  b 
CROTG: On output, a is overwritten by the unique complex number r, whose size in the complex
plane is the Euclidean norm of the complex vector (a,b), and whose direction in the complex
plane is the same as that of the original complex element a.

60 004– 2081– 002


SROTG ( 3S ) SROTG ( 3S )

b SROTG: Real.
CROTG: Complex.
(input and output)
On input, the second component of the vector to be rotated. On output, b contains z, where:
z=s if  a >  b
z=1/c if  a ≤  b and c ≠0
z=1 if c = 0.

c Real. (output).
Cosine, c, of the angle of rotation.
s SROTG: Real.
CROTG: Complex.
(output)
Sine, s, of the angle of rotation.

NOTES
SROTG and CROTG are Level 1 Basic Linear Algebra Subprograms (Level 1 BLAS).
The value of z, returned in b by SROTG, gives a compact representation of the rotation matrix, which can be
used later to reconstruct c and s as in the following example:
IF (B .EQ. 1. ) THE N
C = 0.
S = 1.
ELSEIF ( ABS ( B) .LT . 1) THE N
C = SQRT( 1. - B * B)
S = B
ELSE
C = 1. / B
S = SQR T( 1 - C * C)
END IF

SEE ALSO
SROT(3S)

004– 2081– 002 61


SROTM ( 3S ) SROTM ( 3S )

NAME
SROTM – Applies a modified Givens plane rotation

SYNOPSIS
CALL SROTM (n, x, incx, y, incy, rparam)

IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses only private data.

DESCRIPTION
SROTM applies the modified Givens plane rotation constructed by SROTMG(3S).
This routine has the following arguments:
n Integer. (input)
Number of planar points to be rotated. If n ≤ 0, SROTM returns without any computation.
x Real array of dimension (n– 1) .  incx + 1. (input and output)
On input, array x contains the x-coordinate of each planar point to be rotated. On output, array x
contains the x-coordinate of each rotated planar point.
incx Integer. (input)
Increment between elements of x. If incx = 0, the results will be unpredictable.
y Real array of dimension (n– 1) .  incy + 1. (input and output)
On input, array y contains the y-coordinate of each planar point to be rotated. On output, array y
contains the y-coordinate of each rotated planar point.
incy Integer. (input)
Increment between elements of y. If incx = 0, the results will be unpredictable.
rparam Real array of dimension 5. (input)
Contains rotation matrix information.
SROTM computes a planar rotation, with possible scaling or reflection, as follows:
 xi   h 1,1 h 1,2   xi 
 ←     : for i =1, 2,. . ., n
 yi   h 2,1 h 2,2   yi 
where the matrix that contains the elements h 1,1, h 2,1, h 1,2, and h 2,2 is called a rotation matrix.
The rparam array determines the contents of the rotation matrix, as follows:
The key parameter, rparam(1), may have one of four values:
1.0, 0.0, – 1.0, or – 2.0

62 004– 2081– 002


SROTM ( 3S ) SROTM ( 3S )

If rparam(1) = 1.0:
 h 11 h 1,2   rparam (2) 1.0 
 =  
 h 2,1 h 2,2   −1.0 rparam (5) 
and rparam(3) and rparam(4) are ignored.
If rparam(1) = 0.0:
 h 1,1 h 1,2   1.0 rparam (4) 
 =  
 h 2,1 h 2,2   rparam (3) 1.0 

and rparam(2) and rparam(5) are ignored.


If rparam(1)=– 1.0 (rescaling case):
 h 1,1 h 1,2   rparam (2) rparam (4) 
 =  
 h 2,1 h 2,2   rparam (3) rparam (5) 
This is a full matrix multiplication.
If rparam(1) = – 2.0:
 h 1,1 h 1,2   1.0 0.0 
 =  =I
 h 2,1 h 2,2   0.0 1.0 
where I is the identity matrix. In this case, rparam(2), rparam(3), rparam(4), and rparam(5) are ignored.
If n ≤0, or if the rotation matrix is the identity matrix (for example, when rparam(1)=– 2.0), SROTM returns
with no operation on input arrays x and y.
If any value of rparam(1) that is not valid is read (any value other than 1.0, 0.0, – 1.0, or – 2.0), SROTM
aborts the job and the following message appears on stderr (diagnostic output):
SRO TM CAL LED WIT H INC ORR ECT PAR AME TER KEY

SEE ALSO
SROTMG(3S) for further details about the modified Givens transformation and array rparam

004– 2081– 002 63


SROTMG ( 3S ) SROTMG ( 3S )

NAME
SROTMG – Constructs a modified Givens plane rotation

SYNOPSIS
CALL SROTMG (d 1 , d 2 , b 1 , b 2 , rparam)

IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses only private data.

DESCRIPTION
SROTMG computes the elements of a modified Givens plane rotation matrix.
This routine has the following arguments:
d1 Real. (input and output)
On input, this value is the first diagonal element of the scaling matrix D. On the first call to
SROTMG, this value is typically 1.0. Subsequent calls typically use the value from the previous
call. On output, this value is the first diagonal element of the updated scaling matrix D’.
d2 Real. (input and output)
On input, this is the first diagonal element of the scaling matrix D. On the first call to SROTMG,
this value is typically 1.0. Subsequent calls typically use the value from the previous call. On
output, this value is the first diagonal element of the updated scaling matrix D’.
b1 Real. (input and output)
On input, this value is the x-coordinate of the vector used to define the angle of rotation, before
scaling (multiplying by the matrix D). On output, this value is the x-coordinate of the rotated
vector, before scaling (multiplying by the matrix D’).
b2 Real. (input)
On input, this value is the y-coordinate of the vector used to define the angle of rotation, before
scaling (multiplying by the matrix D). It is unchanged on output.
rparam Real array of dimension 5. (output)
This array contains rotation matrix information. SROTMG sets up the computed elements in
rparam from inputs d 1 , d 2 , b 1 , and b 2 .
Standard Givens Rotation
A standard Givens rotation (see SROTG(3S)) is based on an orthogonal matrix G that rotates points on a
Cartesian xy-coordinate plane. To calculate the rotation matrix, you must provide the angle of rotation
desired, or, equivalently, a vector (point) that lies along the angle of rotation. For a given planar point (x r ,
y r ), G is formed so that:

64 004– 2081– 002


SROTMG ( 3S ) SROTMG ( 3S )

 x ′   c s   xr   xr 
 =    = G  
 0   −s c   yr   yr 

where x ′ = √xr 2 + yr 2.
With this rotation matrix G, you can then convert any number of existing planar points to the new (rotated)
xy-coordinate system. For n points, the rotations would be as follows:
 xi   c s   xi 
  ←     for i =1, 2,. . ., n
 yi   −s c   yi 
Modified Givens Rotation
The algorithm for SROTMG is based on the following observation. The rotation matrix G can be factored
into a scaling matrix (diagonal matrix) and modified rotation matrix H, for which either the diagonal or the
off-diagonal elements are units (that is, ±1). Thus, to perform m modified (scaled) rotations on n planar
points, requires only 2nm, rather than 4nm multiplications for the standard rotation.
Because you may want to perform several successive rotations, this routine assumes that you have leftover
scaling factors from your previous modified Givens rotation; that is, the routine requires you to input not
only a planar rotation vector (b 1 , b 2 ) but also the squares of the diagonal elements of the scaling matrix, d 1
and d 2 . The actual rotation vector is specified as follows:
 xr   √d 1 
0   b1  1  b 
 = 
1
   = 2  
 r 
y 
0 √d 2  2  b D
 b 2 
 

where d 1 and d 2 are the input scaling factors.


Given these inputs, SROTMG generates a new modified rotation matrix H with units for either the diagonal or
off-diagonal elements, and new elements d 1′ and d 2′ for the new scaling factors, and rotated but unscaled
vector (b 1′, 0), with the following results:
 xr  1  b  1  b1  1  b1 
= D H  b 2  = D H  b 2 
1
G y  = G D 2   ′2   ′2 
 r   b2 

where:
 
1
 √d 1′ 0 
′2
D =  
 0 √d 2′ 

uses the updated scaling factors d 1 ’ and d 2 ’, which are d 1 and d 2 on output.

 h 1,1 h 1,2 
H =  h  is stored in the output array argument rparam
 2,1 h 2,2 

004– 2081– 002 65


SROTMG ( 3S ) SROTMG ( 3S )

b 1′ is stored as b 1 on output.
1/2 1/2
D’ H equals G D , not G, as implied earlier. You must account for the old scaling factors when
calculating the new scaling factors.
After calculating the matrix H by using SROTMG, you can then use it in SROTM(3S) to convert points to the
new coordinate system.

NOTES

Meaning of the Output Values


The output values are returned through arguments d 1 , d 2 , b 1 , and rparam.
The scaling factors d 1 and d 2 are updated with each call to SROTMG. Although SROTM(3S) does not need
the updated factors, they are needed in two other important contexts:
• As input for subsequent calls to SROTMG.
• As scaling factors for rotated but unscaled points (x i ), y i )), which are output from SROTM(3S).
In this second usage, the actual (scaled) points would be given by (√d 1xi , √d 2yi ). Doing this operation
frequently on all your points is counterproductive. The main advantage of the modified rotation algorithm
is to reduce the number of operations. If you fold in the scaling factors after each rotation, you are
performing the same number of operations as in the standard Givens rotation.
These two uses for the scaling factors are mutually exclusive; that is, if you fold the scaling factors back into
all your points, you no longer need those factors for SROTMG. After folding in the scaling factors, and
before the next call to SROTMG, reset d 1 and d 2 to 1.0.
On output, b 1 represents the new x-coordinate after rotating (but before scaling) the rotation vector.
Although the y-coordinate of this vector is 0.0 (see the the previous discussion), the corresponding value b 2
is unchanged on output.
The output array argument rparam specifies the format of matrix H, and it holds the nonunit values of H.
This is the only output of SROTMG that SROTM(3S) requires. Each element of rparam has a specific
meaning, as follows:
rparam(1) a flag parameter that specifies how the matrix is stored
= 0.0 Off-diagonal elements of H are units.
= 1.0 Diagonal elements of H are units.
= – 1.0 Rescaling case (see the following subsection).
= – 2.0 H is the identity matrix; no rotation needed.
rparam(2) = h 1,1 if needed
rparam(3) = h 2,1 if needed
rparam(4) = h 1,2 if needed

66 004– 2081– 002


SROTMG ( 3S ) SROTMG ( 3S )

rparam(5) = h 2,2 if needed


Calculating the Output Values
The following presents the algorithm for calculating the output values d 1 , d 2 , b 1 , and rparam, based on the
input values d 1 , d 2 , b 1 , and b2. This algorithm is presented without explanation or proof. For a more
complete discussion of the modified Givens algorithm, see the papers listed in the SEE ALSO section.
Case 1: b 2 = 0 (trivial case)
In this case, no rotation is needed. The flag value, rparam(1), is set to – 2.0. When passed to routine
SROTM(3S), this flag indicates that it should not do any rotations.
On output from SROTMG, d 1, d 2, and b 1 are unchanged.
Case 2:  √d 1b 1  >  √d 2b 2  (  xr  >  yr  )
In this case, the diagonal elements of H (h 1,1 and h 2,2 ) are set to 1.0 . Thus, the rparam values set on
output are as follows:
rparam (1) ← 0.0
b2
rparam (3) ← h 2,1 = −
b1
d 2b 2
rparam (4) ← h 1,2 =
d 1b 1

The output values of the scaling factors are as follows:


d1
d 1 ← d 1′ =
u
d2
d 2 ← d 2′ =
u
d 2b 22
where u = det(H ) = 1 + .
d 1b 12
The output value of b 1 is as follows:
b 1 ← b 1′ = b 1u

If rescaling is needed, SROTMG will further modify these output values before the end of the routine. See
case 4 later in this subsection.
Case 3:  √d 1b 1  ≤  √d 2b 2  (  xr  ≤  yr  )
In this case, the off-diagonal elements of H are units (to be specific, h 2,1 = −1 and h 1,2 = 1). Thus, the
rparam values set on output are as follows:

004– 2081– 002 67


SROTMG ( 3S ) SROTMG ( 3S )

rparam (1) ← 1.0


d 1b 1
rparam (2) ← h 1,1 =
d 2b 2
b1
rparam (5) ← h 2,2 =
b2
The output values of the scaling factors are as follows:
d2
d 1 ← d 1′ =
u
d1
d 2 ← d 2′ =
u
2
d 1b 1
where u = det(H ) = 1 + .
d 2b 22
The output value of b 1 is as follows:
b 1 ← b 1′ = b 2u
If rescaling is needed, SROTMG will further modify these output values before the end of the routine. See
case 4.
Case 4: Rescaling
If the scaling factors become either very large or very small, the scaling and rotation operations may lose a
lot of accuracy; therefore, each scaling factor from case 2 or case 3 is kept within the range:
γ-2 ≤  di ′  ≤ γ 2, for i = 1, 2

where γ = 2.012 = 4096.0.


At the end of case 2 or case 3 assignments, if either of the scaling factors falls outside this range, SROTMG
must rescale that factor (and the corresponding elements of H) to bring its size back within the specified
range. When the scaling factors stay within this range, scaling and rotation operations should keep the
following accuracy:
48 − log2(gamma ) = 48 − 12 = 36 bits (∼
∼10 decimal digits)
Rescaling is performed as follows:
If either d 1 or d 2 is 0, no rescaling is done;
otherwise, let

 log (  d ′  )   log (  d ′  ) 
1
qi = int(logγ(√  di ′  )) = int   = int  , for i =1, 2.
2 i 2 i

2 12   24 
Then the following is true:

68 004– 2081– 002


SROTMG ( 3S ) SROTMG ( 3S )

qi < 0, if  di ′  < γ2
qi = 0, if γ-2 ≤  di ′  ≤γ2
qi > 0, if  di ′  > γ2

Furthermore,  qi  represents the number of times di ′ must be multiplied (or divided) by γ 2


to return it to
the proper range of values.
In this case, the rparam values set on output are as follows:
rparam (1) ← −1.0
q1
rparam (2) ← h 1,1′ = h 1,1γ
q
rparam (3) ← h 2,1′ = h 2,1γ 2

q
rparam (4) ← h 1,2′ = h 1,2γ 1

q
rparam (5) ← h 2,2′ = h 2,2γ 2

The output values of the scaling factors are as follows:


−2q 1
d 1 ← d 1′′ = d 2′γ
−2q
d 2 ← d 2′′ = d 1′γ 2

The output value of b 1 is as follows:


q
b 1 ← b 1′′ = b 1′γ 1

SEE ALSO
SROTG(3S), SROTM(3S)
Gentleman, W. M., "Least Squares Computations by Givens Transformations Without Square Roots,"
Journal of the Institute for Mathematical Applications 12 (1973), pp. 329 – 336.
Lawson, C., Hanson, R., Kincaid, D., and Krogh, F., "Basic Linear Algebra Subprograms for Fortran Usage,"
ACM Transactions on Mathematical Software, 5 (1979), pp. 308 – 325.

004– 2081– 002 69


SSCAL ( 3S ) SSCAL ( 3S )

NAME
SSCAL, CSSCAL, CSCAL – Scales a real or complex vector

SYNOPSIS
CALL SSCAL (n, alpha, x, incx)
CALL CSSCAL (n, alpha, x, incx)
CALL CSCAL (n, alpha, x, incx)

IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, these subroutines execute on a single processor and use only private data.

DESCRIPTION
SSCAL scales a real vector with a real scalar.
CSSCAL scales a complex vector with a real scalar.
CSCAL scales a complex vector with a complex scalar.
These routines perform the following vector operation:
x←αx
where α is a real or complex scalar, and x is a real or complex vector.
These routines have the following arguments:
n Integer. (input)
Number of elements in the vector. If n ≤ 0, SSCAL, CSSCAL, and CSCAL return without any
computation.
alpha SSCAL, CSSCAL: Real. (input)
CSCAL: Complex. (input)
Scalar value α by which to scale the vector.
x SSCAL: Real array of dimension (n– 1) .  incx  + 1. (input and output)
CSSCAL, CSCAL: Complex array of dimension (n– 1) .  incx  + 1. (input and output)
Vector to be scaled.
incx Integer. (input)
Increment between elements of x. If incx = 0, the results will be unpredictable.

70 004– 2081– 002


SSCAL ( 3S ) SSCAL ( 3S )

NOTES
SSCAL, CSSCAL, and CSCAL are Level 1 Basic Linear Algebra Subprograms (Level 1 BLAS).
When working backward (incx < 0), each routine starts at the end of the vector and moves backward, as
follows:
x(1– incx . (n– 1)), x(1– incx . (n– 2)), . . ., x(1)

004– 2081– 002 71


SSUM ( 3S ) SSUM ( 3S )

NAME
SSUM, CSUM – Sums the elements of a real or complex vector

SYNOPSIS
sum = SSUM (n, x, incx)
sum = CSUM (n, x, incx)

IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, these subroutines execute on a single processor and use only private data.

DESCRIPTION
SSUM sums the elements of a real vector.
CSUM sums the elements of a complex vector.
SSUM and CSUM perform the following vector operation:
n
sum ← Σ xi
i =1

These routines have the following arguments:


sum SSUM: Real. (input)
CSUM: Complex. (input)
Sum of the elements of the vector x. If n ≤ 0, sum is set to 0.
n Integer. (input)
Number of vector elements to be summed.
x SSUM: Real array of dimension (n– 1) .  incx  + 1. (input)
CSUM: Complex array of dimension (n– 1) .  incx  + 1. (input)
Vector that contains elements to be summed.
incx Integer. (input)
Increment between elements of x. If incx = 0, the results will be unpredictable.

NOTES
SSUM and CSUM are extensions to the standard Level 1 Basic Linear Algebra Subprograms (Level 1 BLAS).
When working backward (incx < 0), each routine starts at the end of the vector and moves backward, as
follows:
x(1– incx . (n– 1)), x(1– incx . (n– 2)), . . ., x(1)

72 004– 2081– 002


SSWAP ( 3S ) SSWAP ( 3S )

NAME
SSWAP, CSWAP – Swaps two real or complex vectors

SYNOPSIS
CALL SSWAP (n, x, incx, y, incy)
CALL CSWAP (n, x, incx, y, incy)

IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, these subroutines execute on a single processor and use only private data.

DESCRIPTION
SSWAP swaps two real vectors.
CSWAP swaps two complex vectors.
SSWAP and CSWAP perform the following vector operation:
x <→ y
where x and y are real or complex vectors.
These routines have the following arguments:
n Integer. (input)
Number of vector elements to be swapped. If n ≤ 0, SSWAP and CSWAP return without any
computation.
x SSWAP: Real array of dimension (n– 1) .  incx  + 1. (input and output)
CSWAP: Complex array of dimension (n– 1) .  incx  + 1. (input and output)
Vector to be swapped.
incx Integer. (input)
Increment between elements of x.
If incx = 0, the results will be unpredictable.
y SSWAP: Real array of dimension (n– 1) .  incy  + 1. (input and output)
CSWAP: Complex array of dimension (n– 1) .  incy  + 1. (input and output)
Vector to be swapped.
incy Integer. (input)
Increment between elements of y. If incy = 0, the results will be unpredictable.

004– 2081– 002 73


SSWAP ( 3S ) SSWAP ( 3S )

NOTES
SSWAP and CSWAP are Level 1 Basic Linear Algebra Subprograms (Level 1 BLAS).
When working backward (incx < 0 or incy < 0), each routine starts at the end of the vector and moves
backward, as follows:
x(1– incx . (n– 1)), x(1– incx . (n– 2)), . . ., x(1)
y(1– incy . (n– 1)), y(1– incy . (n– 2)), . . ., y(1)

74 004– 2081– 002


INTRO_BLAS2 ( 3S ) INTRO_BLAS2 ( 3S )

NAME
INTRO_BLAS2 – Introduction to matrix-vector linear algebra subprograms

IMPLEMENTATION
See individual man pages for implementation details

DESCRIPTION
The linear algebra subprograms are written to run optimally on UNICOS and UNICOS/mk systems. These
subprograms use call-by-address convention when called by a Fortran or C program, or the assembler for
your system.
Level 2 Basic Linear Algebra Subprograms
The Level 2 Basic Linear Algebra Subprograms (Level 2 BLAS) consist of CAM or CAL routines for real
and complex data. They handle matrix-vector operations. Only the single-precision real and complex data
types are supported.
Increment arguments for vectors
The description of a vector consists of the name of the array (x or y) followed by the storage spacing
(increment) in the array of vector elements (incx or incy). The increment can be positive or negative. When
a vector x consists of n elements, the corresponding actual array arguments must be of length at least
1+(n – 1) .  incx  . For a negative increment, the first element of x is assumed to be x (1+(n – 1) .  incx  .
Table of Level 2 BLAS routines
The following table describes these routines. If more than one routine name appears for a given block in the
table, the first name listed is the name of the man page that documents all routines listed in that block.
The table is in alphabetic order, except that each Hermitian matrix routine (any routine whose name begins
with CH) is grouped next to equivalent symmetric matrix routines (whose names begin with SS or CS). This
is because the Hermitian property is a type of symmetry.
Each routine in the table marked with an asterick (*) is an extension to the standard set of Level 2 BLAS
routines.

Purpose Operation Name

Multiplies a complex vector by a Hermitian y ←α Ax + β y CHBMV


band matrix
Multiplies a complex vector by a Hermitian y ←α Ax + β y CHEMV
matrix
H
Performs Hermitian rank 1 update of a A ←α xx + A CHER
Hermitian matrix
H H
Performs Hermitian rank 2 update of a A ← α xy + α yx + A CHER2
Hermitian matrix

004– 2081– 002 75


INTRO_BLAS2 ( 3S ) INTRO_BLAS2 ( 3S )

Purpose Operation Name

Multiplies a complex vector by a Hermitian y ← α Ax + β y CHPMV


packed matrix
H
Performs Hermitian rank 1 update of a A ← α xx + A CHPR
packed Hermitian matrix
H H
Performs Hermitian rank 2 update of a A ← α xy + α yx + A CHPR2
packed Hermitian matrix
T
Multiplies a real vector by a real general y ← α op(A)x + β y, op(A) = A or op(A) =A SGBMV
H
band matrix; multiplies a complex vector or op(A) = A (CGBMV only) CGBMV
by a complex general band matrix
T
Multiplies a real vector by a real general y ← α op(A)x + β y, op(A) = A or op(A) =A SGEMV
H
matrix; multiplies a complex vector by a or op(A) = A (CGEMV, ZGEMV only) CGEMV
complex general matrix
T
Performs rank 1 update of a real general A ←α xy + A SGER
matrix
H
Performs conjugated rank 1 update of a A ←α xy + A CGERC
complex general matrix
T
Performs unconjugated rank 1 update of a A ←α xy + A CGERU
complex general matrix
Adds a scalar multiple of a real or complex B ← α op(A) + β B SGESUM
matrix to a scalar multiple of another real
or complex matrix
Multiplies a real vector by a real symmetric y ← α Ax + β y SSBMV
band matrix
Multiplies a real vector by a real symmetric y ← α Ax + β y SSPMV
packed matrix
Multiplies a complex vector by a complex CSPMV*
symmetric packed matrix
T
Performs symmetric rank 1 update of a real A ← α xx + A SSPR
symmetric packed matrix
Performs symmetric rank 1 update of a CSPR*
complex symmetric packed matrix
T T
Performs two simultaneous symmetric rank A ← α xx + β yy + A CSSPR12*
1 updates of a real symmetric packed
matrix

76 004– 2081– 002


INTRO_BLAS2 ( 3S ) INTRO_BLAS2 ( 3S )

Purpose Operation Name


T T
Performs symmetric rank 2 update of a real A ← α xy + α yx + A SSPR2
symmetric packed matrix
Multiplies a real vector by a real symmetric y ← α Ax + β y SSYMV
matrix
Multiplies a complex vector by a complex CSYMV*
symmetric matrix
T
Performs symmetric rank 1 update of a real A ← α xx + A SSYR
symmetric matrix
Performs symmetric rank 1 update of a CSYR*
complex symmetric matrix
T T
Performs symmetric rank 2 update of a real A ← α xy + α yx + A CSSYR2
symmetric matrix
T
Multiplies a real vector by a real triangular x ← op(A)x, op(A) = A or op(A) = A or STBMV,
H
band matrix; multiplies a complex vector op(A) = A (CTBMV only) CTBMV
by a complex triangular band matrix
–1 T
Solves a complex triangular band system of x ← op(A) x, op(A) = A or op(A) = A or STBSV
H
equations op(A) = A (CTBSV only) CTBSV
T
Multiplies a real vector by a real triangular x ← op(A)x, op(A) = A or op(A) = A or STPMV
H
packed matrix; multiplies a complex vector op(A) = A (CTPMV only) CTPMV
by a complex triangular packed matrix
-1 T
Solves a real triangular packed system of x ← op(A) x, op(A) = A or op(A) = A or STPSV
H
equations; solves a complex triangular op(A) = A (CTPSV only) CTPSV
packed system of equations
T
Multiplies a real vector by a real triangular x ← op(A)x, op(A) = A or op(A) = A or STRMV
H
matrix; multiplies a complex vector by a op(A) = A (CTRMV only) CTRMV
complex triangular matrix
-1 T
Solves a real triangular system of x ← op(A) x, op(A) = A or op(A) = A or STRSV
H
equations; solves a complex triangular op(A) = A (CTRSV only) CTRSV
system of equations

SEE ALSO
Dongarra, J., J. Du Croz, S. Hammarling, and R. Hanson, "An Extended Set of FORTRAN Basic Linear
Algebra Subprograms," ACM Transactions on Mathematical Software, Vol. 14, No. 1, March 1988, pp. 1 –
17.

004– 2081– 002 77


CHBMV ( 3S ) CHBMV ( 3S )

NAME
CHBMV – Multiplies a complex vector by a complex Hermitian band matrix

SYNOPSIS
CALL CHBMV (uplo, n, k, alpha, a, lda, x, incx, beta, y, incy)

IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses only private data.

DESCRIPTION
CHBMV performs the following matrix-vector operation where α and β are scalars, x and y are n-element
vectors, and A is an n-by-n Hermitian band matrix:
y ← α Ax+ β y
This routine has the following arguments:
uplo Character*1. (input)
Specifies whether the upper or lower triangular part of the band matrix A is supplied, as follows:
uplo= ’U’ or ’u’: the upper triangular part of A is being supplied.
uplo= ’L’ or ’l’: the lower triangular part of A is being supplied.
n Integer. (input)
Specifies the order of matrix A. n ≥ 0.
k Integer. (input)
Specifies the number of superdiagonals of matrix A. k ≥ 0.
alpha Complex. (input)
Scalar factor α.
a Complex array of dimension (lda,n). (input)
Before entry with uplo = ’U’ or ’u’, the leading (k+1)-by-n part of array a must contain the
upper triangular band part of the Hermitian matrix, supplied column-by-column, with the leading
diagonal of the matrix in row k+1 of the array, the first superdiagonal starting at position 2 in
row k, and so on. The top left k-by-k triangle of array a is not referenced.
Before entry with uplo = ’L’ or ’l’, the leading (k+1)-by-n part of array a must contain the lower
triangular band part of the Hermitian matrix, supplied column-by-column, with the leading
diagonal of the matrix in row 1 of the array, the first subdiagonal starting at position 1 in row 2,
and so on. The bottom right k-by-k triangle of array a is not referenced.
The imaginary parts of the diagonal elements need not be set and are assumed to be 0. See the
EXAMPLES section for examples of Fortran code that transfer a band matrix from conventional
full matrix storage to band storage.

78 004– 2081– 002


CHBMV ( 3S ) CHBMV ( 3S )

lda Integer. (input)


Specifies the first dimension of a as declared in the calling program. lda ≥ (k+1).
x Complex array of dimension 1+(n– 1) .  incx  . (input)
Contains vector x.
incx Integer. (input)
Specifies the increment for the elements of x. incx must not be 0.
beta Complex. (input) Scalar factor α.
y Complex array of dimension 1+(n– 1) .  incy  . (input and output)
Contains vector y. On exit, the updated vector overwrites array y.
incy Integer. (input)
Specifies the increment for the elements of y. incy must not be 0.

NOTES
CHBMV is a Level 2 Basic Linear Algebra Subprogram (Level 2 BLAS).
When working backward (incx < 0 or incy < 0), this routine starts at the end of the vector and moves
backward, as follows:
x(1– incx . (n– 1)), x(1– incx . (n– 2)) , . . ., x(1)
y(1– incy . (n– 1)), y(1– incy . (n– 2)) , . . ., y(1)

EXAMPLES
The following program segment transfers the upper triangular part of a Hermitian band matrix from
conventional full matrix storage to band storage:
DO 20, J = 1, N
M = K + 1 - J
DO 10, I = MAX ( 1, J - K ), J
A( M + I, J ) = MAT RIX( I, J )
10 CONTIN UE
20 CON TINUE

The following program segment transfers the lower triangular part of a Hermitian band matrix from
conventional full matrix storage to band storage:
DO 20, J = 1, N
M = 1 - J
DO 10, I = J, MIN ( N, J + K )
A( M + I, J ) = MATRIX ( I, J )
10 CON TIN UE
20 CON TIN UE

004– 2081– 002 79


CHBMV ( 3S ) CHBMV ( 3S )

SEE ALSO
SSBMV(3S)

80 004– 2081– 002


CHEMV ( 3S ) CHEMV ( 3S )

NAME
CHEMV – Multiplies a complex vector by a complex Hermitian matrix

SYNOPSIS
CALL CHEMV (uplo, n, alpha, a, lda, x, incx, beta, y, incy)

IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses only private data.

DESCRIPTION
CHEMV performs the following matrix-vector operation:
y ← α Ax + β y
where α and β are scalars, x and y are n-element vectors, and A is an n-by-n Hermitian matrix.
This routine has the following arguments:
uplo Character*1. (input)
Specifies whether the upper or lower triangular part of array a is referenced, as follows:
uplo= ’U’ or ’u’: only the upper triangular part of a is referenced.
uplo= ’L’ or ’l’: only the lower triangular part of a is referenced.
n Integer. (input)
Specifies the order of matrix A. n ≥ 0.
alpha Complex. (input)
Scalar factor α.
a Complex array of dimension (lda,n). (input)
Before entry with uplo = ’U’ or ’u’, the leading n-by-n upper triangular part of array a must
contain the upper triangular part of the Hermitian matrix. The strictly lower triangular part of a
is not referenced.
Before entry with uplo = ’L’ or ’l’, the leading n-by-n lower triangular part of array a must
contain the lower triangular part of the Hermitian matrix. The strictly upper triangular part of a
is not referenced.
The imaginary parts of the diagonal elements need not be set and are assumed to be 0.
lda Integer. (input)
Specifies the first dimension of a as declared in the calling program.
Argument lda ≥ MAX(1,n).
x Complex array of dimension 1+(n– 1) .  incx  . (input)
Contains vector x.

004– 2081– 002 81


CHEMV ( 3S ) CHEMV ( 3S )

incx Integer. (input)


Specifies the increment for the elements of x. incx must not be 0.
beta Complex. (input)
Scalar factor β. If beta is supplied as 0, y need not be set on input.
y Complex array of dimension 1+(n– 1) .  incy  . (input and output)
Contains vector y. On exit, the updated vector overwrites array y.
incy Integer. (input)
Specifies the increment for the elements of y. incy must not be 0.

NOTES
CHEMV is a Level 2 Basic Linear Algebra Subprogram (Level 2 BLAS).
When working backward (incx < 0 or incy < 0), this routine starts at the end of the vector and moves
backward, as follows:
x(1– incx . (n– 1)), x(1– incx . (n– 2)) , . . ., x(1)
y(1– incy . (n– 1)), y(1– incy . (n– 2)) , . . ., y(1)

SEE ALSO
SSYMV(3S)

82 004– 2081– 002


CHER ( 3S ) CHER ( 3S )

NAME
CHER – Performs Hermitian rank 1 update of a complex Hermitian matrix

SYNOPSIS
CALL CHER (uplo, n, alpha, x, incx, a, lda)

IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses only private data.

DESCRIPTION
CHER performs the following Hermitian rank 1 operation:
H
A ←α x x + A
H
where α is a real scalar, x is an n-element vector, x is the conjugate transpose of x, and A is an n-by-n
Hermitian matrix.
This routine has the following arguments:
uplo Character*1. (input)
Specifies whether the upper or lower triangular part of array a is referenced, as follows:
uplo= ’U’ or ’u’: only the upper triangular part of a is referenced.
uplo= ’L’ or ’l’: only the lower triangular part of a is referenced.
n Integer. (input)
Specifies the order of matrix A. n ≥ 0.
alpha Real. (input)
Scalar factor α.
x Complex array of dimension 1+(n– 1) .  incx  . (input)
Contains vector x.
incx Integer. (input)
Specifies the increment for the elements of x. incx must not be 0.
a Complex array of dimension (lda,n). (input and output)
Before entry with uplo = ’U’ or ’u’, the leading n-by-n upper triangular part of array a must
contain the upper triangular part of the Hermitian matrix. The strictly lower triangular part of a
is not referenced. On exit, the upper triangular part of the updated matrix overwrites the upper
triangular part of array a.

004– 2081– 002 83


CHER ( 3S ) CHER ( 3S )

Before entry with uplo = ’L’ or ’l’, the leading n-by-n lower triangular part of array a must
contain the lower triangular part of the Hermitian matrix. The strictly upper triangular part of a
is not referenced. On exit, the lower triangular part of the updated matrix overwrites the lower
triangular part of array a.
The imaginary parts of the diagonal elements need not be set. They are assumed to be 0; on
exit, they are set to 0.
lda Integer. (input)
On entry, lda specifies the first dimension of a as declared in the calling program. lda ≥
MAX(1,n).

NOTES
CHER is a Level 2 Basic Linear Algebra Subprogram (Level 2 BLAS).
When working backward (incx < 0), this routine starts at the end of the vector and moves backward, as
follows:
x(1– incx . (n– 1)), x(1– incx . (n– 2)), . . ., x(1)

SEE ALSO
SSYR(3S)

84 004– 2081– 002


CHER2 ( 3S ) CHER2 ( 3S )

NAME
CHER2 – Performs Hermitian rank 2 update of a complex Hermitian matrix

SYNOPSIS
CALL CHER2 (uplo, n, alpha, x, incx, y, incy, a, lda)

IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses only private data.

DESCRIPTION
CHER2 performs the following Hermitian rank 2 operation:
H H
A ← α xy + α yx + A
H H
where α is a scalar, α is the complex conjugate of α, x and y are n-element vectors, x and y conjugate
transposes of x and y, respectively, and A is an n-by-n Hermitian matrix.
This routine has the following arguments:
uplo Character*1. (input)
Specifies whether the upper or lower triangular part of array a is referenced, as follows:
uplo= ’U’ or ’u’: only the upper triangular part of a is referenced.
uplo= ’L’ or ’l’: only the lower triangular part of a is referenced.
n Integer. (input)
Specifies the order of matrix A. n ≥ 0.
alpha Complex. (input)
Scalar factor α.
x Complex array of dimension 1+(n– 1) .  incx  . (input)
Contains vector x.
incx Integer. (input)
Specifies the increment for the elements of x. incx must not be 0.
y Complex array of dimension 1+(n– 1) .  incy  . (input)
Contains vector y.
incy Integer. (input)
Specifies the increment for the elements of y. incy must not be 0.
a Complex array of dimension (lda,n). (input and output)

004– 2081– 002 85


CHER2 ( 3S ) CHER2 ( 3S )

Before entry with uplo = ’U’ or ’u’, the leading n-by-n upper triangular part of array a must
contain the upper triangular part of the Hermitian matrix. The strictly lower triangular part of a
is not referenced. On exit, the upper triangular part of the updated matrix overwrites the upper
triangular part of array a.
Before entry with uplo = ’L’ or ’l’, the leading n-by-n lower triangular part of array a must
contain the lower triangular part of the Hermitian matrix. The strictly upper triangular part of a
is not referenced. On exit, the lower triangular part of the updated matrix overwrites the lower
triangular part of array a.
The imaginary parts of the diagonal elements need not be set. They are assumed to be 0; on
exit, they are set to 0.
lda Integer. (input)
Specifies the first dimension of a as declared in the calling program. lda ≥ MAX(1,n).

NOTES
CHER2 is a Level 2 Basic Linear Algebra Subprogram (Level 2 BLAS).
When working backward (incx < 0 or incy < 0), this routine starts at the end of the vector and moves
backward, as follows:
x(1– incx . (n– 1)), x(1– incx . (n– 2)) , . . ., x(1)
y(1– incy . (n– 1)), y(1– incy . (n– 2)) , . . ., y(1)

SEE ALSO
SSYR2(3S)

86 004– 2081– 002


CHPMV ( 3S ) CHPMV ( 3S )

NAME
CHPMV – Multiplies a complex vector by a packed complex Hermitian matrix

SYNOPSIS
CALL CHPMV (uplo, n, alpha, ap, x, incx, beta, y, incy)

IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses only private data.

DESCRIPTION
CHPMV performs the following matrix-vector operation:
y ← α Ax + β y
where α and β are complex scalars, x and y are n-element vectors, and A is an n-by-n packed complex
Hermitian matrix.
This routine has the following arguments:
uplo Character*1. (input)
Specifies whether the upper or lower triangular part of matrix A is packed into the array
argument ap, as follows:
uplo= ’U’ or ’u’: the upper triangular part of A is being supplied in the argument ap.
uplo= ’L’ or ’l’: the lower triangular part of A is being supplied in the argument ap.
n Integer. (input)
Specifies the order of matrix A. n ≥ 0.
alpha Complex. (input)
Scalar factor α.
n (n +1)
ap Complex array of dimension . (input)
2
Before entry with uplo = ’U’ or ’u’, array ap must contain the upper triangular part of the
Hermitian matrix packed sequentially, column-by-column, so that ap(1) contains A(1,1), ap(2)
contains A(1,2), ap(3) contains A(2,2), and so on.
Before entry with uplo = ’L’ or ’l’, array ap must contain the lower triangular part of the
Hermitian matrix packed sequentially, column-by-column, so that ap(1) contains A(1,1), ap(2)
contains A(2,1), ap(3) contains A(3,1), and so on.
x Complex array of dimension 1+(n– 1) .  incx  . (input)
Contains vector x.

004– 2081– 002 87


CHPMV ( 3S ) CHPMV ( 3S )

incx Integer. (input)


Specifies the increment for the elements of x. incx must not be 0.
beta Complex. (input)
Scalar factor β. If beta is supplied as 0, y need not be set on input.
y Complex array of dimension 1+(n– 1) .  incy  . (input and output)
Contains vector y. On exit, y is overwritten by updated vector y.
incy Integer. (input)
Specifies the increment for the elements of y. incy must not be 0.

NOTES
CHPMV is a Level 2 Basic Linear Algebra Subprogram (Level 2 BLAS).
When working backward (incx < 0 or incy < 0), this routine starts at the end of the vector and moves
backward, as follows:
x(1– incx . (n– 1)), x(1– incx . (n– 2)) , . . ., x(1)
y(1– incy . (n– 1)), y(1– incy . (n– 2)) , . . ., y(1)

SEE ALSO
SSPMV(3S)

88 004– 2081– 002


CHPR ( 3S ) CHPR ( 3S )

NAME
CHPR – Performs Hermitian rank 1 update of a packed complex Hermitian matrix

SYNOPSIS
CALL CHPR (uplo, n, alpha, x, incx, ap)

IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses private data.

DESCRIPTION
CHPR performs the following Hermitian rank 1 operation:
H
A ← α xx + A
H
where α is a real scalar, x is an n-element vector, x is the conjugate transpose of x, and A is an n-by-n
packed complex Hermitian matrix.
This routine has the following arguments:
uplo Character*1. (input)
Specifies whether the upper or lower triangular part of matrix A is packed into the array
argument ap, as follows:
uplo= ’U’ or ’u’: the upper triangular part of A is supplied in the argument ap.
uplo= ’L’ or ’l’: the lower triangular part of A is supplied in the argument ap.
n Integer. (input)
Specifies the order of matrix A. n ≥ 0.
alpha Complex. (input)
Scalar factor α.
x Complex array of dimension 1+(n– 1) .  incx  . (input)
Contains vector x.
incx Integer. (input)
Specifies the increment for the elements of x. incx must not be 0.
n (n +1)
ap Complex array of dimension . (input and output)
2
Before entry with uplo = ’U’ or ’u’, array ap must contain the upper triangular part of the
Hermitian matrix packed sequentially, column-by-column, so that ap(1) contains A(1,1), ap(2)
contains A(1,2), ap(3) contains A(2,2), and so on. On exit, the upper triangular part of the
updated matrix overwrites array ap.

004– 2081– 002 89


CHPR ( 3S ) CHPR ( 3S )

Before entry with uplo = ’L’ or ’l’, array ap must contain the lower triangular part of the
Hermitian matrix packed sequentially, column-by-column, so that ap(1) contains A(1,1), ap(2)
contains A(2,1), ap(3) contains A(3,1), and so on. On exit, the lower triangular part of the
updated matrix overwrites array ap.
The imaginary parts of the diagonal elements need not be set. They are assumed to be 0; on
exit, they are set to 0.

NOTES
CHPR is a Level 2 Basic Linear Algebra Subprogram (Level 2 BLAS).
When working backward (incx < 0), this routine starts at the end of the vector and moves backward, as
follows:
x(1– incx . (n– 1)), x(1– incx . (n– 2)), . . ., x(1)

SEE ALSO
SSPR(3S)

90 004– 2081– 002


CHPR2 ( 3S ) CHPR2 ( 3S )

NAME
CHPR2 – Performs Hermitian rank 2 update of a packed complex Hermitian matrix

SYNOPSIS
CALL CHPR2 (uplo, n, alpha, x, incx, y, incy, ap)

IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses private data.

DESCRIPTION
CHPR2 performs the following Hermitian rank 2 operation:
H H
A ← α xy + α yx + A
H H
where α is a scalar, α is the complex conjugate of α, x and y are n-element vectors, x and y conjugate
transposes of x and y, respectively, and A is an n-by-n packed complex Hermitian matrix.
This routine has the following arguments:
uplo Character*1. (input)
Specifies whether the upper or lower triangular part of matrix A is packed into the array
argument ap, as follows:
uplo= ’U’ or ’u’: the upper triangular part of A is being supplied in the argument ap.
uplo= ’L’ or ’l’: the lower triangular part of A is being supplied in the argument ap.
n Integer. (input)
Specifies the order of matrix A. n ≥ 0.
alpha Complex. (input)
Scalar factor α.
x Complex array of dimension 1+(n– 1) .  incx  . (input)
Contains vector x.
incx Integer. (input)
Increment for the elements of x. incx must not be 0.
y Complex array of dimension 1+(n– 1) .  incy  . (input)
Contains vector y.
incy Integer. (input)
Increment for the elements of y. Argument incy must not be 0.
n (n +1)
ap Complex array of dimension . (input and output)
2

004– 2081– 002 91


CHPR2 ( 3S ) CHPR2 ( 3S )

Before entry with uplo = ’U’ or ’u’, array ap must contain the upper triangular part of the
Hermitian matrix packed sequentially, column-by-column, so that ap(1) contains A(1,1), ap(2)
contains A(1,2), ap(3) contains A(2,2), and so on. On exit, the upper triangular part of the
updated matrix overwrites array ap.
Before entry with uplo = ’L’ or ’l’, array ap must contain the lower triangular part of the
Hermitian matrix packed sequentially, column-by-column, so that ap(1) contains A(1,1), ap(2)
contains A(2,1), ap(3) contains A(3,1), and so on. On exit, the lower triangular part of the
updated matrix overwrites array ap.
The imaginary parts of the diagonal elements need not be set. They are assumed to be 0; on
exit, they are set to 0.

NOTES
CHPR2 is a Level 2 Basic Linear Algebra Subprogram (Level 2 BLAS).
When working backward (incx < 0 or incy < 0), this routine starts at the end of the vector and moves
backward, as follows:
x(1– incx . (n– 1)), x(1– incx . (n– 2)) , . . ., x(1)
y(1– incy . (n– 1)), y(1– incy . (n– 2)) , . . ., y(1)

SEE ALSO
SSPR(3S)

92 004– 2081– 002


SGBMV ( 3S ) SGBMV ( 3S )

NAME
SGBMV, CGBMV – Multiplies a real or complex vector by a real or complex general band matrix

SYNOPSIS
CALL SGBMV (trans, m, n, kl, ku, alpha, a, lda, x, incx, beta, y, incy)
CALL CGBMV (trans, m, n, kl, ku, alpha, a, lda, x, incx, beta, y, incy)

IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses private data.

DESCRIPTION
SGBMV multiplies a real vector by a real general band matrix.
CGBMV multiplies a complex vector by a complex general band matrix.
SGBMV and CGBMV perform one of the following matrix-vector operations:
y ← α Ax + β y
T
y ←αA x+βy
H
y ←αA x+βy
where
• α and β are scalars,
• x and y are vectors
• A is an m-by-n band matrix with kl subdiagonals and ku superdiagonals
T
• A is the transpose of A
H
• A is the conjugate transpose of A
These routines have the following arguments:
trans Character*1. (input)
Specifies the operation to be performed:
trans = ’N’ or ’n’: y ← α Ax + βy
T
trans = ’T’ or ’t’: y ← α A x + β y
T H
trans = ’C’ or ’c’: y ← α A x + β y (SGBMV), or y ← α A x + β y (CGBMV)
m Integer. (input)
Specifies the number of rows in matrix A. m ≥ 0.
n Integer. (input)
Specifies the number of columns in the matrix A. n ≥ 0.

004– 2081– 002 93


SGBMV ( 3S ) SGBMV ( 3S )

kl Integer. (input)
Specifies the number of subdiagonals of matrix A. kl ≥ 0.
ku Integer. (input)
Specifies the number of superdiagonals of matrix A. ku ≥ 0.
alpha SGBMV: Real. (input)
CGBMV: Complex. (input)
Scalar factor α.
a SGBMV: Real array of dimension (lda,n). (input)
CGBMV: Complex array of dimension (lda,n). (input)
Before entry, the leading (kl+ku+1)-by-n part of array a must contain the matrix of coefficients,
supplied column-by-column, with the leading diagonal of the matrix in row (ku+1) of the array,
the first superdiagonal starting at position 2 in row ku, the first subdiagonal starting at position 1
in row (ku+2), and so on. Elements in array a that do not correspond to elements in the band
matrix (such as the top left ku-by-ku triangle) are not referenced.
See the NOTES section for an example of Fortran code that transfers a band matrix from
conventional full matrix storage to band storage.
lda Integer. (input)
Specifies the first dimension of a as declared in the calling program. lda ≥ (kl+ku+1).
x SGBMV: Real array of dimension 1+(kx– 1) .  incx  . (input)
CGBMV: Complex array of dimension 1+(kx– 1) .  incx  . (input)
Contains the vector x. When trans = ’N’ or ’n’, kx is n; otherwise, it is m.
incx Integer. (input)
Specifies the increment for the elements of x. incx must not be 0.
beta SGBMV: Real. (input)
CGBMV: Complex. (input)
Scalar factor β. When beta is supplied as 0, y need not be set on input.
y SGBMV: Real array of dimension 1+(ky– 1) .  incy  . (input and output)
CGBMV: Complex array of dimension 1+(ky– 1) .  incy  . (input and output)
Contains the vector y. When trans = ’N’ or ’n’, ky is m; otherwise, it is n. On exit, the updated
vector overwrites array y.
incy Integer. (input)
Specifies the increment for the elements of y.
incy must not be 0.

94 004– 2081– 002


SGBMV ( 3S ) SGBMV ( 3S )

NOTES
The following program segment transfers a band matrix from conventional full matrix storage to band
storage:
DO 20, J = 1, N
K = KU + 1 - J
DO 10, I = MAX (1, J - KU) , MIN (M, J + KL)
A(K + I, J) = MATRIX (I, J)
10 CON TIN UE
20 CONTIN UE

SGBMV and CGBMV are Level 2 Basic Linear Algebra Subprograms (Level 2 BLAS).
When working backward (incx < 0 or incy < 0), each routine starts at the end of the vector and moves
backward, as follows:
x(1– incx . (n– 1)), x(1– incx . (n– 2)) , . . ., x(1)
y(1– incy . (n– 1)), y(1– incy . (n– 2)) , . . ., y(1)

004– 2081– 002 95


SGEMV ( 3S ) SGEMV ( 3S )

NAME
SGEMV, CGEMV – Multiplies a real or complex vector by a real or complex general matrix

SYNOPSIS
CALL SGEMV (trans, m, n, alpha, a, lda, x, incx, beta, y, incy)
CALL CGEMV (trans, m, n, alpha, a, lda, x, incx, beta, y, incy)

IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses private data.

DESCRIPTION
SGEMV multiplies a real vector by a real general matrix.
CGEMV multiplies a complex vector by a complex general matrix.
SGEMV and CGEMV perform one of the following matrix-vector operations:
y ← α Ax + β y
T
y ←αA x+βy
H
y ←αA x+βy
where
• α and β are scalars,
• x and y are vectors
• A is an m-by-n general matrix
T
• A is the transpose of A
H
• A is the conjugate transpose of A
These routines have the following arguments:
trans Character*1. (input)
Specifies the operation to be performed:
trans = ’N’ or ’n’: y ← α Ax + βy
T
trans = ’T’ or ’t’: y ← α A x + β y
T H
trans = ’C’ or ’c’: y ← α A x + β y (SGEMV), or y ← α A x + β y (CGEMV)
m Integer. (input)
Specifies the number of rows in matrix A. m ≥ 0.
n Integer. (input)
Specifies the number of columns in matrix A. n ≥ 0.

96 004– 2081– 002


SGEMV ( 3S ) SGEMV ( 3S )

alpha SGEMV: Real. (input)


CGEMV: Complex. (input)
Scalar factor α.
a SGEMV: Real array of dimension (lda,n). (input)
CGEMV: Complex array of dimension (lda,n). (input)
Before entry, the leading m-by-n part of array a must contain the matrix of coefficients.
lda Integer. (input)
Specifies the first dimension of a as declared in the calling program. lda ≥ MAX(1,m).
x SGEMV: Real array of dimension 1+(kx– 1) .  incx  . (input)
CGEMV: Complex array of dimension 1+(kx– 1) .  incx  . (input)
Contains the vector x. When trans = ’N’ or ’n’, kx is n; otherwise, it is m.
incx Integer. (input)
Specifies the increment for the elements of x. incx must not be 0.
beta SGEMV: Real. (input)
CGEMV: Complex. (input)
Scalar factor β. When beta is supplied as 0, y need not be set on input.
y SGEMV: Real array of dimension 1+(ky– 1) .  incy  . (input and output)
CGEMV: Complex array of dimension 1+(ky– 1) .  incy  . (input and output)
Contains the vector y. When trans = ’N’ or ’n’, ky is m; otherwise, it is n. When beta is
supplied as 0, y need not be set on input. On exit, the updated vector overwrites array y.
incy Integer. (input)
Specifies the increment for the elements of y. incy must not be 0.

NOTES
SGEMV and CGEMV are Level 2 Basic Linear Algebra Subprograms (Level 2 BLAS).
When working backward (incx < 0 or incy < 0), each routine starts at the end of the vector and moves
backward, as follows:
x(1– incx . (n– 1)), x(1– incx . (n– 2)) , . . ., x(1)
y(1– incy . (n– 1)), y(1– incy . (n– 2)) , . . ., y(1)

004– 2081– 002 97


SGER ( 3S ) SGER ( 3S )

NAME
SGER, CGERC, CGERU – Performs rank 1 update of a real general matrix

SYNOPSIS
CALL SGER (m, n, alpha, x, incx, y, incy, a, lda)
CALL CGERC (m, n, alpha, x, incx, y, incy, a, lda)
CALL CGERU (m, n, alpha, x, incx, y, incy, a, lda)

IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses only private data.

DESCRIPTION
SGER performs a rank 1 update of a real general matrix.
CGERC performs a conjugated rank 1 update of a complex general matrix.
CGERU performs an unconjugated rank 1 update of a complex general matrix.
SGER and CGERU perform the rank 1 operation:
T
A ← α xy + A
T
where y is the transpose of y, α is a scalar, x is an m-element vector, y is an n-element vector, and A is an
m-by-n matrix.
CGERC performs the rank 1 operation:
H
A ← α xy + A
H
where y is the conjugate transpose of y, α is a scalar, x is an m-element vector, y is an n-element vector,
and A is an m-by-n matrix.
These routines have the following arguments:
m Integer. (input)
Specifies the number of rows in matrix A. m ≥ 0.
n Integer. (input)
Specifies the number of columns in matrix A. n ≥ 0.
alpha SGER: Real. (input)
CGERC, CGERU: Complex. (input)
Scalar factor α.

98 004– 2081– 002


SGER ( 3S ) SGER ( 3S )

x SGER: Real. (input)


CGERC, CGERU: Complex. (input)
Array of dimension 1+(m– 1) .  incx  .
Contains vector x.
incx Integer. (input)
Specifies the increment for the elements of x. incx must not be 0.
y SGER: Real. (input)
CGERC, CGERU: Complex. (input)
Array of dimension 1+(n– 1) .  incy  .
Contains vector y.
incy Integer. (input)
Specifies the increment for the elements of y. incy must not be 0.
a SGER: Real.
CGERC, CGERU: Complex.
Array of dimension (lda,n). (input and output)
Before entry, the leading m-by-n part of array a must contain the matrix of coefficients. On exit,
the updated matrix overwrites array a.
lda Integer. (input)
Specifies the first dimension of a as declared in the calling program. lda ≥ MAX(1,m).

NOTES
SGER, CGERC, and CGERU are Level 2 Basic Linear Algebra Subprograms (Level 2 BLAS).
When working backward (incx < 0 or incy < 0), these routines start at the end of the vector and move
backward, as follows:
x(1– incx . (n– 1)), x(1– incx . (n– 2)) , . . ., x(1)
y(1– incy . (n– 1)), y(1– incy . (n– 2)) , . . ., y(1)

004– 2081– 002 99


SGESUM ( 3S ) SGESUM ( 3S )

NAME
SGESUM, CGESUM – Adds a scalar multiple of a real or complex matrix to a scalar multiple of another real
or complex matrix

SYNOPSIS
CALL SGESUM (trans, m, n, alpha, a, lda, beta, b, ldb)
CALL CGESUM (trans, m, n, alpha, a, lda, beta, b, ldb)

IMPLEMENTATION
UNICOS/mk systems
These subroutines execute on a single processor and use private data only.

DESCRIPTION
SGESUM adds two real matrices with optional scaling; CGESUM adds two complex matrices.
B ← α op(A) + β B
where
• op(A) represents A, its transpose A T , or its conjugate transpose A H
• op(A) and B are m-by-n matrices
• α and β are scalars. β =0 is a special case, used to copy α . op (A ) to B. α =0 is a special case, used to
scale B.
These routines have the following arguments:
trans Character*1. (input)
Specifies whether the matrix A is transposed.
trans = ’N’ or ’n’: op(A) = A
T
trans = ’T’ or ’t’: op(A) = A
T H
trans = ’C’ or ’c’: op(A) = A (SGESUM), or op(A) = A (CGESUM)
m Integer. (input)
Specifies the number of rows in matrix op(A) and in matrix B.
n Integer. (input)
Specifies the number of columns in matrix op(A) and in matrix B.
alpha SGESUM: Real. (input)
CGESUM: Complex. (input)
Scalar factor α.
a SGESUM: Real array of dimension (lda,k). (input)
CGESUM: Complex array of dimension (lda,k). (input)

100 004– 2081– 002


SGESUM ( 3S ) SGESUM ( 3S )

When trans = ’N’ or ’n’, k is n; otherwise, it is m. When trans = ’N’ or ’n’, the leading m-by-n
part of the array a contains matrix A. When trans = ’T’ or ’t’ or trans = ’C’ or ’c’, the leading
n-by-m part or the array a contains matrix A, whose transpose or conjugate transpose will be
used in the matrix sum. If alpha = 0, a need not be specified on entry.
lda Integer. (input)
Specifies the first dimension of a as declared in the calling program. When trans = ’N’ or ’n’,
lda ≥ MAX(1,m); otherwise, lda ≥ MAX(1,n).
beta SGESUM: Real. (input)
CGESUM: Complex. (input)
Scalar factor β.
b SGESUM: Real array of dimension (ldb,n). (input/output)
CGESUM: Complex array of dimension (ldb,n). (input/output)
On entry, if beta ≠0, the m-by-n matrix b contains B. (If β = 0, b need not be specifed on
entry.) On exit, b is overwritten with the matrix sum α op (A ) + β B .
ldb Integer. (input)
The leading dimension of array b. ldb ≥ MAX(1,m).

EXAMPLES
An important use of SGESUM is to copy an array to another array, in which the second array may be a
temporary workspace that has a better data layout than the first array. For example, suppose array A was
declared as follows in the main program:
REA L A(1 024, 1024)

This data layout is particularity bad for Level 3 BLAS operations that must fit a block of A in the direct-
mapped cache of UNICOS/mk systems, because every column has exactly the same cache offset as every
other column. It might be worthwhile to operate on a block of A at a time by copying a part of A into a
second array B. Give B a leading dimension that is an odd multiple of 16, so that a 16-by-64 subblock of B
will fit in the cache:
REA L B(80, 64)
CDIR$ CAC HE_ ALIGN B

The following call to SGESUM copies a 64-by-64 block of A to B:


CAL L SGE SUM (’N’, 64, 64 1.0, A(1 ,1), 102 4, 0.0, B(1,1), 80)

Similarly, the following call copies a 64-by-64 block of the transpose of A to B:


CALL SGESUM (’T’, 64, 64, 1.0 , A(1,1) , 1024, 0.0, B(1 ,1), 80)

SEE ALSO
SAXPBY(3S)

004– 2081– 002 101


SSBMV ( 3S ) SSBMV ( 3S )

NAME
SSBMV – Multiplies a real vector by a real symmetric band matrix

SYNOPSIS
CALL SSBMV (uplo, n, k, alpha, a, lda, x, incx, beta, y, incy)

IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses only private data.

DESCRIPTION
SSBMV performs the following matrix-vector operation:
y ← α Ax + β y
where α and β are scalars, x and y are n-element vectors, and A is an n-by-n symmetric band matrix.
This routine has the following arguments:
uplo Character*1. (input)
Specifies whether the upper or lower triangular part of band matrix A is supplied, as follows:
uplo= ’U’ or ’u’: the upper triangular part of A is being supplied.
uplo= ’L’ or ’l’: the lower triangular part of A is being supplied.
n Integer. (input)
Specifies the order of matrix A. n ≥ 0.
k Integer. (input)
Specifies the number of superdiagonals of matrix A. k ≥ 0.
alpha Real. (input)
Scalar factor α.
a Real array of dimension (lda,n). (input)
Before entry with uplo = ’U’ or ’u’, the leading (k+1)-by-n part of array a must contain the
upper triangular band part of the symmetric matrix, supplied column-by-column, with the leading
diagonal of the matrix in row (k+1) of the array, the first superdiagonal starting at position 2 in
row k, and so on. The top left k-by-k triangle of array a is not referenced.
Before entry with uplo = ’L’ or ’l’, the leading (k+1)-by-n part of array a must contain the lower
triangular band part of the symmetric matrix, supplied column-by-column, with the leading
diagonal of the matrix in row 1 of the array, the first subdiagonal starting at position 1 in row 2,
and so on. The bottom right k-by-k triangle of array a is not referenced.
See the NOTES section for examples of Fortran code that transfer upper and lower parts of
symmetric band matrices from conventional full matrix storage to band storage.

102 004– 2081– 002


SSBMV ( 3S ) SSBMV ( 3S )

lda Integer. (input)


Specifies the first dimension of a as declared in the calling program.
lda ≥ (k+1).
x Real array of dimension 1+(n– 1) .  incx  . (input)
Contains vector x.
incx Integer. (input)
Specifies the increment for the elements of x. incx must not be 0.
beta Real. (input)
Scalar factor β.
y Real array of dimension 1+(n– 1) .  incy  . (input and output)
Contains vector y. On exit, the updated vector overwrites array y.
incy Integer. (input)
Specifies the increment for the elements of y. incy must not be 0.

NOTES
The following program segment transfers the upper triangular part of a symmetric band matrix from
conventional full matrix storage to band storage:
DO 20, J = 1, N
M = K + 1 - J
DO 10, I = MAX ( 1, J - K ), J
A( M + I, J ) = MAT RIX ( I, J )
10 CON TIN UE
20 CONTIN UE

The following program segment transfers the lower triangular part of a symmetric band matrix from
conventional full matrix storage to band storage:
DO 20, J = 1, N
M = 1 - J
DO 10, I = J, MIN ( N, J + K )
A( M + I, J ) = MATRIX( I, J )
10 CON TINUE
20 CONTIN UE

SSBMV is a Level 2 Basic Linear Algebra Subprogram (Level 2 BLAS).


When working backward (incx < 0 or incy < 0), this routine starts at the end of the vector and moves
backward, as follows:
x(1– incx . (n– 1)), x(1– incx . (n– 2)) , . . ., x(1)
y(1– incy . (n– 1)), y(1– incy . (n– 2)) , . . ., y(1)

004– 2081– 002 103


SSPMV ( 3S ) SSPMV ( 3S )

NAME
SSPMV, CSPMV – Multiplies a real or complex symmetric packed matrix by a real or complex vector

SYNOPSIS
CALL SSPMV (uplo, n, alpha, ap, x, incx, beta, y, incy)
CALL CSPMV (uplo, n, alpha, ap, x, incx, beta, y, incy)

IMPLEMENTATION
UNICOS and UNICOS/mk systems
Only the real routine executes on UNICOS/mk systems (on a single processor, using only private data).

DESCRIPTION
SSPMV and CSPMV perform the following matrix-vector operation:
y ← α Ax + β y
where α and β are scalars, x and y are n-element vectors, and A is an n-by-n symmetric packed matrix.
These routines have the following arguments:
uplo Character*1. (input)
Specifies whether the upper or lower triangular part of matrix A is packed into the array
argument ap, as follows:
uplo= ’U’ or ’u’: the upper triangular part of A is being supplied in the argument ap.
uplo= ’L’ or ’l’: the lower triangular part of A is being supplied in the argument ap.
n Integer. (input)
Specifies the order of matrix A. n ≥ 0.
alpha SSPMV: Real. (input)
CSPMV: Complex. (input) Scalar factor α .
n (n +1)
ap SSPMV: real array of dimension . (input)
2
n (n +1)
CSPMV: complex array of dimension . (input)
2
Before entry with uplo = ’U’ or ’u’, array ap must contain the upper triangular part of the
symmetric matrix packed sequentially, column-by-column, so that ap(1) contains A(1,1), ap(2)
contains A(1,2), ap(3) contains A(2,2), and so on.
Before entry with uplo = ’L’ or ’l’, array ap must contain the lower triangular part of the
symmetric matrix packed sequentially, column-by-column, so that ap(1) contains A(1,1), ap(2)
contains A(2,1), ap(3) contains A(3,1), and so on.

104 004– 2081– 002


SSPMV ( 3S ) SSPMV ( 3S )

x SSPMV: Real array of dimension 1+(n– 1) .  incx  . (input)


CSPMV: Complex array of dimension 1+(n– 1) .  incx  . (input)
Contains vector x.
incx Integer. (input)
Specifies the increment for the elements of x. incx must not be 0.
beta SSPMV: Real. (input)
CSPMV: Complex. (input)
Scalar factor β . If beta is supplied as 0, y need not be set on input.
y SSPMV: Real array of dimension 1+(n– 1) .  incy  . (input and output)
CSPMV: Complex array of dimension 1+(n– 1) .  incy  . (input and output)
Contains vector y. On exit, the updated vector overwrites array y.
incy Integer. (input)
Specifies the increment for the elements of y. incy must not be 0.

NOTES
SSPMV is a Level 2 Basic Linear Algebra Subprogram (Level 2 BLAS). CSPMV is an extension to Level 2
BLAS.
When working backward (incx < 0 or incy < 0), each routine starts at the end of the vector and moves
backward, as follows:
x(1– incx . (n– 1)), x(1– incx . (n– 2)) , . . ., x(1)
y(1– incy . (n– 1)), y(1– incy . (n– 2)) , . . ., y(1)

SEE ALSO
CHPMV(3S)

004– 2081– 002 105


SSPR ( 3S ) SSPR ( 3S )

NAME
SSPR, CSPR – Performs symmetric rank 1 update of a real or complex symmetric packed matrix

SYNOPSIS
CALL SSPR (uplo, n, alpha, x, incx, ap)
CALL CSPR (uplo, n, alpha, x, incx, ap)

IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses only private data.

DESCRIPTION
SSPR and CSPR each perform the following symmetric rank 1 operation:
T
A ← α xx + A
T
where x is the transpose of x, α is a real or complex scalar, x is an n-element vector, and A is an n-by-n
symmetric packed matrix.
These routines have the following arguments:
uplo Character*1. (input)
Specifies whether the upper or lower triangular part of matrix A is packed into the array
argument ap, as follows:
uplo= ’U’ or ’u’: the upper triangular part of A is being supplied in the argument ap.
uplo= ’L’ or ’l’: the lower triangular part of A is being supplied in the argument ap.
n Integer. (input)
Specifies the order of matrix A. n ≥ 0.
alpha SSPR: Real. (input)
CSPR: Complex. (input)
Scalar factor α.
x SSPR: Real array of dimension 1+(n– 1) .  incx  . (input)
CSPR: Complex array of dimension 1+(n– 1) .  incx  . (input)
Contains vector x.
incx Integer. (input)
Specifies the increment for the elements of x. incx must not be 0.

106 004– 2081– 002


SSPR ( 3S ) SSPR ( 3S )

n (n +1)
ap SSPR: Real array of dimension . (input and output)
2
n (n +1)
CSPR: Complex array of dimension . (input and output)
2
Before entry with uplo = ’U’ or ’u’, array ap must contain the upper triangular part of the
symmetric matrix packed sequentially, column-by-column, so that ap(1) contains A(1,1), ap(2)
contains A(1,2), ap(3) contains A(2,2), and so on. On exit, the upper triangular part of the
updated matrix overwrites array ap.
Before entry with uplo = ’L’ or ’l’, array ap must contain the lower triangular part of the
symmetric matrix packed sequentially, column-by-column, so that ap(1) contains A(1,1), ap(2)
contains A(2,1), ap(3) contains A(3,1), and so on. On exit, the lower triangular part of the
updated matrix overwrites array ap.

NOTES
SSPR is a Level 2 Basic Linear Algebra Subprogram (Level 2 BLAS). CSPR is an extension to Level 2
BLAS.
When working backward (incx < 0), each routine starts at the end of the vector and moves backward, as
follows:
x(1– incx . (n– 1)), x(1– incx . (n– 2)), . . ., x(1)

SEE ALSO
CHPR(3S)

004– 2081– 002 107


SSPR2 ( 3S ) SSPR2 ( 3S )

NAME
SSPR2 – Performs symmetric rank 2 update of a real symmetric packed matrix

SYNOPSIS
CALL SSPR2 (uplo, n, alpha, x, incx, y, incy, ap)

IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses only private data.

DESCRIPTION
SSPR2 performs the following symmetric rank 2 operation:
T T
A ← α xy + α yx + A
T T
where x is the transpose of x, y is the transpose of y, α is a real scalar, x and y are n-element vectors,
and A is an n-by-n symmetric packed matrix.
This routine has the following arguments:
uplo Character*1. (input)
Specifies whether the upper or lower triangular part of matrix A is packed into the array
argument ap, as follows:
uplo= ’U’ or ’u’: the upper triangular part of A is being supplied in the argument ap.
uplo= ’L’ or ’l’: the lower triangular part of A is being supplied in the argument ap.
n Integer. (input)
Specifies the order of matrix A. n ≥ 0.
alpha Real. (input)
Scalar factor α.
x Real array of dimension 1+(n– 1) .  incx  . (input)
Contains vector x.
incx Integer. (input)
Increment for the elements of x. incx must not be 0.
y Real array of dimension 1+(n– 1) .  incy  . (input)
Contains vector y.
incy Integer. (input)
Increment for the elements of y. incy must not be 0.

108 004– 2081– 002


SSPR2 ( 3S ) SSPR2 ( 3S )

n (n +1)
ap Real array of dimension . (input and output)
2
Before entry with uplo = ’U’ or ’u’, array ap must contain the upper triangular part of the
symmetric matrix packed sequentially, column-by-column, so that ap(1) contains A(1,1), ap(2)
contains A(1,2), ap(3) contains A(2,2), and so on. On exit, the upper triangular part of the
updated matrix overwrites array ap.
Before entry with uplo = ’L’ or ’l’, array ap must contain the lower triangular part of the
symmetric matrix packed sequentially, column-by-column, so that ap(1) contains A(1,1), ap(2)
contains A(2,1), ap(3) contains A(3,1), and so on. On exit, the lower triangular part of the
updated matrix overwrites array ap.

NOTES
SSPR2 is a Level 2 Basic Linear Algebra Subprogram (Level 2 BLAS).
When working backward (incx < 0 or incy < 0), this routine starts at the end of the vector and moves
backward, as follows:
x(1– incx . (n– 1)), x(1– incx . (n– 2)) , . . ., x(1)
y(1– incy . (n– 1)), y(1– incy . (n– 2)) , . . ., y(1)

SEE ALSO
CHPR2(3S)

004– 2081– 002 109


SSPR12 ( 3S ) SSPR12 ( 3S )

NAME
SSPR12 – Performs two simultaneous symmetric rank 1 updates of a real symmetric packed matrix

SYNOPSIS
CALL SSPR12 (uplo, n, alpha, x, incx, beta, y, incy, ap)

IMPLEMENTATION
UNICOS systems

DESCRIPTION
SSPR12 performs the following matrix-vector operation:
T T
A ← α xx + β yy + A
T T
where x is the transpose of x, y is the transpose of y, α and β are real scalars, x and y are n-element
vectors, and A is an n-by-n real symmetric packed matrix.
This routine has the following arguments:
uplo Character*1. (input)
Specifies whether the upper or lower triangular part of matrix A is packed into the array
argument ap, as follows:
uplo= ’U’ or ’u’: the upper triangular part of A is being supplied in the argument ap.
uplo= ’L’ or ’l’; the lower triangular part of A is being supplied in the argument ap.
n Integer. (input)
Specifies the order of matrix A. n ≥ 0.
alpha Real. (input)
Scalar factor α.
x Real array of dimension 1+(n– 1) .  incx  . (input)
Contains vector x.
incx Integer. (input)
Increment for the elements of x. incx must not be 0.
beta Real. (input)
Scalar factor β.
y Real array of dimension 1+(n– 1) .  incy  . (input)
Contains vector y.
incy Integer. (input)
Increment for the elements of y. incy must not be 0.
n (n +1)
ap Real array of dimension . (input and output)
2

110 004– 2081– 002


SSPR12 ( 3S ) SSPR12 ( 3S )

Before entry with uplo = ’U’ or ’u’, array ap must contain the upper triangular part of the
symmetric matrix packed sequentially, column-by-column, so that ap(1) contains A(1,1), ap(2)
contains A(1,2), ap(3) contains A(2,2), and so on. On exit, the upper triangular part of the
updated matrix overwrites array ap.
Before entry with uplo = ’L’ or ’l’, array ap must contain the lower triangular part of the
symmetric matrix packed sequentially, column-by-column, so that ap(1) contains A(1,1), ap(2)
contains A(2,1), ap(3) contains A(3,1), and so on. On exit, the lower triangular part of the
updated matrix overwrites array ap.

NOTES
SSPR12 is an extension to the standard Level 2 Basic Linear Algebra Subprograms (Level 2 BLAS). It is
similar in function to the Level 2 BLAS routine SSPR2(3S) and is equivalent to two rank 1 updates.
For example,
CAL L SSP R12(UP LO,N,A LPH A,X ,IN CX,BET A,Y ,INCY, AP)

is equivalent to:
CALL SSPR(U PLO,N, ALP HA, X,INCX ,AP )
CAL L SSP R(UPLO ,N,BET A,Y ,IN CY, AP)

When working backward (incx < 0 or incy < 0), this routine starts at the end of the vector and moves
backward, as follows:
x(1– incx . (n– 1)), x(1– incx . (n– 2)) , . . ., x(1)
y(1– incy . (n– 1)), y(1– incy . (n– 2)) , . . ., y(1)

004– 2081– 002 111


SSYMV ( 3S ) SSYMV ( 3S )

NAME
SSYMV, CSYMV – Multiplies a real or complex vector by a real or complex symmetric matrix

SYNOPSIS
CALL SSYMV (uplo, n, alpha, a, lda, x, incx, beta, y, incy)
CALL CSYMV (uplo, n, alpha, a, lda, x, incx, beta, y, incy)

IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses only private data.

DESCRIPTION
SSYMV multiplies a real vector by a real symmetric matrix.
CSYMV multiplies a complex vector by a complex symmetric matrix.
SSYMV and CSYMV perform the following matrix-vector operation:
y ← α Ax + β y
where α and β are scalars, x and y are n-element vectors, and A is an n-by-n symmetric matrix.
These routines have the following arguments:
uplo Character*1. (input)
Specifies whether the upper or lower triangular part of matrix A is being supplied, as follows:
uplo= ’U’ or ’u’: only the upper triangular part of A is being supplied.
uplo= ’L’ or ’l’: only the lower triangular part of A is being supplied.
n Integer. (input)
Specifies the order of matrix A. n ≥ 0.
alpha SSYMV: Real. (input)
CSYMV: Complex. (input)
Scalar factor α.
a SSYMV: Real array of dimension (lda,n). (input)
CSYMV: Complex array of dimension (lda,n). (input)
Before entry with uplo = ’U’ or ’u’, the leading n-by-n upper triangular part of array a must
contain the upper triangular part of the symmetric matrix. The strictly lower triangular part of a
is not referenced.
Before entry with uplo = ’L’ or ’l’, the leading n-by-n lower triangular part of array a must
contain the lower triangular part of the symmetric matrix. The strictly upper triangular part of a
is not referenced.

112 004– 2081– 002


SSYMV ( 3S ) SSYMV ( 3S )

lda Integer. (input)


Specifies the first dimension of a as declared in the calling program. lda ≥ MAX(1,n).
x SSYMV: Real array of dimension 1+(n– 1) .  incx  . (input)
CSYMV: Complex array of dimension 1+(n– 1) .  incx  . (input)
Contains the vector x.
incx Integer. (input)
Specifies the increment for the elements of x. incx must not be 0.
beta SSYMV: Real. (input)
CSYMV: Complex. (input)
Scalar factor β. If beta is supplied as 0, y need not be set on input.
y SSYMV: Real array of dimension 1+(n– 1) .  incy  . (input and output)
CSYMV: Complex array of dimension 1+(n– 1) .  incy  . (input and output)
Contains the vector y. On exit, the updated vector overwrites array y.
incy Integer. (input)
Specifies the increment for the elements of y. incy must not be 0.

NOTES
SSYMV is a Level 2 Basic Linear Algebra Subprogram (Level 2 BLAS). CSYMV is an extension to Level 2
BLAS.
When working backward (incx < 0 or incy < 0), each routine starts at the end of the vector and moves
backward, as follows:
x(1– incx . (n– 1)), x(1– incx . (n– 2)) , . . ., x(1)
y(1– incy . (n– 1)), y(1– incy . (n– 2)) , . . ., y(1)

SEE ALSO
CHEMV(3S)

004– 2081– 002 113


SSYR ( 3S ) SSYR ( 3S )

NAME
SSYR, CSYR – Performs symmetric rank 1 update of a real or complex symmetric matrix

SYNOPSIS
CALL SSYR (uplo, n, alpha, x, incx, a, lda)
CALL CSYR (uplo, n, alpha, x, incx, a, lda)

IMPLEMENTATION
UNICOS/mk systems
Only the real routine executes on UNICOS/mk systems, on a single processor, using only private data.

DESCRIPTION
SSYR and CSYR perform the following symmetric rank 1 operation:
T
A ← α xx + A
T
where x is the transpose of x, α is a real or complex scalar, x is an n-element vector, and A is an n-by-n
symmetric matrix.
These routines have the following arguments:
uplo Character*1. (input)
Specifies whether the upper or lower triangular part of array a is referenced, as follows:
uplo= ’U’ or ’u’: only the upper triangular part of a is referenced.
uplo= ’L’ or ’l’: only the lower triangular part of a is referenced.
n Integer. (input)
Specifies the order of matrix A. n ≥ 0.
alpha SSYR: Real. (input)
CSYR: Complex. (input)
Scalar factor α.
x SSYR: Real array of dimension 1+(n– 1) .  incx  . (input)
CSYR: Complex array of dimension 1+(n– 1) .  incx  . (input)
Contains vector x.
incx Integer. (input)
Specifies the increment for the elements of x. incx must not be 0.
a SSYR: Real array of dimension (lda,n). (input and output)
CSYR: Complex array of dimension (lda,n). (input and output)

114 004– 2081– 002


SSYR ( 3S ) SSYR ( 3S )

Before entry with uplo = ’U’ or ’u’, the leading n-by-n upper triangular part of array a must
contain the upper triangular part of the symmetric matrix. The strictly lower triangular part of a
is not referenced. On exit, the upper triangular part of the updated matrix overwrites the upper
triangular part of array a.
Before entry with uplo = ’L’ or ’l’, the leading n-by-n lower triangular part of array a must
contain the lower triangular part of the symmetric matrix. The strictly upper triangular part of a
is not referenced. On exit, the lower triangular part of the updated matrix overwrites the lower
triangular part of array a.
lda Integer. (input)
Specifies the first dimension of a as declared in the calling program. lda ≥ MAX(1,n).

NOTES
SSYR is a Level 2 Basic Linear Algebra Subprogram (Level 2 BLAS). CSYR is an extension to Level 2
BLAS.
When working backward (incx < 0), each routine starts at the end of the vector and moves backward, as
follows:
x(1– incx . (n– 1)), x(1– incx . (n– 2)), . . ., x(1)

SEE ALSO
CHER(3S)

004– 2081– 002 115


SSYR2 ( 3S ) SSYR2 ( 3S )

NAME
SSYR2 – Performs symmetric rank 2 update of a real symmetric matrix

SYNOPSIS
CALL SSYR2 (uplo, n, alpha, x, incx, y, incy, a, lda)

IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk, this subroutine executes on a single processor and uses only private data

DESCRIPTION
SSYR2 performs the following symmetric rank 2 operation:
T T
A ← α xy + α yx + A
T T
where α is a real scalar, y is the transpose of y, x is the transpose of x, x and y are n-element vectors, and
A is an n-by-n real symmetric matrix.
This routine has the following arguments:
uplo Character*1. (input)
Specifies whether the upper or lower triangular part of matrix A is being supplied, as follows:
uplo= ’U’ or ’u’: only the upper triangular part of array a is referenced.
uplo= ’L’ or ’l’: only the lower triangular part of array a is referenced.
n Integer. (input)
Specifies the order of matrix A. n ≥ 0.
alpha Real. (input)
Scalar factor α.
x Real array of dimension 1+(n– 1) .  incx  . (input)
Contains vector x.
incx Integer. (input)
On entry, incx specifies the increment for the elements of x. incx must not be 0.
y Real array of dimension 1+(n– 1) .  incy  . (input)
Contains vector y.
incy Integer. (input)
On entry, incy specifies the increment for the elements of y. incy must not be 0.
a Real array of dimension (lda,n). (input and output)

116 004– 2081– 002


SSYR2 ( 3S ) SSYR2 ( 3S )

Before entry with uplo=’U’ or ’u’, the leading n-by-n upper triangular part of array a must
contain the upper triangular part of the symmetric matrix and the strictly lower triangular part of
a is not referenced. On exit, the upper triangular part of the updated matrix overwrites the upper
triangular part of array a.
Before entry with uplo=’L’ or ’l’, the leading n-by-n lower triangular part of array a must
contain the lower triangular part of the symmetric matrix and the strictly upper triangular part of
a is not referenced. On exit, the lower triangular part of the updated matrix overwrites the lower
triangular part of array a.
lda Integer. (input)
Specifies the first dimension of a as declared in the calling program. lda ≥ MAX(1,n).

NOTES
SSYR2 is a Level 2 Basic Linear Algebra Subprogram (Level 2 BLAS).
When working backward (incx < 0 or incy < 0), this routine starts at the end of the vector and moves
backward, as follows:
x(1– incx . (n– 1)), x(1– incx . (n– 2)) , . . ., x(1)
y(1– incy . (n– 1)), y(1– incy . (n– 2)) , . . ., y(1)

SEE ALSO
CHER2(3S)

004– 2081– 002 117


STBMV ( 3S ) STBMV ( 3S )

NAME
STBMV, CTBMV – Multiplies a real or complex vector by a real or complex triangular band matrix

SYNOPSIS
CALL STBMV (uplo, trans, diag, n, k, a, lda, x, incx)
CALL CTBMV (uplo, trans, diag, n, k, a, lda, x, incx)

IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses only private data.

DESCRIPTION
STBMV multiplies a real vector by a real triangular band matrix.
CTBMV multiplies a complex vector by a complex triangular band matrix.
STBMV and CTBMV perform one of the following matrix-vector operations:
x ← Ax
T
x←A x
H
x ← A x (CTBMV only)
T H
where A is the transpose of A, A is the conjugate transpose of A, x is an n-element vector, and A may be
either a unit or nonunit n-by-n upper or lower triangular band matrix with (k+1) diagonals.
These routines have the following arguments:
uplo Character*1. (input)
Specifies whether the matrix is upper or lower triangular, as follows:
uplo = ’U’ or ’u’: A is an upper triangular matrix.
uplo = ’L’ or ’l’: A is a lower triangular matrix.
trans Character *1. (input)
Specifies the operation to be performed, as follows:
trans = ’N’ or ’n’: x ← Ax
T
trans = ’T’ or ’t’: x ← A x
T H
trans = ’C’ or ’c’: x ← A x (STBMV), or x ← A x (CTBMV)
diag Character *1. (input)
Specifies whether A is unit triangular, as follows:
diag = ’U’ or ’u’: A is assumed to be unit triangular.
diag = ’N’ or ’n’: A is not assumed to be unit triangular.

118 004– 2081– 002


STBMV ( 3S ) STBMV ( 3S )

n Integer. (input)
Specifies the order of matrix A. n ≥ 0.
k Integer. (input)
uplo = ’U’ or ’u’: k specifies the number of superdiagonals of matrix A.
uplo = ’L’ or ’l’: k specifies the number of subdiagonals of matrix A.
k ≥ 0.
a STBMV: Real array of dimension (lda,n). (input)
CTBMV: Complex array of dimension (lda,n). (input)
Before entry with uplo = ’U’ or ’u’, the leading (k+1)-by-n upper part of array a must contain
the upper triangular band part of the matrix of coefficients, supplied column-by-column, with the
leading diagonal of the matrix in row (k+1) of the array, the first superdiagonal starting at
position 2 in row k, and so on. The top left k-by-k triangle of array a is not referenced.
Before entry with uplo = ’L’ or ’l’, the leading (k+1)-by-n part of array a must contain the lower
triangular band part of the matrix of coefficients, supplied column-by-column, with the leading
diagonal of the matrix in row 1 of the array, the first subdiagonal starting at position 1 in row 2,
and so on. The bottom right k-by-k triangle of array a is not referenced.
See the NOTES section for examples of Fortran code that transfer upper and lower triangular
band matrices from conventional full matrix storage to band storage.
When diag = ’U’ or ’u’, these routines assume that all elements of the array a that represent
diagonal elements of the matrix A are 1. In this case, neither of these routines will reference any
of the diagonal elements.
lda Integer. (input)
Specifies the first dimension of a as declared in the calling program. lda ≥ (k+1).
x STBMV: Real array of dimension 1+(n– 1) .  incx  . (input and output)
CTBMV: Complex array of dimension 1+(n– 1) .  incx  . (input and output)
Contains the vector x. On exit, the transformed vector overwrites array x.
incx Integer. (input)
Specifies the increment for the elements of x. incx must not be 0.

NOTES
The following program segment transfers an upper triangular band matrix from conventional full matrix
storage to band storage:
DO 20, J = 1, N
M = K + 1 - J
DO 10, I = MAX( 1, J - K ), J
A( M + I, J ) = MAT RIX( I, J )
10 CON TIN UE
20 CON TIN UE

004– 2081– 002 119


STBMV ( 3S ) STBMV ( 3S )

The following program segment transfers a lower triangular band matrix from conventional full matrix
storage to band storage:
DO 20, J = 1, N
M = 1 - J
DO 10, I = J, MIN( N, J + K )
A( M + I, J ) = MAT RIX( I, J )
10 CONTIN UE
20 CON TINUE

STBMV and CTBMV are Level 2 Basic Linear Algebra Subprograms (Level 2 BLAS).
When working backward (incx < 0), each routine starts at the end of the vector and moves backward, as
follows:
x(1– incx . (n– 1)), x(1– incx . (n– 2)), . . ., x(1)

120 004– 2081– 002


STBSV ( 3S ) STBSV ( 3S )

NAME
STBSV, CTBSV – Solves a real or complex triangular banded system of equations

SYNOPSIS
CALL STBSV (uplo, trans, diag, n, k, a, lda, x, incx)
CALL CTBSV (uplo, trans, diag, n, k, a, lda, x, incx)

IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses only private data.

DESCRIPTION
STBSV solves a real triangular banded system of equations.
CTBSV solves a complex triangular banded system of equations.
STBSV and CTBSV solve one of the following systems of equations, using the operation associated with
each:
Equations Operation
–1
Ax=b x←A x
T –T
A x=b x←A x
H –H
A x=b x←A x (CTBSV only)
where
• b and x are n-element vectors
• A is either a unit or nonunit n-by-n upper or lower triangular band matrix with (k+1) diagonals
–1
• A is the inverse of A
T
• A is the transpose of A
–T T
• A is the inverse of A
H
• A is the conjugate transpose of A
–H H
• A is the inverse of A
On input, the right-hand side vector b is stored in the array argument x. On output, the solution vector x
overwrites b in the same array argument x.
These routines have the following arguments:
uplo Character*1. (input)
Specifies whether the matrix is an upper or lower triangular matrix, as follows:

004– 2081– 002 121


STBSV ( 3S ) STBSV ( 3S )

uplo = ’U’ or ’u’: A is an upper triangular matrix.


uplo = ’L’ or ’l’: A is a lower triangular matrix.
trans Character *1. (input)
Specifies the operation to be performed, as follows:
–1
trans = ’N’ or ’n’: x ← A x
–T
trans = ’T’ or ’t’: x ← A x
–T –H
trans = ’C’ or ’c’: x ← A x (STBSV), or x ← A x (CTBSV)
diag Character *1. (input)
Specifies whether A is unit triangular, as follows:
diag = ’U’ or ’u’: A is assumed to be unit triangular.
diag = ’N’ or ’n’: A is not assumed to be unit triangular.
n Integer. (input)
Specifies the order of matrix A. n ≥ 0.
k Integer. (input)
uplo = ’U’ or ’u’: k specifies the number of superdiagonals of matrix A.
uplo = ’L’ or ’l’: k specifies the number of subdiagonals of matrix A.
k ≥ 0.
a STBSV: Real array of dimension (lda,n). (input)
CTBSV: Complex array of dimension (lda,n). (input)
Before entry with uplo = ’U’ or ’u’, the leading (k+1)-by-n upper triangular part of array a must
contain the upper triangular band part of the matrix of coefficients, supplied column-by-column,
with the leading diagonal of the matrix in row (k+1) of the array, the first superdiagonal starting
at position 2 in row k, and so on. The top left k-by-k triangle of array a is not referenced.
Before entry with uplo = ’L’ or ’l’, the leading (k+1)-by-n part of array a must contain the lower
triangular band part of the matrix of coefficients, supplied column-by-column, with the leading
diagonal of the matrix in row 1 of the array, the first subdiagonal starting at position 1 in row 2,
and so on. The bottom right k-by-k triangle of array a is not referenced.
When diag = ’U’ or ’u’, these routines assume that all elements of array a that represent
diagonal elements of the matrix A are 1. In this case, neither of these routines will reference any
of the diagonal elements.
lda Integer. (input)
Specifies the first dimension of a as declared in the calling program. lda ≥ (k+1).
x STBSV: Real array of dimension 1+(n– 1) .  incx  . (input and output)
CTBSV: Complex array of dimension 1+(n– 1) .  incx  . (input and output)
Contains the vector x.
On input, x contains the right-hand side vector b. On output, the solution vector overwrites
array x.

122 004– 2081– 002


STBSV ( 3S ) STBSV ( 3S )

incx Integer. (input)


Specifies the increment for the elements of x. incx must not be 0.

NOTES
The following program segment transfers an upper triangular band matrix from conventional full matrix
storage to band storage:
DO 20, J = 1, N
M = K + 1 - J
DO 10, I = MAX( 1, J - K ), J
A( M + I, J ) = MATRIX( I, J )
10 CON TINUE
20 CONTIN UE

The following program segment transfers a lower triangular band matrix from conventional full matrix
storage to band storage:
DO 20, J = 1, N
M = 1 - J
DO 10, I = J, MIN( N, J + K )
A( M + I, J ) = MAT RIX ( I, J )
10 CONTIN UE
20 CON TINUE

Tests for singularity or near-singularity are not included in STBSV or CTBSV. You must perform such tests
before calling these routines.
STBSV and CTBSV are Level 2 Basic Linear Algebra Subprograms (Level 2 BLAS).
When working backward (incx < 0), each routine starts at the end of the vector and moves backward, as
follows:
x(1– incx . (n– 1)), x(1– incx . (n– 2)), . . ., x(1)

004– 2081– 002 123


STPMV ( 3S ) STPMV ( 3S )

NAME
STPMV, CTPMV – Multiplies a real or complex vector by a real or complex triangular packed matrix

SYNOPSIS
CALL STPMV (uplo, trans, diag, n, ap, x, incx)
CALL CTPMV (uplo, trans, diag, n, ap, x, incx)

IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses only private data.

DESCRIPTION
STPMV and CTPMV perform one of the following matrix-vector operations:
x ← Ax
T
x←A x
H
x ← A x (CTPMV only)
T H
where A is the transpose of A, A is the conjugate transpose of A, x is an n-element vector, and A may be
either a unit or nonunit n-by-n upper or lower triangular matrix.
These routines have the following arguments:
uplo Character*1. (input)
Specifies whether the matrix is an upper or lower triangular matrix, as follows:
uplo = ’U’ or ’u’: A is an upper triangular matrix.
uplo = ’L’ or ’l’: A is a lower triangular matrix.
trans Character*1. (input)
Specifies the operation to be performed, as follows:
trans = ’N’ or ’n’: x ← Ax
T
trans = ’T’ or ’t’: x ← A x
T H
trans = ’C’ or ’c’: x ← A x (STPMV), or x ← A x (CTPMV)
diag Character*1. (input)
Specifies whether A is unit triangular, as follows:
diag = ’U’ or ’u’: A is assumed to be unit triangular.
diag = ’N’ or ’n’: A is not assumed to be unit triangular.
n Integer. (input)
Specifies the order of matrix A. n ≥ 0.

124 004– 2081– 002


STPMV ( 3S ) STPMV ( 3S )

n (n +1)
ap STPMV: Real array of dimension . (input)
2
n (n +1)
CTPMV: Complex array of dimension . (input)
2
Before entry with uplo = ’U’ or ’u’, array ap must contain the upper triangular matrix packed
sequentially, column-by-column, so that ap(1) contains A(1,1), ap(2) contains A(1,2), ap(3)
contains A(2,2), and so on.
Before entry with uplo = ’L’ or ’l’, array ap must contain the lower triangular matrix packed
sequentially, column-by-column, so that ap(1) contains A(1,1), ap(2) contains A(2,1), ap(3)
contains A(3,1), and so on.
When diag = ’U’ or ’u’, these routines assume that all elements of the array a that represent
diagonal elements of the matrix A are 1. In this case, neither of these routines will reference any
of the diagonal elements.
x STPMV: Real array of dimension 1+(n– 1) .  incx  . (input and output)
CTPMV: Complex array of dimension 1+(n– 1) .  incx  . (input and output)
Contains the vector x. On exit, the transformed vector overwrites array x.
incx Integer. (input)
Specifies the increment for the elements of x. incx must not be 0.

NOTES
STPMV and CTPMV are Level 2 Basic Linear Algebra Subprograms (Level 2 BLAS).
When working backward (incx < 0), each routine starts at the end of the vector and moves backward, as
follows:
x(1– incx . (n– 1)), x(1– incx . (n– 2)), . . ., x(1)

004– 2081– 002 125


STPSV ( 3S ) STPSV ( 3S )

NAME
STPSV, CTPSV – Solves a real or complex triangular packed system of equations

SYNOPSIS
CALL STPSV (uplo, trans, diag, n, ap, x, incx)
CALL CTPSV (uplo, trans, diag, n, ap, x, incx)

IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses only private data.

DESCRIPTION
STPSV and CTPSV solve one of the following systems of equations, using the operation associated with
each:
Equations Operation
–1
Ax=b x←A x
T –T
A x=b x←A x
H –H
A x=b x←A x (CTPSV only)
where
• b and x are n-element vectors
• A is either a unit or nonunit n-by-n upper or lower triangular band matrix with (k+1) diagonals
–1
• A is the inverse of A
T
• A is the transpose of A
–T T
• A is the inverse of A
H
• A is the conjugate transpose of A
–H H
• A is the inverse of A
On input, the right-hand side vector b is stored in the array argument x. On output, the solution vector x
overwrites b in the same array argument x.
These routines have the following arguments:
uplo Character*1. (input)
Specifies whether the matrix is an upper or lower triangular matrix, as follows:
uplo = ’U’ or ’u’: A is an upper triangular matrix.
uplo = ’L’ or ’l’: A is a lower triangular matrix.

126 004– 2081– 002


STPSV ( 3S ) STPSV ( 3S )

trans Character*1. (input)


Specifies the operation to be performed, as follows:
–1
trans = ’N’ or ’n’: x ← A x
–T
trans = ’T’ or ’t’: x ← A x
–T –H
trans = ’C’ or ’c’: x ← A x (STPSV), or x ← A x (CTPSV)
diag Character*1. (input)
Specifies whether A is unit triangular, as follows:
diag = ’U’ or ’u’: A is assumed to be unit triangular.
diag = ’N’ or ’n’: A is not assumed to be unit triangular.
n Integer. (input)
Specifies the order of matrix A. n ≥ 0.
n (n +1)
ap STPSV: Real array of dimension . (input)
2
n (n +1)
CTPSV: Complex array of dimension . (input)
2
Before entry with uplo = ’U’ or ’u’, array ap must contain the upper triangular matrix packed
sequentially, column-by-column, so that ap(1) contains A(1,1), ap(2) contains A(1,2), ap(3)
contains A(2,2), and so on.
Before entry with uplo = ’L’ or ’l’, array ap must contain the lower triangular matrix packed
sequentially, column-by-column, so that ap(1) contains A(1,1), ap(2) contains A(2,1), ap(3)
contains A(3,1), and so on.
When diag = ’U’ or ’u’, these routines assume that all elements of array a that represent
diagonal elements of the matrix A are 1. In this case, neither of these routines will reference any
of the diagonal elements.
x STPSV: Real array of dimension 1+(n– 1) .  incx  . (input and output)
CTPSV: Complex array of dimension 1+(n– 1) .  incx  . (input and output)
Contains the vector b, then the vector x.
incx Integer. (input)
Specifies the increment for the elements of x. incx must not be 0.

NOTES
Tests for singularity or near-singularity are not included in STPSV or CTPSV. You must perform such tests
before calling either routine.
STPSV and CTPSV are Level 2 Basic Linear Algebra Subprograms (Level 2 BLAS).
When working backward (incx < 0), each routine starts at the end of the vector and moves backward, as
follows:
x(1– incx . (n– 1)), x(1– incx . (n– 2)), . . ., x(1)

004– 2081– 002 127


STRMV ( 3S ) STRMV ( 3S )

NAME
STRMV, CTRMV – Multiplies a real or complex vector by a real or complex triangular matrix

SYNOPSIS
CALL STRMV (uplo, trans, diag, n, a, lda, x, incx)
CALL CTRMV (uplo, trans, diag, n, a, lda, x, incx)

IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses only private data.

DESCRIPTION
STRMV multiplies a real vector by a real triangular matrix.
CTRMV multiplies a complex vector by a complex triangular matrix.
STRMV and CTRMV perform one of the following matrix-vector operations:
x ← Ax
T
x←A x
H
x ← A x (CTRMV only)
T H
where A is the transpose of A, A is the conjugate transpose of A, x is an n-element vector, and A may be
either a unit or nonunit n-by-n upper or lower triangular matrix.
These routines have the following arguments:
uplo Character*1. (input)
Specifies whether the matrix is upper or lower triangular, as follows:
uplo = ’U’ or ’u’: A is an upper triangular matrix.
uplo = ’L’ or ’l’: A is a lower triangular matrix.
trans Character *1. (input)
Specifies the operation to be performed, as follows:
trans = ’N’ or ’n’: x ← Ax
T
trans = ’T’ or ’t’: x ← A x
T H
trans = ’C’ or ’c’: x ← A x (STRMV), or x ← A x (CTRMV)
diag Character *1. (input)
Specifies whether A is unit triangular, as follows:
diag = ’U’ or ’u’: A is assumed to be unit triangular.
diag = ’N’ or ’n’: A is not assumed to be unit triangular.

128 004– 2081– 002


STRMV ( 3S ) STRMV ( 3S )

n Integer. (input)
Specifies the order of matrix A. n ≥ 0.
a STRMV: Real array of dimension (lda,n). (input)
CTRMV: Complex array of dimension (lda,n). (input)
Before entry with uplo = ’U’ or ’u’, the leading n-by-n upper triangular part of array a must
contain the upper triangular matrix. The strictly lower triangular part of a is not referenced.
Before entry with uplo = ’L’ or ’l’, the leading n-by-n lower triangular part of array a must
contain the lower triangular matrix. The strictly upper triangular part of a is not referenced.
When diag = ’U’ or ’u’, these routines assume that all elements of array a that represent
diagonal elements of matrix A are 1. In this case, neither of these routines will reference any of
the diagonal elements.
lda Integer. (input)
Specifies the first dimension of a as declared in the calling program. lda must be at least
MAX(1,n).
x STRMV: Real array of dimension 1+(n– 1) .  incx  . (input and output)
CTRMV: Complex array of dimension 1+(n– 1) .  incx  . (input and output)
Contains the vector x. On exit, the transformed vector overwrites array x.
incx Integer. (input)
Specifies the increment for the elements of x. incx must not be 0.

NOTES
STRMV and CTRMV are Level 2 Basic Linear Algebra Subprograms (Level 2 BLAS).
When working backward (incx < 0), each routine starts at the end of the vector and moves backward, as
follows:
x(1– incx . (n– 1)), x(1– incx . (n– 2)), . . ., x(1)

004– 2081– 002 129


STRSV ( 3S ) STRSV ( 3S )

NAME
STRSV, CTRSV – Solves a real or complex triangular system of equations

SYNOPSIS
CALL STRSV (uplo, trans, diag, n, a, lda, x, incx)
CALL CTRSV (uplo, trans, diag, n, a, lda, x, incx)

IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses only private data.

DESCRIPTION
STRSV solves a real triangular system of equations.
CTRSV solves a complex triangular system of equations.
STRSV and CTRSV solve one of the following systems of equations, using the operation associated with
each:
Equations Operation
–1
Ax=b x←A x
T –T
A x=b x←A x
H –H
A x=b x←A x (CTRSV only)
where
• b and x are n-element vectors
• A is either a unit or nonunit n-by-n upper or lower triangular matrix
–1
• A is the inverse of A
T
• A is the transpose of A
–T T
• A is the inverse of A
H
• A is the conjugate transpose of A
–H H
• A is the inverse of A
On input, the right-hand side vector b is stored in array argument x. On output, the solution vector x
overwrites b in the same array argument x.
These routines have the following arguments:
uplo Character*1. (input)
Specifies whether the matrix is an upper or lower triangular matrix, as follows:

130 004– 2081– 002


STRSV ( 3S ) STRSV ( 3S )

uplo = ’U’ or ’u’: A is an upper triangular matrix.


uplo = ’L’ or ’l’: A is a lower triangular matrix.
trans Character *1. (input)
Specifies the operation to be performed, as follows:
–1
trans = ’N’ or ’n’: x ← A x
–T
trans = ’T’ or ’t’: x ← A x
–T –H
trans = ’C’ or ’c’: x ← A x (STRSV), or x ← A x (CTRSV)
diag Character *1. (input)
Specifies whether A is unit triangular, as follows:
diag = ’U’ or ’u’: A is assumed to be unit triangular.
diag = ’N’ or ’n’: A is not assumed to be unit triangular.
n Integer. (input)
Specifies the order of matrix A. n ≥ 0.
a STRSV: Real array of dimension (lda,n). (input)
CTRSV: Complex array of dimension (lda,n). (input)
Before entry with uplo = ’U’ or ’u’, the leading n-by-n upper triangular part of array a must
contain the upper triangular matrix. The strictly lower triangular part of a is not referenced.
Before entry with uplo = ’L’ or ’l’, the leading n-by-n lower triangular part of array a must
contain the lower triangular matrix. The strictly upper triangular part of a is not referenced.
When diag = ’U’ or ’u’, these routines assume that all elements of array a that represent
diagonal elements of matrix A are 1. In this case, neither of these routines will reference any of
the diagonal elements.
lda Integer. (input)
Specifies the first dimension of a as declared in the calling program. lda ≥ MAX(1,n).
x STRSV: Real array of dimension 1+(n– 1) .  incx  . (input and output)
CTRSV: Complex array of dimension 1+(n– 1) .  incx  . (input and output)
On input, x contains the right-hand side vector b. On output, x is overwritten with the solution
vector x.
incx Integer. (input)
Specifies the increment for the elements of x. incx must not be 0.

NOTES
Tests for singularity or near-singularity are not included in STRSV or CTRSV. You must perform such tests
before calling either routine.

004– 2081– 002 131


STRSV ( 3S ) STRSV ( 3S )

STRSV and CTRSV are Level 2 Basic Linear Algebra Subprograms (Level 2 BLAS).
When working backward (incx < 0), each routine starts at the end of the vector and moves backward, as
follows:
x(1– incx . (n– 1)), x(1– incx . (n– 2)), . . ., x(1)

132 004– 2081– 002


INTRO_BLAS3 ( 3S ) INTRO_BLAS3 ( 3S )

NAME
INTRO_BLAS3 – Introduction to matrix-matrix linear algebra subprograms

IMPLEMENTATION
See individual man pages for implementation details

DESCRIPTION
The Level 3 Basic Linear Algebra Subprograms (Level 3 BLAS) consist of routines for unpacked real and
complex data. They handle matrix-matrix operations.
Level 3 Basic Linear Algebra Subprograms
The following table describes these routines. If more than one routine name appears for a given block in the
table, the first name listed is the name of the man page that describes all routines listed in that block. For
complete information about each operation performed by the routine, see the individual man page for that
routine.
The table is in alphabetic order, except that each Hermitian matrix routine (any routine whose name begins
with CH) is grouped next to equivalent symmetric matrix routines (whose names begin with SS or CS). This
is because the Hermitian property is a type of symmetry.
Each routine in the table marked with an asterisk is an extension to the standard set of Level 3 BLAS
routines.

Purpose Operation Name

Copies a real matrix into another real B←A SCOPY2*


matrix; copies a complex matrix into CCOPY2*
another complex matrix (used by the
out-of-core routines only on
Cray Y-MP systems, see
INTRO_CORE(3S))
Multiplies a real general matrix by a C ← α op (A ) op (B ) + β C op (X ) = X or op (X ) = SGEMM
real general matrix; multiplies a X T or op (X ) = X H (CGEMM only) CGEMM
complex general matrix by a complex
general matrix
Multiplies a real general matrix by a C ← α op (A ) op (B ) + β C SGEMMS*
real general matrix, using a variation CGEMMS*
of Strassen’s algorithm; multiplies a
complex general matrix by a complex
general matrix, using a variation of
Strassen’s algorithm

004– 2081– 002 133


INTRO_BLAS3 ( 3S ) INTRO_BLAS3 ( 3S )

Purpose Operation Name

Multiplies a real general matrix by a C ← α A B + β C or C ← α B A + β C SSYMM


real symmetric matrix; multiplies a CSYMM
complex general matrix by a complex
symmetric matrix
Multiplies a complex general matrix by C ← αA B + βC C ← αB A + βC CHEMM
a Hermitian matrix
Performs symmetric rank 2k update of C ← α AB T + α BA T + β C SSYR2K
a real symmetric matrix; performs C ← αA T B + αB T A + βC CSYR2K
symmetric rank 2k update of a
complex symmetric matrix
Performs Hermitian rank 2k update of C ← αA B H + αB A H + βC CHER2K
a Hermitian matrix C ← αA H B + αB H A + βC
Performs symmetric rank k update of a C ← αA A T + βC C ← αA T A + βC SSYRK
real symmetric matrix; performs CSYRK
symmetric rank k update of a complex
symmetric matrix
Performs Hermitian rank k update of a C ← α A A H + β C or C ← α A H A + β C CHERK
Hermitian matrix
Multiplies a real general matrix by a B ← α op (A ) B B ← α B op (A ) STRMM
real triangular matrix; multiplies a CTRMM
complex general matrix by a complex
triangular matrix
Solves a real triangular system of B ← α op (A −1) B B ← α B op (A −1) op (A ) = A or STRSM
equations with multiple right-hand op (A ) = A T or op (A ) = A H (CTRSM only) CTRSM
sides; solves a complex triangular
system of equations with multiple
right-hand sides

SEE ALSO
Dongarra, J., J. Du Croz, I. Duff, and S. Hammarling,"A Set of Level 3 Basic Linear Algebra Subprograms,"
ACM Transactions on Mathematical Software, Vol. 16, No. 1, March 1990, pp. 1 – 17.

134 004– 2081– 002


CHEMM ( 3S ) CHEMM ( 3S )

NAME
CHEMM – Multiplies a complex general matrix by a complex Hermitian matrix

SYNOPSIS
CALL CHEMM (side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc)

IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses only private data.

DESCRIPTION
CHEMM multiplies a complex general matrix by a complex Hermitian matrix.
CHEMM performs one of the following matrix-matrix operations:
C ← αA B + βC
C ← αB A + βC
where α and β are scalars, A is a Hermitian matrix, and B and C are m-by-n matrices.
This routine has the following arguments:
side Character*1. (input)
Specifies whether the Hermitian matrix A appears on the left or right in the operation, as
follows:
side = ’L’ or ’l’: C ← α A B + β C
side = ’R’ or ’r’: C ← α B A + β C
uplo Character*1. (input)
Specifies whether the upper or lower triangular part of the Hermitian matrix A is referenced, as
follows:
uplo = ’U’ or ’u’: only the upper triangular part of the Hermitian matrix is referenced.
uplo = ’L’ or ’l’: only the lower triangular part of the Hermitian matrix is referenced.
m Integer. (input)
Specifies the number of rows in matrix C. m must be ≥ 0.
n Integer. (input)
Specifies the number of columns in matrix C. n must be ≥ 0.
alpha Complex. (input)
Scalar factor α.
a Complex array of dimension (lda,ka). (input)
Contains matrix A. When side = ’L’ or ’l’, ka is m; otherwise, it is n.

004– 2081– 002 135


CHEMM ( 3S ) CHEMM ( 3S )

Before entry with side = ’L’ or ’l’, the m-by-m part of array a must contain the Hermitian
matrix, such that:
• If uplo = ’U’ or ’u’, the leading m-by-m upper triangular part of array a must contain the
upper triangular part of the Hermitian matrix. The strictly lower triangular part of a is not
referenced.
• If uplo = ’L’ or ’l’, the leading m-by-m lower triangular part of array a must contain the
lower triangular part of the Hermitian matrix. The strictly upper triangular part of a is not
referenced.
Before entry with side = ’R’ or ’r’, the n-by-n part of array a must contain the Hermitian matrix,
such that:
• If uplo = ’U’ or ’u’, the leading n-by-n upper triangular part of array a must contain the
upper triangular part of the Hermitian matrix. The strictly lower triangular part of a is not
referenced.
• If uplo = ’L’ or ’l’, the leading n-by-n lower triangular part of array a must contain the lower
triangular part of the Hermitian matrix. The strictly upper triangular part of a is not
referenced.
The imaginary parts of the diagonal elements need not be set. They are assumed to be 0.
lda Integer. (input)
Specifies the first dimension of a as declared in the calling program. When side = ’L’ or ’l’, lda
≥ MAX(1,m); otherwise, lda ≥ MAX(1,n).
b Complex array of dimension (ldb,n). (input)
Contains matrix B. Before entry, the leading m-by-n part of array b must contain matrix B.
ldb Integer. (input)
Specifies the first dimension of b as declared in the calling program. ldb ≥ MAX(1,m).
beta Complex. (input)
Scalar factor β. When beta is supplied as 0, c need not be set on input.
c Complex array of dimension (ldc,n). (input and output)
Contains matrix C.
Before entry, the leading m-by-n part of array c must contain matrix C, except when beta is 0; in
which case, c need not be set. On exit, the m-by-n updated matrix overwrites array c.
ldc Integer. (input)
Specifies the first dimension of c as declared in the calling program. ldc ≥ MAX(1,m).

136 004– 2081– 002


CHEMM ( 3S ) CHEMM ( 3S )

NOTES
CHEMM is a Level 3 Basic Linear Algebra Subprogram (Level 3 BLAS).

SEE ALSO
SSYMM(3S)

004– 2081– 002 137


CHER2K ( 3S ) CHER2K ( 3S )

NAME
CHER2K – Performs Hermitian rank 2k update of a complex Hermitian matrix

SYNOPSIS
CALL CHER2K (uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc)

IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses only private data.

DESCRIPTION
CHER2K performs a Hermitian rank 2k update of a complex Hermitian matrix.
CHER2K performs one of the following Hermitian rank 2k operations:
H H
C ← α AB + α BA + β C
H H
C ← α A B + α B A + βC
where the following is true:
• α and β are scalars;
H H
• A H and B are the conjugate transposes of A and B, respectively;
• C is an n-by-n Hermitian matrix;
• A and B and A and B are n-by-k matrices in the first operation listed previously, and k-by-n matrices in
the second.
This routine has the following arguments:
uplo Character*1. (input)
Specifies whether the upper or lower triangular part of array c is referenced, as follows:
uplo = ’U’ or ’u’: only the upper triangular part of c is referenced.
uplo = ’L’ or ’l’: only the lower triangular part of c is referenced.
trans Character*1. (input)
Specifies the operation to be performed, as follows:
H H
trans = ’N’ or ’n’: C ← α AB + α BA + β C
H H
trans = ’C’ or ’c’: C ← α A B + α B A + βC
n Integer. (input)
Specifies the order of matrix C. n must be ≥ 0.
k Integer. (input)
On entry with trans = ’N’ or ’n’, k specifies the number of columns of matrices A and B.

138 004– 2081– 002


CHER2K ( 3S ) CHER2K ( 3S )

On entry with trans = ’C’ or ’c’, k specifies the number of rows of matrices A and B.
k must be ≥ 0.
alpha Complex. (input)
Scalar factor α.
a Complex array of dimension (lda,ka). (input)
When trans = ’N’ or ’n’, ka is k; otherwise, it is n. Contains matrix A.
Before entry with trans = ’N’ or ’n’, the leading n-by-k part of array a must contain matrix A;
otherwise, the leading k-by-n part of array a must contain matrix A.
lda Integer. (input)
Specifies the first dimension of a as declared in the calling program.
If trans = ’N’ or ’n’, lda ≥ MAX(1,n); otherwise, lda ≥ MAX(1,k).
b Complex array of dimension (ldb,kb). (input)
When trans = ’N’ or ’n’, kb is k; otherwise, it is n. Contains matrix B.
Before entry with trans = ’N’ or ’n’, the leading n-by-k part of array b must contain matrix B;
otherwise, the leading k-by-n part of array b must contain matrix B.
ldb Integer. (input)
Specifies the first dimension of b as declared in the calling program. If trans = ’N’ or ’n’, ldb ≥
MAX(1,n); otherwise, ldb ≥ MAX(1,k).
beta Real. (input)
Scalar factor α.
c Complex array of dimension (ldc,n). (input)
Contains matrix C.
Before entry with uplo = ’U’ or ’u’, the leading n-by-n upper triangular part of array c must
contain the upper triangular part of the Hermitian matrix. The strictly lower triangular part of c
is not referenced. On exit, the upper triangular part of the updated matrix overwrites the upper
triangular part of array c.
Before entry with uplo = ’L’ or ’l’, the leading n-by-n lower triangular part of array c must
contain the lower triangular part of the Hermitian matrix. The strictly upper triangular part of c
is not referenced. On exit, the lower triangular part of the updated matrix overwrites the lower
triangular part of array c.
The imaginary parts of the diagonal elements need not be set and are assumed to be 0. On exit,
they are set to 0.
ldc Integer. (input)
Specifies the first dimension of c as declared in the calling program. ldc ≥ MAX(1,n).

004– 2081– 002 139


CHER2K ( 3S ) CHER2K ( 3S )

NOTES
CHER2K is a Level 3 Basic Linear Algebra Subprogram (Level 3 BLAS).

SEE ALSO
SSYR2K(3S)

140 004– 2081– 002


CHERK ( 3S ) CHERK ( 3S )

NAME
CHERK – Performs Hermitian rank k update of a complex Hermitian matrix

SYNOPSIS
CALL CHERK (uplo, trans, n, k, alpha, a, lda, beta, c, ldc)

IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses only private data.

DESCRIPTION
CHERK performs a Hermitian rank k update of a complex Hermitian matrix.
CHERK performs one of the following Hermitian rank k operations:
H
C ← α AA + β C
H
C←αA A+β C
where the following is true:
• α and β are scalars;
H
• A is the conjugate transpose of A.
• C is an n-by-n Hermitian matrix;
• A is an n-by-k matrix in the first operation listed previously, and a k-by-n matrix in the second.
This routine has the following arguments:
uplo Character*1. (input)
Specifies whether the upper or lower triangular part of array c is referenced, as follows:
uplo = ’U’ or ’u’: only the upper triangular part of c is referenced.
uplo = ’L’ or ’l’: only the lower triangular part of c is referenced.
trans Character*1. (input)
Specifies the operation to be performed, as follows:
H
trans = ’N’ or ’n’: C ← α AA + β C
H
trans = ’C’ or ’c’: C ← α A A + β C
n Integer. (input)
Specifies the order of matrix C. n must be ≥ 0.
k Integer. (input)

004– 2081– 002 141


CHERK ( 3S ) CHERK ( 3S )

On entry with trans = ’N’ or ’n’, k specifies the number of columns of matrix A.
On entry with trans = ’C’ or ’c’, k specifies the number of rows of matrix A.
k must be ≥ 0.
alpha Real. (input)
Scalar factor α.
a Complex array of dimension (lda,ka). (input)
When trans = ’N’ or ’n’, ka is k; otherwise, it is n. Contains matrix A.
Before entry with trans = ’N’ or ’n’, the leading n-by-k part of array a must contain matrix A;
otherwise, the leading k-by-n part of array a must contain matrix A.
lda Integer. (input)
Specifies the first dimension of a as declared in the calling program. If trans = ’N’ or ’n’, lda ≥
MAX(1,n); otherwise, lda ≥ MAX(1,k).
beta Real. (input)
Scalar factor β.
c Complex array of dimension (ldc,n). (input and output) Contains matrix C.
Before entry with uplo = ’U’ or ’u’, the leading n-by-n upper triangular part of array c must
contain the upper triangular part of the Hermitian matrix. The strictly lower triangular part of c
is not referenced. On exit, the upper triangular part of the updated matrix overwrites the upper
triangular part of array c.
Before entry with uplo = ’L’ or ’l’, the leading n-by-n lower triangular part of array c must
contain the lower triangular part of the Hermitian matrix. The strictly upper triangular part of c
is not referenced. On exit, the lower triangular part of the updated matrix overwrites the lower
triangular part of array c.
The imaginary parts of the diagonal elements need not be set and are assumed to be 0. On exit,
they are set to 0.
ldc Integer. (input)
Specifies the first dimension of c as declared in the calling (sub)program. ldc ≥ MAX(1,n).

NOTES
CHERK is a Level 3 Basic Linear Algebra Subprogram (Level 3 BLAS).

SEE ALSO
SSYRK(3S)

142 004– 2081– 002


SCOPY2 ( 3S ) SCOPY2 ( 3S )

NAME
SCOPY2, CCOPY2 – Copies a real or complex matrix into another real or complex matrix

SYNOPSIS
CALL SCOPY2 (m, n, a, lda, b, ldb)
CALL CCOPY2 (m, n, a, lda, b, ldb)

IMPLEMENTATION
UNICOS systems

DESCRIPTION
SCOPY2 copies a real matrix into another real matrix.
CCOPY2 copies a complex matrix into another complex matrix.
SCOPY2 and CCOPY2 perform the following matrix operation:
B←A
where A and B are real or complex matrices.
This routine has the following arguments:
m Integer. (input)
Number of rows of A and B.
n Integer. (input)
Number of columns of A and B.
a SCOPY2: Real array of dimension (lda,n). (input)
CCOPY2: Complex array of dimension (lda,n). (input)
Contains matrix from which to copy.
lda Integer. (input)
Leading dimension of array a.
b SCOPY2: Real array of dimension (ldb,n). (output)
CCOPY2: Complex array of dimension (ldb,n). (output)
Contains matrix into which to copy.
ldb Integer. (input)
Leading dimension of array b.

004– 2081– 002 143


SCOPY2 ( 3S ) SCOPY2 ( 3S )

NOTES
SCOPY2 and CCOPY2 are extensions to the standard set of Level 3 Basic Linear Algebra Subprograms
(Level 3 BLAS). They are matrix analogues of the Level 1 BLAS vector copy routines SCOPY(3S) and
CCOPY(3S).
The 2 in the routine name means "two-dimensional."
This routine vectorizes along the rows or columns, whichever is longer, and processes the other direction in
parallel.

SEE ALSO
CCOPY(3S), SCOPY(3S), SGESUM(3S)

144 004– 2081– 002


SGEMM ( 3S ) SGEMM ( 3S )

NAME
SGEMM, CGEMM – Multiplies a real or complex general matrix by a real or complex general matrix

SYNOPSIS
CALL SGEMM (transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
CALL CGEMM (transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc)

IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses only private data.

DESCRIPTION
SGEMM multiplies a real general matrix by a real general matrix.
CGEMM multiplies a complex general matrix by a complex general matrix.
SGEMM and CGEMM perform one of the matrix-matrix operations:
C ←α op(A) op(B) + β C
where op(X) is one of the following:
op(X) = X
T
op(X) = X
H
op(X) = X (CGEMM only)
where
• α and β are scalars
• A, B, and C are matrices
• op(A) is an m-by-k matrix
• op(B) is a k-by-n matrix
• C is an m-by-n matrix.
T
• X is the transpose of x
H
• X is the conjugate transpose of X.
These routines have the following arguments:

004– 2081– 002 145


SGEMM ( 3S ) SGEMM ( 3S )

transa Character*1. (input)


Specifies the form of op (A ) to be used in the matrix multiplication, as follows:
transa = ’N’ or ’n’: op(A) = A
T
transa = ’T’ or ’t’: op(A) = A
T H
transa = ’C’ or ’c’, op(A) = A (SGEMM), or op(A) = A (CGEMM)
transb Character*1. (input)
Specifies the form of op(B) to be used in the matrix multiplication, as follows:
transb = ’N’ or ’n’: op(B) = B
T
transb = ’T’ or ’t’: op(B) = B
T H
transb = ’C’ or ’c’: op(B) = B (SGEMM), or op(B) = B (CGEMM)
m Integer. (input)
Specifies the number of rows in matrix op(A) and in matrix C. m must be ≥ 0.
n Integer. (input)
Specifies the number of columns in matrix op(B) and in matrix C. n must be ≥ 0.
k Integer. (input)
Specifies the number of columns of matrix op(A) and the number of rows of matrix op(B). k
must be ≥ 0.
alpha SGEMM: Real. (input)
CGEMM: Complex. (input)
Scalar factor α.
a SGEMM: Real array of dimension (lda,ka). (input)
CGEMM: Complex array of dimension (lda,ka). (input)
When transa = ’N’ or ’n’, ka is k; otherwise, it is m. Contains the matrix A.
Before entry with transa = ’N’ or ’n’, the leading m-by-k part of array a must contain matrix A;
otherwise, the leading k-by-m part of array a must contain matrix A.
lda Integer. (input)
Specifies the first dimension of a as declared in the calling program.
When transa = ’N’ or ’n’, lda ≥ MAX(1,m); otherwise, lda ≥ MAX(1,k).
b SGEMM: Real array of dimension (ldb,kb). (input)
CGEMM: Complex array of dimension (ldb,kb). (input)
When transb = ’N’ or ’n’, kb is n; otherwise, it is k. Contains the matrix B.
Before entry with transb = ’N’ or ’n’, the leading k-by-n part of array b must contain matrix B;
otherwise, the leading n-by-k part of array b must contain matrix B.
ldb Integer. (input)
Specifies the first dimension of b as declared in the calling program. When transb = ’N’ or ’n’,
ldb ≥ MAX(1,k); otherwise, ldb ≥ MAX(1,n).

146 004– 2081– 002


SGEMM ( 3S ) SGEMM ( 3S )

beta SGEMM: Real. (input)


CGEMM: Complex. (input)
Scalar factor β. When beta is supplied as 0, c need not be set on input.
c SGEMM: Real array of dimension (ldc,n). (input and output)
CGEMM: Complex array of dimension (ldc,n). (input and output)
Contains the matrix C.
Before entry, the leading m-by-n part of array c must contain matrix C, except when beta is 0; in
which case, c need not be set. On exit, the m-by-n result matrix overwrites array c.
ldc Integer. (input)
Specifies the first dimension of c as declared in the calling program. ldc ≥ MAX(1,m).

NOTES
SGEMM and CGEMM are Level 3 Basic Linear Algebra Subprograms (Level 3 BLAS).

SEE ALSO
SGEMMS(3S) to multiply general matrices by using Strassen’s algorithm

004– 2081– 002 147


SGEMMS ( 3S ) SGEMMS ( 3S )

NAME
SGEMMS, CGEMMS – Multiplies a real or complex general matrix by a real or complex general matrix, using
Strassen’s algorithm

SYNOPSIS

UNICOS systems:
CALL SGEMMS (transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc, work)
CALL CGEMMS (transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc, work)
UNICOS/mk systems:
CALL SGEMMS (transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
CALL CGEMMS (transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc)

IMPLEMENTATION
UNICOS and UNICOS/mk systems

DESCRIPTION
SGEMMS multiplies a real general matrix by a real general matrix. CGEMMS multiplies a complex general
matrix by a complex general matrix.
SGEMMS and CGEMMS are implementations of the Winograd’s variation of Strassen’s algorithm for matrix
multiplication. The algorithm is descibed in the NOTES section of this man page. Because of the very
different order of operations performed by the Strassen’s algorithm, numerical results from SGEMMS and
CGEMMS may differ slightly from those of SGEMM and CGEMM.
On UNICOS systems, these routines are functionally equivalent to SGEMM and CGEMM except for the
addditional argument, work.
On UNICOS/mk systems, SGEMMS and CGEMMS are functionally equivalent to SGEMM and CGEMM except
for the following:
• m, n, k must be greater than 0.
• transa and transb = ’C’ or ’c’ is invalid in SGEMMS.
The UNICOS/mk version of these routines requires a workspace which is allocated, managed, and freed by
the routines.
SGEMMS and CGEMMS perform one of the matrix-matrix operations:
C ← α op (A ) op (B ) + β C
where op(X) is one of the following:

148 004– 2081– 002


SGEMMS ( 3S ) SGEMMS ( 3S )

op(X) = X
T
op(X) = X
H
op(X) = X (CGEMMS only)
where
• α and β are scalars
• A, B, and C are matrices
• op(A) is an m-by-k matrix
• op(B) is a k-by-n matrix
• C is an m-by-n matrix
T
• X is the transpose of x
H
• X is the conjugate transpose of X.
These routines have the following arguments:
transa Character*1. (input)
Specifies the form of op(A) to be used in the matrix multiplication, as follows:
If transa = ’N’ or ’n’, op(A) = A
T
If transa = ’T’ or ’t’, op(A) = A
T H
In the UNICOS version, if transa = ’C’ or ’c’, op(A) = A (SGEMMS) or op(A) = A (CGEMMS)
H
In the UNICOS/mk version, if transa = ’C’ or ’c’, op(A) = A
transb Character*1. (input)
Specifies the form of op(B) to be used in the matrix multiplication, as follows:
If transb = ’N’ or ’n’, op(B) = B
T
If transb = ’T’ or ’t’, op(B) = B
T H
In the UNICOS version, if transb = ’C’ or ’c’: op(B) = B (SGEMMS) or op(B) = B (CGEMMS)
H
In the UNICOS/mk version, if transb = ’C’ or ’c’, op(B) = B
m Integer. (input)
Specifies the number of rows in matrix op(A) and in matrix C. m must be ≥ 0 on UNICOS
systems; m must be ≥ 1 for UNICOS/mk systems..
n Integer. (input)
Specifies the number of columns in matrix op(B) and in matrix C. n must be ≥ 0 for UNICOS
systems; n must be ≥ 1 for UNICOS/mk systems.
k Integer. (input)
Specifies the number of columns of matrix op(A) and the number of rows of matrix op(B). k
must be ≥ 0 for UNICOS systems; k must be ≥ 1 for UNICOS/mk systems.

004– 2081– 002 149


SGEMMS ( 3S ) SGEMMS ( 3S )

alpha SGEMMS: Real. (input)


CGEMMS: Complex. (input)
Scalar factor α.
a SGEMMS: Real array of dimension (lda,ka). (input)
CGEMMS: Complex array of dimension (lda,ka). (input)
When transa = ’N’ or ’n’, ka is k; otherwise, it is m. Contains the matrix A.
Before entry with transa = ’N’ or ’n’, the leading m-by-k part of array a must contain matrix A;
otherwise, the leading k-by-m part of array a must contain matrix A.
lda Integer. (input)
Specifies the first dimension of a as declared in the calling (sub)program. When transa = ’N’ or
’n’, lda ≥ MAX(1,m); otherwise, lda ≥ MAX(1,k).
b SGEMMS: Real array of dimension (ldb,kb). (input)
CGEMMS: Complex array of dimension (ldb,kb). (input)
When transb = ’N’ or ’n’, kb is n; otherwise, it is k. Contains the matrix B.
Before entry with transb = ’N’ or ’n’, the leading k-by-n part of array b must contain matrix B;
otherwise, the leading n-by-k part of array b must contain matrix B.
ldb Integer. (input)
Specifies the first dimension of b as declared in the calling (sub)program.
When transb = ’N’ or ’n’, ldb ≥ MAX(1,k); otherwise, ldb ≥ MAX(1,n).
beta SGEMMS: Real. (input)
CGEMMS: Complex. (input)
Scalar factor β. When beta is supplied as 0, c need not be set on input.
c SGEMMS: Real array of dimension (ldc,n). (input and output)
CGEMMS: Complex array of dimension (ldc,n). (input and output)
Contains the matrix C.
Before entry, the leading m-by-n part of array c must contain matrix C, except when beta is 0; in
which case, c need not be set. On exit, the m-by-n result matrix overwrites array c.
ldc Integer. (input)
Specifies the first dimension of c as declared in the calling (sub)program. ldc ≥ MAX(1,m).
work User-allocated workspace required only in UNICOS version.
UNICOS SGEMMS: Real array of dimension 2.34 . MAX (m ,k ) . MAX (k ,n ).
UNICOS CGEMMS: Complex array of dimension 2.34 . MAX (m ,k ) . MAX (k ,n ).
On exit, work is overwritten.

150 004– 2081– 002


SGEMMS ( 3S ) SGEMMS ( 3S )

NOTES
SGEMMS and CGEMMS are extensions to the standard Level 3 Basic Linear Algebra Subprograms (Level 3
BLAS).
Strassen’s Algorithm
Strassen’s algorithm for matrix multiplication is a complex, recursive algorithm that performs the
multiplication in a manner completely different from the usual inner product method.
Suppose you want to multiply a pair of square matrices of order n. The typical inner product method for
matrix multiplication has an operations count on the order of n 3; the operations count for Strassen’s
2.8
algorithm is on the order of n . The trade-off is that Strassen’s algorithm requires an auxiliary work space.
The UNICOS implementations of SGEMMS and CGEMMS require an array work, supplied by the calling
program, of the following size (or equivalently, a real array of twice this dimension for CGEMMS):
2.34*MAX(m,k)*MAX(k,n)

The work array is overwritten, and no diagnostic is given if the supplied array is too small.
For small problem sizes of dimensions less than or equal to 128 on UNICOS systems and 360 on
UNICOS/mk systems, SGEMMS and CGEMMS call SGEMM and CGEMM, respectively, to compute the matrix
multiply. Only when the problem sizes are larger than the dimensions indicated above is the Strassen’s
algorithm used. Because of the very different order of operations carried out by Strassen’s algorithm,
numerical results of SGEMMS and CGEMMS may differ slightly from those of SGEMM and CGEMM.

SEE ALSO
SGEMM(3S) to multiply general matrices by using the more standard inner product algorithm

004– 2081– 002 151


SSYMM ( 3S ) SSYMM ( 3S )

NAME
SSYMM, CSYMM – Multiplies a real or complex general matrix by a real or complex symmetric matrix

SYNOPSIS
CALL SSYMM (side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc)
CALL CSYMM (side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc)

IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses only private data.

DESCRIPTION
SSYMM multiplies a real general matrix by a real symmetric matrix.
CSYMM multiplies a complex general matrix by a complex symmetric matrix.
SSYMM and CSYMM perform one of the following matrix-matrix operations:
C ← α AB + β C
where α and β are scalars, A is a symmetric matrix, and B and C are m-by-n matrices.
These routines have the following arguments:
side Character*1. (input)
Specifies whether the symmetric matrix A appears on the left or right in the operation, as
follows:
side = ’L’ or ’l’: C ← α A B + β C
side = ’R’ or ’r’; C ← α B A + β C
uplo Character*1. (input)
Specifies whether the upper or lower triangular part of the symmetric matrix A is referenced, as
follows:
uplo = ’U’ or ’u’: only the upper triangular part of the symmetric matrix is referenced.
uplo = ’L’ or ’l’: only the lower triangular part of the symmetric matrix is referenced.
m Integer. (input)
Specifies the number of rows in matrix C. m must be ≥ 0.
n Integer. (input)
Specifies the number of columns in matrix C. n must be ≥ 0.
alpha SSYMM: Real. (input)
CSYMM: Complex. (input)
Scalar factor α.

152 004– 2081– 002


SSYMM ( 3S ) SSYMM ( 3S )

a SSYMM: Real array of dimension (lda,ka). (input)


CSYMM: Complex array of dimension (lda,ka). (input)
When side = ’L’ or ’l’, ka is m; otherwise, it is n. Contains the matrix A.
Before entry with side = ’L’ or ’l’, the m-by-m part of array a must contain the symmetric
matrix A, such that:
• If uplo = ’U’ or ’u’, the leading m-by-m upper triangular part of array a must contain the
upper triangular part of the symmetric matrix. The strictly lower triangular part of a is not
referenced.
• If uplo = ’L’ or ’l’, the leading m-by-m lower triangular part of array a must contain the
lower triangular part of the symmetric matrix. The strictly upper triangular part of a is not
referenced.
Before entry with side = ’R’ or ’r’, the n-by-n part of array a must contain the symmetric matrix
A, such that:
• If uplo = ’U’ or ’u’, the leading n-by-n upper triangular part of array a must contain the
upper triangular part of the symmetric matrix. The strictly lower triangular part of a is not
referenced.
• If uplo = ’L’ or ’l’, the leading n-by-n lower triangular part of array a must contain the lower
triangular part of the symmetric matrix. The strictly upper triangular part of a is not
referenced.
lda Integer. (input)
Specifies the first dimension of a as declared in the calling program. When side = ’L’ or ’l’, lda
≥ MAX(1,m); otherwise, lda ≥ MAX(1,n).
b SSYMM: Real array of dimension (ldb,n). (input)
CSYMM: Complex array of dimension (ldb,n). (input)
Contains the matrix B. Before entry, the leading m-by-n part of array b must contain matrix B.
ldb Integer. (input)
Specifies the first dimension of b as declared in the calling program. ldb ≥ MAX(1,m).
beta SSYMM: Real. (input)
CSYMM: Complex. (input)
Scalar factor β. When beta is supplied as 0, c need not be set on input.
c SSYMM: Real array of dimension (ldc,n). (input and output)
CSYMM: Complex array of dimension (ldc,n). (input and output)
Contains the matrix C. Before entry, the leading m-by-n part of array c must contain matrix C,
except when beta is 0; in which case, c need not be set. On exit, the m-by-n updated matrix
overwrites array c.
ldc Integer. (input)
Specifies the first dimension of c as declared in the calling program. ldc ≥ MAX(1,m).

004– 2081– 002 153


SSYMM ( 3S ) SSYMM ( 3S )

NOTES
SSYMM and CSYMM are Level 3 Basic Linear Algebra Subprograms (Level 3 BLAS).

SEE ALSO
CHEMM(3S)

154 004– 2081– 002


SSYR2K ( 3S ) SSYR2K ( 3S )

NAME
SSYR2K, CSYR2K – Performs symmetric rank 2k update of a real or complex symmetric matrix

SYNOPSIS
CALL SSYR2K (uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
CALL CSYR2K (uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc)

IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses only private data.

DESCRIPTION
SSYR2K performs a symmetric rank 2k update of a real symmetric matrix.
CSYR2K performs a symmetric rank 2k update of a complex symmetric matrix.
SSYR2K and CSYR2K perform one of the following symmetric rank 2k operations:
T T
C ← α AB + α BA + β C
T T
C←αA B+αB A+βC
where
• α and β are scalars
• C is an n-by-n symmetric matrix
• A and B are n-by-k matrices in the first operation listed previously and k-by-n matrices in the second
T T
• A and B are transposes of A and B, respectively
These routines have the following arguments:
uplo Character*1. (input)
Specifies whether the upper or lower triangular part of array c is referenced, as follows:
uplo = ’U’ or ’u’: only the upper triangular part of c is referenced.
uplo = ’L’ or ’l’: only the lower triangular part of c is referenced.
trans Character*1. (input)
Specifies the operation to be performed, as follows:
T T
trans = ’N’ or ’n’: C ← α AB + α BA + β C
T T
trans = ’T’ or ’t’:C ← α A B + α B A + β C
n Integer. (input)
Specifies the order of matrix C. n must be ≥ 0.
k Integer. (input)

004– 2081– 002 155


SSYR2K ( 3S ) SSYR2K ( 3S )

On entry with trans = ’N’ or ’n’, k specifies the number of columns of matrices A and B.
On entry with trans = ’T’ or ’t’, k specifies the number of rows of matrices A and B.
k must be ≥ 0.
alpha SSYR2K: Real. (input)
CSYR2K: Complex. (input) Scalar factor α.
a SSYR2K: Real array of dimension (lda,ka). (input)
CSYR2K: Complex array of dimension (lda,ka). (input)
When trans = ’N’ or ’n’, ka is k; otherwise, it is n. Contains the matrix A.
Before entry with trans = ’N’ or ’n’, the leading n-by-k part of array a must contain matrix A;
otherwise, the leading k-by-n part of array a must contain matrix A.
lda Integer. (input)
Specifies the first dimension of a as declared in the calling program. If trans = ’N’ or ’n’, lda ≥
MAX(1,n); otherwise, lda ≥ MAX(1,k).
b SSYR2K: Real array of dimension (ldb,kb). (input)
CSYR2K: Complex array of dimension (ldb,kb). (input)
When trans = ’N’ or ’n’, kb is k; otherwise, it is n. Contains the matrix B.
Before entry with trans = ’N’ or ’n’, the leading n-by-k part of array b must contain matrix B;
otherwise, the leading k-by-n part of array b must contain matrix B.
ldb Integer. (input)
Specifies the first dimension of b as declared in the calling program. If trans = ’N’ or ’n’, ldb ≥
MAX(1,n); otherwise, ldb ≥ MAX(1,k).
beta SSYR2K: Real. (input)
CSYR2K: Complex. (input)
Scalar factor β.
c SSYR2K: Real array of dimension (ldc,n). (input and output)
CSYR2K: Complex array of dimension (ldc,n). (input and output)
Contains the matrix C.
Before entry with uplo = ’U’ or ’u’, the leading n-by-n upper triangular part of array c must
contain the upper triangular part of the symmetric matrix. The strictly lower triangular part of c
is not referenced. On exit, the upper triangular part of the updated matrix overwrites the upper
triangular part of array c.
Before entry with uplo = ’L’ or ’l’, the leading n-by-n lower triangular part of array c must
contain the lower triangular part of the symmetric matrix. The strictly upper triangular part of c
is not referenced. On exit, the lower triangular part of the updated matrix overwrites the lower
triangular part of array c.
ldc Integer. (input)
Specifies the first dimension of c as declared in the calling program. ldc ≥ MAX(1,n).

156 004– 2081– 002


SSYR2K ( 3S ) SSYR2K ( 3S )

NOTES
SSYR2K and CSYR2K are Level 3 Basic Linear Algebra Subprograms (Level 3 BLAS).

SEE ALSO
CHER2K(3S)

004– 2081– 002 157


SSYRK ( 3S ) SSYRK ( 3S )

NAME
SSYRK, CSYRK – Performs symmetric rank k update of a real or complex symmetric matrix

SYNOPSIS
CALL SSYRK (uplo, trans, n, k, alpha, a, lda, beta, c, ldc)
CALL CSYRK (uplo, trans, n, k, alpha, a, lda, beta, c, ldc)

IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses only private data.

DESCRIPTION
SSYRK performs a symmetric rank k update of a real symmetric matrix.
CSYRK performs a symmetric rank k update of a complex symmetric matrix.
SSYRK and CSYRK perform one of the following symmetric rank k operations:
T
C ← α AA + β C
T
C←αA A+βC
T
where A is the transpose of A; α and β are scalars; C is an n-by-n symmetric matrix; A is an n-by-k matrix
in the first operation listed previously, and a k-by-n matrix in the second.
These routines have the following arguments:
uplo Character*1. (input)
Specifies whether the upper or lower triangular part of array c is referenced, as follows:
uplo = ’U’ or ’u’: only the upper triangular part of c is referenced.
uplo = ’L’ or ’l’: only the lower triangular part of c is referenced.
trans Character*1. (input)
Specifies the operation to be performed, as follows:
trans = ’N’ or ’n’: C ← α A A T + β C
trans = ’T’ or ’t’: C ← α A T A + β C
n Integer. (input)
Specifies the order of matrix C. n must be ≥ 0.
k Integer. (input)
On entry with trans = ’N’ or ’n’, k specifies the number of columns of matrix A.
On entry with trans = ’T’ or ’t’, k specifies the number of rows of matrix A.
k must be ≥ 0.

158 004– 2081– 002


SSYRK ( 3S ) SSYRK ( 3S )

alpha SSYRK: Real. (input)


CSYRK: Complex. (input)
Scalar factor α.
a SSYRK: Real array of dimension (lda,ka). (input)
CSYRK: Complex array of dimension (lda,ka). (input)
When transa = ’N’ or ’n’, ka is k; otherwise, it is n. Contains the matrix A.
Before entry with trans = ’N’ or ’n’, the leading n-by-k part of array a must contain matrix A;
otherwise, the leading k-by-n part of array a must contain matrix A.
lda Integer. (input)
Specifies the first dimension of a as declared in the calling program. If trans = ’N’ or ’n’, lda ≥
MAX(1,n); otherwise, lda ≥ MAX(1,k).
beta SSYRK: Real. (input)
CSYRK: Complex. (input)
Scalar factor β.
c SSYRK: Real array of dimension (ldc,n). (input and output)
CSYRK: Complex array of dimension (ldc,n). (input and output)
Contains the matrix C. Before entry with uplo = ’U’ or ’u’, the leading n-by-n upper triangular
part of array c must contain the upper triangular part of the symmetric matrix. The strictly
lower triangular part of c is not referenced. On exit, the upper triangular part of the updated
matrix overwrites the upper triangular part of array c.
Before entry with uplo = ’L’ or ’l’, the leading n-by-n lower triangular part of array c must
contain the lower triangular part of the symmetric matrix. The strictly upper triangular part of c
is not referenced. On exit, the lower triangular part of the updated matrix overwrites the lower
triangular part of array c.
ldc Integer. (input)
Specifies the first dimension of c as declared in the calling program. ldc ≥ MAX(1,n).

NOTES
SSYRK and CSYRK are Level 3 Basic Linear Algebra Subprograms (Level 3 BLAS).

SEE ALSO
CHERK(3S)

004– 2081– 002 159


STRMM ( 3S ) STRMM ( 3S )

NAME
STRMM, CTRMM – Multiplies a real or complex general matrix by a real or complex triangular matrix

SYNOPSIS
CALL STRMM (side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb)
CALL CTRMM (side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb)

IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses only private data.

DESCRIPTION
STRMM multiplies a real general matrix by a real triangular matrix.
CTRMM multiplies a complex general matrix by a complex triangular matrix.
STRMM and CTRMM perform one of the matrix-matrix operations:
B ←α op(A) B
B ←α B op(A)
where α is a scalar; B is an m-by-n matrix; A is either a unit or nonunit upper or lower triangular matrix,
and op(A) is one of the following:
• op(A) = A
• op(A) = A T
• op(A) = A H (CTRMM only)
where
• A T is the transpose of A
• A H is the conjugate transpose of A.
These routines have the following arguments:
side Character*1. (input)
Specifies whether op (A ) multiplies B from the left or right, as follows:
side = ’L’ or ’l’: B ← α op (A ) B
side = ’R’ or ’r’: B ← α B op (A )
uplo Character*1. (input)
Specifies whether matrix A is an upper or lower triangular matrix, as follows:
uplo = ’U’ or ’u’: A is an upper triangular matrix.
uplo = ’L’ or ’l’: A is a lower triangular matrix.

160 004– 2081– 002


STRMM ( 3S ) STRMM ( 3S )

transa Character*1. (input)


Specifies the form of op(A) to be used in the matrix multiplication, as follows:
transa = ’N’ or ’n’: op(A) = A
T
transa = ’T’ or ’t’, op(A) = A
T H
transa = ’C’ or ’c’: op(A) = A (STRMM), or op(A) = A (CTRMM)
diag Character*1. (input)
Specifies whether A is unit triangular, as follows:
diag = ’U’ or ’u’: A is assumed to be unit triangular.
diag = ’N’ or ’n’: A is not assumed to be unit triangular.
m Integer. (input)
Specifies the number of rows in B. m must be ≥ 0.
n Integer. (input)
Specifies the number of columns in B. n must be ≥ 0.
alpha STRMM: Real. (input)
CTRMM: Complex. (input)
Scalar factor α. When alpha is 0, a is not referenced and b need not be set before entry.
a STRMM: Real array of dimension (lda,k). (input)
CTRMM: Complex array of dimension (lda,k). (input)
When side = ’L’ or ’l’, k is m; when side = ’R’ or ’r’, k is n. Contains the matrix A.
Before entry with uplo = ’U’ or ’u’, the leading k-by-k upper triangular part of array a must
contain the upper triangular matrix. The strictly lower triangular part of a is not referenced.
Before entry with uplo = ’L’ or ’l’, the leading k-by-k lower triangular part of array a must
contain the lower triangular matrix. The strictly upper triangular part of a is not referenced.
When diag = ’U’ or ’u’, the diagonal elements of a are not referenced, but they are assumed to
be unity.
lda Integer. (input)
Specifies the first dimension of a as declared in the calling program.
When side = ’L’ or ’l’, lda ≥ MAX(1,m). When side = ’R’ or ’r’, lda ≥ MAX(1,n).
b STRMM: Real array of dimension (ldb,n). (input and output)
CTRMM: Complex array of dimension (ldb,n). (input and output)
Contains the matrix B.
Before entry, the leading m-by-n part of array b must contain matrix B. On exit, the transformed
matrix overwrites array b.
ldb Integer. (input)
Specifies the first dimension of b as declared in the calling program. ldb ≥ MAX(1,m).

004– 2081– 002 161


STRMM ( 3S ) STRMM ( 3S )

NOTES
STRMM and CTRMM are Level 3 Basic Linear Algebra Subprograms (Level 3 BLAS).

162 004– 2081– 002


INTRO_FFT ( 3S ) INTRO_FFT ( 3S )

NAME
INTRO_FFT – Introduction to signal processing routines

IMPLEMENTATION
See individual man pages for implementation details

DESCRIPTION
The signal processing routines consist of Fast Fourier Transform (FFT) routines, filter routines, and
convolution routines.
Fast Fourier Transform Routines
These routines apply to one or more FFTs. The Standard FFT package, available only on UNICOS systems,
is discussed first. Then the superseded FFT routines, available on all Cray vector architectures, are
discussed.
Standard FFT package (UNICOS systems)
The following is a matrix of preferred FFT routines. These routines are preferred because they have more
functionality, are more generally applicable, and are more fully optimized than the superseded routines that
follow them. Each of these routines is multitasked, but they also are highly optimized for single-processor
use. Each routine can compute either a forward or inverse Fourier transform.
In this matrix, columns of the matrix represent input and output data types for the routines in each column:
• Complex-to-complex implies complex input and output. In this column, the routine name in parentheses
is the name of the equivalent UNICOS routine, which is provided in this release for backward
compatibility. Each routine named in this column has a man page in this section.
• Real-to-complex implies real input and complex output. Each routine named in this column has a man
page in this section.
• Complex-to-real implies complex input and real output. Each routine named in this column is
documented with the real-to-complex routine in the same row.
Rows of the matrix represent the number of dimensions for which the FFT is calculated for the routines in
each row:
• One-dimensional (single) calculates one FFT in one dimension.
• One-dimensional (multiple) calculates an FFT in one dimension for each column of a two-dimensional
matrix.
• Two-dimensional calculates one FFT in two dimensions.
• Three-dimensional calculates one FFT in three dimensions.
Those routines marked with an asterisk (*) are available on UNICOS/mk systems only.

004– 2081– 002 163


INTRO_FFT ( 3S ) INTRO_FFT ( 3S )

Dimensions Complex-to-complex Real-to-complex Complex-to-real

One-dimensional CCFFT (CFFT) SCFFT CSFFT


(single)
One-dimensional CCFFTM (MCFFT) SCFFTM CSFFTM
(multiple)
Two-dimensional CCFFT2D SCFFT2D CSFFT2D
(CFFT2D) PSCFFT2D* PCSFFT2D*
PCCFFT2d*
Three-dimensional CCFFT3D SCFFT3D CSFFT3D
(CFFT3D) PSCFFT3d* PCSFFT3D*
PCCFFT3D*

Superseded FFT routines


The following is a matrix of superseded FFT routines. Each routine has its own man page.
The columns of the matrix represent data types, as in the previous matrix. The rows of the matrix represent
the number of FFTs calculated by the routines in each row:
• Single calculates one FFT
• Multiple calculates an FFT for each column of a two-dimensional matrix
No two- and three-dimensional FFT routines are included among the superseded routines.

Number of
FFTs Complex-to-complex Real-to-complex Complex-to-real

Single CFFT2 SCFFT2 CSFFT2


Multiple CFFTMLT RFFTMLT RFFTMLT
(forward transform (inverse transform
only) only)

164 004– 2081– 002


INTRO_FFT ( 3S ) INTRO_FFT ( 3S )

Linear Digital Filter Routines


These filter routines are used for filter analysis and design, but they also solve more general problems. The
following table contains a summary of the filter routines. Each routine has its own man page.

Purpose Name

Computes a correlation of two vectors FILTERG


Computes a correlation of two vectors (assuming the filter FILTERS
coefficient vector is symmetric)
Solves the Weiner-Levinson linear equations OPFILT

Convolution Routines
The convolution routines compute the convolution of a complex sequence with one or more other complex
sequences. The following table contains a summary of the convolution routines. Each routine has its own
man page.

Purpose Name

Computes a standard complex convolution CCNVL


Computes a convolution using FFTs CCNVLF

UNICOS/mk Routines
These routines run only on UNICOS/mk systems. Each routine has its own man page.

Purpose Name

Performs the convolution of two sequences of real numbers SCONV


Performs the correlation of two sequences of real numbers SCORR
Performs the correlation of two sequences of real numbers SCORRS
(symmetric filter)

32-bit UNICOS/mk routines

Purpose Name

Applies a multitasked complex-to-complex FFT GGFFT


Performs the convolution of two sequences of real numbers HCONV
Performs the correlation of two sequences of real numbers HCORR

004– 2081– 002 165


INTRO_FFT ( 3S ) INTRO_FFT ( 3S )

Purpose Name

Performs the correlation of two sequences of real numbers HCORRS


Computes a real-to-complex or complex-to-real FFT HGFFT
GHFFT
Solves Weiner-Levinson linear equations HOPFILT

166 004– 2081– 002


CCFFT ( 3S ) CCFFT ( 3S )

NAME
CCFFT – Applies a multitasked complex-to-complex Fast Fourier Transform (FFT)

SYNOPSIS
CALL CCFFT (isign, n, scale, x, y, table, work, isys)

IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses private data only.

DESCRIPTION
CCFFT computes the Fast Fourier Transform (FFT) of the complex vector x, and it stores the result in vector
y.
In FFT applications, it is customary to use zero-based subscripts; the formulas are simpler that way.
Suppose that the arrays are dimensioned as follows:
COMPLE X X(0 :N- 1), Y(0 :N-1)

The output array is the FFT of the input array, using the following formula for the FFT:

n −1
Σ
. j .k
Yk = scale . X j . ωisign for k = 0, . . ., n −1
j =0

where
2.π.i
ω=e n

i = + √−1
π = 3.14159. . .
isign = ±1
Different authors use different conventions for which of the transforms, isign = +1 or isign = – 1, is the
forward or inverse transform, and what the scale factor should be in either case. You can make this routine
compute any of the various possible definitions, however, by choosing the appropriate values for isign and
scale.
The relevant fact from FFT theory is this: If you take the FFT with any particular values of isign and scale,
the mathematical inverse function is computed by taking the FFT with – isign and 1 /(n .scale ). In particular,
if you use isign = +1 and scale = 1.0 you can compute the inverse FFT by using isign = – 1 and
scale = 1.0 /n.

004– 2081– 002 167


CCFFT ( 3S ) CCFFT ( 3S )

The output array may be the same as the input array, provided that n has at least 2 factors.
On UNICOS/mk systems only: if the length of the FFT (i.e. n) is not factorizable into powers of 2, 3 and 5
(that is, when a fast mixed-radix algorithm cannot be used) the user may specify that a fast chirp-z
transform-based algorithm be used instead of a slow O(nˆ2) algorithm. The isys variable allows the user to
exercise this option. Setting the value of isys to 0 uses the slow algorithm while setting it to 1 flags the use
of the fast algorithm. Depending on the value of isys specified, the size of the table vector and workspace
vector vary.
This routine has the following arguments:
isign Integer. (input)
Specifies whether to initialize the table array or to do the forward or inverse Fourier transform,
as follows:
If isign = 0, the routine initializes the table array and returns. In this case, the only arguments
used or checked are isign, n, and table.
If isign = +1 or – 1, the value of isign is the sign of the exponent used in the FFT formula.
n Integer. (input)
Size of the transform (the number of values in the input array). n ≥ 2.
scale Real. (input)
Scale factor. Each element of the output array is multiplied by scale after taking the Fourier
transform, as defined by the previous formula.
x Complex array of dimension (0:n– 1). (input)
Input array of values to be transformed.
y Complex array of dimension (0:n– 1). (output)
Output array of transformed values. The output array may be the same as the input array. In
that case, the transform is done in place and the input array is overwritten with the transformed
values.
table UNICOS: Real array of dimension 100 + 8 . n. (input or output)
UNICOS/mk: Real array of dimension 2n when isys = 0 and real array of dimension 12n when
isys = 1.
Table of factors and trigonometric functions.
If isign = 0, the routine initializes table (table is output only).
If isign = +1 or – 1, the values in table are assumed to be initialized already by a prior call with
isign = 0 (table is input only).
work UNICOS: Real array of dimension 8n. (workspace)
UNICOS/mk: Real array of dimension 4n when isys = 0 and real array of dimension 8n when
isys = 1 (workspace).
Work array. This is a scratch array used for intermediate calculations. Its address space must
be different address space from that of the input and output arrays.

168 004– 2081– 002


CCFFT ( 3S ) CCFFT ( 3S )

isys Integer. (Input)


On UNICOS/mk systems: If n is prime or not factorizable into powers of 2, 3 and 5 then
intializing isys = 1 results in the use of a fast chirp-z based FFT algorithm. When n is
factorizable into powers of 2, 3 and 5, then setting isys = 0 results in the fastest algorithm being
used. Setting isys = 1, in the latter case results in a slower algorithm.
On UNICOS systems, the value of isys must be set to 0.

NOTES
This section contains information about the algorithm for CCFFT, the initialization of the table array, the
declaration of dimensions for x and y arrays, some performance tips, and some implementation dependent
details.
Algorithm
UNICOS/mk:
The algorithm used is a variant of Agarwal’s algorithm when n is factorizable into powers of 2, 3 and 5. If
n is prime or is not factorizable into powers of 2, 3 and 5 then setting isys to 1 results in the use of a fast
(O(n log(n))) algorithm based on the chirp-z transform. For example, 120 and 256 are factorizable into
powers of 2, 3 and 5, but 254 = 2 . 127 is not factorizable. To obtain considerable reduction in time to
compute the FFT of a 254 length vector, the integer isys may be set to 1.
UNICOS systems:
The algorithm is the "four-step" method, in which the data is considered as a matrix of dimensions n1 by n2,
for which n1 . n2 = n , and the values of n1 and n2 are chosen for efficiency.
The rows are transformed, the phase factors are applied, the columns are transformed, and finally the matrix
is transposed to obtain the result.
Initialization
The table array stores the trigonometric tables used in calculation of the FFT. You must initialize table by
calling the routine with isign = 0 prior to doing the transforms. If the value of the problem size, n, does not
change, table does not have to be reinitialized.
Dimensions
In the preceding description, it is assumed that array subscripts were zero-based, as is customary in FFT
applications. Thus, the input and output arrays are declared as follows:
COMPLE X X(0 :N- 1)
COMPLE X Y(0 :N- 1)

However, if you prefer to use the more customary Fortran style with subscripts starting at 1 you do not have
to change the calling sequence, as in the following (assuming N > 0):
COM PLE X X(N )
COM PLE X Y(N )

004– 2081– 002 169


CCFFT ( 3S ) CCFFT ( 3S )

Performance Tips
This routine computes an FFT for any value of n, but the performance for a given value of n depends on the
prime factorization of n. This fact is characteristic of all FFT algorithms.
Fastest performance is realized when n is a power of 2. In that case, the number of floating-point operations
is approximately 5nlog 2 (n).
If n contains factors of 3, computation time is slightly longer, because more floating-point operations are
required. If n contains powers of 5, it is longer still. Slowest performance is when n is a prime number. In
that case, the number of floating-point operations is approximately 8 . n 2 when isys = 0.
On UNICOS/mk systems only, if n is prime, setting isys = 1 results in the use of an algorithm whose
complexity is approximately the same as using a fast mixed-radix algorithm on a vector of twice the length.
The kernel routines are optimized for values of n that are products of powers of 2, 3, and 5.
(Because the kernel routines have a special case for multiples of 4, powers of 4 will be slightly faster than
odd powers of 2.)
Implementation-dependent Items
The UNICOS and UNICOS/mk FFT routines were designed so that they can be implemented efficiently on
many different architectures. The calling sequence is the same in any implementation. Certain details,
however, depend on the particular implementation. These details are confined to two areas:
• The first area is the size of the table and work arrays. Different systems may need different sizes. The
subroutine call requires no change, but you may have to change array sizes in the DIMENSION or type
statements that declare the arrays.
• The second area is the isys parameter array, an array that gives certain implementation-specific
information. All features and functions of the FFT routines specific to any particular implementation are
confined to this isys array. On any implementation, you can use the default values by using an argument
value of 0.
In the current UNICOS implementation, no special options are supported; therefore, you may specify that
the isys parameter always be given as a constant 0. Subsequent software releases may provide other
options.

EXAMPLES
These examples use the table and workspace sizes appropriate to UNICOS systems.
Example 1: Initialize the complex array table in preparation for doing an FFT of size 1024. Only the isign,
n, and table arguments are used in this case. You can use dummy arguments or zeros for the other
arguments in the subroutine call.
REAL TABLE( 100 + 8*1 024)
CALL CCF FT( 0, 102 4, 0.0 , DUM MY, DUM MY, TABLE, DUM MY, 0)

170 004– 2081– 002


CCFFT ( 3S ) CCFFT ( 3S )

Example 2: x and y are complex arrays of dimension (0:1023). Take the FFT of x and store the results in y.
Before taking the FFT, initialize the table array, as in example 1.
COM PLE X X(0 :10 23), Y(0:10 23)
REA L TABLE( 100 + 8*1 024)
REA L WOR K(8 *10 24)
...
CAL L CCF FT(0, 1024, 1.0 , X, Y, TAB LE, WOR K, 0)
CAL L CCF FT(1, 1024, 1.0 , X, Y, TAB LE, WOR K, 0)

Example 3: Using the same x and y as in example 2, take the inverse FFT of y and store it back in x. The
scale factor 1/1024 is used. Assume that the table array is already initialized.
CAL L CCFFT( -1, 1024, 1.0 /10 24.0, Y, X, TAB LE, WORK, 0)

Example 4: Do the same computation as in example 2, but assume that the lower bound of each array is 1,
rather than 0. No change was needed in the subroutine calls.
COMPLE X X(1 024 ), Y(1 024 )
...
CALL CCFFT( 0, 102 4, 1.0, X, Y, TAB LE, WOR K, 0)
CALL CCFFT( 1, 102 4, 1.0, X, Y, TAB LE, WOR K, 0)

Example 5: Do the same computation as in example 4, but put the output back in array x to save storage
space. Assume that table is already initialized.
COM PLE X X(1 024 )
...
CALL CCFFT( 1, 102 4, 1.0, X, X, TAB LE, WOR K, 0)

SEE ALSO
CCFFTM(3S), SCFFT(3S), SCFFTM(3S)

004– 2081– 002 171


CCFFT2D ( 3S ) CCFFT2D ( 3S )

NAME
CCFFT2D – Applies a two-dimensional complex-to-complex Fast Fourier Transform (FFT)

SYNOPSIS
CALL CCFFT2D (isign, n1, n2, scale, x, ldx, y, ldy, table, work, isys)

IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, these subroutines execute on a single processor and use only private data.

DESCRIPTION
CCFFT2D computes the two-dimensional complex Fast Fourier Transform (FFT) of the complex matrix X,
and it stores the results in the complex matrix Y.
In FFT applications, it is customary to use zero-based subscripts; the formulas are simpler that way.
Suppose that the arrays are dimensioned as follows:
COM PLEX X(0:n1 -1, 0:n2-1 )
COM PLEX Y(0:n1 -1, 0:n2-1 )

CCFFT2D computes the formula:


n1−1 n2−1 .k .k
Σ Σ
j . ω2 j
Yk ,k = scale .
1 2
X j ,j . ω1
1 2
1 1 2 2
for k 1 = 0, . . ., n1−1, k 2 = 0, . . ., n2−1
j =0 j =0
1 2

where
isign . 2 . π . i
ω1 = e isign n1
. 2 . π . i i = +√−1
ω2 = e n2
π = 3.14159. . .
isign = ±1
Different authors use different conventions for which of the transforms, isign = +1 or isign = – 1, is the
forward or inverse transform, and what the scale factor should be in either case. You can make this routine
compute any of the various possible definitions, however, by choosing the appropriate values for isign and
scale.
The relevant fact from FFT theory is this: If you take the FFT with any particular values of isign and scale,
the mathematical inverse function is computed by taking the FFT with – isign and 1 / (n1.n2.scale). In
particular, if you use isign = +1 and scale = 1.0 for the forward FFT, you can compute the inverse FFT by
using isign = – 1 and scale = 1 /(n1.n2).

172 004– 2081– 002


CCFFT2D ( 3S ) CCFFT2D ( 3S )

UNICOS/mk systems only


If the values of either n1 or n2 are prime or not factorizable into powers of 2, 3 and 5 significant
improvements in computational time can be obtained by using the following initializations of isys which is a
vector of length 3.
If both n1 and n2 are factorizable into powers of 2, 3 and 5, and for example, n1 = 30 and n2 = 120 then
isys(1) = 2
isys(2) = 0
isys(3) = 0
If any one dimension is not factorizable into powers of 2, 3 and 5 then the following intializations of isys
yield the fastest times:
n1 not factorizable but n2 factorizable
isys(1) = 2
isys(2) = 1
isys(3) = 0
n1 factorizable but n2 not factorizable
isys(1) = 2
isys(2) = 0
isys(3) = 1
both n1 and n2 not factorizable
isys(1) = 2
isys(2) = 1
isys(3) = 1
Here isys(1) indicates the dimension of the matrix over which the FFT is being performed.
If the numbers n1 and n2 are not known ahead of time, then isys(2) and isys(3) could be initialized to 0 and
the routine would compute correct results for n1 and n2, albeit slowly, if either n1 or n2 were prime.
The storage requirements for the vector table depend on the values of the isys vector.
This feature does not exist for UNICOS systems and isys is ignored on those machines.
This routine has the following arguments:
isign Integer. (input)
Specifies whether to initialize the table array or to do the forward or inverse transform as
follows:
If isign = 0, the routine initializes the table array and returns. In this case, the only arguments
used or checked are isign, n1, n2, table.
If isign = +1 or – 1, the value of isign is the sign of the exponent used in the FFT formula.
n1 Integer. (input)
Transform size in the first dimension. If n1 is not positive, CCFFT2D returns without
performing a transform.

004– 2081– 002 173


CCFFT2D ( 3S ) CCFFT2D ( 3S )

n2 Integer. (input)
Transform size in the second dimension. If n2 is not positive, CCFFT2D returns without
performing a transform.
scale Real. (input)
Scale factor. Each element of the output array is multiplied by scale factor after taking the
Fourier transform, as defined previously.
x Complex array of dimension (0:ldx– 1, 0:n2– 1). (input)
Input array of values to be transformed.
ldx Integer. (input)
The number of rows in the x array, as it was declared in the calling program (the leading
dimension of x). ldx ≥ MAX(n1, 1).
y Complex array of dimension (0:ldy– 1, 0:n2– 1). (output)
Output array of transformed values. The output array may be the same as the input array, in
which case, the transform is done in place (the input array is overwritten with the transformed
values). In this case, it is necessary that ldx = ldy.
ldy Integer. (input)
The number of rows in the y array, as it was declared in the calling program (the leading
dimension of y). ldy ≥ MAX(n1, 1).
table UNICOS systems: real array of dimension 100 + 2(n1 + n2). (input or output)
UNICOS/mk systems: Real array of dimension 2(n1 + n2) if both isys(2) and isys(3) are equal to
zero. Private real vector of length 12(n1 + n2) if either isys(2) or isys(3) is equal to 1. (input or
output)
Table of factors and trigonometric functions.
If isign = 0, the routine initializes table (table is output only).
If isign = +1 or – 1, the values in table are assumed to be initialized already by a prior call with
isign = 0 (table is input only).
work UNICOS systems: real array of dimension 512 . MAX(n1, n2). (scratch output)
UNICOS/mk systems: Real array of dimension 2 . n1 . n2 (scratch output)
Work array. This is a scratch array used for intermediate calculations. Its address space must
be different from that of the input and output arrays.
isys UNICOS systems: ignored.
UNICOS/mk systems: Integer array of length 3. (input)
isys(1) = 2
isys(2) = 0 (if n1 is factorizable into powers of 2, 3 and 5)
1 ( if n1 is not factorizable into powers of 2, 3 and 5)
isys(3) = 0 (if n1 is factorizable into powers of 2, 3 and 5)
1 ( if n1 is not factorizable into powers of 2, 3 and 5)

174 004– 2081– 002


CCFFT2D ( 3S ) CCFFT2D ( 3S )

See PCCFFT2D(3S) for a more detailed explanation of the isys parameter.

NOTES
This section contains information about the algorithm for CCFFT2D, the initialization of the table array, the
declaration of dimensions for x and y arrays, some performance tips, and some implementation dependent
details.
The following notes are for UNICOS systems only. CCFFT2D(3S) on UNICOS/mk systems provides the
functionality of PCCFFT2D(3S) on a single PE. For notes about CCFFT2D on UNICOS/mk systems, see
PCCFFT2D(3S).
Algorithm
CCFFT2D uses a routine very much like CCFFTM(3S) to do multiple FFTs first on all columns in an input
matrix and then on all of the rows.
Initialization
The table array stores factors of n1 and n2 and also trigonometric tables that are used in calculation of the
FFT. This table must be initialized by calling the routine with isign = 0. If the values of the problem sizes,
n1 and n2, do not change, the table does not have to be reinitialized.
Dimensions
In the preceding description, it is assumed that array subscripts were zero-based, as is customary in FFT
applications. Thus, the input and output arrays are declared as follows:
COM PLEX X(0:ld x-1 , 0:n 2-1 )
COM PLEX Y(0:ld y-1 , 0:n 2-1 )

However, the calling sequence does not change if you prefer to use the more customary Fortran style with
subscripts starting at 1. The same values of ldx and ldy would be passed to the subroutine even if the input
and output arrays were dimensioned as follows:
COM PLEX X(l dx, n2)
COM PLEX Y(l dy, n2)

Performance Tips
This routine computes an FFT for any values of n1 and n2, but the performance depends on the prime
factorizations of n1 and n2. This fact is characteristic of all FFT algorithms.
Fastest performance is realized when both n1 and n2 are powers of 2. In that case, the number of floating-
point operations is approximately 5 . n1 . n2 . log 2 (n1 . n2).
If either n1 or n2 contains factors of 3, computation time is slightly longer, because more floating-point
operations are required. If they contain powers of 5, it is longer still.
The kernel routines are optimized for values of n1 and n2 that are products of powers of 2, 3, and 5.

004– 2081– 002 175


CCFFT2D ( 3S ) CCFFT2D ( 3S )

In the UNICOS systems implementation, it is very important to make the leading dimensions of the arrays
odd numbers (or, if that is not possible, make them an odd multiple of 2) to avoid memory bank conflicts.
Implementation-dependent Items
The UNICOS and UNICOS/mk FFT routines were designed so that they could be implemented efficiently on
many different architectures. The calling sequence is the same in any implementation. Certain details,
however, depend on the particular implementation. These details are confined to three areas:
• The first area is the size of the table and work arrays. Different systems may need different sizes. No
change is required to the subroutine call, but you may have to change array sizes in the DIMENSION or
type statements that declare the arrays.
• The second area is the isys parameter array, an argument that gives certain implementation-specific
information. All features and functions of the FFT routines that are specific to any particular
implementation are confined to this isys array. On any implementation, you can use the default values by
using an argument value of 0.
In the UNICOS systems implementation, no special options are supported; therefore, you can always
specify the isys parameter as a constant 0. Subsequent software releases may provide other options.
• The third area is the issue of which problem sizes or dimensions give optimal performance in a particular
implementation. See the Performance Tips subsection.

EXAMPLES
All examples here are for UNICOS systems only.
Example 1: Initialize the TABLE array in preparation for doing a two-dimensional FFT of size 128 by 256.
In this case only the isign, n1, n2, and table arguments are used; you can use dummy arguments or zeros for
other arguments.
REA L TABLE( 100 + 2*( 128 + 256 ))
CALL CCF FT2 D (0, 128, 256 , 0.0 , DUMMY, 1, DUMMY, 1,
& TABLE, DUM MY, 0)

Example 2: X and Y are complex arrays of dimension (0:128, 0:255). The first 128 elements of each
column contain data. For performance reasons, the extra element forces the leading dimension to be an odd
number. Take the two-dimensional FFT of X and store it in Y. Initialize the TABLE array, as in example 1.
COM PLE X X(0 :12 8, 0:2 55)
COMPLE X Y(0:12 8, 0:2 55)
REA L TAB LE( 100 + 2*( 128 + 256 ))
REAL WOR K(512*256 )
...
CAL L CCFFT2 D(0, 128 , 256, 1.0 , X, 129 , Y, 129 , TAB LE, WOR K, 0)
CAL L CCF FT2 D(1, 128 , 256 , 1.0, X, 129, Y, 129 , TAB LE, WOR K, 0)

176 004– 2081– 002


CCFFT2D ( 3S ) CCFFT2D ( 3S )

Example 3: With X and Y as in example 2, take the inverse FFT of Y and store it back in X. The scale
1
factor is used. Assume that the TABLE array is already initialized.
128.256
CAL L CCF FT2 D(-1, 128, 256 , 1.0 /(1 28. 0*256. 0), Y, 129 ,
& X, 129 , TAB LE, WOR K, 0)

Example 4: Do the same computation as in example 2, but assume that the lower bound of each array is 1,
rather than 0. The subroutine calls are not changed.
COM PLEX X(1 29, 256 )
COM PLEX Y(1 29, 256 )
...
CAL L CCFFT2 D(0 , 128 , 256 , 1.0 , X, 129 , Y, 129 , TAB LE, WOR K, 0)
CAL L CCFFT2 D(1 , 128 , 256 , 1.0 , X, 129 , Y, 129 , TAB LE, WOR K, 0)

Example 5: Do the same computation as in example 4, but put the output back in array X to save storage
space. Assume that the TABLE array is already initialized.
COM PLEX X(1 29, 256 )
...
CAL L CCFFT2 D(1 , 128 , 256 , 1.0 , X, 129 , X, 129 , TAB LE, WOR K, 0)

SEE ALSO
CCFFT(3S), CCFFT3D(3S), CCFFTM(3S), SCFFT(3S), SCFFT2D(3S), SCFFT3D(3S), SCFFTM(3S)

004– 2081– 002 177


CCFFT3D ( 3S ) CCFFT3D ( 3S )

NAME
CCFFT3D – Applies a three-dimensional complex-to-complex Fast Fourier Transform (FFT)

SYNOPSIS
CALL CCFFT3D (isign, n1, n2, n3, scale, x, ldx, ldx2, y, ldy, ldy2, table, work, isys)

IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, these subroutines execute on a single processor and use only private data.

DESCRIPTION
CCFFT3D computes the three-dimensional complex FFT of the complex matrix X, and it stores the results in
the complex matrix Y.
In FFT applications, it is customary to use zero-based subscripts; the formulas are simpler that way. So
suppose the arrays are dimensioned as follows:
COM PLEX X(0:n1 -1, 0:n 2-1, 0:n 3-1)
COMPLE X Y(0 :n1-1, 0:n2-1, 0:n 3-1)

CCFFT3D computes the formula:


n1−1 n2−1 n3−1 .k .k .k k 1 = 0, . . ., n1−1
Σ Σ Σ
j . ω2 j . ω3 j
Yk k k
1, 2, 3
= scale . X j ,j ,j . ω1
1 2 3
1 1 2 2 3 3
for k 2 = 0, . . ., n2−1
j j
1=0
j 2=0 3=0 k 3 = 0, . . ., n3−1
where: isign . 2 . π . i
ω1 = e n1
i = +√−1
isign . 2 . π . i
ω2 = e n2
π = 3.14159. . .
isign . 2 . π . i
ω3 = e n3
isign = ±1
Different authors use different conventions for which of the transforms, isign = +1 or isign = – 1, is the
forward or inverse transform, and what the scale factor should be in either case. You can make this routine
compute any of the various possible definitions, however, by choosing the appropriate values for isign and
scale.
The relevant fact from FFT theory is this: If you take the FFT with any particular values of isign and scale,
the mathematical inverse function is computed by taking the FFT with – isign and 1 /(n1 . n2 . n3 . scale).
In particular, if you use isign = +1 and scale = 1.0 for the forward FFT, you can compute the inverse FFT
by using isign = – 1 and scale = 1/(n1 . n2 . n3).

178 004– 2081– 002


CCFFT3D ( 3S ) CCFFT3D ( 3S )

UNICOS/mk systems only


If the values of either n1, n2, or n3 are prime or not factorizable into powers of 2, 3 and 5 significant
improvements in computational time can be obtained by using the following initializations of isys, which is a
vector of length 4.
The first element of isys indicates the dimension of the problem, that is, isys(1) = 3. The next three elements
of isys indicate if the lengths n1, n2 and n3 are factorizable into powers of 2, 3 and 5. isys(2) is set to 0 if
n1 is factorizable into powers of 2, 3 and 5 and is set to 1 otherwise. Similarly, isys(3) and isys(4) are set to
zero if n2 and n3 are factorizable into powers of 2, 3 and 5 and set to 1 if they are not.
For example, if n1 = 256, n2 = 240, and n3 = 254, the best computational time is obtained by setting the
following:
isys(1) = 3 (dimension of the problem)
isys(2) = 0
isys(3) = 0
isys(4) = 1
If the numbers n1, n2, and n3 are not known ahead of time, then isys(2), isys(3), and isys(4) could be
initialized to 0 and the routine would compute correct result, albeit slowly, if either n1, n2, or n3 were not
factorizable into powers of 2, 3, and 5.
The storage requirements for the vector table depend on the values of the isys vector.
UNICOS systems
The isys parameter is used to choose between two multitasking strategies and correspondingly different
amounts of workspace to be provided. A brief discussion about the significance of the isys parameter is
provided in the following argument list.
This routine has the following arguments:
isign Integer. (input)
Specifies whether to initialize the table array or to do the forward or inverse Fourier transform,
as follows:
If isign = 0, the routine initializes the table array and returns. In this case, the only arguments
used or checked are isign, n1, n2, n3, and table.
If isign = +1 or – 1, the value of isign is the sign of the exponent used in the FFT formula.
n1 Integer. (input)
Transform size in the first dimension. If n1 is not positive, CCFFT3D returns without
computing a transform.
n2 Integer. (input)
Transform size in the second dimension. If n2 is not positive, CCFFT3D returns without
computing a transform.
n3 Integer. (input)
Transform size in the third dimension. If n3 is not positive, CCFFT3D returns without
computing a transform.

004– 2081– 002 179


CCFFT3D ( 3S ) CCFFT3D ( 3S )

scale Real. (input)


Scale factor. Each element of the output array is multiplied by scale after taking the Fourier
transform, as defined previously.
x Complex array of dimension (0:ldx– 1, 0:ldx2– 1, 0:n3– 1). (input)
Input array of values to be transformed.
ldx Integer. (input)
The first dimension of x, as it was declared in the calling program (the leading dimension of x).
ldx ≥ MAX(n1, 1).
ldx2 Integer. (input)
The second dimension of x, as it was declared in the calling program. ldx2 ≥ MAX(n2, 1).
y Complex array of dimension (0:ldy– 1, 0:ldy2– 1, 0:n3– 1). (output)
Output array of transformed values. The output array may be the same as the input array, in
which case, the transform is done in place; that is, the input array is overwritten with the
transformed values. In this case, it is necessary that ldx = ldy, and ldx2 = ldy2.
ldy Integer. (input)
The first dimension of y, as it was declared in the calling program (the leading dimension of y).
ldy ≥ MAX(n1, 1).
ldy2 Integer. (input)
The second dimension of y, as it was declared in the calling program. ldy2 ≥ MAX(n2, 1).
table UNICOS systems: real array of dimension 100 + 2(n1 + n2 + n3). (input or output)
UNICOS/mk systems: Real array of dimension 2(n1 + n2 + n3) if isys(2), isys(3), and isys(4)
are all equal to zero. Private real vector of length 12(n1 + n2 + n3) if either isys(2), isys(3), or
isys(4) is equal to 1. (input or output)
Table of factors and trigonometric functions. If isign = 0, the routine initializes table (table is
output only). If isign = +1 or – 1, the values in table are assumed to be initialized already by a
prior call with isign = 0 (table is input only).
work UNICOS systems: real array of dimension 512 . MAX(n1, n2, n3) when isys = 0 and of
dimension 4 . NCPUS . MAX(n1 . n2, n2 . n3, n3 . n1) when isys = 1. (scratch output)
When isys = 0, the parallel performance of this routine may be bad for small problem sizes.
Setting isys to 1 and providing the additional memory will significantly enhance parallel
performance. The NCPUS parameter is the environment variable that is set to indicate the
maximum number of CPUs that can be used in the application. The default value is 4.
UNICOS/mk systems: Real array of dimension 2(n1 . n2 . n3) (scratch output)
Work array. This is a scratch array used for intermediate calculations. Its address space must
be different from that of the input and output arrays.
isys UNICOS systems: Integer array of dimension (1). (input)
isys = 0 or 1 depending on the amount of workspace the user can provide to the routine.

180 004– 2081– 002


CCFFT3D ( 3S ) CCFFT3D ( 3S )

UNICOS/mk systems: Integer array of dimension 4. (input)


isys(1) = 3
isys(2) = 0 (if n1 is factorizable into powers of 2, 3 and 5)
1 (if n1 is not factorizable into powers of 2, 3 and 5)
isys(3) = 0 (if n2 is factorizable into powers of 2, 3 and 5)
1 (if n2 is not factorizable into powers of 2, 3 and 5)
isys(4) = 0 (if n3 is factorizable into powers of 2, 3 and 5)
1 ( if n3 is not factorizable into powers of 2, 3 and 5)

NOTES
CCFFT3D is the generalization of CCFFT2D to three dimensions. All of the notes for CCFFT2D apply,
with the obvious modifications for three dimensions.

EXAMPLES
The following examples are for UNICOS systems only. CCFFT3D(3S) on UNICOS/mk systems provides
the functionality of PCCFFT3D(3S) on a single PE. For notes on CCFFT3D(3S) on UNICOS/mk systems,
see PCCFFT3D(3S).
In all the examples shown below isys is set to 0. For better performance on small size 3D FFTs, setting isys
= 1 and providing adequate workspace would yield better performance.
Example 1: Initialize the TABLE array in preparation for doing a three-dimensional FFT of size 128 by 128
by 128. In this case, only the isign, n1, n2, n3, and table arguments are used; you can use dummy
arguments or zeros for other arguments.
REAL TABLE( 100 + 2*(128 + 128 + 128 ))
CAL L CCFFT3 D (0, 128, 128 , 128 , 0.0 , DUM MY, 1, 1, DUM MY, 1, 1,
& TAB LE, DUMMY, 0)

Example 2: X and Y are complex arrays of dimension (0:128, 0:128, 0:128). The first 128 elements of each
dimension contain data; for performance reasons, the extra element forces the leading dimensions to be odd
numbers. Take the three-dimensional FFT of X and store it in Y. Initialize the TABLE array, as in example
1.
COM PLEX X(0 :12 8, 0:1 28, 0:128)
COMPLEX Y(0:128, 0:128, 0:1 28)
REA L TAB LE(100 + 2*( 128 + 128 + 128 ))
REA L WOR K(512* 128)
...
CALL CCFFT3 D(0 , 128, 128 , 128 , 1.0 , DUM MY, 1, 1,
& DUM MY, 1, 1, TAB LE, WOR K, 0)
CALL CCFFT3D(1 , 128 , 128 , 128 , 1.0 , X, 129 , 129 ,
& Y, 129 , 129, TAB LE, WOR K, 0)

004– 2081– 002 181


CCFFT3D ( 3S ) CCFFT3D ( 3S )

Example 3: With X and Y as in example 2, take the inverse FFT of Y and store it back in X. The scale
1
factor is used. Assume that the TABLE array is already initialized.
1283
CAL L CCF FT3 D(- 1, 128 , 128, 128, 1.0/(1 28. 0**3), Y, 129, 129 ,
& X, 129 , 129 , TAB LE, WOR K, 0)

Example 4: Do the same computation as in example 2, but assume that the lower bound of each array is 1,
rather than 0. The subroutine calls do not change.
COMPLE X X(1 29, 129, 129)
COMPLEX Y(1 29, 129 , 129 )
...
CAL L CCF FT3 D(0, 128, 128 , 128 , 1.0 , DUM MY, 1, 1,
& DUMMY, 1, 1, TABLE, WORK, 0)
CALL CCF FT3D(1, 128, 128 , 128 , 1.0 , X, 129 , 129,
& Y, 129 , 129 , TAB LE, WOR K, 0)

Example 5: Do the same computation as in example 4, but put the output back in the array X to save
storage space. Assume that the TABLE array is already initialized.
COMPLE X X(1 29, 129, 129 )
...
CAL L CCF FT3 D(1, 128 , 128, 128, 1.0, X, 129 , 129 ,
& X, 129 , 129 , TABLE, WOR K, 0)

SEE ALSO
CCFFT(3S), CCFFT2D(3S), CCFFTM(3S), SCFFT(3S), SCFFT2D(3S), SCFFT3D(3S), SCFFTM(3S)

182 004– 2081– 002


CCFFTM ( 3S ) CCFFTM ( 3S )

NAME
CCFFTM – Applies multiple multitasked complex-to-complex Fast Fourier Transforms (FFTs)

SYNOPSIS
CALL CCFFTM (isign, n, lot, scale, x, ldx, y, ldy, table, work, isys)

IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses only private data.

DESCRIPTION
CCFFTM computes the FFT of each column of the complex matrix X, and it stores the results in the columns
of complex matrix Y.
Suppose the arrays are dimensioned as follows:
COM PLEX X(0 :ldx-1 , 0:lot-1)
COM PLEX Y(0 :ldy-1 , 0:lot- 1)

where ldx ≥ n, ldy ≥ n.


Then column L of the output array is the FFT of column L of the input array, using the following formula
for the FFT:
n −1
Σ Xj
. j .k
Yk ,L = scale . . ωisign for k = 0, . . ., n −1, L = 0, . . ., lot −1
j =0

where:
2 . π . i
ω=e n
i = +√−1
π = 3.14159. . . isign = ±1
lot = Number of columns to transform
Different authors use different conventions for which of the transforms, isign = +1 or isign = – 1, is the
forward or inverse transform, and what the scale factor should be in either case. You can make this routine
compute any of the various possible definitions, however, by choosing the appropriate values for isign and
scale.
The relevant fact from FFT theory is this: If you take the FFT with any particular values of isign and scale,
the mathematical inverse function is computed by taking the FFT with – isign and 1 / (n . scale). In
particular, if you use isign = +1 and scale = 1.0 for the forward FFT, you can compute the inverse FFT by
using the following: isign = – 1 and scale = 1.0 / n.

004– 2081– 002 183


CCFFTM ( 3S ) CCFFTM ( 3S )

This routine has the following arguments:


isign Integer. (input)
Specifies whether to initialize the table array or to do the forward or inverse Fourier transform,
as follows:
If isign = 0, the routine initializes table and returns. In this case, the only arguments used or
checked are isign, n, and table.
If isign = +1 or – 1, the value of isign is the sign of the exponent used in the FFT formula.
n Integer. (input)
Size of each transform (the number of elements in each column of the input and output matrix to
be transformed). Performance depends on the value of n, as explained in the NOTES section. n
≥ 0; if n = 0, the routine returns.
lot Integer. (input)
The number of transforms to be computed (lot size). This is the number of elements in each
row of the input and output matrix. lot ≥ 0. If lot = 0, the routine returns.
scale Real. (input)
Scale factor. Each element of the output array is multiplied by the scale factor after taking the
Fourier transform, as defined previously.
x Complex array of dimension (0:ldx– 1, 0:lot– 1). (input)
Input array of values to be transformed.
ldx Integer. (input)
The number of rows in x, as it was declared in the calling program (the leading dimension of X).
ldx ≥ MAX(n, 1).
y Complex array of dimension (0:ldy– 1, 0:lot– 1). (output)
Output array of transformed values. Each column of the output array, y, is the FFT of the
corresponding column of the input array, x, computed according to the preceding formula.
The output array may be the same as the input array. In that case, the transform is done in place.
The input array is overwritten with the transformed values. In this case, it is necessary that ldx
= ldy.
ldy Integer. (input)
The number of rows in the Y array, as it was declared in the calling program (the leading
dimension of Y). ldy ≥ MAX(n, 1).
table UNICOS systems: Real array of dimension 100 + 2n. (input or output)
UNICOS/mk systems: Real array of dimension 2n.
Table of trigonometric functions.
If isign = 0, the routine initializes table (table is output only).
If isign = +1 or – 1, the values in table are assumed to be initialized already by a prior call with
isign = 0 (table is input only).

184 004– 2081– 002


CCFFTM ( 3S ) CCFFTM ( 3S )

work UNICOS systems: Real array of dimension 4 . lot . n . (scratch output)


UNICOS/mk systems: Real array of dimension 4n.
Work array. This is a scratch array used for intermediate calculations. Its address space must
be different from the input and output arrays.
isys Integer array of dimension (0:isys(0)). (input and output)
The first element of the array specifies how many more elements are in the array. Use isys to
specify certain processor-specific parameters or options.
If isys(0) = 0, the default values of such parameters are used. In this case, you can specify the
argument value as the scalar integer constant 0.
If isys(0) > 0, isys(0) gives the upper bound of the isys array. Therefore, if il = isys(0), user-
specified parameters are expected in isys(1) through isys(il).

NOTES
This section contains information about the algorithm for CCFFTM, the initialization of the table array, the
declaration of dimensions for x and y arrays, some performance tips, and some implementation-dependent
details.
Algorithm
UNICOS only: CCFFTM uses decimation-in-frequency type FFT. It takes the FFT of the columns and
vectorizes the operations along the rows of the matrix. Thus, the vector length in the calculations depends
on the row size, and the strides for vector loads and stores are the leading dimensions, ldx and ldy.
On UNICOS/mk systems, this routine is not optimized.
Initialization
The table array stores the trigonometric tables used in calculation of the FFT. You must initialize the table
array by calling the routine with isign = 0 prior to doing the transforms. If the value of the problem size, n,
does not change, table does not have to be reinitialized.
Dimensions
In the preceding description, it is assumed that array subscripts were zero-based, as is customary in FFT
applications. Thus, the input and output arrays are declared as follows:
COM PLEX X(0:ld x-1, 0:lot- 1)
COM PLEX Y(0:ld y-1, 0:lot- 1)

The calling sequence does not have to change, however, if you prefer to use the more customary Fortran
style with subscripts starting at 1. The same values of ldx and ldy would be passed to the subroutine even if
the input and output arrays were dimensioned as follows:
COM PLEX X(ldx, lot )
COM PLEX Y(ldy, lot)

004– 2081– 002 185


CCFFTM ( 3S ) CCFFTM ( 3S )

Performance Tips
This routine computes an FFT for any value of n, but the performance for a given value of n depends on the
prime factorization of n. This fact is characteristic of all FFT algorithms.
Fastest performance is realized when n is a power of 2. In that case, the number of floating-point operations
is approximately 5 . lot . n . log 2 (n)
If n contains factors of 3, computation time is slightly longer, because more floating-point operations are
required. If n contains powers of 5, it is longer still. Slowest performance is when n is a prime number. In
2
that case, the number of floating-point operations is approximately 8 . lot . n .
The kernel routines are optimized for values of n that are products of powers of 2, 3, and 5. Because the
kernel routines have a special case for multiples of 4, even powers of 2 will be slightly faster than odd
powers of 2.
In this implementation, to avoid memory bank conflicts, it is very important to make the leading dimensions
of the arrays odd numbers (or, if that is not possible, make them an odd multiple of 2). To attain best
vectorization performance, the lot size should be at least 64, and preferably, it should be a multiple of 64.
Implementation-dependent Items
The UNICOS and UNICOS/mk FFT routines were designed so that they can be implemented efficiently on
many different architectures. The calling sequence is the same in any implementation. Certain details,
however, depend on the particular implementation. These details are confined to three areas:
• The first area is the size of the table and work arrays. Different systems may need different sizes. The
subroutine call requires no change, but you may have to change the array sizes in the DIMENSION or
type statements that declare the arrays.
• The second area is the isys parameter array, an argument that gives certain implementation-specific
information. All features and functions of the FFT routines that are specific to any particular
implementation are confined to this isys array. On any implementation, you can use the default values by
specifying an argument value of 0.
In the UNICOS systems implementation, no special options are supported; therefore, you can always
specify the isys parameter as a constant 0. Subsequent software releases may provide other options.
• The third area is the issue of which problem sizes or dimensions give optimal performance in a particular
implementation. See the Performance Tips subsection.

EXAMPLES
Example 1: Initialize the TABLE array in preparation for doing an FFT of size 128. Only the isign, n, and
table arguments are used in this case. You can use dummy arguments or zeros for the other arguments in
the subroutine call.
REAL TABLE( 100 + 2*128)
CALL CCFFTM (0, 128, 0, 0., DUM MY, 1, DUMMY, 1, TABLE, DUMMY, 0)

186 004– 2081– 002


CCFFTM ( 3S ) CCFFTM ( 3S )

Example 2: X and Y are complex arrays of dimension (0:128) by (0:55). The first 128 elements of each
column contain data. For performance reasons, the extra element forces the leading dimension to be an odd
number. Take the FFT of the first 50 columns of X and store the results in the first 50 columns of Y.
Before taking the FFT, initialize the TABLE array, as in example 1.
COM PLE X X(0 :128, 0:5 5)
COM PLE X Y(0 :128, 0:5 5)
REA L TAB LE( 100 + 2*1 28)
REA L WOR K(4 *128*50)
...
CAL L CCF FTM(0, 128 , 50, 1.0 , X, 129 , Y, 129 , TAB LE, WOR K, 0)
CAL L CCF FTM (1, 128 , 50, 1.0, X, 129 , Y, 129 , TAB LE, WOR K, 0)

Example 3: With X and Y as in example 2, take the inverse FFT of Y and store it back in X. The scale
factor 1/128 is used. Assume that the TABLE array is already initialized.
CALL CCFFTM (-1 , 128 , 50, 1./ 128 ., Y, 129, X, 129, TAB LE,WOR K,0)

Example 4: Do the same computation as in example 2, but assume that the lower bound of each array is 1,
rather than 0. The subroutine calls are not changed.
COM PLEX X(1 29, 55)
COMPLE X Y(1 29, 55)
...
CALL CCFFTM (0, 128, 50, 1.0 , X, 129, Y, 129, TAB LE, WORK, 0)
CAL L CCF FTM(1, 128 , 50, 1.0 , X, 129 , Y, 129 , TAB LE, WOR K, 0)

Example 5: Do the same computation as in example 4, but put the output back in array X to save storage
space. Assume that the TABLE array is already initialized.
COM PLE X X(129, 55)
...
CAL L CCF FTM(1, 128 , 50, 1.0 , X, 129 , X, 129 , TAB LE, WOR K, 0)

SEE ALSO
CCFFT(3S), SCFFT(3S), SCFFTM(3S)

004– 2081– 002 187


CCNVL ( 3S ) CCNVL ( 3S )

NAME
CCNVL – Computes the convolution of a complex sequence with one or more other complex sequences

SYNOPSIS
CALL CCNVL (nh, nx, m, ny, h, inc1h, x, inc1x, inc2x, y, inc1y, inc2y)

IMPLEMENTATION
UNICOS systems

DESCRIPTION
CCNVL computes the convolution of the complex sequence h with one or more complex sequences x.
This routine has the following arguments:
nh Integer. (input)
Number of elements in the h sequence. nh ≥ 0. If nh = 0, CCNVL zeroes out all elements of
output matrix y.
nx Integer. (input)
Number of elements in each sequence of x values. nx ≥ 0. If nx = 0, CCNVL zeroes out all
elements of output matrix y.
m Integer. (input)
Number of sequences of x values. m ≥ 0. If m = 0, CCNVL returns without calculating a
convolution.
ny Integer. (input)
Number of elements in output sequence y. ny ≥ 0. If ny = 0, CCNVL returns without calculating
a convolution.
h Complex array of dimension nh. (input)
Input sequence to be convolved with x.
inc1h Integer. (input)
Address increment between elements in array h. inc1h must not be zero.
x Complex array of dimension (nx, m). (input)
Input matrix to be convolved with h.
inc1x Integer. (input)
Address increment between elements in each sequence of x values. inc1x must not be zero.
inc2x Integer. (input)
Address increment between sequences of x values. inc2x must not be 0.
y Complex array of dimension (ny, m). (output)
Output matrix of convolutions.

188 004– 2081– 002


CCNVL ( 3S ) CCNVL ( 3S )

inc1y Integer. (input)


Address increment between elements in each sequence of y values. inc1y must not be 0.
inc2y Integer. (input)
Address increment between sequences of y values. inc2y must not be 0.
Calculating a Convolution
Suppose h and x are two sequences of complex numbers, having nh and nx elements, respectively. As is
customary in signal processing applications, let the subscripts start at 0, so that
 h   x 
 0
  0

h x
h =  ..  and x =  .. 
1 1
 .   . 
 hnh −1   xnx −1 
The convolution product, y = h . x , is the sequence that has nh + nx – 1 elements defined by the following:
y0 = h 0x 0
y1 = h 1x 0 + h 0x 1
y2 = h 2x 0 + h 1x 1 + h 0x 2
y3 = h 3x 0 + h 2x 1 + h 1x 2 + h 0x 3
..
.
ynh +nx −2 = hnh −1xnx −1
In the CCNVL routine, the number of terms in the output sequence is specified by an argument, ny. If
ny < nh + nx – 1 the sequence y is just truncated. If ny > nh + nx – 1 the terms beyond y(nh + nx – 2) are
set to 0.
Multiple Convolutions
CCNVL can actually compute several convolutions in one call. The input sequence h is always a vector, but
the input x can be a matrix. In this case, CCNVL convolves h with each column of x, resulting in a column
of the output matrix, y.

NOTES
The following notes define the convolution more precisely, and discuss its uses and performance.
Convolution Definition
The precise definition of convolution computed by CCNVL is as follows:
Let h be a sequence of nh elements and X be a matrix with m columns, and nx elements in each column, as
follows:
 x x 0,1 x 0,2 . . . x 0,m −1 
 h0   0,0 
 h   x 1,0 x 1,1 x 1,2 . . . x 1,m −1 
 1   
h =  h.2  X =  x 2,0 x 2,1 x 2,2 . . . x 2,m −1 
. . .. .. .. ..
 .   .. . . . . 
 hnh −1   
 xnx −1,0 xnx −1,1 xnx −1,2 . . . xnx −1,m −1 

004– 2081– 002 189


CCNVL ( 3S ) CCNVL ( 3S )

Then each column of the output matrix:


 y y 0,1 y 0,2 . . . y 0,m −1 
 0,0 
 y 1,0 y 1,1 y 1,2 . . . y 1,m −1 
 
Y =  y 2,0 y 2,1 y 2,2 . . . y 2,m −1 
. .. .. .. ..
 .. . . . . 
 
 yny −1,0 yny −1,1 yny −1,2 . . . yny −1,m −1 

is obtained by convolving h with the corresponding column of X, so that:


MIN (k ,nh −1)
yk ,j = Σ
i =MAX (0,k −nx +1)
hi xk −i , j

A complex convolution is similar to the multiplication of complex polynomials. You can think of the
sequence h as a sequence of coefficients of the polynomial
H (z ) = h 0 + h 1z + h 2z 2 + h 3z 3 + . . . + hnh −1z nh −1
Similarly, each column

 x 
 0,j
.. 
xj =  . 
 xnx −1,j 
of the matrix X can be considered the coefficients of each of the polynomials
X j (z ) = x 0,j + x 1z + x 2,j z 2 + x 3,j z 3 + . . . + xnx −1,j z nx −1
The convolution product h and each column y j = h . x j is the sequence whose elements are the coefficients
of the product polynomial
Y j (z) = H(z)X j (z)
The operation of convolution is commutative, so that h . x = x . h. In this subroutine, however, h and x are
not interchangeable, because h is restricted to be a vector, but x can be a matrix of one or more vectors, all
of the same length.
Uses for Convolution
Convolutions have numerous applications in signal processing, where the convolution operation is sometimes
called filtering. The sequence h might represent a filter, and the matrix x might represent a set of input
signals. The output matrix y would represent the output signals obtained by filtering (convolving) x with h.
Performance
If the NCPUS environment variable is set and greater than 1, CCNVL multitasks on m; that is, it performs the
convolutions of successive x sequences in parallel.

190 004– 2081– 002


CCNVL ( 3S ) CCNVL ( 3S )

This routine is efficient for any value of the arguments. For long sequences, however, there is a faster
algorithm that uses a Fourier transform technique. For details, see the CCNVLF(3S) routine.

SEE ALSO
CCNVLF(3S)

004– 2081– 002 191


CCNVLF ( 3S ) CCNVLF ( 3S )

NAME
CCNVLF – Computes the convolution of a complex sequence with one or more other complex sequences by
using a Fourier transform method

SYNOPSIS
CALL CCNVLF (nh, nx, m, ny, h, inc1h, x, inc1x, inc2x, y, inc1y, inc2y, table, ntable,
work, nwork)

IMPLEMENTATION
UNICOS systems

DESCRIPTION
CCNVLF computes the convolution of the complex sequence h with one or more complex sequences x.
CCNVLF has exactly the same effect as the CCNVL(3S) routine. The difference is in the algorithm.
CCNVL(3S) computes the convolution directly, but CCNVLF uses a Fourier transform method, using a
routine similar to CCFFTM(3S). CCNVLF requires additional space for tables and workspace, but for long
sequences, it can be significantly faster than CCNVL(3S).
See the CCNVL(3S) man page for a definition of the convolution product that is computed by both routines.
This routine has the following arguments:
nh Integer. (input)
Number of elements in sequence h. nh ≥ 0. If nh = 0, CCNVLF zeroes out all elements of
output matrix y.
nx Integer. (input)
Number of elements in each sequence of x values. nx ≥ 0. If nx = 0, CCNVLF zeroes out all
elements of output matrix y.
m Integer. (input)
Number of sequences of x values. m ≥ 0. If m = 0, CCNVLF returns without calculating a
convolution.
ny Integer. (input)
Number of elements in output sequence y. ny ≥ 0. If ny = 0, CCNVLF returns without
calculating a convolution.
h Complex array of dimension nh. (input)
Input sequence to be convolved with x.
inc1h Integer. (input)
Address increment between elements in array h. inc1h must not be zero.
x Complex array of dimension (nx, m). (input)
Input matrix to be convolved with h.

192 004– 2081– 002


CCNVLF ( 3S ) CCNVLF ( 3S )

inc1x Integer. (input)


Address increment between elements in each sequence of x values. inc1x must not be zero.
inc2x Integer. (input)
Address increment between sequences of x values. inc2x must not be 0.
y Complex array of dimension (ny, m). (output)
Output matrix of convolutions.
inc1y Integer. (input)
Address increment between elements in each sequence of y values. inc1y must not be 0.
inc2y Integer. (input)
Address increment between sequences of y values. inc2y must not be 0.
table Real array of dimension ntable. (input and output)
Table of factors and trigonometric function values.
ntable Integer. (input)
Size of array ntable. The value of ntable should be at least 2(nh + nx) + 100.
work Real array of dimension nwork. (scratch output)
Work array. This is a scratch array used for intermediate calculations. Its address space must
be different from that of the input and output arrays.
nwork Integer. (input)
Size of array work. The value of nwork should be at least 6(nh+nx)(m+1).

NOTES
The computed output values y are the same as those computed by routine CCNVL(3S). See the CCNVL(3S)
man page for a definition of the convolution product.
The algorithm of this routine uses the famous Convolution Theorem, which states that the Fourier transform
of a convolution is the product of the Fourier transforms of the original sequences.
This routine performs a Fast Fourier Transform (FFT) on each on the sequences h and x, and an inverse FFT
on the product to compute the convolution. In each case, the order of each FFT is n = nh + nx.
The routine works correctly for any value of n, but as with all FFT routines, the performance depends on the
prime factorization of n. For the best performance, n should be a power of 2. If n is a moderately large
number that is a product of powers of 2, 3, and 5, you will still get very good performance. To obtain a
good value of n, the input sequences can be padded with zeros, if necessary. See the CCFFTM(3S) man page
for general information about FFTs.
The table array is initialized on the first call to routine CCNVLF. It is not reinitialized in subsequent calls
unless the value of n changes.

004– 2081– 002 193


CCNVLF ( 3S ) CCNVLF ( 3S )

If the NCPUS environment variable is set and greater than 1, this routine multitasks on m; that is, it performs
the convolutions of successive x sequences in parallel.

SEE ALSO
CCFFTM(3S), CCNVL(3S)

194 004– 2081– 002


CFFT ( 3S ) CFFT ( 3S )

NAME
CFFT – Applies a multitasked complex Fast Fourier Transform (FFT)

SYNOPSIS
CALL CFFT (isign, n, scale, x, incx, y, incy, table, ntable, work, nwork)

IMPLEMENTATION
UNICOS systems

DESCRIPTION
CFFT computes the FFT of the complex vector x, and it stores the result in vector y. For most purposes,
CFFT is superseded by the FFT routine CCFFT(3S).
Suppose arrays X and Y are dimensioned as follows:
COM PLEX X(0 :N-1), Y(0:N- 1)

Array X contains the input vector x:


x 0 , x 1 , x 2 , . . ., x n– 1
and CFFT computes the output vector y in the array Y:
y 0 , y 1 , y 2 ,. . ., y n– 1
according to the following formula:
n −1
yk = scale Σ
j =0
x j ω jk , for k = 0, 1, . . . , n −1

where
(isign )2πi
ω=e n
isign = ±1
π = 3.14159. . . e = 2.71828. . .
i = √−1
Different authors use different conventions for which of the transforms, isign = +1 or isign = – 1, is the
forward or inverse transform, and what the scale factor should be in either case. In this routine, when
isign = +1 it is called the forward transform, and when isign = – 1 it is called the inverse transform.
This routine has the following arguments:

004– 2081– 002 195


CFFT ( 3S ) CFFT ( 3S )

isign Integer. (input)


Specifies whether to initialize the table array or to do the forward or inverse Fourier transform:
isign = 0 Initializes the table array
isign = +1 Computes the forward transform
isign = – 1 Computes the inverse transform
n Integer. (input)
Order of transform (the number of values in the input (or output) array). If n ≤ 0, the CFFT
returns without calculating a transform.
scale Real. (input)
Real scale factor. Each element of the output array is multiplied by scale after taking the
Fourier transform, as defined by the preceding formula.
x Complex array element. (input)
First element to be used in an array of values to be transformed. The array of input values has
the dimension  incx (n– 1)+1.
incx Integer. (input)
The increment between successive complex array elements in input array. To use every element,
set incx = 1. incx must not be 0.
y Complex array element. (output)
First element to be output in an array of transformed values.
The array of output values has the dimension  incy (n– 1)+1. The output array can be the same
as the input array. In that case, the transform is done in place, provided that the arrays are the
same size and that incx = incy.
incy Integer. (input)
The increment between successive complex array elements in the complex output array. To use
every element, set incy = 1. incy must not be 0.
table Real array of dimension ntable. (input or output)
Table of factors and trigonometric functions. This array can be initialized by a call to CFFT
with isign = 0.
ntable Integer. (input)
Number of (real) words in the table array. The value of ntable should be at least 8n + 100. If
the value of ntable does not provide enough space, CFFT prints an error message and stops.
work Real array of dimension nwork. (scratch output)
Work array. This is a scratch array used for intermediate calculations. It must be a different
address space from the input and output arrays.
nwork Integer. (input)
Number of (real) words in the work array. The value of nwork should be at least 12n. If the
value of nwork does not provide enough space, CFFT prints an error message and stops.

196 004– 2081– 002


CFFT ( 3S ) CFFT ( 3S )

NOTES
This section contains information about the algorithm for CFFT, the initialization of the table array, the size
of the table and work arrays, and some performance tips.
Algorithm
The algorithm for CFFT is the "Four Step FFT," which is as follows:
Let n be the order of the transform.
Let n = n1 . n2; n1 and n2 are close to the square root of n, and they are chosen for performance. Then:
1. Perform n1 simultaneous n2-point FFTs on the n data elements, treated as an n1-by-n2 matrix.
jk
2. Multiply the resulting matrix by the phase factors α .
3. Transpose the resulting data array, treated as an n1-by-n2 matrix, into an n2-by-n1 matrix.
4. Perform n2 simultaneous n1-point FFTs on the n data elements, treated as an n2-by-n1 matrix.
CFFT includes a special case for n2 = 1, in which case, a conventional FFT is done.
Initialization
The table array is used to store factors of n1 and n2 and trigonometric tables that are used in calculation of
the FFT. You can initialize table explicitly by calling the routine with isign = 0. If you do not initialize
table, the routine does so automatically on the first call. If the value of the problem size, n, does not change
between calls, the table does not need to be reinitialized. If you call the routine with a different value of n
without first reinitializing the table, the routine reinitializes the table automatically.
Re-initialization of table is relatively time-consuming. If you are continually changing the problem size, you
might consider using more than one table array so that it will not have to be reinitialized on each subroutine
call.
If you initialize the table explicitly by calling the routine with isign = 0, the only arguments that are
significant are isign, n, table, and ntable. In this case, the other arguments are ignored.
The value of ntable is checked when the table is initialized to verify that the table space you provided is
large enough. If it is not, the routine stops after printing an error message, which indicates the amount of
table space required. (See the following subsection.)
Size of Table and Work Arrays
The precise sizes of the table and work arrays depend on the numbers n1 and n2 in the factorization of
n = n1 . n2.
ntable = 100 + 2(n1 + n2 + np)
nwork = 4np
where np = 2n if n2 is odd,
(2n + n1) if n2 is even

004– 2081– 002 197


CFFT ( 3S ) CFFT ( 3S )

Because the user does not know in advance the values of n1 and n2, the estimates given in the preceding
argument list may be used in all cases. If insufficient table or workspace is provided, the subroutine stops
after printing an error message that tells exactly how much space was needed.
Performance Tips
CFFT computes an FFT for any value of n, but the performance for a given value of n depends on the
factorization of n. This is characteristic of all FFT algorithms.
Best performance is realized when n is a power of 2. In that case, the number of arithmetic operations is
proportional to nlog 2 (n).
Performance is slightly worse when n contains factors of 3; it is even worse when n contains powers of 5.
The worst performance is when n is a prime number. In that case, the number of operations is proportional
2
to n . The kernel routines are optimized for values of n that are products of powers of 2, 3, and 5.
CFFT is rather slow for small values of n (for example, n < 128) because in such cases, the vector lengths
are very short. For small n, however, performance should not matter unless you are performing many
transforms. In this case, you should use the MCFFT(3S) routine (multiple complex FFT). MCFFT(3S)
vectorizes in the "lot direction," and performance can be quite good even for small values of n.
CFFT runs in multitasked mode for large values of n. If
MIN (n1, n2)
≥ 16
ncpus
where ncpus is the number of CPUs being used, the calculation runs in multitasked mode. The values of n
must be fairly large to realize an appreciable performance gain from multitasking, however, because the
vector lengths are proportional to the square root of n.

EXAMPLES
The following is a test program for CFFT. It computes the FFT of a random sequence of complex numbers,
first using the direct definition of the Fourier Transform, and then using CFFT. Afterward, it compares the
results. Finally, it computes the inverse transform (dividing by N), and compares with the original data.
PAR AME TER (N = 4*3 *5* 7)
COM PLEX I, W, X(0 :N- 1), Y(0 :N-1), YY(0:N-1)
PARAME TER (NT ABLE = 100 + 8*N , NWO RK = 12* N)
REA L TAB LE(NTA BLE), WOR K(NWOR K)
PAR AMETER (I = (0. 0, 1.0 ))
LOG ICAL LFWD, LIN V
*----- ------ ------ --- --- --- ------ --- ------ ---------
* Ini tia liz e inp ut arr ay, X, wit h random
* com ple x num ber s.

DO 10, J = 0, N-1
X(J) = CMP LX( RAN F(), RANF() )
10 CON TINUE
*----- ------ ------ --- --- --- ------ --- ------ ------ ---

198 004– 2081– 002


CFFT ( 3S ) CFFT ( 3S )

* Compute YY = the Fou rie r Tra nsf orm of X,


* using the def ini tion of the Fou rie r tra nsf orm.

PI = ATA N2(0.0 , -1. 0)


W = EXP(2*PI*I/N )
DO 30, K = 0, N-1
YY(K) = 0.
DO 20, J = 0, N-1
YY( K) = YY( K) + X(J )*W **( J*K)
20 CONTIN UE
30 CON TINUE
*-- ------ ------ --- ------ ------ --- --- --- ------ --- ---
* Comput e Y = the Fou rie r Transf orm of X,
* using cfft.

CALL CFFT(+1, N, 1.0, X, 1, Y, 1,


& TAB LE, NTA BLE, WORK, NWO RK)
*----- ------------ --- --- --- --- --- ------ --- --- --- ---
* Com par e Y and YY.

LFWD = .TRUE.
DO 40, J = 0, N-1
ERR OR = ABS (Y( J) - YY(J)) /AB S(Y (J) )
LFWD = LFWD .AND. (ERROR .LE . 1.E -6)
40 CONTINUE
IF (.N OT. LFW D) PRI NT *, ’Fa ile d for ward test’
IF (LF WD) PRI NT *, ’Forwa rd tra nsform OK’
*-- --------------- --- ------ --- --- --- ------ --- --- ---
* Com put e the inv ers e tra nsf orm of Y,
* and store it back in Y.

CALL CFF T(-1, N, 1.0 /N, Y, 1, Y, 1,


& TABLE, NTABLE , WOR K, NWO RK)
*----- ------ ------ ------ --- --- ------ --- --- --- ------
* Compare the tra nsform ed Y arr ay with the
* ori ginal X array.

LINV = .TRUE.
DO 50, J = 0, N-1
ERROR = ABS(X( J) - Y(J))/ ABS (X( J))
LINV = LIN V .AND. (ER ROR .LE . 1.0 E-6 )
50 CONTINUE
IF (.N OT. LINV) PRI NT *, ’Fa ile d inv ers e tes t’
IF (LI NV) PRINT *, ’In verse tra nsf orm OK’

004– 2081– 002 199


CFFT ( 3S ) CFFT ( 3S )

IF (LF WD .AND. LIN V) PRINT *, ’Te st succee ded ’


END

SEE ALSO
CCFFT(3S), which supersedes most uses of CFFT
CCFFT2D(3S), CFFT2D(3S) to calculate a two-dimensional FFT. CCFFT2D(3S) supersedes most uses of
CFFT2D(3S).
CCFFT3D(3S), CFFT3D(3S) to calculate a three-dimensional FFT. CCFFT3D(3S) supersedes most uses of
CFFT3D(3S).
CCFFTM(3S), MCFFT(3S) to calculate multiple one-dimensional FFTs. CCFFTM(3S) supersedes most uses
of MCFFT(3S).

200 004– 2081– 002


CFFT2 ( 3S ) CFFT2 ( 3S )

NAME
CFFT2 – Applies a complex Fast Fourier Transform (FFT)

SYNOPSIS
CALL CFFT2 (init, ix, n, x, work, y)

IMPLEMENTATION
UNICOS systems

DESCRIPTION
CFFT2 performs the following calculation:
n −1
2πi
Σ xj
2
yk = exp (± j k ) for k = 0, . . ., n– 1, where i = – 1
j =0 n
The sign of the exponent is the same as the sign of the argument ix. This routine has the following
arguments:
init Integer. (input)
If nonzero, generates sine and cosine tables in work. If 0, calculates FFTs by using sine and
cosine tables of the previous call.
ix Integer. (input)
>0 Calculates a forward transform
<0 Calculates an inverse transform
n Integer. (input)
m
Size of the Fourier transform (2 , where m ≥ 3).
x Complex array of dimension n. (input)
Input vector. Range of x:
n 102466
≤ xi ≤ , for i = 1,2,. . .,n.
102466 n
Vector x can be equivalenced to the work vector. In this case, scratch work overwrites the
input values.
work Complex array of dimension 5 . n / 2. (scratch output)
Work storage vector.
y Complex array of dimension n. (output)
Result vector.

SEE ALSO
CCFFT(3S) (which supersedes CFFT2 only on Cray Y-MP systems), CCFFTM(3S), CRFFT2(3S),
RCFFT2(3S)

004– 2081– 002 201


CFFT2D ( 3S ) CFFT2D ( 3S )

NAME
CFFT2D – Applies a multitasked two-dimensional complex Fast Fourier Transform (FFT)

SYNOPSIS
CALL CFFT2D (isign, n1, n2, scale, x, inc1x, inc2x, y, inc1y, inc2y, table, ntable, work,
nwork)

IMPLEMENTATION
UNICOS systems

DESCRIPTION
CFFT2D computes the two-dimensional complex Fourier Transform of each column of the complex matrix
X, and it stores the results in the complex matrix Y. For most purposes, CFFT2D is superseded by
CCFFT2D(3S).
Suppose the matrices are stored in Fortran arrays dimensioned as follows:
COMPLE X X(0 :N1-1, 0:N2-1)
COMPLE X Y(0 :N1-1, 0:N2-1)

CFFT2D computes the following formula:

n1−1 n2−1
j1.k1 j2.k2
Y k1,k2 = scale Σ Σ
j1=0 j2=0
X j1,j2 ω1 ω2 for k1 = 0, . . ., n1– 1 k2 = 0, . . ., n2– 1

where:
(isign )2πi
ω1 = e n1
isign = ±1
π = 3.14159. . . e = 2.71828. . .
(isign )2πi
i = √−1 ω2 = e n2

In this documentation, when isign = +1 it is called the forward transform, and when isign = – 1 it is called
the inverse transform.
This routine has the following arguments:
isign Integer. (input)
Specifies whether to initialize the table array, or whether to do the forward or inverse transform:
isign = 0 Initializes the table array
isign = +1 Computes the forward transform
isign = – 1 Computes the inverse transform

202 004– 2081– 002


CFFT2D ( 3S ) CFFT2D ( 3S )

n1 Integer. (input)
Transform size in the first dimension. If n1 is not positive, CFFT2D returns without performing
a transform.
n2 Integer. (input)
Transform size in the second dimension. If n2 is not positive, CFFT2D returns without
performing a transform.
scale Real. (input)
Real scale factor. Each element of the output array is multiplied by scale after taking the
Fourier transform, as defined previously.
x Complex array. (input)
Input array of values to be transformed.
inc1x Integer. (input)
x increment in the first dimension (the address increment between successive complex row
elements of input array x). To use every row element in a given column, set inc1x = 1. inc1x
must not be 0.
inc2x Integer. (input)
x increment in the second dimension (the address increment between successive complex column
elements of input array x). To use every column element in a given row, set inc1x to be twice
the leading dimension of the complex array x. inc2x must not be 0.
y Complex array. (output)
Output array of transformed values. The output array may be the same as the input array. In
that case, the transform is done in place.
inc1y Integer. (input)
y increment in the first dimension (the address increment between successive complex row
elements of output array y). To use every row element in a given column, set inc1y = 1. inc1y
must not be 0.
inc2y Integer. (input)
y increment in the second dimension (the address increment between successive complex column
elements of output array y). To use every column element in a given row, set inc1y to be twice
the leading dimension of the complex array y. inc2y must not be 0.
table Real array of dimension ntable. (input or output)
Table of factors and trigonometric functions. This array may be initialized by a call to CFFT2D
with isign = 0.
ntable Integer. (input)
Number of (real) words in table. The value of ntable should be at least 2(n1 + n2) + 100. If
not enough space is provided, CFFT2D prints an error message and stops.
work Real array. (scratch output)
Work array of size nwork. This is a scratch array used for intermediate calculations. It must be
a different address space from the input and output arrays.

004– 2081– 002 203


CFFT2D ( 3S ) CFFT2D ( 3S )

nwork Integer. (input)


Number of words in the work array. The value of nwork should be at least
4(MAX(n1, n2)MIN(n1, n2, 16ncpus)) where ncpus is the number of CPUs used in the
calculation. If not enough space is provided, CFFT2D prints an error message and stops.

NOTES
This section includes information about the algorithm for CFFT2D, initialization of arrays, the significance
of the increment arguments, and performance.
Algorithm
CFFT2D uses MCFFT(3S) to do multiple FFTs, first on all of the columns of the input matrix and then on
all of the rows.
Initialization
The table array stores factors of n1 and n2 and also trigonometric tables that are used in the calculation of
the FFT. You can initialize table explicitly by calling the routine with isign = 0. If you do not initialize
table, the routine does so automatically on the first call. If the values of the problem size, n1 and n2, do not
change, the table does not need to be reinitialized. If you call the routine with different values of n1 and n2
without first reinitializing the table, the routine will reinitialize the table automatically.
Reinitialization of table is relatively time-consuming. If you are continually changing the problem size, you
might consider using more than one table array, so that it will not have to be reinitialized on each subroutine
call.
If you initialize the table explicitly by calling the routine with isign = 0, the only arguments that are
significant are isign, n1, n2, table, and ntable. In this case, the other arguments are ignored.
The value of ntable is checked when the table is initialized to verify that the table space you provided is
large enough. If it is not, the routine stops after printing an error message, which indicates the amount of
table space required.
Increment Arguments
The inc1x, inc2x, inc1y, and inc2y increment arguments describe how the matrices are stored in Fortran
arrays. These arguments are the link between the mathematical matrices and their representation in computer
memory.
Consider the following 4-by-5 matrix X:
X(1 ,1) X(1 ,2) X(1 ,3) X(1 ,4) X(1,5)
X(2 ,1) X(2 ,2) X(2 ,3) X(2 ,4) X(2,5)
X(3 ,1) X(3 ,2) X(3 ,3) X(3 ,4) X(3,5)
X(4 ,1) X(4 ,2) X(4 ,3) X(4 ,4) X(4,5)

Suppose that this matrix is declared by the following Fortran statement:


COMPLE X X(4 ,5)

204 004– 2081– 002


CFFT2D ( 3S ) CFFT2D ( 3S )

Fortran stores matrices "by column", in the following order:


X(1 ,1), X(2,1) , X(3 ,1) , X(4 ,1), X(1,2) , X(2 ,2) . . .

Thus, the increment in the first dimension, inc1x, is 1. The increment in the second dimension, inc2x, is the
(address) distance between X(1,1) and X(1,2), which is 4, the leading dimension of X. Generally, the
increment in the second dimension is the leading dimension of the array as it is declared in the Fortran
program.
The increment arguments are not directly related to the values of n1 and n2, except that the matrix must fit
in the allocated address space.
Performance Tips
CFFT2D works for any values of the arguments, subject only to the restrictions as described. The
performance of this algorithm, however, depends on the values of the following arguments:
• n1 and n2 (the problem size in each dimension)
• inc1x, inc2x, inc1y, inc2y (the increment arguments)
• nwork (the amount of workspace)
Each of these factors are considered separately in the following subsections.
Performance relative to the problem size
CFFT2D computes an FFT for any values of n1 and n2, but the performance depends on the factorization of
these numbers. This is characteristic of all FFT algorithms.
Best performance is realized when n1 and n2 are each a power of 2. In that case, the number of arithmetic
operations in the calculation is proportional to n1.n2.log 2 (n1.n2).
Performance is slightly worse when n1 or n2 contain factors of 3. It is worse if n1 or n2 contain factors of 5.
Worst performance is when n1 and n2 are prime numbers. In that case, the number of arithmetic operations
2
in the calculation is proportional to (n1 . n2) .
The kernel routines are optimized for values of n1 and n2 that are products of powers of 2, 3, and 5. The
values of n1 and n2 also relate to vectorization and multitasking performance. Each of the dimensions is
used as a vector length for part of the calculation. Thus, as with all vector calculations, performance is less
than optimal if either n1 or n2 is small (for example, < 8).
If either of the dimensions is large enough, CFFT2D will multitask. If
MIN (n1,n2)
≥16
ncpus
where ncpus is the number of CPUs being used, the entire calculation runs in multitasked mode.
Performance relative to the increment arguments
The increment arguments have no effect on the algorithm itself, but their values are significant for memory
contention.

004– 2081– 002 205


CFFT2D ( 3S ) CFFT2D ( 3S )

The stride for vector loads is, alternately, inc1x and inc2x. To avoid memory bank conflicts, neither number
should be a large multiple of 2. Best performance occurs when both numbers are odd. One way to do this
is to make the leading dimension of array x an odd number when the array is declared in the calling
program.
Likewise, the stride for vector stores are inc1y and inc2y, and the best performance occurs when both
numbers are odd.
Performance relative to the amount of workspace
To do all of the FFTs in one lot, the workspace required (in real words of storage) is nwork = 4 . n1 . n2.
If n1 and n2 are large, this amounts to a lot of memory. You can provide less workspace and still obtain
very good performance. At a minimum, you need the following amount of storage, in (real) words:
nwork = 4(MAX(n1, n2)MIN(n1, n2, 16ncpus))
where ncpus is the number of CPUs being used. If you give a value of nwork in the range:
4(MAX(n1, n2)MIN(n1, n2, 16ncpus)) ≤ nwork < 4(n1n2)
CFFT2D divides the work into lots, in which the size of each lot is sufficiently small to be accommodated
by the workspace provided. For best performance, nwork should be at least 128(n1n2)ncpus.

EXAMPLES
The following program computes the forward and inverse two-dimensional Fourier transform of a random
matrix of complex numbers and compares the result with the original matrix.
PAR AME TER (N1 = 256 , N2 = 300 )
COM PLEX X(N 1, N2), Y(N1, N2)
PAR AMETER (NT ABL E = 100 + 2*( N1 + N2))
PAR AME TER (NW ORK = 4*N1*N 2)
REA L TAB LE( NTA BLE), WOR K(N WOR K)
LOG ICA L LPA SS
*-- ------ --- ------ ------ --- --- ------ --- --- --- --- ---
* Fil l arr ay X with ran dom com ple x numbers.

DO 2, J = 1, N2
DO 1, I = 1, N1
X(I ,J) = CMP LX( RAN F() , RAN F())
1 CONTIN UE
2 CONTIN UE
*-- ------ --- ------ ------ --- --- ------ --- --- --- ------
* Comput e Y = 2-D Fou rie r tra nsform of X.

CAL L CFF T2D (+1, N1, N2, 1.0 ,


& X, 1, N1, Y, 1, N1,
& TABLE, NTA BLE, WORK, NWORK)
*-- ------ --- ------ ------ --- --- ------ --- --- --- ------

206 004– 2081– 002


CFFT2D ( 3S ) CFFT2D ( 3S )

* Compute Y = inv ers e 2-D tra nsf orm of Y.

CAL L CFFT2D (-1 , N1, N2, 1.0 /(N1*N 2),


& Y, 1, N1, Y, 1, N1,
& TAB LE, NTA BLE , WOR K, NWORK)
*-- ------ ------ --- --------- --- --- --- ------ --- --- ---
* Compar e X and Y.

LPASS = .TRUE.
DO 5, J = 1, N2
DO 4, I = 1, N1
ERR OR = ABS (X( I,J )-Y(I, J)) / ABS (X( I,J))
LPASS = LPA SS .AN D. (ER ROR .LE . 1.0 E-6 )
4 CONTINUE
5 CONTINUE
IF (.NOT. LPA SS) PRINT *,’Fai led the tes t’
IF (LPASS ) PRI NT *, ’Pa sse d the tes t’
END

SEE ALSO
CCFFT(3S), CFFT(3S) to calculate a one-dimensional FFT. CCFFT(3S) supersedes most uses of CFFT(3S).
CCFFT2D(3S), which supersedes most uses of CFFT2D
CCFFT3D(3S), CFFT3D(3S) to calculate a three-dimensional FFT. CCFFT3D(3S) supersedes most uses of
CFFT3D(3S).
CCFFTM(3S), MCFFT(3S) to calculate multiple one-dimensional FFTs. CCFFTM(3S) supersedes most uses
of MCFFT(3S).

004– 2081– 002 207


CFFT3D ( 3S ) CFFT3D ( 3S )

NAME
CFFT3D – Applies a multitasked three-dimensional complex Fast Fourier Transform (FFT)

SYNOPSIS
CALL CFFT3D (isign, n1, n2, n3, scale, x, inc1x, inc2x, inc3x, y, inc1y, inc2y, inc3y,
table, ntable, work, nwork)

IMPLEMENTATION
UNICOS systems

DESCRIPTION
CFFT3D computes the three-dimensional complex Fourier transform of the complex matrix X, and it stores
the results in the complex matrix Y. For most purposes, CFFT3D is superseded by CCFFT3D(3S).
On this man page, the first dimension of a three-dimensional matrix is defined as the row dimension, the
second dimension is the column dimension, and the third dimension is the plane dimension.
Suppose that the matrices are stored in Fortran arrays, which are declared as follows:
COMPLE X X(0 :N1-1, 0:N2-1, 0:N 3-1 )
COM PLEX Y(0:N1 -1, 0:N2-1 , 0:N3-1 )

CFFT3D computes the following formula:


n1−1 n2−1 n3−1
j1.k1 j2.k2 j3.k3
Y k1,k2,k3 = scale Σ j2=0
j1=0
Σ j3=0
Σ X j1, j2, j3 ω1 ω2 ω3

for k1 = 0,. . .,n1– 1, k2 =0, . . ., n2– 1, k3 =0, . . ., n3– 1


where:
isign . 2 . π . i
ω1 = e n1
isign = ±1
isign . 2 . π . i
ω2 = e n2
e = 2.71828. . .
isign . 2 . π . i
ω3 = e n3
π = 3.14159. . .

i =√−1
Different authors use different conventions for which of the transforms, isign = +1 or isign = – 1, is the
forward or the inverse transform, and what the scale factor should be in either case. In this documentation,
when isign = +1, it is called the forward transform, and when isign = – 1, it is called the inverse transform.
This routine has the following arguments:
isign Integer. (input)
Specifies whether to initialize the table array or to do the forward or inverse Fourier transform,
as follows:

208 004– 2081– 002


CFFT3D ( 3S ) CFFT3D ( 3S )

isign = 0 Initializes the table array


isign = +1 Computes the forward transform
isign = – 1 Computes the inverse transform
n1 Integer. (input)
Transform size in the first dimension. If n1 is not positive, CFFT3D returns without computing
any transform.
n2 Integer. (input)
Transform size in the second dimension. If n2 is not positive, CFFT3D returns without
computing any transform.
n3 Integer. (input)
Transform size in the third dimension. If n3 is not positive, CFFT3D returns without computing
any transform.
scale Real. (input)
Scale factor. Each element of the output array is multiplied by scale after taking the Fourier
transform, as defined previously.
x Complex array element. (input)
First element to be used in an input array of values to be transformed.
inc1x Integer. (input)
Input array increment in the first dimension (the address increment between successive complex
array row elements in a given column and plane of the input array). To access successive row
elements in a given column and plane of the declared input array, set inc1x = 1. inc1x must not
be 0.
inc2x Integer. (input)
Input array increment in the second dimension (the address increment between successive
complex array column elements in a given row and plane of the input array). To access
successive column elements in a given row and plane of the declared input array, set inc2x to be
the declared leading dimension of the input array. inc2x must not be 0.
inc3x Integer. (input)
Input array increment in the third dimension (the address increment between successive complex
array plane elements in a given row and column of the input array). To access successive plane
elements in a given row and column of the declared input array, set inc3x to be the product of
the two leading dimensions of the input array. inc3x must not be 0.
y Complex array element. (output)
First element to be output in the array of transformed values. The output array may be the same
as the input array, in which case, the transform is done in place if the following are true:
inc1x = inc1y
inc2x = inc2y
inc3x = inc3y

004– 2081– 002 209


CFFT3D ( 3S ) CFFT3D ( 3S )

inc1y Integer. (input)


Output array increment in the first dimension (the address increment between successive complex
array row elements in a given column and plane of the output array). To access successive row
elements in a given column and plane of the declared input array, set inc1y = 1. inc1y must not
be 0.
inc2y Integer. (input)
Output array increment in the second dimension (the address increment between successive
complex array column elements in a given row and plane of the output array). To access
successive column elements in a given row and plane of the declared input array, set inc2y to be
the declared leading dimension of the input array. inc2y must not be 0.
inc3y Integer. (input)
Output array increment in the third dimension (the address increment between successive
complex array plane elements in a given row and column of the output array). To access
successive plane elements in a given row and column of the declared output array, set inc3y to
be the product of the two leading dimensions of the output array. inc3y must not be 0.
table Real array of dimension ntable. (input or output)
Table of factors and trigonometric functions. This array can be initialized by a call to CFFT3D
with isign = 0.
ntable Integer. (input)
Number of (real) words in the table array. The value of ntable should be at least
2(n1 + n2 + n3) + 100. If the value of ntable does not provide enough space, CFFT3D prints an
error message and stops.
work Real array. (scratch output)
Work array of size nwork. This is a scratch array used for intermediate calculations. It must be
a different address space from the input and output arrays.
nwork Integer. (input)
Number of words in the work array. The value of nwork should be at least 64(MAX(n1, n2, n3)).
If the value of nwork does not provide enough space, CFFT3D prints an error message and
stops.

NOTES
This section includes information about the algorithm for CFFT3D, table initialization, increment arguments,
and performance tips.
Algorithm
CFFT3D uses MCFFT(3S) to do multiple FFTs first on all of the rows, then on all of the columns, and then
on all of the planes of the input matrix.

210 004– 2081– 002


CFFT3D ( 3S ) CFFT3D ( 3S )

Table Initialization
The table array stores factors of n1, n2, and n3 and also trigonometric tables that are used in calculation of
the FFT. You can initialize table explicitly by calling CFFT3D with isign = 0. If you do not initialize
table, CFFT3D does so automatically on the first call. If the values of the problem size, n1, n2, and n3, do
not change, table does not need to be reinitialized. If you call CFFT3D with different values of n1, n2, and
n3 without reinitializing table first, CFFT3D reinitializes table automatically.
Reinitialization of table is relatively time-consuming. If you are continually changing the problem size, you
might consider using more than one table array, so that it will not have to be reinitialized on each call to
CFFT3D.
If you initialize table explicitly by calling the routine with isign = 0, the only arguments that are significant
are isign, n1, n2, n3, table, and ntable. In this case, the other arguments are ignored.
CFFT3D checks the value of ntable to ensure that enough space is available to store the entire table. If
ntable is not large enough, the routine stops after printing an error message, which indicates the amount of
table space required.
Increment Arguments
The inc1x, inc2x, inc3x, inc1y, inc2y, and inc3y increment arguments describe how the matrices are stored in
Fortran arrays. These arguments are the link between the mathematical matrices and their representation in
computer memory. The use of these increment arguments allows complete generality in specifying the
matrices. Because CFFT3D deals with three-dimensional matrices, some explanation is necessary.
Consider the following 2-by-3-by-4 matrix X:
X(1,1, 1) X(1 ,2, 1) X(1 ,3, 1)
X(2 ,1,1) X(2 ,2,1) X(2,3, 1)

X(1 ,1,2) X(1 ,2,2) X(1,3, 2)


X(2 ,1,2) X(2 ,2,2) X(2,3, 2)

X(1 ,1,3) X(1 ,2,3) X(1,3, 3)


X(2 ,1,3) X(2 ,2,3) X(2,3, 3)

X(1 ,1,4) X(1 ,2,4) X(1,3, 4)


X(2 ,1,4) X(2 ,2,4) X(2,3, 4)

Suppose that this matrix is declared by the Fortran statement:


COMPLE X X(2 ,3,4)

Fortran stores matrices "by column," meaning that it stores the matrix so that the first index changes most
rapidly and the last index changes least rapidly, which results in the following order:

004– 2081– 002 211


CFFT3D ( 3S ) CFFT3D ( 3S )

X(1 ,1, 1) –> X(2 ,1, 1) –> X(1 ,2, 1) –> X(2 ,2, 1) –> X(1 ,3, 1) –> X(2 ,3, 1) –>
X(1 ,1, 2) –> X(2 ,1, 2) –> X(1 ,2, 2) –> X(2 ,2, 2) –> X(1 ,3, 2) –> X(2 ,3, 2) –>
X(1 ,1, 3) –> X(2 ,1, 3) –> X(1 ,2, 3) –> X(2 ,2, 3) –> X(1 ,3, 3) –> X(2 ,3, 3) –>
X(1 ,1, 4) –> X(2 ,1, 4) –> X(1 ,2, 4) –> X(2 ,2, 4) –> X(1 ,3, 4) –> X(2 ,3, 4)

Thus, the increment in the first dimension, inc1x, is 1. The increment in the second dimension, inc2x, is the
(address) distance between X(1,1,1) and X(1,2,1), which is 2, the leading dimension of X.
The increment in the third dimension, inc3x, is the (address) distance between X(1,1,1) and X(1,1,2),
which is 6. This number 6 is the product of the first two leading dimensions of X.
Generally, the increment in the second dimension is the leading dimension of the array as it is declared in
the Fortran program, and the increment in the third dimension is the product of the two leading dimensions
of the array.
The increment arguments are not directly related to the values of n1, n2, and n3, except that the matrix must
fit in the allocated address space.
Negative increments are legal. If a row, column, or plane increment is negative, the address given as the x
or y argument should be the address of the first element used in the array (last in memory) by row number,
column number, or plane number.
Performance Tips
CFFT3D works for any values of the arguments, subject only to the restrictions given previously. The
performance of this algorithm, however, depends on the values of the following arguments:
• n1, n2, n3: problem size in each dimension
• inc1x, inc2x, inc3x, inc1y, inc2y, inc3y: increment arguments
• nwork: (amount of workspace)
Each of these factors is considered separately in the following subsections.
Performance relative to the problem size
CFFT3D computes an FFT for any value of n1, n2, and n3, but the performance depends on the factorization
of these numbers. This is characteristic of all FFT algorithms.
Best performance is realized when n1, n2, and n3 are each a power of 2. In that case the number of
arithmetic operations in the calculation is proportional to
n1 . n2 . n3 . log 2 (n1 . n2 . n3).
Performance is slightly worse when n1, n2, or n3 contain factors of 3. It is worse if n1, n2, or n3 contain
factors of 5. Worst performance is when n1, n2, and n3 are prime numbers. In that case, the number of
2
arithmetic operations in the calculation is proportional to (n1 . n2 . n3) .
The kernel routines are optimized for values of n1, n2, and n3 that are products of powers of 2, 3, and 5.
The values of n1, n2, and n3 also relate to vectorization and multitasking performance. Each of the
dimensions is used as a vector length for part of the calculation. Thus, as with all vector calculations,
performance will be less than optimum if either n1, n2, or n3 is small (for example, < 8).

212 004– 2081– 002


CFFT3D ( 3S ) CFFT3D ( 3S )

If any of the dimensions is large enough, CFFT3D will multitask. If


MIN (n1,n2,n3)
≥16,
ncpus
where ncpus is the number of CPUs being used, the entire calculation runs in multitasked mode.
Performance relative to the increment arguments
The increment arguments have no effect on the algorithm itself, but their values are significant for memory
contention.
The stride for vector loads alternates between inc1x, inc2x, and inc3x. Likewise, the strides for vector stores
are inc1y, inc2y, and inc3y. To avoid memory bank conflicts, none of these numbers should be a multiple of
a large power of 2. Best performance occurs when all these numbers are odd. To ensure this, declare both
the input and output arrays with odd numbers for both of the two leading dimensions.
Performance relative to the amount of workspace
The size of the workspace, nwork, is also relevant to performance. To do all of the FFTs in one lot requires
the following amount of workspace (in real words of storage):
nwork = 4(MAX(n1,n2)MAX(n2,n3))
If n1, n2, and n3 are large, this amounts to a lot of memory. You can provide less workspace and still
obtain very good performance. At a minimum, you need the following (in real words of storage):
nwork = 4(MAX(n1,n2,n3)MIN(n1,n2,n3,16ncpus))
where ncpus is the number of CPUs being used. If you specify less than this minimum, CFFT3D generates
an error message. If you give a value of nwork in the range:
4(MAX(n1,n2,n3) MIN(n1,n2,n3,16ncpus)) ≤ nwork < 4((MAX(n1,n2)MAX(n2,n3))
CFFT3D divides the work into lots, in which the size of each lot is sufficiently small to be accommodated
by the workspace provided. For best performance, nwork should be at least 128(n1 . n2 . n3 . ncpus).

EXAMPLES
The following program computes the forward and inverse two-dimensional Fourier transform of a random
matrix of complex numbers and compares the result with the original matrix.
PAR AME TER (N1 = 16, N2 = 18, N3 = 25)
COM PLEX X(N1, N2, N3) , Y(N1, N2, N3)
PAR AME TER (NT ABL E = 100 + 2*( N1 + N2 + N3) )
PAR AMETER (NW ORK = 4*N2*N 3)
REA L TAB LE( NTA BLE), WOR K(N WOR K)
LOG ICA L LPA SS
*----- --- ------ ------ --- --- ------ ------ --- ------ ---
* Fill array X with random com plex numbers.
DO 3, K = 1, N3
DO 2, J = 1, N2
DO 1, I = 1, N1
X(I,J, K) = CMP LX(RAN F(), RANF())

004– 2081– 002 213


CFFT3D ( 3S ) CFFT3D ( 3S )

1 CONTIN UE
2 CON TINUE
3 CON TIN UE
*----- --- ------ ------ --- --- --- ------ --- ------ ------
* Comput e Y = 3-D Fou rie r tra nsform of X.
CALL CFFT3D (+1 , N1, N2, N3, 1.0 ,
& X, 1, N1, N1* N2, Y, 1, N1, N1*N2,
& TAB LE, NTABLE , WOR K, NWORK)
*----- --- ------ ------ --- --- --- ------ --- ------ ------
* Comput e Y = Inv ers e 3-D tra nsform of Y.
CALL CFFT3D (-1 , N1, N2, N3, 1.0 /(N1*N 2*N3),
& Y, 1, N1, N1* N2, Y, 1, N1, N1*N2,
& TAB LE, NTABLE , WOR K, NWORK)
*----- --- ------ ------ --- --- --- ------ --- ------ ------
* Com par e X and Y.
LPA SS = .TRUE.
DO 6, K = 1, N3
DO 5, J = 1, N2
DO 4, I = 1, N1
ERR OR = ABS(X( I,J ,K) -Y( I,J,K) )/ABS(X(I ,J,K))
LPA SS = LPASS .AN D. (ER ROR .LE . 1.0E-6 )
4 CONTIN UE
5 CON TINUE
6 CON TIN UE
IF (.N OT. LPASS) PRINT *,’ Fai led the test’
IF (LP ASS ) PRI NT *, ’Pa sse d the test’
END

SEE ALSO
CCFFT(3S), CFFT(3S) to calculate a one-dimensional FFT. CCFFT(3S) supersedes most uses of CFFT(3S).
CCFFT2D(3S), CFFT2D(3S) to calculate a two-dimensional FFT. CCFFT2D(3S) supersedes most uses of
CFFT2D(3S).
CCFFT3D(3S), which supersedes most uses of CFFT3D
CCFFTM(3S), MCFFT(3S) to calculate multiple one-dimensional FFTs. CCFFTM(3S) supersedes most uses
of MCFFT(3S).

214 004– 2081– 002


CFFTMLT ( 3S ) CFFTMLT ( 3S )

NAME
CFFTMLT – Applies complex-to-complex Fast Fourier Transforms (FFTs) on multiple input vectors

SYNOPSIS
CALL CFFTMLT (ar, ai, work, trigs, ifax, inc, jump, n, lot, isign)

IMPLEMENTATION
UNICOS systems

DESCRIPTION
This routine is inow obsolete. It has been replaced by CCFFTM. See the CCFFTM man page for details on
its use.
CFFTMLT applies complex-to-complex FFTs on more than one input vector, as follows:
(ar (jump .l +inc .k +1),ai (jump .l +inc .k +1)) =
n −1
isign .i .2.π. j .k )
Σ exp(
j =0 n
(ar (jump .l +inc . j +1), ai (jump .l +inc . j +1))

for
k = 0,1,. . .,n– 1
l = 0, . . ., lot– 1,
i = √−1
This calculation is performed for each of the n vectors in the input.
Vectorization is achieved by doing parallel transforms, with vector length = lot.
This routine has the following arguments:
ar Real array of dimension n . lot. (input and output)
On input, it contains the real part of the input data. On output, it contains the real part of the
transformed data.
ai Real array of dimension n . lot. (input and output)
On input, it contains the imaginary part of the input data. On output, it contains the imaginary
part of the transformed data.
work Real array of dimension 4 . n . lot. (scratch output)
Work storage array.
trigs Real array of dimension 2*n. (input)
Must be initialized to contain sine and cosine tables. The following call initializes both trigs
and ifax:
CALL CFTFAX (n,ifax,trigs)

004– 2081– 002 215


CFFTMLT ( 3S ) CFFTMLT ( 3S )

ifax Integer array of dimension 19. (input)


Contains a previously prepared list of factors of n.
inc Integer. (input)
The increment within each data vector.
jump Integer. (input)
The increment between the start of each data vector. inc and jump apply to both the real and
imaginary parts of the data. To obtain best performance, jump should be an odd number.
n Integer. (input)
Length of the data vectors. On UNICOS systems, n ≥ 0.
Any value of n that is not valid causes CFTFAX to return the error code ifax(1) = – 99.
lot Integer. (input)
The number of data vectors.
isign Integer. (input)
isign = +1 Fourier analysis (forward transform)
isign = – 1 Fourier synthesis (inverse transform)

NOTES
In the division by n, the normalization used by CFFTMLT differs from that used by CFFT2, CRFFT2, and
RCFFT2.

SEE ALSO
CCFFTM(3S), which supersedes this routine only on Cray Y-MP systems
RFFTMLT(3S) to calculate multiple real-to-complex or complex-to-real FFTs

216 004– 2081– 002


CRFFT2 ( 3S ) CRFFT2 ( 3S )

NAME
CRFFT2 – Applies a complex-to-real Fast Fourier Transform (FFT)

SYNOPSIS
CALL CRFFT2 (init, ix, n, x, work, y)

IMPLEMENTATION
UNICOS systems

DESCRIPTION
CRFFT2 calculates the following:
n −1
2πi
yk = Σ xj
j =0
exp (±
n
jk ) for k = 0,1,. . .,n– 1

This routine has the following arguments:


init Integer. (input)
If nonzero, generates sine and cosine tables in work. If 0, calculates FFT by using sine and
cosine tables of the previous call.
ix Integer. (input)
> 0 Calculates a forward transform (analysis).
< 0 Calculates an inverse transform (synthesis).
n Integer. (input)
m
Size of the Fourier transform (2 , for some m ≥ 3).
x Complex array of dimension (n / 2) + 1. (input)
Input vector.
n 102466
Range of x: 2466
≤ xi ≤ for i = 1,2,. . .,n.
10 n
work Complex array of dimension (3 . n/2) + 2. (input)
Work storage vector.
y Real array of dimension n. (input)
Result vector.

NOTES
x j elements are complex and related by x j = x ((n– j)) for j = 1,2,. . .,(n / 2).
Only the first (n / 2)+1 elements are stored in x.

004– 2081– 002 217


CRFFT2 ( 3S ) CRFFT2 ( 3S )

SEE ALSO
CFFT2(3S), RCFFT2(3S)
SCFFT(3S) for a description of CSFFT, a routine that supersedes CRFFT2 only on UNICOS systems

218 004– 2081– 002


DESCINIT3D ( 3S ) DESCINIT3D ( 3S )

NAME
DESCINIT3D – Initializes a descriptor vector that contains information about the distribution of a
three-dimensional (3D) array across a 3D grid of processors

SYNOPSIS
CALL DESCINIT3D (desc, nx, ny, nz, nxpp, nypp, nzpp, pesx, pesy, pesz, ictxt, lldx, lldy,
info)

IMPLEMENTATION
UNICOS/mk systems

DESCRIPTION
DESCINIT3D initializes a descriptor vector desc with information about the distribution of a 3D array A
across a 3D grid of processors. The information contained in this descriptor vector allows any routine that
uses distributed 3D arrays to know how the data is distributed across the processors. To specify a lower
dimensional grid of processors, initialize the corresponding entries in the descriptor vector to be 1.
Description of the Distributed Data
Consider a 3D array A, which is passed to a distributed library routine to be operated on. Let this 3D array
A of global size nx-by-ny-by-nz be distributed over a 3D grid of processors of size npx-by-npy-by-npz where
N$PES = npx x npy x npz.
To specify a two-dimensional (2D) grid of processors, the dimension any one of npx, npy, or npz could be
initialized to 1.
Each processor is assigned an address to denote its location in the 3D grid. The routine that initializes the
3D grid is called GRIDINIT3D(3S), and the user must first call it before calling DESCINIT3D (see the
man pages for GRIDINIT3D(3S)). This sets up the processor set as a 3D grid of processors. If you
initialized a 3D grid of size npx-by-npy-by-npz, the call to GRIDINIT3D(3S) would be as follows:
CALL GRIDIN IT3 D (ictxt, npx, npy, npz)

Other support routines are provided to help manage 3D-processor grids:


GRIDINFO3D(3S) Provides information about the processor grid and must be called after
GRIDINIT3D(3S) if necessary.
PCOORD3D(3S) Determines the grid coordinates of the processor given the processing element (PE)
number.
PNUM3D(3S) Determines the PE number given the grid coordinates.
As allowed by Fortran D or High-Performance Fortran (HPF), the array A could be distributed across the
processor grid in one of four ways across each dimension, that is, degenerate, block, block-cyclic, and cyclic.
Descriptor desc specifies this distribution.

004– 2081– 002 219


DESCINIT3D ( 3S ) DESCINIT3D ( 3S )

The nxpp, nypp, and nzpp arguments are initialized to the block size along each of the dimensions. Consider
the X dimension. If all of the data along this dimension will reside on the same processor (degenerate
distribution), nxpp would have the same value as nx. If the distribution is block, nxpp would be initialized
to ICEIL(nx, npx). If the distribution were cyclic, nxpp would be initialized to 1, and if the distribution
were block-cyclic, nxpp would be initialized to the corresponding block size.
Distribution along X Dimension Value of nxpp
Degenerate nxpp = nx
Block nxpp = ICEIL(nx, npx)
Cyclic nxpp = 1
Block-cyclic nxpp = block size desired
The nypp and nzpp arguments are initialized accordingly.
As an example, consider an array A of size 128-by-200-by-100 distributed on a 128-processor 2D grid of
size 1-by-16-by-8, as follows:
nx = 128
ny = 200
nz = 100

npx = 1
npy = 16
npz = 8

Let the distribution be degenerate along the X axis, block along the Y axis, and block-cyclic with a size of 4
along the Z axis, then
nxpp = 128
nypp = ICE IL(200,16 ) = 13
nzpp = 4

The pesx, pesy, and pesz arguments are the grid coordinates of the processor specifying the location of the
first element of the global array A (that is, the processor that owns A(1,1,1)).
If the first element is in processor pes, then pesx, pesy, and pesz can be obtained by a call to
PCOORD3D(3S), as follows:
CALL PCOORD 3D (ictxt, pes, pesx, pesy, pesz)

The first global element of the array A given by A(1,1,1) is usually located in processor 0. Therefore, the
following is true:
pesx = 0
pesy = 0
pesz = 0

220 004– 2081– 002


DESCINIT3D ( 3S ) DESCINIT3D ( 3S )

However, in some cases it is advantageous to align array A in a slightly skewed manner in regard to another
array to avoid some communication. The following example illustrates this in the one-dimensional case.
Consider two vectors X and Y of length N that are involved in the computation (globally speaking) of a third
vector Z, as follows:
do i = 1, N
Z(i) = X(i ) + Y(m od(i+N /2, N))
end do

In this example, the vector Y is to be stored in a skewed manner in regard to X so that the first element of X
will reside on the same processor that has Y(mod(1+N/2),N). If pesx were this processor, in the
descriptor for the vector X, the user would pass pesx to the routine to initialize that variable. You can
extend this idea to the other two dimensions.
However, as mentioned earlier, most applications would not require this flexibility in the data distribution;
therefore, pesx, pesy, and pesz in these applications would be initialized to 0.
The ictxt argument is a handle that describes the 3D partitioning of the set of processors done by
GRIDINIT3D(3S).
The lldx and lldy arguments are the leading dimensions of the local array in each processor that stores a
share of the global data.
If the distribution along the X or Y axes were degenerate, lldx ≥ nx and lldy ≥ ny, respectively.
If the distribution were block along either dimension, lldx ≥ ICEIL(nx,npx) and lldy ≥ ICEIL(ny,npy).
If the distribution were cyclic along either dimension, lldx ≥ INT(nx/npx) + 1 and lldy ≥ INT(ny/npy) + 1.
If the distribution were block-cyclic along either dimension with a block size of nxpp and nypp,
lldx ≥ INT(nx/(npx*nxpp)) + nxpp and lldy ≥ INT(ny/(npy*nypp)) + nypp.
The following example uses the previous example with this change: where the array A of size
128-by-200-by-100 was distributed on a 128-processor 2D grid of size 1-by-16-by-8 with the data along the
X axis being degenerate and the data along the Y axis being block, the following is true:
nx = 128
ny = 200
nz = 100

npx = 1
npy = 16
npz = 8

lldx ≥ 128 and lldy ≥ 13

004– 2081– 002 221


DESCINIT3D ( 3S ) DESCINIT3D ( 3S )

The DESCINIT3D routine accepts the following arguments.


desc Vector of length 12. (output)
Contains information about the distribution of 3D data across a 3D grid of processors.
nx Integer. (input)
Global size of the array along the X axis.
ny Integer. (input)
Global size of the array along the Y axis.
nz Integer. (input)
Global size of the array along the Z axis.
nxpp Integer. (input)
Block size of the distribution along the X axis.
nypp Integer. (input)
Block size of the distribution along the Y axis.
nzpp Integer. (input)
Block size of the distribution along the Z axis.
pesx Integer. (input)
X coordinate of the processor address that specifies the location of element (1,1,1) of the global
array.
pesy Integer. (input)
Y coordinate of the processor address that specifies the location of element (1,1,1) of the global
array.
pesz Integer. (input)
Z coordinate of the processor address that specifies the location of element (1,1,1) of the global
array.
ictxt Integer. (input)
The context variable initialized by GRIDINIT3D(3S). This is a handle that describes the grid.
lldx Integer. (input)
Local leading X dimension of the local array that contains a processors share of the global array.
lldy Integer. (input)
Local leading Y dimension of the local array that contains a processors share of the global array.
info Integer. (output)
Variable to indicate error status. A value of 0 is returned in the info argument if all of the
arguments passed to DESCINIT3D have legal values. A nonzero value is returned if any of the
arguments have illegal values.

222 004– 2081– 002


DESCINIT3D ( 3S ) DESCINIT3D ( 3S )

NOTES
The GRIDINIT3D(3S) routine must be called somewhere in the program before the first call to
DESCINIT3D.

SEE ALSO
GRIDINFO3D(3S), GRIDINIT3D(3S), PCOORD3D(3S), PNUM3D(3S)

004– 2081– 002 223


FILTERG ( 3S ) FILTERG ( 3S )

NAME
FILTERG – Computes a correlation of two vectors

SYNOPSIS
CALL FILTERG (a, m, d, n, o)

IMPLEMENTATION
UNICOS systems

DESCRIPTION
FILTERG computes a correlation of two vectors.
Given the following:
(a i ) i = 1,. . .,m Filter coefficients
(d j ) j = 1,. . .,n Data
FILTERG computes the following:
m
oi = Σ
j =1
a j di +j −1 i =1,. . . n−m +1

This routine has the following arguments:


a Real array of dimension m. (input)
Vector of filter coefficients.
m Integer. (input)
Number of filter coefficients.
d Real array of dimension n. (input)
Data vector.
n Integer. (input)
Number of data points.
o Real array of dimension n– m+1. (output)
Resulting vector.

SEE ALSO
FILTERS(3S)

224 004– 2081– 002


FILTERS ( 3S ) FILTERS ( 3S )

NAME
FILTERS – Computes a correlation of two vectors (symmetric coefficient)

SYNOPSIS
CALL FILTERS (a, m, d, n, r)

IMPLEMENTATION
UNICOS systems

DESCRIPTION
FILTERS computes the same correlation as FILTERG(3S) except that it assumes the filter coefficient vector
is symmetric.
Given the following:

m 
(c i ) i = 1, . . .,  
 2 
(d j ) j=1,. . . n
m  m
  = for m even, and (m+1)/2 for m odd.
 
2 2
This is called the ceiling function.
FILTERS computes the following when m is an odd number:
(m −1)
2
ri = Σ
j =1
a j . (di +j −1 + di +m −j ) + a (m +1) . d
i +(
m +1)
2 2

i=1, . . ., n– m+1
FILTERS computes the following when m is an even number:
m
2
ri = Σ aj
j =1
. (di +j −1 + di +m −j )

i=1, . . ., n– m+1
This routine has the following arguments:
a Real array of dimension  m / 2  . (input)
Symmetric filter coefficient vector.
m Integer. (input)
Formal length of vector a. The actual length of a is as indicated previously.

004– 2081– 002 225


FILTERS ( 3S ) FILTERS ( 3S )

d Real array of dimension n. (input)


Data vector.
n Integer. (input)
Number of data points.
r Array of dimension n– m+1. (output)
Resulting vector.

SEE ALSO
FILTERG(3S)

226 004– 2081– 002


GGFFT ( 3S ) GGFFT ( 3S )

NAME
GGFFT – Applies a multitasked complex-to-complex Fast Fourier Transform (FFT)

SYNOPSIS
CALL GGFFT (isign, n, scale, x, y, table, work, isys)

IMPLEMENTATION
UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses private data only.

DESCRIPTION
GGFFT computes the Fast Fourier Transform (FFT) of the complex vector x, and it stores the result in vector
y.
In FFT applications, it is customary to use zero-based subscripts; the formulas are simpler that way.
Suppose that the arrays are dimensioned as follows:
COMPLE X(KIND =4) X(0 :N- 1), Y(0:N-1)

The output array is the FFT of the input array, using the following formula for the FFT:

n −1
Σ
. j .k
Yk = scale . X j . ωisign for k = 0, . . ., n −1
j =0

where
2.π.i
ω=e n

i = +√−1
π = 3.14159. . .
isign = ±1
Different authors use different conventions for which of the transforms, isign = +1 or isign = – 1, is the
forward or inverse transform, and what the scale factor should be in either case. You can make this routine
compute any of the various possible definitions, however, by choosing the appropriate values for isign and
scale.
The relevant fact from FFT theory is this: If you take the FFT with any particular values of isign and scale,
the mathematical inverse function is computed by taking the FFT with – isign and 1 / (n . scale). In
particular, if you use isign = +1 and scale = 1.0 for the forward FFT, you can compute the inverse FFT by
using isign = – 1 and scale = 1.0/n.
The output array may be the same as the input array, provided that n has at least 2 factors.

004– 2081– 002 227


GGFFT ( 3S ) GGFFT ( 3S )

This routine has the following arguments:


isign INTEGER(KIND=8). (input)
Specifies whether to initialize the table array or to do the forward or inverse Fourier transform,
as follows:
If isign = 0, the routine initializes the table array and returns. In this case, the only arguments
used or checked are isign, n, and table.
If isign = +1 or – 1, the value of isign is the sign of the exponent used in the FFT formula.
n INTEGER(KIND=8). (input)
Size of the transform; that is, the number of values in the input array (all systems): n ≥ 2
scale REAL(KIND=4). (input)
Scale factor. Each element of the output array is multiplied by scale after taking the Fourier
transform, as defined by the preceding formula.
x COMPLEX(KIND=4) array of dimension (0:n– 1). (input)
Input array of values to be transformed.
y COMPLEX(KIND=4) array of dimension (0:n– 1). (output)
Output array of transformed values. The output array may be the same as the input array, in
which case, the transform is done in place; that is, the input array is overwritten with the
transformed values.
table REAL(KIND=4) array of dimension 2n. (input or output)
Table of factors and trigonometric functions. If isign = 0, the routine initializes table (table is
output only). If isign = +1 or – 1, the values in table are assumed to be initialized already by a
prior call with isign = 0 (table is input only).
work REAL(KIND=4) array of dimension 4n. (input or output)
Work array. This is a scratch array used for intermediate calculations. Its address space must
be different address space from that of the input and output arrays.
isys INTEGER(KIND=8)array of dimension (0:isys(0)). (input and output)
The first element of the array specifies how many more elements are in the array.
Use isys to specify certain processor-specific parameters or options.
If isys(0) = 0, the default values of such parameters are used. In this case, the argument value
can be given as the scalar integer constant 0.
If isys(0) > 0, isys(0) gives the upper bound of the isys array; that is, if il = isys(0), user-
specified parameters are expected in isys(1) through isys(il).

228 004– 2081– 002


GGFFT ( 3S ) GGFFT ( 3S )

NOTES
This section contains information about the algorithm for GGFFT, the initialization of the table array, the
declaration of dimensions for x and y arrays, some performance tips, and some implementation dependent
details.
Algorithm
The algorithm used is a variant of Agarwal’s algorithm.
Initialization
The table array stores the trigonometric tables used in calculation of the FFT. You must initialize table by
calling the routine with isign = 0 prior to doing the transforms. If the value of the problem size, n, does not
change, table does not have to be reinitialized.
Dimensions
In the preceding description, it is assumed that array subscripts were zero-based, as is customary in FFT
applications. Thus, the input and output arrays are declared as follows:
COM PLEX(K IND =4) X(0 :N-1)
COM PLEX(K IND =4) Y(0 :N-1)

However, if you prefer to use the more customary Fortran style with subscripts starting at 1 you do not have
to change the calling sequence, as in (assuming N > 0):
COMPLE X(K IND=4) X(N )
COMPLE X(K IND=4) Y(N )

Performance Tips
This routine computes an FFT for any value of n, but the performance for a given value of n depends on the
prime factorization of n. This fact is characteristic of all FFT algorithms.
Fastest performance is realized when n is a power of 2; in which case, the number of floating-point
operations is approximately 5n . log 2 (n).
If n contains factors of 3, computation time is slightly longer, because more floating-point operations are
required. If n contains powers of 5, it is longer still. Slowest performance is when n is a prime number; in
2
which case, the number of floating-point operations is approximately 8n .
The kernel routines are optimized for values of n that are products of powers of 2, 3, and 5. (Because the
kernel routines have a special case for multiples of 4, powers of 4 will be slightly faster than odd powers of
2.)
Implementation-dependent Items
The UNICOS and UNICOS/mk FFT routines were designed so that they can be implemented efficiently on
many different architectures. The calling sequence is the same in any implementation. Certain details,
however, depend on the particular implementation. These details are confined to two areas:
• The first area is the size of the table and work arrays. Different systems may need different sizes. The
subroutine call requires no change, but you may have to change array sizes in the DIMENSION or type
statements that declare the arrays.

004– 2081– 002 229


GGFFT ( 3S ) GGFFT ( 3S )

• The second area is the isys parameter array, an array that gives certain implementation-specific
information. All features and functions of the FFT routines specific to any particular implementation are
confined to this isys array. On any implementation, you can use the default values by using an argument
value of 0.

EXAMPLES

Example 1: Initialize the complex array TABLE in preparation for doing an FFT of size 1024. Only the
ISIGN, N, and TABLE arguments are used in this case; you can use dummy arguments or zeros for the
other arguments in the subroutine call.
REAL(K IND =4) TAB LE( 100 + 8*1 024 )
CALL GGF FT( 0, 102 4, 0.0 , DUM MY, DUM MY, TABLE, DUM MY, 0)

Example 2: X and Y are complex arrays of dimension (0:1023). Take the FFT of X and store the results in
Y. Before taking the FFT, initialize the TABLE array, as in example 1.
COMPLE X(KIND =4) X(0:10 23), Y(0:10 23)
REAL TAB LE( 100 + 8*1024 )
REA L WOR K(8 *1024)
...
CALL GGFFT( 0, 102 4, 1.0, X, Y, TAB LE, WOR K, 0)
CALL GGFFT( 1, 102 4, 1.0, X, Y, TAB LE, WOR K, 0)

Example 3: Using the same X and Y as in example 2, take the inverse FFT of Y and store it back in X. The
scale factor 1/1024 is used. Assume that the TABLE array is already initialized.
CALL GGF FT( -1, 102 4, 1.0 /10 24.0, Y, X, TAB LE, WOR K, 0)

Example 4: Do the same computation as in example 2, but assume that the lower bound of each array is 1,
rather than 0. No change was needed in the subroutine calls.
COMPLE X X(1 024), Y(1024 )
...
CALL GGFFT( 0, 102 4, 1.0, X, Y, TABLE, WOR K, 0)
CALL GGFFT( 1, 102 4, 1.0, X, Y, TABLE, WOR K, 0)

Example 5: Do the same computation as in example 4, but put the output back in array X to save storage
space. Assume that TABLE is already initialized.
COMPLEX X(1024 )
...
CALL GGFFT( 1, 102 4, 1.0, X, X, TABLE, WOR K, 0)

SEE ALSO
CCFFT(3S), HGFFT(3S), SCFFT(3S),

230 004– 2081– 002


HCONV ( 3S ) HCONV ( 3S )

NAME
HCONV – Performs the convolution of two sequences of real numbers

SYNOPSIS
CALL HCONV (nh, nx, ny, h, x, y)

IMPLEMENTATION
UNICOS/mk systems

DESCRIPTION
HCONV computes the convolution of the filter sequence h with the data sequence x, producing the output
sequence y.
Suppose h and x are two sequences of real numbers, having nh and nx elements, respectively. As is
customary in signal processing applications, let the subscripts start at 0, therefore:
h = h(0), h(1), . . . h(nh – 1)
x = x(0), x(1), . . . x(nx – 1)
The "convolution product," y, is the sequence having elements defined by:
y(0) = h(nh– 1) . x(0) + h(nh– 2) . x(1) + . . . + h(0) . x(nh– 1)
y(1) = h(nh– 1) . x(1) + h(nh– 2) . x(2) + . . . + h(0) . x(nh)
y(2) = h(nh– 1) . x(2) + h(nh– 2) . x(3) + . . . + h(0) . x(nh+1)
This example definition assumes nx > nh.
The precise definition of the convolution is:
Yk = Σ H (nh −1−j ) . x (k +j )
0≤j ≤min

for 0 ≤ k ≤ ny−1.
The number of terms in the output sequence is specified by an argument, ny. If ny < nx, then the output
sequence is just truncated. If ny > nx, then zeros are appended to the output sequence.
By choosing ny > nx − nh+1, the routine does what is sometimes called "post-tapered" convolution. The
effect is as though the data sequence, x, were padded on the end with zeros, except that no zeros are actually
stored and no multiplications by zero are actually done.
This routine has the following arguments:
nh INTEGER(KIND=8). (input)
Specifies the number of elements in the filter sequence, h. nh ≥ 0.
nx INTEGER(KIND=8). (input)
Specifies the number of elements in the data sequence, x. nx ≥ 0.

004– 2081– 002 231


HCONV ( 3S ) HCONV ( 3S )

ny INTEGER(KIND=8). (input)
Specifies the number of elements in the output sequence, y. ny ≥ 0.
h REAL(KIND=4) array of dimension (0, nh−1). (input)
Specifies the input sequence of filter values.
x REAL(KIND=4) array of dimension (0, nx−1). (input)
Specifies the input sequence of data values.
y REAL(KIND=4) array of dimension (0, ny−1). (output)
Specifies the output matrix of convolutions.

NOTES
If ny = 0, the routine just returns. If either nh = 0 or nx = 0, the routine will zero the first ny elements in y
and return.

EXAMPLES

REAL(K IND =4) h(0 :2) , x(0 :3), y(0:7)


DATA (h( i), i = 0, 2) / 1.0, 2.0, 3.0 /
DAT A (x( i), i = 0, 3) / 4.0, 5.0 , 6.0, 7.0 /
CALL HCO NV(3, 4, 8, h, x, y)
PRINT *, y

The output produced is:


28. , 34. , 32. , 21. , 0., 0., 0., 0.

SEE ALSO
HCORR(3S), HCORRS(3S), SCONV(3S)

232 004– 2081– 002


HCORR ( 3S ) HCORR ( 3S )

NAME
HCORR – Performs the correlation of two sequences of real numbers

SYNOPSIS
CALL HCORR (nh, nx, ny, h, x, y)

IMPLEMENTATION
UNICOS/mk systems

DESCRIPTION
HCORR computes the correlation of the filter sequence h with the data sequence x, producing the output
sequence y.
Suppose h and x are two sequences of real numbers, having nh and nx elements, respectively. As is
customary in signal processing applications, let the subscripts start at 0, so
h = h(0), h(1), . . ., h(nh – 1)
x = x(0), x(1), . . ., x(nx – 1)
The "correlation product", y, is the sequence having elements defined by:
y(0) = h(0) . x(0) + h(1) . x(1) + . . . + h(nh – 1) . x(nh – 1)
y(1) = h(0) . x(1) + h(1) . x(2) + . . . + h(nh – 1) . x(nh)
y(2) = h(0) . x(2) + h(1) . x(3) + . . . + h(nh – 1) . x(nh + 1)
This example definition assumes that nx ≥ nh.
The precise definition is as follows:
Yk = Σ H (nh −1−j ) . x (k +j ) for 0 ≤ k ≤ny −1
0≤j ≤MIN

The number of terms in the output sequence is specified by the argument ny. If ny < nx, the output sequence
is just truncated. If ny > nx, zeros are appended to the output sequence.
By choosing ny > nx – nh + 1, the routine does what is sometimes called "post-tapered" correlation. The
effect is as though the data sequence, x, were padded on the end with zeros, except that no zeros are actually
stored and no multiplications by zero are actually done. This routine has the following arguments:
nh INTEGER(KIND=8). (input)
Specifies the number of elements in the filter sequence, h. nh ≥ 0.
nx INTEGER(KIND=8). (input)
Specifies the number of elements in the sequence of data sequence, x. nx ≥ 0.
ny INTEGER(KIND=8). (input)
Specifies the number of elements in the output sequence, y. ny ≥ 0.
h REAL(KIND+4) array of dimension (0, nh−1). (input)
Specifies the input sequence of filter values.

004– 2081– 002 233


HCORR ( 3S ) HCORR ( 3S )

x REAL(KIND=4) array of dimension (0, nx−1). (input)


Specifies the input sequence of data values.
y REAL(KIND=4) array of dimension (0, ny−1). (output)
Specifies the output matrix of convolutions.

NOTES
If ny = 0, the routine returns. If either nh = 0 or nx = 0, the routine will zero the first ny elements in y and
return.

EXAMPLES

REAL(K IND =4) h(0 :2), x(0:3) , y(0 :7)


DAT A (h( i), i = 0, 2) / 1.0 , 2.0 , 3.0 /
DATA (x( i), i = 0, 3) / 4.0, 5.0, 6.0 , 7.0 /
CAL L HCO RR(3, 4, 8, h, x, y)
PRI NT *, y

The output produced is:


32., 38. , 20. , 7., 0., 0., 0., 0.

SEE ALSO
HCONV(3S), HCORRS(3S), SCORR(3S)

234 004– 2081– 002


HCORRS ( 3S ) HCORRS ( 3S )

NAME
HCORRS – Performs the correlation of two sequences of real numbers (symmetric filter)

SYNOPSIS
CALL HCORRS (nh, nx, ny, h, x, y)

IMPLEMENTATION
UNICOS/mk systems

DESCRIPTION
HCORRS computes the correlation of the symmetric filter sequence h with the data sequence x, producing the
output sequence y. The filter, h, is assumed to be symmetric about its middle.
The computation carried out by HCORRS is exactly the same as that done by routine HCORR, with one
exception: the filter, h, is assumed to be symmetric, so only the first half of the elements are accessed. The
values of the second half are inferred from the first half and do not actually have to be supplied by the
calling routine.
To review the definition of correlation (not necessarily assuming a symmetric filter), suppose h and x are two
sequences of real numbers, having nh and nx elements, respectively. As is customary in signal processing
applications, let the subscripts start at 0, so
h = h(0), h(1), . . ., h(nh – 1)
x = x(0), x(1), . . ., x(nx – 1)
The "correlation product", y, is the sequence having elements defined by:
y(0) = h(0) . x(0) + h(1) . x(1) + . . . + h(nh – 1) . x(nh – 1)
y(1) = h(0) . x(1) + h(1) . x(2) + . . . + h(nh – 1) . x(nh)
y(2) = h(0) . x(2) + h(1) . x(3) + . . . + h(nh – 1) . x(nh + 1)
This example definition assumes that nx ≥ nh.)
The precise definition of correlation is as follows:
Yk = Σ H (nh −1−j ) . x (k +j ) for 0 ≤ k ≤ny −1
0≤j ≤MIN

The HCORRS routine makes the assumption that the filter is symmetric; in other words, that h(nh − j) = h(j),
for 0 ≤ j ≤ nh / 2.
Only the elements h(0) through h(nh/2) are accessed by the routine. The last half of the filter values are not
accessed and do not actually have to be supplied by the calling routine.
The number of terms in the output sequence is specified by an argument, ny. If ny < nx, then the output
sequence is just truncated. If ny > nx, then zeros are appended to the output sequence.

004– 2081– 002 235


HCORRS ( 3S ) HCORRS ( 3S )

By choosing ny > nx − nh+1, the routine does what is sometimes called "post-tapered" correlation. The
effect is as though the data sequence, x, were padded on the end with zeros, except that no zeros are actually
stored and no multiplications by zero are actually done.
This routine has the following arguments:
nh Integer. (input)
Specifies the number of elements in the filter sequence, h.
nh ≥ 0.
nx Integer. (input)
Specifies the number of elements in the data sequence, x.
nx ≥ 0.
ny Integer. (input)
Specifies the number of elements in the output sequence, y.
ny ≥ 0.
h REAL(KIND=4) array of dimension (0, nh/2). (input)
Specifies the input sequence of filter values. Only values h(0) through h(nh/2) are accessed; the
second half of the filter values are inferred from the symmetry of h.
x REAL(KIND=4) array of dimension (0, nx−1). (input)
Specifies the input sequence of data values.
y REAL(KIND=4) array of dimension (0, ny−1). (output)
Specifies the output sequence.

NOTES
If ny = 0, the routine just returns. If either nh = 0 or nx = 0, the routine will zero the first ny elements in y
and return.

EXAMPLES

REAL(KIND =4) h(0:2) , x(0:3) , y(0:7)


DAT A (h(i), i = 0, 2) / 1.0 , 2.0 , 3.0 /
DAT A (x(i), i = 0, 3) / 4.0 , 5.0, 6.0, 7.0 /
CALL HCO RRS (5, 4, 8, h, x, y)
print *, y

The output produced is:


46., 38., 20. , 7., 0., 0., 0., 0.

SEE ALSO
HCONV(3S), HCORR(3S), SCORRS(3S)

236 004– 2081– 002


HGFFT ( 3S ) HGFFT ( 3S )

NAME
HGFFT, GHFFT – Computes a real-to-complex or complex-to-real Fast Fourier Transform (FFT)

SYNOPSIS
CALL HGFFT (isign, n, scale, x, y, table, work, isys)
CALL GHFFT (isign, n, scale, x, y, table, work, isys)

IMPLEMENTATION
UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses only private data.

DESCRIPTION
HGFFT computes the FFT of the real array X, and it stores the results in the complex array Y. GHFFT
computes the corresponding inverse complex-to-real transform.
In FFT applications, it is customary to use zero-based subscripts; the formulas are simpler that way. First
the function of HGFFT is described. Suppose that the arrays are dimensioned as follows:
REA L(K IND=4) X(0 :n- 1)
COM PLEX(K IND =4) Y(0 :n/ 2)

Then the output array is the FFT of the input array, using the following formula for the FFT:
n −1
n
Σ
. j .k
Yk = scale X j . ωisign for k = 0, . . .,
j =0 2
where
2.π.i
ω=e n

i = +√−1
π = 3.14159. . .
isign = ±1
Different authors use different conventions for which of the transforms, isign = +1 or isign = – 1, is the
forward or inverse transform, and what the scale factor should be in either case. You can make these
routines compute any of the various possible definitions, however, by choosing the appropriate values for
isign and scale.
The relevant fact from FFT theory is this: If you call HGFFT with any particular values of isign and scale,
the mathematical inverse function is computed by calling GHFFT with – isign and 1 / (n . scale). In
particular, if you use isign = +1 and scale = 1.0 in HGFFT for the forward FFT, you can compute the
inverse FFT by using GHFFT with isign = – 1 and scale = 1.0/n.

004– 2081– 002 237


HGFFT ( 3S ) HGFFT ( 3S )

This routine has the following arguments:


isign INTEGER(KIND=8). (input)
Specifies whether to initialize the table array or to do the forward or inverse Fourier transform,
as follows:
If isign = 0, the routine initializes the table array and returns. In this case, the only arguments
used or checked are isign, n, and table.
If isign = +1 or – 1, the value of isign is the sign of the exponent used in the FFT formula.
n INTEGER(KIND=8). (input)
Size of transform. If n ≤ 2, HGFFT returns without calculating the transform.
scale REAL(KIND=4). (input)
Scale factor. Each element of the output array is multiplied by scale after taking the Fourier
transform, as defined in the preceding formula.
x HGFFT: REAL(KIND=4) array of dimension (0:n– 1). (input)
GHFFT: COMPLEX(KIND=4) array of dimension (0:n/2). (input)
Input array of values to be transformed.
y HGFFT: COMPLEX(KIND=4) array of dimension (0:n/2). (output)
GHFFT: REAL(KIND=4) array of dimension (0:n– 1). (output)
Output array of transformed values.
The output array, Y, is the FFT of the the input array, X, computed according to the preceding
formula. The output array may be equivalenced to the input array in the calling program. Be
careful when dimensioning the arrays, in this case, to allow for the fact that the complex array
contains two (real) words more than the real array.
table REAL(KIND=4) array of dimension (2n). (input or output)
Table of factors and trigonometric functions.
If isign = 0, the table array is initialized to contain trigonometric tables needed to compute an
FFT of size n.
If isign = +1 or – 1, the values in table are assumed to be initialized already by a prior call with
isign = 0.
work REAL(KIND=4) array of dimension 2n.
Work array used for intermediate calculations. Its address space must be different from that of
the input and output arrays.
isys INTEGER(KIND=8) array of dimension (0:isys(0)). (input and output)
The first element of the array specifies how many more elements are in the array. You may use
isys to specify certain processor-specific parameters or options.

238 004– 2081– 002


HGFFT ( 3S ) HGFFT ( 3S )

If isys(0) = 0, the default values of such parameters are used. In this case, you can specify the
argument value as the scalar integer constant 0. If isys(0)>0, isys(0) gives the upper bound of
the isys array; that is, if il=isys(0), user-specified parameters are expected in isys(1) through
isys(il).
Real-to-complex FFTs
Notice in the preceding formula that there are n real input values, and n / 2 + 1 complex output values. This
property is characteristic of real-to-complex FFTs.
The mathematical definition of the Fourier transform takes a sequence of n complex values and transforms it
to another sequence of n complex values. A complex-to-complex FFT routine, such as GGFFT(3S), will take
n complex input values, and produce n complex output values. In fact, one easy way to compute a real-to-
complex FFT is to store the input data in a complex array, then call routine GGFFT to compute the FFT.
You get the same answer when using the HGFFT routine.
The reason for having a separate real-to-complex FFT routine is efficiency. Because the input data is real,
you can make use of this fact to save almost half of the computational work.
The theory of Fourier transforms tells us that for real input data, you have to compute only the first n/2 + 1
complex output values, because the remaining values can be computed from the first half of the values by
the following simple formula:
Y(k)=conjg(Y(n-k)) for n / 2 ≤ k ≤ n-1
where the notation conjg(z) represents the complex conjugate of z.
In fact, in many applications, the second half of the complex output data is never explicitly computed or
stored. Likewise, as explained below, only the first half of the complex data has to be supplied for the
complex-to-real FFT.
Another implication of FFT theory is that, for real input data, the first output value, Y(0), will always be a
real number; therefore, the imaginary part will always be 0. If n is an even number, Y(n/2) will also be real
and thus, have zero imaginary part.
Complex-to-real FFTs
Consider the complex-to-real case. The effect of the computation is given by the preceding formula, but
with X complex and Y real.
Generally, the FFT transforms a complex sequence into a complex sequence. However, in a certain
application we may know the output sequence is real. Often, this is the case because the complex input
sequence was the transform of a real sequence. In this case, you can save about half of the computational
work.
According to the theory of Fourier transforms, for the output sequence, Y, to be a real sequence, the
following identity on the input sequence, X, must be true:
X(k) = conjg(X(n-k)) for n / 2 ≤k ≤ n-1
And, in fact, the input values X(k) for k > n/2 need not be supplied; they can be inferred from the first half
of the input.

004– 2081– 002 239


HGFFT ( 3S ) HGFFT ( 3S )

Thus, in the complex-to-real routine, GHFFT, the arrays can be dimensioned as follows:
COMPLEX(K IND =4) X(0:n/ 2)
REAL(K IND =4) Y(0 :n- 1)

There are n / 2 + 1 complex input values and n real output values. Even though only n/2 + 1 input values
are supplied, the size of the transform is still n in this case, because implicitly you are using the FFT
formula for a sequence of length n.
Another implication of the theory is that X(0) must be a real number (that is, it must have zero imaginary
part). Also, if n is even, X(n/2) must also be real. Routine GHFFT assumes that these values are real; if you
specify a nonzero imaginary part, it is ignored.

NOTES

Table Initialization
The table array stores the trigonometric tables used in calculation of the FFT. This table must be initialized
by calling the routine with isign = 0 prior to doing the transforms. The table does not have to be
reinitialized if the value of the problem size, n, does not change. Because HGFFT and GHFFT use the same
format for table, either can be used to initialize it (GGFFT uses a different table format).
Dimensions
In the preceding description, it is assumed that array subscripts were zero-based, as is customary in FFT
applications. Thus, the input and output arrays are declared (assuming n > 0):
REA L(K IND =4) X(0 :n-1)
COM PLE X(KIND =4) Y(0 :n/2)

No change is needed in the calling sequence; however, if you prefer you can use the more customary Fortran
style with subscripts starting at 1, as in the following:
REA L(KIND =4) X(n)
COMPLE X(K IND=4) Y(n /2 + 1)

Performance Tips
These routines will compute an FFT for any value of n, provided only that n is an even number, n ≥ 2,
Performance for a given value of n depends on the prime factorization of n. This fact is characteristic of all
FFT algorithms.
Fastest performance is realized when n is a power of 2; in which case, the number of floating-point
operations is approximately
5. .
n log 2 (n)
2
If n contains factors of 3, performance is slightly worse; if n contains powers of 5, it is slightly worse still.
Worst performance is when n is a prime number; in which case, the number of operations is approximately 4
. n2.

240 004– 2081– 002


HGFFT ( 3S ) HGFFT ( 3S )

The kernel routines are optimized for values of n that are even numbers and are products of powers of 2, 3,
and 5. (Because the kernel routines have a special case for multiples of 4, even powers of 2 will be slightly
faster than odd powers of 2.)
Implementation-dependent Items
The UNICOS and UNICOS/mk FFT routines were designed so that they could be implemented efficiently on
many different architectures. The calling sequence is the same in any implementation. Certain details,
however, depend on the particular implementation. These details are confined to two areas:
• The first area is the size of the table and work arrays. Different sizes may be needed on different
systems. No change is required to the subroutine call, but you may have to change the array sizes in the
DIMENSION or type statements that declare the arrays.
• The second area is the isys parameter array, an argument that gives certain implementation-specific
information. All features and functions of the FFT routines that are specific to any particular
implementation are confined to this isys array. On any implementation, you can use the default values by
specifying an argument value of 0.

EXAMPLES
Example 1: Initialize the complex array TABLE in preparation for doing an FFT of size 1024. In this case
only the arguments isign, n, and table are used; you can use dummy arguments or zeros for the other
arguments in the subroutine call.
REA L(K IND=4) TAB LE( 100 + 4*1 024 )
CALL HGFFT( 0, 102 4, 0.0, DUMMY, DUM MY, TAB LE, DUMMY, 0)

Example 2: X is a real array of dimension (0:1023), and Y is a complex array of dimension (0:512). Take
the FFT of X and store the results in Y. Before taking the FFT, initialize the TABLE array, as in example 1.
REA L(KIND =4) X(0:1023)
COM PLEX(K IND=4) Y(0 :512)
REAL(K IND =4) TAB LE( 100 + 4*1024 )
REA L(K IND=4) WORK(4 *1024 + 4)
...
CAL L HGF FT( 0, 102 4, 1.0 , X, Y, TABLE, WOR K, 0)
CAL L HGF FT( 1, 102 4, 1.0 , X, Y, TABLE, WOR K, 0)

Example 3: With X and Y as in example 2, take the inverse FFT of Y and store it back in X. The scale
factor 1/1024 is used. Assume that the TABLE array is initialized already.
CAL L GHF FT(-1, 102 4, 1.0/10 24. 0, Y, X, TAB LE, WOR K, 0)

004– 2081– 002 241


HGFFT ( 3S ) HGFFT ( 3S )

Example 4: Do the same computation as in example 2, but assume that the lower bound of each array is 1,
rather than 0. The subroutine calls are not changed.
REA L(K IND=4) X(1024 )
COMPLE X(KIND =4) Y(5 13)
...
CAL L HGFFT( 0, 102 4, 1.0 , X, Y, TAB LE, WOR K, 0)
CALL HGF FT(1, 1024, 1.0, X, Y, TAB LE, WOR K, 0)

Example 5: Do the same computation as in example 4, but equivalence the input and output arrays to save
storage space. Assume that the TABLE array is initialized already.
REAL(K IND =4) X(1 024 )
COM PLEX(K IND=4) Y(5 13)
EQUIVA LENCE ( X(1), Y(1) )
...
CAL L HGFFT( 1, 102 4, 1.0 , X, Y, TAB LE, WOR K, 0)

SEE ALSO
GGFFT(3S), SCFFT(3S)

242 004– 2081– 002


HOPFILT ( 3S ) HOPFILT ( 3S )

NAME
HOPFILT – Solves Weiner-Levinson linear equations

SYNOPSIS
CALL HOPFILT (m, a, b, c, r)

IMPLEMENTATION
UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses private data only.

DESCRIPTION
HOPFILT computes the solution to the Weiner-Levinson system of linear equations Ta = b; T is a
symmetric Toeplitz matrix in which elements are described as follows:
t ij = R(1+MOD(m+j– i,m))
for some vector R = (R(1), R(2), . . ., R(m))
This routine has the following arguments:
m Integer. (input)
Order of the system of equations.
a REAL(KIND=4) array of dimension m. (output)
Resulting vector of filter coefficients.
b REAL(KIND=4) array of dimension m. (input)
Information auto-correlation vector (right-hand side vector in system of linear equations).
c REAL(KIND=4) array of dimension 2m. (scratch output)
Scratch vector.
r REAL(KIND=4) array of dimension m. (input)
Signal auto-correlation vector (band values of the symmetric Toeplitz matrix T).

NOTES
Although HOPFILT solves this matrix equation faster than Gaussian elimination, HOPFILT does no
pivoting; therefore, it is less numerically stable than Gaussian elimination, unless the matrix T is either
positive definite or diagonally dominant.

004– 2081– 002 243


HOPFILT ( 3S ) HOPFILT ( 3S )

EXAMPLES
You can solve the following system of linear equations with the call HOPFILT (3,A,B,C,R). Vector c
has a length of at least 6.
 R (1) R (2) R (3)   A (1)   B (1) 
     
 R (2) R (1) R (2)   A (2)  =  B (2) 
     
 R (3) R (2) R (1)   A (3)   B (3) 

SEE ALSO
OPFILT(3S)

244 004– 2081– 002


MCFFT ( 3S ) MCFFT ( 3S )

NAME
MCFFT – Applies multiple multitasked complex Fast Fourier Transforms (FFTs)

SYNOPSIS
CALL MCFFT (isign, n, m, scale, x, inc1x, inc2x, y, inc1y, inc2y, table, ntable, work,
nwork)

IMPLEMENTATION
UNICOS systems

DESCRIPTION
MCFFT computes the Fourier transform of each column of the complex matrix x, and it stores the results in
the columns of matrix y. For most purposes, MCFFT is superseded by the UNICOS standard FFT routine
CCFFTM(3S).
Suppose the arrays are dimensioned as follows:
COM PLE X X(0 :N- 1, M), Y(0:N-1, M)

MCFFT computes the formula:

n −1

Yk ,l = scale
ΣX
j =0
j ,l ω jk for k = 0, . . ., n −1, l = 1, . . ., m

where
isign . 2 . π . i
ω=e n

isign =±1
π=3.14159. . .
e =2.71828. . .
i =√−1
Different authors use different conventions for which of the transforms, isign = +1 or isign = – 1, is the
forward or the inverse transform, and what the scale factor should be in either case. In this documentation,
when isign = +1, it is called the forward transform, and when isign = – 1, it is called the inverse transform.

004– 2081– 002 245


MCFFT ( 3S ) MCFFT ( 3S )

This routine has the following arguments:


isign Integer. (input)
Specifies whether to initialize the table array or do the forward or inverse Fourier transform:
isign = 0 Initializes the table array
isign = +1 Computes the forward transform
isign = – 1 Computes the inverse transform
n Integer. (input)
Order of the transforms (the number of elements in each column of the input and output matrix).
If n is not positive, MCFFT returns without computing any transforms.
m Integer. (input)
The number of transforms to be computed (the number of elements in each row of the input and
output matrix). If m is not positive, MCFFT returns without computing any transforms.
scale Real. (input)
Real scale factor. Each element of the output array is multiplied by the scale factor after taking
the Fourier transform, as defined in the preceding formula.
x Complex array element. (input)
First element to be used in an input array of values to be transformed.
inc1x Integer. (input)
Input array increment in the first dimension (the distance between successive complex array row
elements within each column of the input array). inc1x must not be 0.
inc2x Integer. (input)
Input array increment in the second dimension (the distance between successive complex array
column elements within each row of the input array). inc2x must not be 0.
y Complex array element. (output)
First element to be output in an array of transformed values.
The output array may be the same as the input array. In that case, the transform is done in
place, if inc1x = inc1y and inc2x = inc2y.
inc1y Integer. (input)
Output array increment in the first dimension (the distance between successive complex array
row elements within each column of the output array). inc1y must not be 0.
inc2y Integer. (input)
Output array increment in the second dimension (the distance between successive complex array
column elements within each row of the output array). inc2y should be a multiple of the
declared leading dimension of the output array. inc2y must not be 0.
table Real array of dimension ntable. (input or output)
Table of factors and trigonometric functions. This array may be initialized by a call to MCFFT
with isign = 0.

246 004– 2081– 002


MCFFT ( 3S ) MCFFT ( 3S )

ntable Integer. (input)


Number of (real) words in the table array. The value of ntable should be at least 2n + 100; n is
the order of the transform. If the value of ntable does not provide enough space, MCFFT prints
an error message and stops.
work Real array of dimension nwork. (scratch output)
Work array. This is a scratch array used for intermediate calculations. It must be a different
address space from the input and output arrays.
nwork Integer. (input)
Number of words in the work array.
The value of nwork should be at least 4n(MIN(m, 16ncpus)):
n Order of each transform
m Number of transforms
ncpus Number of CPUs used in the calculation
If the value of nwork does not provide enough space, MCFFT prints an error message and stops.

NOTES
This section contains information about the algorithm for MCFFT, table initialization, increment arguments,
and performance tips.
Algorithm
MCFFT uses decimation-in-frequency type FFT that performs its operations on each row of the matrix. This
means that as the algorithm is transforming each column of the input matrix, it vectorizes along the rows.
Thus, the vector length in the calculations depends on the row size. The performance tips later in this
subsection give more information on the algorithm as it relates to performance.
Table Initialization
The table array stores factors of n and trigonometric tables that are used in calculation of the FFT. You can
initialize table explicitly by calling MCFFT with isign = 0. If you do not initialize table, MCFFT does so
automatically on the first call. If the value of the problem size, n, does not change, table does not have to
be reinitialized. If you call MCFFT with a different value of n without first reinitializing table, MCFFT
reinitializes table automatically.
Reinitialization of table is relatively time-consuming. If you are continually changing the problem size, you
might consider using more than one table array, so that it will not have to be reinitialized on each call to
MCFFT.
If you initialize table explicitly by calling MCFFT with isign = 0, the only arguments that are significant are
isign, n, table, and ntable. In this case, the other arguments are ignored.
The value of ntable is checked when the table is initialized to verify that the table space you provided is
large enough. If it is not, MCFFT stops after printing an error message, indicating the amount of table space
required.

004– 2081– 002 247


MCFFT ( 3S ) MCFFT ( 3S )

Increment Arguments
The inc1x, inc2x, inc1y, and inc2y increment arguments describe how the matrices are stored in Fortran
arrays. These arguments are the link between the mathematical matrices and their representation in computer
memory.
Consider the following 4-by-5 matrix X.
X(1,1) X(1 ,2) X(1,3) X(1 ,4) X(1 ,5)
X(2,1) X(2 ,2) X(2,3) X(2 ,4) X(2 ,5)
X(3,1) X(3 ,2) X(3,3) X(3 ,4) X(3 ,5)
X(4,1) X(4 ,2) X(4,3) X(4 ,4) X(4 ,5)

Suppose that this matrix is declared by the Fortran statement:


COMPLE X X(4,5)

Fortran stores matrices "by column", in the following order:


X(1,1), X(2 ,1), X(3,1) , X(4 ,1) , X(1 ,2), X(2 ,2) . . .

Thus, the increment in the first dimension, inc1x, is just 1. The increment in the second dimension, inc2x, is
the (address) distance between X(1,1) and X(1,2), which is 4, the leading dimension of X. Generally, the
increment in the second dimension is the leading dimension of the array as it is declared in the Fortran
program, or a multiple thereof.
The previous information described transforming each column of X into a column of Y. Actually, it could
just as well have described transforming rows of X into rows of Y. MCFFT can do either one, as follows:
Suppose that X and Y have been declared with the following statement, as in the previous example:
COMPLE X X(4 , 5), Y(4 , 5)

To transform the columns of X into the columns of Y, using every element of each column, set the
following:
INC1X = 1
INC2X = 4
INC1Y = 1
INC2Y = 4

INC1X and INC1Y are 1, meaning to use every element of the column, and the values of INC2X and
INC2Y are the leading dimensions of the arrays, as they are declared.
To transform the rows of X into the rows of Y, interchange the values of INC1X and INC2X, and also of
INC1Y and INC2Y (the increment arguments) as follows:
INC1X = 4
INC2X = 1
INC1Y = 4
INC2Y = 1

248 004– 2081– 002


MCFFT ( 3S ) MCFFT ( 3S )

Because of the way that arrays are stored in Fortran, interchanging the increments this way is equivalent to
transposing the matrices.
The increment arguments are not directly related to the values of n and m, except insofar as the matrix must
fit in the allocated address space.
Negative increments are legal. If row or column increment is negative, the address given as the x or y
argument should be the address of the element at the end of the row or column of the array (not the
beginning).
Because each transform has n elements, this implies that the increment values must satisfy the following
logical expressions:
 inc2x  ≥n  inc1x  or  inc1x  ≥n  inc2x 
 inc2y  ≥n  inc1y  or  inc1y  ≥n  inc2y 
Performance Tips
MCFFT will work for any values of the arguments, subject only to the restrictions given previously. The
performance of this algorithm, however, depends on the values of the following arguments:
• n: Order of each transform
• m: Number of transforms
• inc1x, inc2x, inc1y, inc2y: Increment arguments
• nwork: Amount of workspace
Each of these factors is considered separately in the following subsections.
Performance relative to the order of transform
MCFFT computes an FFT for any value of n, but the performance for a given value of n depends on the
factorization of n. This is characteristic of all FFT algorithms.
Best performance is realized when n is a power of 2. In that case, the number of operations is proportional
to mnlog 2 (n).
Performance is slightly worse if n contains factors of 3. It is worse if n contains powers of 5. Worst
2
performance is when n is a prime number. In that case, the number of operations is proportional to mn .
The kernel routines are optimized for values of n that are products of powers of 2, 3, and 5. The value of n
has no effect on vectorization or multitasking, which depend only on m.
Performance relative to the number of transforms
MCFFT uses a vectorized FFT algorithm that vectorizes across the rows of x and y. Thus, the vector length
for the computations is m, the number of transforms. As with all vector calculation, performance is poor if
m is small (for example, less than 8). If m ≥ 32, performance will be good. Performance is best when m is
a multiple of 64 (128 on Cray C90 series computer systems), particularly if m ≥ 256.

004– 2081– 002 249


MCFFT ( 3S ) MCFFT ( 3S )

MCFFT runs in multitasked mode for large values of m. If


m
≥16
ncpus
where ncpus is the number of CPUs allocated to the job, then the calculation will be multitasked.
If m is small (for example, m < 8), the CFFT(3S) routine should be used to do the transforms one at a time.
Performance relative to the values of increment arguments
The increment arguments have no effect on the algorithm itself, but their values are significant for memory
contention.
The strides for vector loads and stores are inc2x and inc2y, respectively. To avoid memory bank conflicts,
neither inc2x or inc2y should be a multiple of a large power of 2. Best performance is realized when inc2x
and inc2y are odd numbers. One way to do this is to declare both input and output arrays with an odd
leading dimension in the calling program.
The values of inc1x and inc1y should also be odd, if possible, but these values are much less significant to
performance than inc2x and inc2y.
Performance relative to the amount of workspace
The size of the workspace, nwork, is also relevant to performance. To do the FFTs in one lot, the
workspace required is (in (real) words of storage) nwork = 4nm
If m is large, this amounts to a lot of memory. You can provide less workspace and still obtain very good
performance. At a minimum, you need (in (real) words of storage) nwork = 4nMIN(m, 16ncpus)
If you give a value of nwork in the range 16 . n . ncpus ≤ nwork < 4nm, MCFFT will divide the work into
lots small enough to run in the workspace provided.
For best performance, nwork should be at least 128.n .ncpus .

EXAMPLES
The following program illustrates the use of MCFFT. The program computes 256 one-dimensional FFTs out
of a matrix of random numbers, first by using MCFFT, then by using CFFT(3S) for each column. Then it
compares the two results.
The program then computes each inverse transform, also using MCFFT, and compares the results with the
original sequence.
PAR AME TER (N = 2*3 *5* 7, M = 256)
PAR AME TER (LD 1 = N+1 , LD2 = M+3)
COM PLEX X(LD1, LD2 ), Y(L D1, LD2 ), YY( LD1, LD2 )
PARAME TER ( NTA BLE = 100 + 8*N )
PARAME TER ( NWORK = 4*N *M )
REA L TAB LE(NTA BLE ), WOR K(NWOR K)
LOG ICAL LFWD, LIN V
*----- --- ------ ------ --- --- ------ --- ------ ------ ------ ------
* Ini tia lize input arr ay, X, to a

250 004– 2081– 002


MCFFT ( 3S ) MCFFT ( 3S )

* set of ran dom num ber s.

DO 15, J = 1, M
DO 10, I = 1, N
X(I , J) = CMP LX(RAN F() , RAN F() )
10 CON TIN UE
15 CONTIN UE
*-- ------ ------ --- --- --- ------ --- --- ------ ------ --- ------ ---
* Com put e Y(: ,J) = the Fou rie r Tra nsform of X(:,J)
* usi ng MCFFT.

CALL MCF FT(+1, N, M, 1.0, X, 1, LD1 ,


& Y, 1, LD1 , TABLE, NTA BLE, WORK, NWO RK)
*----- ------ ------ --- --- --- ------ --- --- --- ------ --- --- --- ---
* Comput e YY(:,J ) = the Fou rie r Tra nsform of X(:,J)
* using multip le cal ls to CFF T.

DO 20, J = 1, M
CAL L CFFT(+ 1, N, 1.0, X(1 ,J), 1, YY( 1,J ), 1,
& TABLE, NTA BLE, WORK, NWO RK)
20 CONTIN UE
*----- ------ ------ --- --- --- ------ --- --- --- ------ --- --- --- ---
* Compar e Y and YY.

LFW D = .TR UE.


DO 40, J = 1, M
DO 30, I = 1, N
ERR OR = ABS ( Y(I ,J)-YY (I, J) )/A BS( Y(I,J) )
LFWD = LFWD .AND. (ER ROR .LE . 1.0 E-6)
30 CON TINUE
40 CON TIN UE
IF (.N OT. LFW D) PRI NT *, ’Fa iled forwar d tes t’
IF (LF WD) PRINT *, ’Fo rwa rd transf orm OK’
*-- ------ --- ------ ------ --- --- ------ --- --- --- ------ ------ ---
* Com put e the invers e tra nsf orm of Y,
* and sto re it bac k in Y.

CALL MCFFT( -1, N, M, 1.0 /N, Y, 1, LD1 , Y, 1, LD1 ,


& TAB LE, NTABLE , WOR K, NWO RK)
*-- ------ --- ------ ------ --- --- ------ --- --- --- ------ ------ ---
* Com pare the tra nsf ormed Y arr ay wit h the
* ori gin al X arr ay.

LIN V = .TRUE.

004– 2081– 002 251


MCFFT ( 3S ) MCFFT ( 3S )

DO 60, I = 1, N
DO 50, J = 1, M
ERR OR = ABS ( X(I,J) -Y(I,J ) )/A BS( X(I ,J))
LINV = LIN V .AND. (ER ROR .LE . 1.0 E-6)
50 CON TINUE
60 CONTIN UE
IF (.N OT. LIN V) PRINT *, ’Fa ile d inv erse tes t’
IF (LI NV) PRI NT *, ’In ver se tra nsform OK’
IF (LI NV .AN D. LFW D) PRI NT *, ’Te st succee ded’
END

SEE ALSO
CCFFT(3S), CFFT(3S) to calculate a single one-dimensional FFT. CCFFT(3S) supersedes most uses of
CFFT(3S).
CCFFT2D(3S), CFFT2D(3S) to calculate a two-dimensional FFT. CCFFT2D(3S) supersedes most uses of
CFFT2D(3S).
CCFFT3D(3S), CFFT3D(3S) to calculate a three-dimensional FFT. CCFFT3D(3S) supersedes most uses of
CFFT3D(3S).
CCFFTM(3S), which supersedes most uses of MCFFT

252 004– 2081– 002


OPFILT ( 3S ) OPFILT ( 3S )

NAME
OPFILT – Solves Weiner-Levinson linear equations

SYNOPSIS
CALL OPFILT (m, a, b, c, r)

IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses private data only.

DESCRIPTION
OPFILT computes the solution to the Weiner-Levinson system of linear equations Ta = b; T is a symmetric
Toeplitz matrix in which elements are described as follows:
ti j = R(1+mod (m +j −i , m ))
for some vector R = (R(1), R(2), . . ., R(m)).
This routine has the following arguments:
m Integer. (input)
Order of the system of equations.
a Real array of dimension m. (output)
Resulting vector of filter coefficients.
b Real array of dimension m. (input)
Information auto-correlation vector (right-hand side vector in system of linear equations).
c Real array of dimension 2m. (scratch output)
Scratch vector.
r Real array of dimension m. (input)
Signal auto-correlation vector (band values of the symmetric Toeplitz matrix T).

NOTES
Although OPFILT solves this matrix equation faster than Gaussian elimination, OPFILT does no pivoting;
therefore, it is less numerically stable than Gaussian elimination, unless the matrix T is either positive
definite or diagonally dominant.

EXAMPLES
You can solve the following system of linear equations with the call OPFILT (3,A,B,C,R). Vector c
has a length of at least 6.

004– 2081– 002 253


OPFILT ( 3S ) OPFILT ( 3S )

 R (1) R (2) R (3)   A (1)   B (1) 


     
 R (2) R (1) R (2)   A (2)  =  B (2) 
     
 R (3) R (2) R (1)   A (3)   B (3) 

254 004– 2081– 002


PCCFFT2D ( 3S ) PCCFFT2D ( 3S )

NAME
PCCFFT2D – Applies a two-dimensional (2D) complex-to-complex Fast Fourier Transform (FFT) to a
matrix distributed across a set of processors

SYNOPSIS
CALL PCCFFT2D (isign, n1, n2, scale, A, iA, jA, descA, B, iB, jB, descB, table, work,
isys, info)

IMPLEMENTATION
UNICOS/mk systems

DESCRIPTION
PCCFFT2D computes the 2D complex Fast Fourier Transform (FFT) of the distributed complex matrix A,
and it stores the results in the distributed complex matrix B.
Description of the Distributed Data
This routine considers the processors to be partitioned into a one-dimensional (1D) linear array of processors.
The 2D input matrix A is then distributed across this 1D grid of processors as discussed in the following
text.
Consider a 2D matrix A of size nr-by-nc, where nr is the number of rows and nc is the number of columns
of matrix A.
Let the processors (N$PES in number) be partitioned into a 1D grid of size 1-by-npc, where npc = N$PES is
the number of processors assigned to the column dimension of A. Let the number of processors assigned to
the row dimension be 1. To partition processors into this grid, call the BLACS_GRIDINIT routine as
shown:
CAL L BLA CS_ GRIDIN IT (ictxt, ’C’, 1, npc)

The input matrix, A, and the output matrix, B, are distributed across this 1D linear array of processors by
using the block (as defined in FORTRAN D and HPF) distribution along the columns. The distribution
along the rows is degenerate. The descriptors descA and descB provide information on the distribution of
the matrices A and B across the processor grid. The descriptors descA and descB are initialized using the
DESCINIT(3S) routine. The DESCINIT(3S) routine would have to be called after the call to
BLACS_GRIDINIT and would look like this:
CALL DES CIN IT (descA, nr, nc, nr, ICE IL(nc, npc),
pesr, pesc, ictxt, lld, info)

Given that matrix A is distributed in a block manner across the processor grid along the columns, the block
size along the columns would be the following:
ncpp = ICE IL(nc, npc)

004– 2081– 002 255


PCCFFT2D ( 3S ) PCCFFT2D ( 3S )

Further assume that the user wants the 2D FFT to be performed on a submatrix starting at global address
(iA, jA). Let this submatrix be of size n1-by-n2. Then the arguments iA and jA represent the global address
of the first element of the submatrix and n1 and n2 represent the size of the submatrix over which the 2D
FFT is to be performed.
Similarly, the iB and jB arguments represent the first element of the submatrix to which the output is to be
written.
Restrictions
In the current release, the matrices A and B must be distributed identically. This means that all of the
arguments provided to DESCINIT(3S) to initalize descA and descB must be equal except for the local
leading dimension.
The flexibility of performing the FFT over any submatrix of the global input matrix is not available in the
current release. Therefore, users must initialize iA, jA, iB, and jB with 1 and n1 with nr and n2 with nc.
All processors must call this routine. In future releases, only those processors that own the matrix over
which the FFT is to be performed will participate in the computation. All other processors will exit the
routine immediately.
2D FFT Theory
In FFT applications, it is customary to use zero-based subscripts; the formulas are simpler that way.
Suppose that the arrays are dimensioned as follows:
COM PLEX A(0 :n1 -1, 0:n 2-1)
COM PLEX B(0 :n1 -1, 0:n 2-1)

PCCFFT2D computes the formula:


n1−1 n2−1 .k .k
Bk k
1, 2
= scale . Σ Σ
j =0 j =0
Aj j
1, 2
. ω1 j 1 1 . ω2 j 2 2
for k 1 = 0, . . ., n1−1, k 2 = 0, . . ., n2−1
1 2

where
isign . 2 . π . i
ω1 = e isign n1
. 2 . π . i i = +√−1
ω2 = e n2
π = 3.14159. . .
isign = ±1
Different authors use different conventions for which of the transforms, isign = +1 or isign = – 1, is the
forward or inverse transform, and what the scale factor should be in either case. To compute any of the
various possible definitions, however, choose the appropriate values for isign and scale.
If you take the FFT with any particular values of isign and scale, the mathematical inverse function is
computed by taking the FFT with -isign and 1 / (n1 . n2 . scale). In particular, if you use isign = +1 and
scale = 1.0 for the forward FFT, you can compute the inverse FFT by using isign = – 1 and scale = 1.0 / (n1
. n2).

256 004– 2081– 002


PCCFFT2D ( 3S ) PCCFFT2D ( 3S )

If the values of either n1 or n2 are prime or not factorizable into powers of 2, 3 and 5 then significant
improvements in computational time can be obtained by using the following initializations of isys which is a
vector of length 3.
If both n1 and n2 are factorizable into powers of 2, 3 and 5, for example, n1 = 30 and n2 = 120, then
isys(1) = 2
isys(2) = 0
isys(3) = 0
If any one dimension is not factorizable into powers of 2, 3 and 5, then the following intializations of isys
yield the fastest times:
n1 not factorizable but n2 factorizable
isys(1) = 2
isys(2) = 1
isys(3) = 0
n1 factorizable but n2 not factorizable
isys(1) = 2
isys(2) = 0
isys(3) = 1
both n1 and n2 not factorizable
isys(1) = 2
isys(2) = 1
isys(3) = 1
Here isys(1) indicates the dimension of the matrix over which the FFT is being performed.
If the numbers n1 and n2 are not known ahead of time, then isys(2) and isys(3) could be initialized to 0 or 1;
if an inappropriate choice is made, the routine would compute the correct result for n1 and n2, although
slowly (if either n1 or n2 were prime). If initialized to 1, more workspace is needed; see the description of
table which follows.
The storage requirements for the vector table depend on the values of the isys vector. The PCCFFT2D
routine accepts the following arguments (all scalar values are private data):
isign Integer. (input)
Specifies whether to initialize the table array or to do the forward or inverse transform as
follows:
If isign = 0, the routine initializes the table array and returns. In this case, the only arguments
used or checked are isign, n1, n2, and table.
If isign = +1 or – 1, the value of isign is the sign of the exponent used in the FFT formula.
n1 Integer. (input)
Number of rows in the sumbatrix to be transformed.

004– 2081– 002 257


PCCFFT2D ( 3S ) PCCFFT2D ( 3S )

n2 Integer. (input)
Number of columns in the sumbatrix to be transformed.
scale Real. (input)
Scale factor. Each element of the output array is multiplied by scale factor after taking the
Fourier transform.
A Private complex array of dimension (0:lldA– 1, 0:ICEIL(nc,npc)– 1). (input)
Input array of values to be transformed. lldA is the local leading dimension and is initialized
using DESCINIT(3S). A must be declared in a COMMON block.
iA With jA, the global address of the first element of the global input matrix.
jA With iA, the global address of the first element of the global input matrix.
descA Integer vector of dimension 9. (input)
Contains description of the distribution of the matrix A across a 1D processor grid.
B Private complex array of dimension (0:lldB– 1, 0:ICEIL(nc,npc)– 1). (output)
Output array of transformed values. lldB is the local leading dimension and is initialized using
DESCINIT(3S).
Output array B may be the same as the input array A in which case the input array A is
overwritten with the transformed values. B must be declared in a COMMON block.
iB With jB, the global address of the first element of the global matrix where output will be written.
jB With iB, the global address of the first element of the global matrix where output will be written.
descB Integer vector of dimension 9. (input)
Contains description of the distribution of the matrix B across a 1D processor grid. If the input
array and the output array are the same, you must use the same descriptors.
table Private real vector of length 2(n1 + n2) if both isys(2) and isys(3) are equal to zero. Private real
vector of length 12(n1 + n2) if either isys(2) or isys(3) is equal to 1. (input or output)
Table of trigonometric function values.
If isign = 0, the routine initializes table (table is output only).
If isign = +1 or – 1, the values in table are assumed to be initialized already by a prior call with
isign = 0 (table is input only).
work Private complex vector of length (n1r)(ICEIL(n2r/npc))
Where n1r and n2r are the values of n1 and n2 rounded up to the nearest powers of 2 greater
than or equal to them. work must be declared in a COMMON block.
isys Private integer vector of length 3. (input)
isys(1) indicates the dimension of the problem which is 2. isys(1) should be set to 2.
isys(2) indicates if n1 is prime or not factorizable into powers of 2, 3 and 5. Should be set to 1
if the number is not factorizable into powers of 2, 3 and 5. Should be set to 0 if it is
factorizable.

258 004– 2081– 002


PCCFFT2D ( 3S ) PCCFFT2D ( 3S )

isys(3) should be set to 0 or 1, depending on whether n2 is factorizable or not into powers of 2,


3 and 5.
info Integer. (output)
info is set to 0 if all the arguments passed to the routine are legal. If any argument has an
illegal value, the routine exists after setting info to a negative number. – info indicates the
position of the illegal argument.

NOTES
The scale factor scale can take on values of 1.0 or 1.0 /(n1 . n2) depending on whether the forward or
inverse FFT is being computed.
Algorithm
The routine uses a very efficient single processor FFT routine, CCFFT, to do the FFT of each column on the
processors that own the submatrix. It then transposes the submatrix by using intermediate workspace that it
allocates for the purpose, and it again does the FFT along the columns (FFTs of the rows).
If either isys(2) or isys(3) or both are initialized to 1, then a fast (O(n log(n))) algorithm based on the chirp-z
transform is used for the one dimensional FFT in the corresponding direction. In this case, the vector table
must be real of length 12(n1+n2).
Workspace
The routine dynamically allocates two real work arrays:
Size of one work array:
8 . MAX(nr . nc)
Size of the other work array:
2 . MAX(nr . ICEIL(nc,npc), nc . ICEIL(nr,npc))
The workspaces are freed on exiting.
Example code for PCCFFT2D on a 16-processor partition:
com ple x A(2 56,16)
comple x B(2 56,16)
com ple x wor k(4096)
com mon /abw/ A, B, work
rea l tab le( 8192)

int eger ictxt, descA( 9), descB( 9), isi gn, isys(3 )
int eger nr, nc, ice il, inf o, np
int eger n1, n2

rea l sca le

004– 2081– 002 259


PCCFFT2D ( 3S ) PCCFFT2D ( 3S )

nr = 240
nc = 181

n1 = nr
n2 = nc
np = n$p es

cal l bla cs_ gridin it( ict xt, ’C’ ,1,np)

cal l des cinit( descA, nr, nc, nr, ice il(nc,np),


0, 0, ictxt, 256, info )

if ( inf o .ne . 0 ) the n


cal l exi t(0)
end if

call des cinit( des cB, nr, nc, nr, iceil( nc, np),
0, 0, ictxt, 256 , inf o )

if ( info .ne . 0 ) the n


call exi t(0 )
end if

isi gn = -1
sca le = 1.0
isy s(1) = 2
isy s(2) = 0
isy s(3) = 1
*
* Ini tializ ing the tri g tab les
*
cal l pcc fft 2d( 0, n1, n2, sca le, A, 1, 1, descA,
B, 1, 1, descB, table, wor k, isys, inf o)

if ( inf o .ne . 0 ) the n


cal l exi t(0 )
end if
*
* FFT
*
call pccfft 2d(isi gn, n1, n2, sca le, A, 1, 1, des cA,
B, 1, 1, descB, table, wor k, isys, inf o)

if ( inf o .ne . 0 ) the n

260 004– 2081– 002


PCCFFT2D ( 3S ) PCCFFT2D ( 3S )

call exit(0 )
end if

cal l exi t(0 )


end

SEE ALSO
BLACS_GRIDINFO(3S), BLACS_GRIDINIT(3S), BLACS_PCOORD(3S), BLACS_PNUM(3S),
DESCINIT(3S)

004– 2081– 002 261


PCCFFT3D ( 3S ) PCCFFT3D ( 3S )

NAME
PCCFFT3D – Applies a three-dimensional (3D) complex-to-complex Fast Fourier Transform (FFT) to a
matrix distributed across a set of processors

SYNOPSIS
CALL PCCFFT3D (isign, n1, n2, n3, scale, A, iA, jA, kA, descA, B, iB, jB, kB, descB,
table, work, isys, info)

IMPLEMENTATION
UNICOS/mk systems

DESCRIPTION
PCCFFT3D computes the 3D complex Fast Fourier Transform (FFT) of the distributed complex matrix A,
and it stores the results in the distributed complex matrix B.
Description of the Distributed Data
This routine considers the processors to be partitioned into a two-dimensional (2D) grid. The 3D input
matrix A is then distributed across this 2D grid of processors as discussed in the following text.
Consider a 3D matrix A of size nx-by-ny-by-nz. nx, ny, and nz are the sizes of the matrix A along the X, Y,
and Z dimensions, respectively.
Let the processors (N$PES in number) be partitioned into a 2D grid of size npy-by-npz where npy is the
number of processors assigned to the Y dimension and npz is the number of processors assigned to the Z
dimension. Let the number of processors assigned to the X dimension be 1. To partition the processors into
this grid call the GRIDINIT3D(3S) routine as follows:
CALL GRIDIN IT3D (IC TXT, 1, npy, npz)

The input matrix, A, and the output matrix, B, are distributed across this 2D processor grid by using the
block (as defined in FORTRAN D and HPF) distribution along the Y and Z dimensions. The distribution
along the X dimension is degenerate. The descriptors descA and descB provide information on the
distribution of the matrices A and B across the processor grid. The descriptors descA and descB are
initialized using the DESCINIT3D(3S) routine.
The DESCINIT3D(3S) routine would have to be called after the call to GRIDINIT3D(3S) and would look
like this:
CALL DESCINIT3D (descA, nx, ny, nz, nx, ICEIL(ny, npy), ICEIL(nz, npz), pesx, pesy, pesz,
ictxt, lldx, lldy, info)
Given that matrix A is distributed in a block manner across the processor grid along the Y and Z dimensions,
the block size along these two dimensions would be as follows:
nypp = ICEIL(ny, npy) and nzpp = ICEIL(nz, npz)

262 004– 2081– 002


PCCFFT3D ( 3S ) PCCFFT3D ( 3S )

Further assume that the user wants the 3D FFT to be performed on a submatrix starting at global address
(iA, jA, kA). Let this submatrix be of size n1-by-n2-by-n3. Then the iA, jA, and kA arguments represent the
global address of the first element of the submatrix and n1, n2, and n3 represent the size of the submatrix
over which the 3D FFT will be performed.
Similarly, the iB, jB, and kB arguments represent the first element of the submatrix to which the output will
be written.
Restrictions
In the current release, the matrices A and B must be distributed identically. This means that all of the
arguments provided to DESCINIT3D(3S) to initalize descA and descB must be equal except the local
leading dimensions.
The flexibility of performing the FFT over any submatrix of the global input matrix A is not available in the
current release. Therefore, you must initialize iA, jA, kA, iB, jB, and kB with 1 and n1 with nx, n2 with ny,
and n3 with nz.
All processors must call this routine. In future releases only those processors that own the matrix over
which the FFT is to be performed, will participate in the computation. All other processors will exit the
routine immediately.
3D FFT Theory
In FFT applications, it is customary to use zero-based subscripts; the formulas are simpler that way.
Suppose that the arrays are dimensioned as follows:
COM PLEX A(0 :n1 -1, 0:n 2-1, 0:n 3-1 )
COM PLEX B(0 :n1 -1, 0:n 2-1, 0:n3-1 )

PCCFFT3D computes the formula:


n1−1 n2−1 n3−1 .k .k .k k 1 = 0, . . ., n1−1
Bk k k
1, 2, 3
= scale . Σ jΣ=0 jΣ=0 Aj j j
1, 2, 3
. ω1 j 1 1 . ω2 j 2 2 . ω3 j 3 3
for k 2 = 0, . . ., n2−1
j 1=0 2 3 k 3 = 0, . . ., n3−1
where
isign . 2 . π . i isign . 2 . π . i
ω1 = e n1
ω2 = e n2

isign . 2 . π . i
ω3 = e n3
i = +√−1

π = 3.14159. . . isign = ±1
Different authors use different conventions for which of the transforms, isign = +1 or isign = – 1, is the
forward or inverse transform, and what the scale factor should be in either case. To make this routine
compute any of the various possible definitions, however, by choosing the appropriate values for isign and
scale.

004– 2081– 002 263


PCCFFT3D ( 3S ) PCCFFT3D ( 3S )

If you take the FFT with any particular values of isign and scale, the mathematical inverse function is
computed by taking the FFT with -isign and 1/(n1 . n2 . n3 . scale). In particular, if you use isign = +1 and
scale = 1.0 for the forward FFT, you can compute the inverse FFT by using isign = – 1 and scale = 1/(n1 .
n2 . n3).
If the values of either n1 or n2 are prime or not factorizable into powers of 2, 3 and 5 then significant
improvements in computational time can be obtained by using the following initializations of isys which is a
vector of length 4.
The first element of isys indicates the dimension of the problem, i.e., isys(1) = 3. The next three elements of
isys indicate if the lengths n1, n2 and n3 are factorizable into powers of 2, 3 and 5. isys(2) is set to 0 if n1
is factorizable into powers of 2, 3 and 5 and is set to 1 otherwise. Similarly isys(3) and isys(4) are set to
zero if n2 and n3 are factorizable into powers of 2, 3 and 5 and set to 1 if they are not.
For example if n1 = 256, n2 = 240 and n3 = 254, then the best computational time is obtained by setting
isys(1) = 3 (dimension of the problem)
isys(2) = 0
isys(3) = 0
isys(4) = 1
If the numbers n1, n2 and n3 are not known ahead of time, then isys(2), isys(3) and isys(4) could be
initialized to 0 or 1; if an inappropriate choice is made, the routine would compute the correct result,
although slowly (if either n1, n2 or n3 were not factorizable into powers of 2, 3 and 5). If initialized to 1,
more workspace is needed; see the description of table which follows.
The storage requirements for the vector table depend on the values of the isys vector.
The PCCFFT3D routine accepts the following arguments:
isign Integer. (input)
Specifies whether to initialize the table array or to do the forward or inverse transform as
follows:
If isign = 0, the routine initializes the table array and returns. In this case, the only arguments
used or checked are isign, n1, n2, n3, and table.
If isign = +1 or – 1, the value of isign is the sign of the exponent used in the FFT formula.
n1 Integer. (input)
Transform size in the X dimension.
n2 Integer. (input)
Transform size in the Y dimension.
n3 Integer. (input)
Transform size in the Z dimension.
scale Real. (input)
Scale factor. Each element of the output array is multiplied by scale factor after taking the
Fourier transform.

264 004– 2081– 002


PCCFFT3D ( 3S ) PCCFFT3D ( 3S )

A Private complex array of the following dimension: (0:lldxA– 1, 0:lldyA– 1, 0:ICEIL(nz,npz)– 1).
(input)
Input array of values to be transformed.
lldxA and lldyA are the local leading dimensions along the X and Y dimensions, and are
initialized using DESCINIT3D(3S). A must be declared in a COMMON block.
iA With jA and kA, the global address of the first element of the global input matrix.
jA With iA and kA, the global address of the first element of the global input matrix.
kA With iA and jA, the global address of the first element of the global input matrix.
descA Integer vector of dimension 12. (input)
Contains description of the distribution of the matrix A across a 3D processor grid.
B Private complex array of the following dimension: (0:lldxB– 1, 0:lldyB– 1, 0:ICEIL(nz,npz)– 1).
(output)
Output array of transformed values.
lldxB and lldyB are the local leading dimensions along the X and Y dimensions and are
initialized using DESCINIT3D(3S).
The output array B may be the same as the input array A in which case the input array A is
overwritten with the transformed values. B must be declared in a COMMON block.
iB With jB and kB, the global address of the first element of the global matrix where output will be
written.
jB With iB and kB, the global address of the first element of the global matrix where output will be
written.
kB With iB and jB, the global address of the first element of the global matrix where output will be
written.
descB Integer vector of dimension 12. (input)
Contains description of the distribution of the matrix B across a 3D processor grid.
If the input array and the output array are the same, then the same descriptors must be used.
table Private real vector of length 2(n1 + n2 + n3) if isys(2), isys(3) and isys(4) = 0. Private real
vector of length 12(n1 + n2 + n3), if isys(2), isys(3) or isys(4) = 1. (input or output)
If isign = 0, the routine initializes table (table is output only).
If isign = +1 or – 1, the values in the table are assumed to be initialized already by a prior call
with isign = 0 (table is input only).
work Private complex vector of length (n1r . ICEIL(n2r,npy) . ICEIL(n3r,npz). (workspace)
Where n1r, n2r and n3r are the values of n1, n2, and n3 rounded up to the nearest powers of 2
greater than or equal to them. work must be declared in a COMMON block.

004– 2081– 002 265


PCCFFT3D ( 3S ) PCCFFT3D ( 3S )

isys Private integer vector of length 4. (input)


isys(1) indicates the dimension of the problem which is 3. isys(1) should be set to 3.
isys(2), isys(3) and isys(4) should be set to 0 or 1, depending on whether n1, n2, or n3 is
factorizable or not into powers of 2, 3 and 5 correspondingly.
info Integer. (output)
info is set to 0 if all the arguments passed to the routine are legal. If any argument has an
illegal value, the routine exits after setting info to a negative number. – info indicates the
position of the illegal argument.

NOTES
The scale factor scale can take on values of 1.0 or 1.0/(n1 . n2 . n3) depending on whether the forward or
inverse FFT is being computed.
Algorithm
The routine uses a very efficient single FFT routine, CCFFT, to do the FFT of each column (X dimension)
on the processors that own the submatrix. It then transposes the submatrix along the X-Y plane, using
intermediate workspace that it allocates for the purpose, and again does the FFT along the columns (FFTs of
the Y dimension). The submatrix is again transposed along the X-Y plane to restore the original
distribution. Now the submatrix is transposed along the X-Z planes and the FFTs along the Z dimension are
computed. Finally another transpose along the X-Z plane restores the original distribution.
If either isys(2), isys(3) or isys(4) or all are initialized to 1, then a fast (O(n log(n))) algorithm based on the
chirp-z transform is used for the one dimensional FFT in the corresponding direction. In this case, the
vector table must be real of length 12(n1+n2+n3).
Workspace
The routine dynamically allocates two real work arrays:
Size of one work array:
8 . MAX(nx . ny . nz)
Size of the other work array:
2 . MAX(nx . ICEIL(ny,npy), ny . ICEIL(nx,npy), nx . ICEIL(nz,npz), nz . ICEIL(nx,npz))
The workspaces are freed on exiting.

EXAMPLES
Example code for PCCFFT3D on a 16-processor partition:

266 004– 2081– 002


PCCFFT3D ( 3S ) PCCFFT3D ( 3S )

com plex A(2 56,70, 50)


com ple x B(2 56,70,50)
com plex wor k(1048 576 )
com mon /ab w/ A, B, wor k
rea l table( 8192)

int ege r ict xt, des cA( 12) , des cB( 12), isign, isy s(4)
int ege r nx, ny, nz, npy , npz , ice il, inf o
int ege r n1, n2, n3

rea l sca le

nx = 240
ny = 181
nz = 145

n1 = nx
n2 = ny
n3 = nz

npy = 4
npz = n$pes / 4

cal l gridin it3 d(i ctxt,1 ,np y,n pz)

cal l descinit3 d( des cA, nx, ny, nz, nx, ice il( ny,npy ),
ice il( nz,npz), 0, 0, 0, ictxt, 256 , 70, inf o )

if ( inf o .ne . 0 ) then


cal l exi t(0 )
end if

cal l des cinit3 d( des cB, nx, ny, nz, nx, iceil( ny, npy),
iceil( nz, npz ), 0, 0, 0, ict xt, 256 , 70, inf o )

if ( info .ne. 0 ) the n


cal l exi t(0)
end if

isign = -1
scale = 1.0
isys(1 ) = 3
isys(2 ) = 0
isys(3 ) = 1

004– 2081– 002 267


PCCFFT3D ( 3S ) PCCFFT3D ( 3S )

isy s(4) = 1
*
* Ini tia lizing the tri g tab les
*
cal l pcc fft3d( 0, n1, n2, n3, sca le, A, 1, 1, 1,
descA, B, 1, 1, 1, descB, tab le, work, isy s, info)

if ( inf o .ne . 0 ) then


cal l exi t(0)
end if

*
* FFT
*
cal l pcc fft 3d(isi gn, n1, n2, n3, scale, A, 1, 1, 1,
descA, B, 1, 1, 1, des cB, table, wor k, isys, inf o)

if ( inf o .ne . 0 ) the n


cal l exi t(0)
end if

cal l exit(0 )
end

SEE ALSO
DESCINIT3D(3S), GRIDINFO3D(3S), GRIDINIT3D(3S), PCOORD3D(3S), PNUM3D(3S)

268 004– 2081– 002


PSCFFT2D ( 3S ) PSCFFT2D ( 3S )

NAME
PSCFFT2D, PCSFFT2D – Applies a two-dimensional (2D) real-to-complex or complex-to-real Fast Fourier
Transform (FFT) to a matrix distributed across a set of processors

SYNOPSIS
CALL PSCFFT2D (isign, n1, n2, scale, A, iA, jA, descA, B, iB, jB, descB, table, work,
isys, info)
CALL PCSFFT2D (isign, n1, n2, scale, A, iA, jA, descA, B, iB, jB, descB, table, work,
isys, info)

IMPLEMENTATION
UNICOS/mk systems

DESCRIPTION
PSCFFT2D computes the 2D real-to-complex Fast Fourier Transform (FFT) of the distributed real matrix A,
and it stores the results in the distributed complex matrix B.
PCSFFT2D computes the two-dimensional complex-to-real Fast Fourier Transform (FFT) of the distributed
complex matrix A, and it stores the results in the distributed real matrix B.
Description of the Distributed Data
PSCFFT2D considers the processors to be partitioned into a one-dimensional (1D) linear array of processors.
The 2D input matrix A is then distributed across this 1D grid of processors as discussed in the following
text.
Consider a 2D matrix A of size nr-by-nc, where nr is the number of rows and nc is the number of columns
of matrix A.
Let the processors (N$PES in number) be partitioned into a 1D grid of size 1-by-npc, where npc = N$PES is
the number of processors assigned to the column dimension of A. Let the number of processors assigned to
the row dimension be 1. To partition processors into this grid, call the BLACS_GRIDINIT routine as
shown:
CAL L BLA CS_ GRIDIN IT (ictxt, ’C’, 1, npc)

The input matrix, A, and the output matrix, B, are distributed across this 1D linear array of processors by
using the block (as defined in FORTRAN D and HPF) distribution along the columns. The distribution
along the rows is degenerate. The descriptors descA and descB provide information on the distribution of
the matrices A and B across the processor grid. The descriptors descA and descB are initialized using the
DESCINIT(3S) routine. The DESCINIT(3S) routine would have to be called after the call to
BLACS_GRIDINIT and would look like this:
CALL DESCINIT (descA, nr, nc, nr, ICEIL(nc, npc), pesr, pesc, llda, info)

004– 2081– 002 269


PSCFFT2D ( 3S ) PSCFFT2D ( 3S )

Due to the symmetry in the FFT of A, the only computed values stored in the output matrix B are (0:(nr/2),
0:nc-1). Therefore, the call to DESCINIT(3S) for the output matrix B would look like the following:
nr
nrb = +1
2
CALL DESCINIT (descB, nrb, nc, nrb, ICEIL(nc, npc), pesr, pesc, lldb, info)
Here, llda and lldb are the local leading dimensions of the private data matrices A and B that store the local
portions of the global input and output 2D matrices participating in the FFT computation.
Given that matrix A is distributed in a block manner across the processor grid along the columns, the block
size along the columns would be the following:
ncpp = ICEIL(nc, npc)
Further assume that the user wants the 2D FFT to be performed on a submatrix starting at global address
(iA, jA). Let this submatrix be of size n1-by-n2. Then the arguments iA and jA represent the global address
of the first element of the submatrix and n1 and n2 represent the size of the submatrix over which the 2D
FFT is to be performed.
Similarly, the iB and jB arguments represent the first element of the submatrix to which the output is to be
written.
Restrictions
In the current release, it is required that the matrices A and B be distributed conformably. This means that
apart from the difference in the dimensions of A and B in the number of rows, the distribution along the
columns must be identical.
The flexibility of performing the FFT over any submatrix of the global matrix is not available in the current
release. Therefore, users must initialize iA, jA, iB, and jB with 1 and n1 with nr and n2 with nc.
All processors must call this routine. In future releases, only those processors that own the matrix over
which the FFT is to be performed will participate in the computation. All other processors will exit the
routine immediately.
2D FFT Theory
In FFT applications, it is customary to use zero-based subscripts; the formulas are simpler that way.
Suppose that the arrays are dimensioned as follows:
REAL A(0 :n1-1, 0:n2-1 )
COM PLE X B(0 :n1/2, 0:n 2-1)

PSCFFT2D computes the formula:


n1−1 n2−1 .k .k
Bk k
1, 2
= scale . Σ Σ
j =0 j =0
Aj j
1, 2
. ω1 j 1 1 . ω2 j 2 2
for k 1 = 0, . . ., n1−1, k 2 = 0, . . ., n2−1
1 2

270 004– 2081– 002


PSCFFT2D ( 3S ) PSCFFT2D ( 3S )

where
isign . 2 . π . i isign . 2 . π . i
ω1 = e n1
ω2 = e n2

i = +√−1 π = 3.14159. . .

isign = ±1
If in a certain application it is known that the FFT of the complex input matrix is real, then instead of using
PCCFFT2D, you can save computation time by using PCSFFT2D. This is often the case because the
complex input matrix was the transform of a real matrix. In this case, you can save about half the
computational work and PCSFFT2D computes an identical formula with the input and output matrices
reversed.
Different authors use different conventions for which of the transforms, isign = +1 or isign = – 1, is the
forward or inverse transform, and what the scale factor should be in either case. To compute any of the
various possible definitions, however, choose the appropriate values for isign and scale.
If you call PSCFFT2D with any particular values of isign and scale the mathematical inverse function is
computed by calling PCSFFT2D with – isign and 1/(n1 . n2 . scale). In particular, if you use isign = +1 and
scale = 1.0 in PSCFFT2D for the forward FFT, you can compute the inverse FFT by using PCSFFT2D with
isign = – 1 and scale = 1.0/(n1 . n2).
PSCFFT2D is very similar in function to PCCFFT2D, but it takes the real-to-complex transform in the first
dimension, followed by the complex-to-complex transform in the second dimension.
PCSFFT2D does the reverse. It takes the complex-to-real FFT in the second dimension, followed by the
complex-to-real FFT in the first dimension.
See the SCFFT(3S) man page for more information about real-to-complex and complex-to-real FFTs. The
2D analog of the conjugate formula is as follows:
B(k1, k2) = conjg(B(n1 – k1, n2 – k2))
for
n1/2 < k1 ≤ n1– 1
Therefore, you have to compute only slightly more than half of the output values, namely:
B(k1, k2)
for
0 ≤ k1 ≤ n1 / 2
0 ≤ k2 ≤ n2– 1
Therefore, the only value of B that is computed is B(0:n1/2, 0:n2-1).

004– 2081– 002 271


PSCFFT2D ( 3S ) PSCFFT2D ( 3S )

If the values of either n1 or n2 are prime or not factorizable into powers of 2, 3 and 5 then significant
improvements in computational time can be obtained by using the following initializations of isys which is a
vector of length 3.
If both n1 and n2 are factorizable into powers of 2, 3 and 5, for example, n1 = 30 and n2 = 120 then
isys(1) = 2
isys(2) = 0
isys(3) = 0
If any one dimension is not factorizable into powers of 2, 3 and 5 then the following intializations of isys
yield the fastest times:
n1 not factorizable but n2 factorizable
isys(1) = 2
isys(2) = 1
isys(3) = 0
n1 factorizable but n2 not factorizable
isys(1) = 2
isys(2) = 0
isys(3) = 1
both n1 and n2 not factorizable
isys(1) = 2
isys(2) = 1
isys(3) = 1
Here isys(1) indicates the dimension of the matrix over which the FFT is being performed.
If the numbers n1 and n2 are not known ahead of time, then isys(2) and isys(3) could be initialized to 0 or 1;
if an inapproriate choice is made, the routine would compute the correct result for n1 and n2, although
slowly. If initialized to 1, more workspace is needed; see the description of table which follows.
The storage requirements for the vector table depend on the values of the isys vector.
These routines accept the following arguments (all scalar values are private data):
isign Integer. (input)
Specifies whether to initialize the table array or to do the forward or inverse transform as
follows:
If isign = 0, the routine initializes the table array and returns. In this case, the only arguments
used or checked are isign, n1, n2, and table.
If isign = +1 or – 1, the value of isign is the sign of the exponent used in the FFT formula.
n1 Integer. (input)
Number of rows in the sumbatrix to be transformed.
n2 Integer. (input)
Number of columns in the sumbatrix to be transformed.

272 004– 2081– 002


PSCFFT2D ( 3S ) PSCFFT2D ( 3S )

scale Real. (input)


Scale factor. Each element of the output array is multiplied by scale factor after taking the
Fourier transform.
A Private real array of dimension (0:lldA– 1, 0:ICEIL(nc,npc)– 1). (input)
Input array of values to be transformed. lldA is the local leading dimension and is initialized
using DESCINIT(3S). A must be declared in a COMMON block.
iA With jA, the global address of the first element of the global input matrix.
jA With iA, the global address of the first element of the global input matrix.
descA Integer vector of dimension 9. (input)
Contains description of the distribution of the matrix A across a 1D processor grid.
B Private complex array of dimension (0:lldB– 1, 0:ICEIL(nc,npc)– 1). (output)
Output array of transformed values. lldB is the local leading dimension and is initialized using
DESCINIT(3S). B must be declared in a COMMON block.
iB With jB, the global address of the first element of the global matrix where output will be written.
jB With iB, the global address of the first element of the global matrix where output will be written.
descB Integer vector of dimension 9. (input)
Contains description of the distribution of the matrix B across a 1D processor grid.
table Private real vector of length 2(n1 + n2) if both isys(2) and isys(3) are equal to zero. Private real
vector of length 12(n1 + n2) if either isys(2) or isys(3) is equal to 1. (input or output)
Table of trigonometric function values. If isign = 0, the routine initializes table (table is output
only).
If isign = +1 or – 1, the values in table are assumed to be initialized already by a prior call with
isign = 0 (table is input only).
work Private complex vector of length 2(n1r . ICEIL(n2r,npc). (workspace)
where n1r and n2r are the values of n1 and n2 rounded up to the nearest powers of 2 greater
than or equal to them. work must be declared in a COMMON block.
isys Private integer vector of length 3. (input)
isys(1) indicates the dimension of the problem which is 2. isys(1) should be set to 2.
isys(2) indicates if n1 is prime or not factorizable into powers of 2, 3 and 5. Should be set to 1
if the number is not factorizable into powers of 2, 3 and 5. Should be set to 0 if it is
factorizable.
isys(3) should be set to 0 or 1, depending on whether n2 is factorizable or not into powers of 2,
3 and 5.
info Integer. (output)

004– 2081– 002 273


PSCFFT2D ( 3S ) PSCFFT2D ( 3S )

info is set to 0 if all the arguments passed to the routine are legal. If any argument has an
illegal value, the routine exits after setting info to a negative number. – info indicates the
position of the illegal argument.
The argument list for PCSFFT2D is identical to that of PSCFFT2D except that the input array for
PCSFFT2D is complex and the output array is real. If the routine PCSFFT2D was being used to compute
the inverse FFT of the matrix B (the FFT of the matrix A), then the arguments pertaining to A and B in
PSCFFT2D are reversed for PCSFFT2D.

NOTES
The scale factor scale can take on values of 1.0 or 1.0/(n1 . n2), depending on whether the forward or
inverse FFT is being computed.
The format of the vector that stores the trig tables (table) is the same for both routines. It can be initialized
by either routine.
Algorithm
The routine uses a very efficient single FFT routine, SCFFT, to do the FFT of each column on the
processors that own the submatrix. It then transposes the submatrix by using intermediate workspace that it
allocates for the purpose, and it again does the FFT along the columns (FFTs of the rows).
PCSFFT2D first transposes the matrix and performs a very efficient single processor FFT routine, CCFFT,
on the columns of the transposed matrix (that is, the rows of the original input matrix). This intermediate
matrix is then transposed again after which a complex-to-real FFT, CSFFT, is applied to the columns.
If either isys(2) or isys(3) or both are initialized to 1, then a fast (0(n log(n))) algorithm based on the chirp-z
transform is used for the 1D FFT in the corresponding direction. In this case, the vector table must be real
of length 12(n1+n2).
Workspace
The routine dynamically allocates two real work arrays:
Size of one work array:
8 . MAX(nr, nc)
Size of the other work array:
2 . MAX(nr . ICEIL(nc,npc), nc . ICEIL(nr,npc))
The workspaces are freed on exiting.

EXAMPLES
Example code for PSCFFT2D and PCSFFT2D on a 16-processor partition:

274 004– 2081– 002


PSCFFT2D ( 3S ) PSCFFT2D ( 3S )

real A(256, 16)


com ple x B(1 29,16)
rea l C(2 56, 16)
comple x wor k(8 192 )
common /abcw/ A, B, C, work
com plex tab le(600 0)

int eger ictxt, des cA( 9), des cB(9), des cC( 9), isign, isy s(3 )
int eger nr, nc, ice il, inf o, np, nrB
intege r n1, n2

real scale

nr = 240
nc = 181
nrB = (nr /2) + 1

n1 = nr
n2 = nc

np = n$p es

cal l bla cs_ gridinit( ict xt, ’C’ ,1,np)

cal l des cinit( des cA, nr, nc, nr, ice il( nc,np) ,
0, 0, ict xt, 256 , inf o )

if ( info .ne. 0 ) then


call exit(0 )
end if

cal l des cin it( des cC, nr, nc, nr, ice il( nc,np) ,
0, 0, ict xt, 256 , info )

if ( inf o .ne . 0 ) the n


cal l exi t(0)
end if

cal l des cin it( des cB, nrB , nc, nrB , ice il(nc, np),
0, 0, ict xt, 129 , inf o )

if ( inf o .ne . 0 ) the n


cal l exi t(0 )
end if

004– 2081– 002 275


PSCFFT2D ( 3S ) PSCFFT2D ( 3S )

isign = -1
scale = 1.0
isy s(1) = 2
isy s(2) = 0
isy s(3) = 1
*
* Ini tia lizing the tri g tab les
*
call psc fft 2d(0, n1, n2, scale, A, 1, 1, des cA,
B, 1, 1, des cB, tab le, wor k, isy s, info)

if ( inf o .ne . 0 ) the n


cal l exi t(0)
end if
*
* FFT
*
cal l psc fft2d( isign, n1, n2, sca le, A, 1, 1, descA,
B, 1, 1, des cB, tab le, wor k, isys, inf o)

if ( info .ne. 0 ) the n


call exit(0 )
end if
*
* Inv ers e FFT
*
isi gn = 1
sca le = 1.0 / float( n1* n2)

cal l pcs fft 2d(isi gn, n1, n2, sca le, B, 1, 1, des cB,
C, 1, 1, descC, tab le, work, isys, info)

if ( inf o .ne . 0 ) then


cal l exi t(0 )
end if

cal l exi t(0)


end

SEE ALSO
BLACS_GRIDINFO(3S), BLACS_GRIDINIT(3S), BLACS_PCOORD(3S), BLACS_PNUM(3S),
DESCINIT(3S)

276 004– 2081– 002


PSCFFT3D ( 3S ) PSCFFT3D ( 3S )

NAME
PSCFFT3D, PCSFFT3D – Applies a three-dimensional (3D) real-to-complex or complex-to-real Fast Fourier
Transform (FFT) to a matrix distributed across a set of processors

SYNOPSIS
CALL PSCFFT3D (isign, n1, n2, n3, scale, A, iA, jA, kA, descA, B, iB, jB, kB, descB,
table, work, isys, info)
CALL PCSFFT3D (isign, n1, n2, n3, scale, A, iA, jA, kA, descA, B, iB, jB, kB, descB,
table, work, isys, info)

IMPLEMENTATION
UNICOS/mk systems

DESCRIPTION
PSCFFT3D computes the 3D real-to-complex Fast Fourier Transform (FFT) of the distributed real matrix A,
and it stores the results in the distributed complex matrix B.
PCSFFT3D computes the 3D complex-to-real Fast Fourier Transform (FFT) of the distributed complex
matrix B, and it stores the results in the distributed real matrix A.
Description of the Distributed Data
This routine considers the processors to be partitioned into a two-dimensional (2D) grid. The 3D input
matrix A is then distributed across this 2D grid of processors as discussed in the following text.
Consider a 3D matrix A of size nx-by-ny-by-nz. nx, ny, and nz are the sizes of the matrix A along the X, Y,
and Z dimensions, respectively.
Let the processors (N$PES in number) be partitioned into a 2D grid of size npy-by-npz where npy is the
number of processors assigned to the Y dimension and npz is the number of processors assigned to the Z
dimension. Let the number of processors assigned to the X dimension be 1. To partition the processors into
this grid call the GRIDINIT3D(3S) routine as follows:
CAL L GRI DIN IT3D (ICTXT, 1, npy, npz)

The input matrix, A, and the output matrix, B, are distributed across this 2D processor grid by using the
block (as defined in FORTRAN D and HPF) distribution along the Y and Z dimensions. The distribution
along the X dimension is degenerate. The descriptors descA and descB provide information on the
distribution of the matrices A and B across the processor grid. The descriptors descA and descB are
initialized using the DESCINIT3D(3S) routine. The DESCINIT3D(3S) routine would have to be called
after the call to GRIDINIT3D(3S) and would look like this:
CALL DESCINIT3D (descA, nx, ny, nz, nx, ICEIL(ny, npy), ICEIL(nz, npz), pesz, pesy, pesz,
ictxt, lldxA,

004– 2081– 002 277


PSCFFT3D ( 3S ) PSCFFT3D ( 3S )

Due to the symmetry in the FFT of A, the only computed values stored in the output matrix B are (0:(nx/2),
0:ny-1, 0:nz-1). Therefore, the call to DESCINIT3D(3S) for the output matrix B would look like the
following:
nxB = (nx/2) + 1
CALL DESCINIT3D (descB, nxB, ny, nz, nxB, ICEIL(ny, npy), ICEIL(nz, npz), pesx, pesy,
pesz, ictxt, lldxB, lldyB, info)
Here, llda and lldb are the local leading dimensions of the private data matrices A and B that store the local
portions of the global input and output 3D matrices participating in the FFT computation.
Given that matrix A is distributed in a block manner across the processor grid along the Y and Z dimensions,
the block size along these two dimensions would be as follows:
nypp = ICEIL(ny, npy) and nzpp = ICEIL (nz, npz)
Further assume that the user wants the 3D FFT to be performed on a submatrix starting at global address
(iA, jA, kA). Let this submatrix be of size n1-by-n2-by-n3. Then the iA, jA, and kA arguments represent the
global address of the first element of the submatrix and n1, n2, and n3 represent the size of the submatrix
over which the 3D FFT will be performed.
Similarly, the iB, jB, and kB arguments represent the first element of the submatrix to which the output will
be written.
Restrictions
In the current release, the matrices A and B must be distributed identically. This means that all of the
arguments provided to DESCINIT3D(3S) to initalize descA and descB must be equal, except the local
leading dimensions, and the size of the matrix in the X dimension.
The flexibility of performing the FFT over any submatrix of the global matrix is not available in the current
release. Therefore, you must initialize iA, jA, kA, iB, jB, and kB with 1 and n1 with nx, n2 with ny, and n3
with nz.
All processors must call this routine. In future releases only those processors that own the matrix over
which the FFT is to be performed, will participate in the computation. All other processors will exit the
routine immediately.
3D FFT Theory
In FFT applications, it is customary to use zero-based subscripts; the formulas are simpler that way.
Suppose that the arrays are dimensioned as follows:
REAL A(0 :n1 -1, 0:n2-1 , 0:n 3-1 )
COMPLE X B(0:n1 /2, 0:n 2-1, 0:n3-1 )
n1−1 n2−1 n3−1 .k .k .k
Σ Σ Σ
j . ω2 j . ω3 j
PSCFFT3D computes the formula: Bk k k
1, 2, 3
= scale . A j ,j ,j . ω1
1 2 3
1 1 2 2 3 3

j =0 j =0 j =0
1 2 3
k 1 = 0, . . ., n 1 / 2
for k 2 = 0, . . ., n2−1
k 3 = 0, . . ., n3−1

278 004– 2081– 002


PSCFFT3D ( 3S ) PSCFFT3D ( 3S )

where
isign . 2 . π . i
ω1 = e n1
π = 3.14159. . .
isign . 2 . π . i
ω2 = e n2
i = +√−1
isign . 2 . π . i
ω3 = e n3
isign = ±1
If in a certain application it is known that the FFT of the complex input matrix is real, then instead of using
PCCFFT3D, you can save computation time by using PCSFFT3D. Often, this is the case because the
complex input matrix was the transform of a real matrix. In this case, you can save about half of the
computational work and PCSFFT3D computes an identical formula with the input and output matrices
reversed.
Different authors use different conventions for which of the transforms, isign = +1 or isign = – 1, is the
forward or inverse transform, and what the scale factor should be in either case. To make this routine
compute any of the various possible definitions, however, by choosing the appropriate values for isign and
scale.
If you call PSCFFT3D with any particular values of isign and scale, the mathematical inverse function is
computed by taking the FFT with – isign and 1 /(n1 . n2 . n3 . scale). In particular, if you use isign = +1
and scale = 1.0 in PSCFFT3D for the forward FFT, you can compute the inverse FFT by using PCSFFT3D
with isign = – 1 and scale = 1/(n1 . n2 . n3).
PSCFFT3D is very similar in function to PCCFFT3D, but it takes the real-to-complex transform in the first
dimension, followed by the complex-to-complex transform in the second and third dimension. PCSFFT3D
does the reverse. It takes the complex-to-complex FFT in the third and second dimension, followed by the
complex-to-real FFT in the first dimension. See the SCFFT(3S) man page for more information about
real-to-complex and complex-to-real FFTs. The three dimensional analog of the conjugate formulate is as
follows:
B(k1, k2, k3) = conjg(B(n1 – k1, n2 – k2, n3 – k3))
for
n1 / 2 < k1 ≤ n1 – 1
0 ≤ k2 ≤ n2 – 1
0 ≤ k3 ≤ n3 – 1
where the notation conjg(z) represents the complex conjugate of z.
Therefore, you have to compute only slightly more than half of the output values, namely:
B(k1, k2, k3)
for
0 ≤ k1 ≤ n1/2
0 ≤ k2 n2– 1
0 ≤ k3 ≤ n3– 1

004– 2081– 002 279


PSCFFT3D ( 3S ) PSCFFT3D ( 3S )

Therefore, the only values of B that are computed are B(0: n1 / 2, 0:n2– 1,0:n3– 1).
If the values of either n1, n2, or n3 are prime or not factorizable into powers of 2, 3 and 5 then significant
improvements in computational time can be obtained by using the following initializations of isys which is a
vector of length 4.
The first element of isys indicates the dimension of the problem, i.e., isys(1) = 3. The next three elements of
isys indicate if the lengths n1, n2 and n3 are factorizable into powers of 2, 3 and 5. isys(2) is set to 0 if n1
is factorizable into powers of 2, 3 and 5 and is set to 1 otherwise. Similarly isys(3) and isys(4) are set to
zero if n2 and n3 are factorizable into powers of 2, 3 and 5 and set to 1 if they are not.
For example if n1 = 256, n2 = 240 and n3 = 254, then the best computational time is obtained by setting
isys(1) = 3 (dimension of the problem)
isys(2) = 0
isys(3) = 0
isys(4) = 1
If the numbers n1, n2 and n3 are not known ahead of time, then isys(2), isys(3) and isys(4) could be
initialized to 0 or 1; if an inappropriate choice is made, the routine would compute the correct result,
although slowly. If initialized to 1, more workspace is needed; see the description of table which follows.
The storage requirements for the vector table depend on the values of the isys vector.
The PSCFFT3D routine accepts the following arguments:
isign Integer. (input)
Specifies whether to initialize the table array or to do the forward or inverse transform as
follows:
If isign = 0, the routine initializes the table array and returns. In this case, the only arguments
used or checked are isign, n1, n2, n3, and table.
If isign = +1 or – 1, the value of isign is the sign of the exponent used in the FFT formula.
n1 Integer. (input)
Transform size in the X dimension.
n2 Integer. (input)
Transform size in the Y dimension.
n3 Integer. (input)
Transform size in the Z dimension.
scale Real. (input)
Scale factor. Each element of the output array is multiplied by scale factor after taking the
Fourier transform.
A Private complex array of dimension (0:lldxA– 1,0:lldyA– 1,0:ICEIL(nz,npz)– 1). (input)
Input array of values to be transformed. lldxA and lldyA are the local leading dimensions along
the X and Y dimensions, and are initialized using DESCINIT3D(3S). A must be declared in a
COMMON block.

280 004– 2081– 002


PSCFFT3D ( 3S ) PSCFFT3D ( 3S )

iA With jA and kA, the global address of the first element of the global input matrix.
jA With iA and kA, the global address of the first element of the global input matrix.
kA With iA and jA, the global address of the first element of the global input matrix.
descA Integer vector of dimension 12. (input)
Contains description of the distribution of the matrix A across a 3D processor grid.
B Private complex array of dimension (0:lldxB– 1,0:lldyB– 1,0:ICEIL(nz,npz)– 1). (output)
Output array of transformed values. lldxB and lldyB are the local leading dimensions along the
X and Y dimensions and are initialized using DESCINIT3D(3S). B must be declared in a
COMMON block.
The output array B may be the same as the input array A in which case the input array A is
overwritten with the transformed values.
iB With jB and kB, the global address of the first element of the global matrix where output will be
written.
jB With iB and kB, the global address of the first element of the global matrix where output will be
written.
kB With iB and jB, the global address of the first element of the global matrix where output will be
written.
descB Integer vector of dimension 12. (input)
Contains description of the distribution of the matrix B across a 3D processor grid.
If the input array and the output array are the same, then the same descriptors must be used.
table Private real vector of length 2(n1 + n2 + n3) if isys(2), isys(3) and isys(4) = 0. Private real
vector of length 12(n1 + n2 + n3), if isys(2), isys(3) or isys(4) = 1. (input or output)
If isign = 0, the routine initializes table (table is output only). If isign = +1 or – 1, the values in
the table are assumed to be initialized already by a prior call with isign = 0 (table is input only).
work Private complex vector length 2(n1r . ICEIL(n2r,npy) . ICEIL(n3r,npz)
where n1r, n2r, and n3r are the values of n1, n2 and n3 rounded up to the nearest powers of 2
greater than or equal to them. work must be declared in a COMMON block.
isys Private integer vector of length 4. (input)
isys(1) indicates the dimension of the problem which is 3. isys(1) should be set to 3.
isys(2), isys(3) and isys(4) should be set to 0 or 1, depending on whether n1, n2, or n3 is
factorizable or not into powers of 2, 3 and 5 correspondingly.
info Integer. (output)
info is set to 0 if all the arguments passed to the routine are legal. If any argument has an
illegal value, the routine exits after setting info to a negative number. – info indicates the
position of the illegal argument.

004– 2081– 002 281


PSCFFT3D ( 3S ) PSCFFT3D ( 3S )

NOTES
The scale factor scale can take on values of 1.0 or 1.0/(n1 . n2 . n3) depending on whether the forward or
inverse FFT is being computed.
The format of the vector that stores the trig tables (table) is the same for both routines. It can be initizalized
by either routine.
Algorithm
The routine uses a very efficient single FFT routine, CCFFT, to do the FFT of each column (X dimension)
on the processors that own the submatrix. It then transposes the submatrix along the X-Y plane, using
intermediate ,orkspace that it allocates for the purpose, and again does the FFT along the columns (FFTs of
the Y dimension). The submatrix is again transposed along the X-Y plane to restore the original
distribution. Now the submatrix is transposed along the X-Z planes and the FFTs along the Z dimension are
computed. Finally another transpose along the X-Z plane restores the original distribution.
If either isys(2), isys(3) or isys(4) or all are initialized to 1, then a fast (O(n log(n))) algorithm based on the
chirp-z transform is used for the one dimensional FFT in the corresponding direction. In this case, the
vector table must be real of length 12(n1+n2+n3).
Workspace
The routine dynamically allocates two real work arrays:
Size of one work array:
8 . (MAX(nx,ny,nz).
Size of the other work array:
2 . MAX(nx . ICEIL(ny,npy), ny . ICEIL(nx,npy), nx . ICEIL(nz,npz), nz . ICEIL (nx,npz))

EXAMPLES
Example code for PSCFFT3D on a 16-processor system:
rea l A(2 56, 50,40)
rea l C(2 56, 50,40)
com plex B(1 29,50, 40)
com plex wor k(2097 152 )
common /abcw/ A, B, C, work
real table( 8192)

int eger ictxt, descA( 12), des cB(12) , isign, isy s(4)
intege r nx, ny, nz, npy , npz, ice il, info
int ege r n1, n2, n3

282 004– 2081– 002


PSCFFT3D ( 3S ) PSCFFT3D ( 3S )

rea l sca le

nx = 240
ny = 181
nz = 145

n1 = nx
n2 = ny
n3 = nz

npy = 4
npz = n$pes / 4

cal l gri dinit3 d(i ctxt,1 ,npy,n pz)

call descin it3d( des cA, nx, ny, nz, nx, iceil( ny,npy ),
ice il(nz, npz ), 0, 0, 0, ict xt, 256 , 50, inf o )

if ( info .ne. 0 ) the n


call exit(0)
end if
cal l des cinit3 d( des cC, nx, ny, nz, nx, iceil( ny, npy),
iceil( nz,npz ), 0, 0, 0, ict xt, 256, 50, inf o )
if ( inf o .ne . 0 ) the n
cal l exi t(0 )
end if

call descin it3 d( des cB, nx, ny, nz, nx, ice il(ny, npy),
ice il( nz,npz), 0, 0 ,0, ictxt, 129 , 50, info)

if ( inf o .ne . 0) then


cal l exi t(0)
end if

isign = -1
scale = 1.0
isys(1 ) = 3
isys(2 ) = 0
isys(3 ) = 1
isys(4 ) = 1
*
* Ini tializ ing the trig tables
*
cal l psc fft3d( 0, n1, n2, n3, sca le, A, 1, 1, 1,

004– 2081– 002 283


PSCFFT3D ( 3S ) PSCFFT3D ( 3S )

des cA, B, 1, 1, 1, descB, tab le, wor k, isys, inf o)

if ( info .ne. 0) then


call exit(0)
end if

*
* FFT
*
cal l psc fft 3d(isign, n1, n2, n3, sca le, A, 1, 1, 1,
des cA, B, 1, 1, 1, des cB, table, wor k, isy s, inf o)

if ( inf o .ne . 0) the n


cal l exi t(0)
end if

isign = +1
sca le = 1.0 /float (n1*n2 *n3)
*
* Inv ers e FFT
*
cal l pcsfft 3d(isign, n1, n2, n3, sca le, B, 1, 1, 1
des cB, C, 1, 1, 1, des cC, table, wor k, isy s, info)

if ( inf o .ne. 0) then


cal l exit(0)
end if

call exi t(0)


end

SEE ALSO
DESCINIT3D(3S), GRIDINFO3D(3S), GRIDINIT3D(3S), PCOORD3D(3S), PNUM3D(3S)

284 004– 2081– 002


RCFFT2 ( 3S ) RCFFT2 ( 3S )

NAME
RCFFT2 – Applies a real-to-complex Fast Fourier Transform (FFT)

SYNOPSIS
CALL RCFFT2 (init, ix, n, x, work, y)

IMPLEMENTATION
UNICOS systems

DESCRIPTION
RCFFT2 calculates the following:
n−1
2πi
yk = 2 Σ xj
j =0
exp (±
n
jk ) for k = 0,1,. . ., n / 2

The sign of the exponent is the same as the sign of ix.


This routine has the following arguments:
init Integer. (input)
If nonzero, generates sine and cosine tables in work. If 0, calculates FFTs by using sine and
cosine tables of the previous call.
ix Integer. (input)
> 0 Calculates a forward transform
< 0 Calculates an inverse transform
n Integer. (input)
m
Size of the Fourier transform (2 , where m ≥ 3).
x Real array of dimension n. (input)
Input vector. Range of x:

2n 102466
≤ xi ≤ for i = 1,2,. . .,n.
102466 2n
work Complex array of dimension (3 . n / 2) + 2. (scratch output)
Work storage vector.
y Complex array of dimension (n / 2) + 1. (output)
Result vector.

SEE ALSO
CFFT2(3S), CRFFT2(3S)
SCFFT(3S), which supersedes this routine only on Cray Y-MP systems

004– 2081– 002 285


RFFTMLT ( 3S ) RFFTMLT ( 3S )

NAME
RFFTMLT – Applies complex-to-real or real-to-complex Fast Fourier Transforms (FFTs) on multiple input
vectors

SYNOPSIS
CALL RFFTMLT (x, work, trigs, ifax, inc1x, inc2x, n, lot, isign)

IMPLEMENTATION
UNICOS systems

DESCRIPTION
When isign = – 1, RFFTMLT applies real-to-complex FFTs (forward transforms) on more than one input
vector. When isign = +1, RFFTMLT applies complex-to-real inverse FFTs (inverse transforms) on more than
one input vector.
This routine has the following arguments:
x Real array of dimension (0:n+1, lot). (input and output)
Contains the input values before the call to RFFTMLT, and output values after the call. On exit,
the computed output values are stored in the space originally occupied by input values. Because
the output is written back into the input array and contains n/2+1 complex values per transform,
you must size the input array to contain at least n+2 real elements per transform. (See the Data
Format subsection.)
work Real array of dimension 2 . n . lot . (scratch output)
Work storage vector.
trigs Real array of dimension 2n. (input)
Sine and cosine tables for FFT calculation. The following call initializes the vectors trigs and
ifax:
CAL L FFT FAX(n, ifax, trigs)

ifax Integer array of dimension 19. (input)


List of prime factors of n. Generally, ifax is initialized by FFTFAX, as stated previously.
inc1x Integer. (input)
The increment within each data vector; that is, the address distance between consecutive
elements of each vector to be transformed. The increment is always given in real (not complex)
words.
inc2x Integer. (input)
The increment between data vectors; that is, the address distance between corresponding
elements in adjacent data vectors. inc2x is typically the length of the leading dimension of x, as
noted previously. The increment is always given in real (not complex) words.

286 004– 2081– 002


RFFTMLT ( 3S ) RFFTMLT ( 3S )

n Integer. (input)
Length of each data vector. n ≥ 2. n must be even. Any value of n that is not valid causes
FFTFAX to return the error code ifax(1) = – 99.
lot Integer. (input)
The number of data vectors.
isign Integer. (input)
Sign of the transform:
isign = – 1 Calculates real-to-complex (forward) FFT
isign = +1 Calculates complex-to-real (inverse) FFT

NOTES

Superseded and Preferred Routines


On UNICOS systems, RFFTMLT is superseded by the routines SCFFTM(3S) (real-to-complex, multiple
FFTs) and CSFFTM(3S) (complex-to-real, multiple FFTs).
Choice of Transform
The final argument, isign, determines whether a real-to-complex, or a complex-to-real transform is done, as
follows.
Real-to-complex transform (isign = – 1)
When isign = – 1, RFFTMLT performs a real-to-complex FFT on a set of vectors.
For each of the m real input vectors,
x j,m j = 0,1,. . .,n-1
RFFTMLT computes the complex output vector
y k,m k = 0,1,. . .,n / 2
defined by
n −1 2.π.i
−jk
yk ,m = Σ
j =0
x j ,m ωn for k = 0,1,. . .,n / 2, where ωn =e n

Only the first n / 2+1 complex output vectors are computed for each vector. The theory of Fourier transforms
implies that because the input is real, the values obey the symmetry:
y n– k,m = y k,m
(where the notation z denotes the complex conjugate of z).
Thus, the last n / 2 output values are complex conjugates of the first n / 2 output values.

004– 2081– 002 287


RFFTMLT ( 3S ) RFFTMLT ( 3S )

Complex-to-real transform (isign = +1)


When isign = +1, RFFTMLT performs a complex-to-real FFT on a set of vectors. For each of the m
complex input vectors,
y k,m k = 0,1,. . .,n / 2
RFFTMLT computes the real output vector
x j,m j = 0,1,. . .,n-1
defined by
n −1 2.π.i
1
Σ
jk
x j ,m = yk ,m ωn for j = 0,1,. . .,n– 1 where ωn =e n
n k =0

Although the summation in the definition runs from 0 to n– 1, actually only the first n/2+1 values for each
input vector are used. The other input values are deduced from the following symmetry:
y n– k,m = y k,m
which, according to the theory, must be true because the transform of the input data is real-valued. The
isign=– 1 and isign=+1 transforms are inverses of each other.
Data Format
The x array contains both input and output values of either the real-to-complex or complex-to-real transform.
The array is declared real, but, on output from the real-to-complex transform and on input into the
complex-to-real transform, x contains complex values. The following describes how complex values are
arranged in the real array x.
Real-to-complex (isign = – 1)
The output values are stored in the same array as the input values. On input, lot real input vectors of length
n are stored as follows:
x is stored in X(j . inc1x, m) for j = 0,1,. . .,n-1 m = 1,2,. . .,lot
j,m
Space for the values X(n . inc1x,m) and X (n+1)inc1x,m must also be reserved, although these values are
not used on input, and they may be undefined.
On output, lot complex output vectors of length n / 2 +1 (same as n+2 "real" elements per vector) are stored
in the same array elements as the real input vectors, so that
Real(y ) is stored in X(2k . inc1x,m)
(k,m)
Imaginary(y (k,m) ) is stored in X((2k+1)inc1x,m)
for k = 0,1,. . .,n / 2
m = 1,2,. . .,lot
For all lot output vectors, y 0,m and y n / 2,m have real number values. Thus, their imaginary parts are set to 0.

288 004– 2081– 002


RFFTMLT ( 3S ) RFFTMLT ( 3S )

Complex-to-real (isign = +1)


The input and output data format is exactly the opposite of that given previously for isign = – 1.
Normalization
RFFTMLT uses a normalization (scale factor) that differs from the one used by CFFT2(3S), CRFFT2(3S),
and RCFFT2(3S).
Performance
For fastest performance, n should be a product only of powers of 2, 3, and 5. Vectorization is achieved by
doing parallel transforms, with vector length = lot.
The memory stride for vector loads and stores is inc2x; therefore, to avoid slowdowns due to memory
contention, inc2x should be an odd number.

EXAMPLES
The following program shows how to invoke RFFTMLT.
parame ter (n = 16, lot = 2, inc = 1, jump = inc*(n +2))
real a(jump , lot), tri gs(2*n), wor k(2*n*lot)
int ege r ifa x(1 9)
. . .

*----- ------ --- ------ --- ------------ ------------ ------------ ----------


* Com pute the FFT of A, using RFF TMLT

call fftfax (n, ifax, tri gs)


cal l rff tmlt(a , work, tri gs, ifa x, inc, jump, n, lot , -1)
. . .

end

SEE ALSO
CFFTMLT(3S), CRFFT2(3S), RCFFT2(3S)
SCFFTM(3S), which supersedes this routine only on Cray PVP systems

004– 2081– 002 289


SCFFT ( 3S ) SCFFT ( 3S )

NAME
SCFFT, CSFFT – Computes a real-to-complex or complex-to-real Fast Fourier Transform (FFT)

SYNOPSIS
CALL SCFFT (isign, n, scale, x, y, table, work, isys)
CALL CSFFT (isign, n, scale, x, y, table, work, isys)

IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses only private data.

DESCRIPTION
SCFFT computes the FFT of the real array X, and it stores the results in the complex array Y. CSFFT
computes the corresponding inverse complex-to-real transform.
It is customary in FFT applications to use zero-based subscripts; the formulas are simpler that way. For
SCFFT, suppose that the arrays are dimensioned as follows:
REAL X(0:n- 1)
COM PLEX Y(0:n/ 2)

Then the output array is the FFT of the input array, using the following formula for the FFT:
n −1
Σ
. j .k
Yk = scale X j . ωisign for k = 0, . . ., n ⁄ 2
j =0

where 2 . π . i
ω=e n
i = + √−1
π = 3.14159. . . isign = ±1
Different authors use different conventions for which of the transforms, isign = +1 or isign = – 1, is the
forward or inverse transform, and what the scale factor should be in either case. You can make these
routines compute any of the various possible definitions, however, by choosing the appropriate values for
isign and scale.
The relevant fact from FFT theory is this: If you call SCFFT with any particular values of isign and scale,
the mathematical inverse function is computed by calling CSFFT with – isign and 1 /(n .scale ). In particular,
if you use isign = +1 and scale = 1.0 in SCFFT for the forward FFT, you can compute the inverse FFT by
using CSFFT with isign = – 1 and scale = 1.0 / n.
This routine has the following arguments:
isign Integer. (input)
Specifies whether to initialize the table array or to do the forward or inverse Fourier transform,
as follows:

290 004– 2081– 002


SCFFT ( 3S ) SCFFT ( 3S )

If isign = 0, the routine initializes the table array and returns. In this case, the only arguments
used or checked are isign, n, and table.
If isign = +1 or – 1, the value of isign is the sign of the exponent used in the FFT formula.
n Integer. (input)
Size of transform. If n ≤ 2, SCFFT returns without calculating the transform.
scale Real. (input)
Scale factor. Each element of the output array is multiplied by scale after taking the Fourier
transform, as defined in the preceding formula.
x SCFFT: Real array of dimension (0:n– 1). (input)
CSFFT: Complex array of dimension (0:n / 2). (input)
Input array of values to be transformed.
y SCFFT: Complex array of dimension (0:n / 2). (output)
CSFFT: Real array of dimension (0:n– 1). (output)
Output array of transformed values.
The output array, y, is the FFT of the the input array, x, computed according to the preceding
formula. The output array may be equivalenced to the input array in the calling program. Be
careful when dimensioning the arrays, in this case, to allow for the fact that the complex array
contains two (real) words more than the real array.
table UNICOS systems: Real array of dimension (100 + 4n). (input or output)
UNICOS/mk systems: Real array of dimension (2n). (input or output)
Table of factors and trigonometric functions.
If isign = 0, the table array is initialized to contain trigonometric tables needed to compute an
FFT of size n.
If isign = +1 or – 1, the values in table are assumed to be initialized already by a prior call with
isign = 0.
work UNICOS systems: Real array of dimension (4 + 4n). (scratch output)
UNICOS/mk systems: Real array of dimension (2n).
Work array used for intermediate calculations. Its address space must be different from that of
the input and output arrays.
isys Integer array of dimension (0:isys(0)). (input and output)
Use isys to specify certain processor-specific parameters or options. The first element of the
array specifies how many more elements are in the array.
If isys(0) = 0, the default values of such parameters are used. In this case, you can specify the
argument value as the scalar integer constant 0. If isys(0) > 0, isys(0) gives the upper bound of
the isys array; that is, if il = isys(0), user-specified parameters are expected in isys(1) through
isys(il).

004– 2081– 002 291


SCFFT ( 3S ) SCFFT ( 3S )

NOTES
This subsection contains implementation information, initialization information, and performance tips.
Real-to-complex FFTs
Notice in the preceding formula that there are n real input values, and n / 2 + 1 complex output values. This
property is characteristic of real-to-complex FFTs.
The mathematical definition of the Fourier transform takes a sequence of n complex values and transforms it
to another sequence of n complex values. A complex-to-complex FFT routine, such as CCFFT(3S), will take
n complex input values, and produce n complex output values. In fact, one easy way to compute a real-to-
complex FFT is to store the input data in a complex array, then call routine CCFFT to compute the FFT.
You get the same answer when using the SCFFT routine.
The reason for having a separate real-to-complex FFT routine is efficiency. Because the input data is real,
you can make use of this fact to save almost half of the computational work. The theory of Fourier
transforms tells us that for real input data, you have to compute only the first n / 2 + 1 complex output
values, because the remaining values can be computed from the first half of the values by the simple
formula:
Y(k) = conjg(Y(n-k)) for n / 2 ≤ k ≤ n-1
where the notation conjgY represents the complex conjugate of y.
In fact, in many applications, the second half of the complex output data is never explicitly computed or
stored. Likewise, as explained later, only the first half of the complex data has to be supplied for the
complex-to-real FFT.
Another implication of FFT theory is that, for real input data, the first output value, Y(0), will always be a
real number; therefore, the imaginary part will always be 0. If n is an even number, Y(n/2) will also be real
and thus, have zero imaginary parts.
Complex-to-real FFTs
Consider the complex-to-real case. The effect of the computation is given by the preceding formula, but
with X complex and Y real.
Generally, the FFT transforms a complex sequence into a complex sequence. However, in a certain
application we may know the output sequence is real. Often, this is the case because the complex input
sequence was the transform of a real sequence. In this case, you can save about half of the computational
work.
According to the theory of Fourier transforms, for the output sequence, Y, to be a real sequence, the
following identity on the input sequence, X, must be true:
n
X(k) = conjg(X(n-k)) for ≤ k ≤ n-1
2
And, in fact, the input values X(k) for k > n / 2 need not be supplied; they can be inferred from the first half
of the input.

292 004– 2081– 002


SCFFT ( 3S ) SCFFT ( 3S )

Thus, in the complex-to-real routine, CSFFT, the arrays can be dimensioned as follows:
COM PLEX X(0 :n/2)
REA L Y(0 :n-1)

There are n / 2 + 1 complex input values and n real output values. Even though only n / 2 + 1 input values
are supplied, the size of the transform is still n in this case, because implicitly you are using the FFT
formula for a sequence of length n.
Another implication of the theory is that X(0) must be a real number (that is, it must have zero imaginary
part). Also, if n is even, X(n/2) must also be real. Routine CSFFT assumes that these values are real; if you
specify a nonzero imaginary part, it is ignored.
Table Initialization
The table array stores the trigonometric tables used in calculation of the FFT. This table must be initialized
by calling the routine with isign = 0 prior to doing the transforms. The table does not have to be
reinitialized if the value of the problem size, n, does not change. Because SCFFT and CSFFT use the same
format for table, either can be used to initialize it (note that CCFFT uses a different table format).
Dimensions
In the preceding description, it is assumed that array subscripts were zero-based, as is customary in FFT
applications. Thus, the input and output arrays are declared (assuming n > 0):
REA L X(0:n- 1)
COM PLEX Y(0:n/ 2)

No change is needed in the calling sequence; however, if you prefer you can use the more customary Fortran
style with subscripts starting at 1, as in the following:
REA L X(n)
COMPLE X Y(n /2 + 1)

Performance Tips
These routines will compute an FFT for any value of n.
Performance for a given value of n depends on the prime factorization of n. This fact is characteristic of all
FFT algorithms.
Fastest performance is realized when n is a power of 2, in which case the number of floating-point
operations is approximately (5 / 2) . n . log 2 (n).
If n contains factors of 3, performance is slightly worse. If n contains powers of 5, it is slightly more worse.
Worst performance is when n is a prime number. In that case, the number of operations is approximately
2
4n .
The kernel routines are optimized for values of n that are even numbers and are products of powers of 2, 3,
and 5. (Because the kernel routines have a special case for multiples of 4, even powers of 2 will be slightly
faster than odd powers of 2.)

004– 2081– 002 293


SCFFT ( 3S ) SCFFT ( 3S )

Implementation-dependent Items
The Standard FFT routines were designed so that they could be implemented efficiently on many different
architectures. The calling sequence is the same in any implementation. Certain details, however, depend on
the particular implementation. These details are confined to two areas:
• The first area is the size of the table and work arrays. Different sizes may be needed on different
systems. No change is required to the subroutine call, but you may have to change the array sizes in the
DIMENSION or type statements that declare the arrays.
• The second area is the isys parameter array, an argument that gives certain implementation-specific
information. All features and functions of the FFT routines that are specific to any particular
implementation are confined to this isys array. On any implementation, you can use the default values by
specifying an argument value of 0.
In the UNICOS systems implementation, no special options are supported; therefore, you can always
specify an isys argument as constant 0. Other options may be provided in subsequent software releases.

EXAMPLES
These examples use the table and workspace sizes appropriate to UNICOS systems.
Example 1: Initialize the complex array TABLE in preparation for doing an FFT of size 1024. In this case
only the arguments isign, n, and table are used. You can use dummy arguments or zeros for the other
arguments in the subroutine call.
REA L TAB LE(100 + 4*1 024 )
CALL SCF FT( 0, 102 4, 0.0 , DUM MY, DUM MY, TABLE, DUM MY, 0)

Example 2: X is a real array of dimension (0:1023), and Y is a complex array of dimension (0:512). Take
the FFT of X and store the results in Y. Before taking the FFT, initialize the TABLE array, as in example 1.
REAL X(0 :10 23)
COMPLEX Y(0:51 2)
REAL TABLE( 100 + 4*1 024 )
REAL WORK(4 *1024 + 4)
...
CALL SCF FT(0, 1024, 1.0 , X, Y, TABLE, WORK, 0)
CALL SCF FT(1, 1024, 1.0 , X, Y, TABLE, WORK, 0)

Example 3: With X and Y as in example 2, take the inverse FFT of Y and store it back in X. The scale
factor 1/1024 is used. Assume that the TABLE array is initialized already.
CALL CSFFT( -1, 1024, 1.0/10 24. 0, Y, X, TAB LE, WORK, 0)

294 004– 2081– 002


SCFFT ( 3S ) SCFFT ( 3S )

Example 4: Do the same computation as in example 2, but assume that the lower bound of each array is 1,
rather than 0. The subroutine calls are not changed.
REA L X(1024 )
COMPLE X Y(513)
...
CAL L SCFFT( 0, 102 4, 1.0 , X, Y, TAB LE, WOR K, 0)
CALL SCF FT(1, 1024, 1.0, X, Y, TAB LE, WOR K, 0)

Example 5: Do the same computation as in example 4, but equivalence the input and output arrays to save
storage space. Assume that the TABLE array is initialized already.
REA L X(1024 )
COM PLE X Y(5 13)
EQU IVALENCE ( X(1), Y(1 ) )
...
CALL SCF FT(1, 102 4, 1.0, X, Y, TAB LE, WORK, 0)

SEE ALSO
CCFFT(3S), CCFFTM(3S), SCFFTM(3S)

004– 2081– 002 295


SCFFT2D ( 3S ) SCFFT2D ( 3S )

NAME
SCFFT2D, CSFFT2D – Applies a two-dimensional real-to-complex or complex-to-real Fast Fourier
Transform (FFT)

SYNOPSIS
CALL SCFFT2D (isign, n1, n2, scale, x, ldx, y, ldy, table, work, isys)
CALL CSFFT2D (isign, n1, n2, scale, x, ldx, y, ldy, table, work, isys)

IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, these subroutines execute on a single processor and use only private data.

DESCRIPTION
SCFFT2D computes the two-dimensional Fast Fourier Transform (FFT) of the real matrix X, and it stores
the results in the complex matrix Y. CSFFT2D computes the corresponding inverse transform.
In FFT applications, it is customary to use zero-based subscripts; the formulas are simpler that way. First
the function of SCFFT2D is described. Suppose the arrays are dimensioned as follows:
REAL X(0 :ldx-1 , 0:n2-1 )
COMPLE X Y(0 :ldy-1 , 0:n2-1 )

where ldx ≥ n1 ldy ≥ (n1/2) + 1.


SCFFT2D computes the formula:
n1−1 n2−1 .k .k n1
Σ Σ
j . ω2 j
Yk , k = scale X j ,j . ω1 1 1 2 2
for k 1 = 0, . . ., +1, k 2 = 0, . . ., n2−1
1 2
j =0 j =0
1 2
2
1 2

where isign . 2 . π . i
ω1 = e n1
i = +√−1
isign . 2 . π . i
ω2 = e n2
π = 3.14159. . .

isign = ±1
Different authors use different conventions for which of the transforms, isign = +1 or isign = – 1, is the
forward or inverse transform, and what the scale factor should be in either case. You can make these
routines compute any of the various possible definitions, however, by choosing the appropriate values for
isign and scale.
The relevant fact from FFT theory is this: If you take the FFT with any particular values of isign and scale,
the mathematical inverse function is computed by taking the FFT with – isign and 1 / (n1 . n2 . scale). In
particular, if you use isign = +1 and scale = 1.0 for the forward FFT, you can compute the inverse FFT by
using isign = – 1 and scale = 1.0/(n1 . n2).

296 004– 2081– 002


SCFFT2D ( 3S ) SCFFT2D ( 3S )

SCFFT2D is very similar in function to CCFFT2D, but it takes the real-to-complex transform in the first
dimension, followed by the complex-to-complex transform in the second dimension.
CSFFT2D does the reverse. It takes the complex-to-complex FFT in the second dimension, followed by the
complex-to-real FFT in the first dimension.
See the SCFFT(3S) man page for more information about real-to-complex and complex-to-real FFTs. The
two-dimensional analog of the conjugate formula is as follows:
Yk , k
=Y n1 – k , n2 – k
for n1 / 2 < k 1 ≤ n1 – 1 0 ≤ k 2 ≤ n2 – 1
1 2 1 2

where the notation z represents the complex conjugate of z.


Thus, you have to compute only (slightly more than) half of the output values, namely:
Yk , k
for 0 ≤ k 1 ≤ n1 / 2 0 ≤ k 2 ≤ n2 – 1
1 2

UNICOS/mk systems only


If the values of either n1 or n2 are prime or not factorizable into powers of 2, 3 and 5 significant
improvements in computational time can be obtained by using the following initializations of isys, which is a
vector of length 3.
If both n1 and n2 are factorizable into powers of 2, 3 and 5, and for example, n1 = 30 and n2 = 120 then
isys(1) = 2
isys(2) = 0
isys(3) = 0
If any one dimension is not factorizable into powers of 2, 3 and 5 then the following intializations of isys
yield the fastest times:
n1 not factorizable but n2 factorizable
isys(1) = 2
isys(2) = 1
isys(3) = 0
n1 factorizable but n2 not factorizable
isys(1) = 2
isys(2) = 0
isys(3) = 1
both n1 and n2 not factorizable
isys(1) = 2
isys(2) = 1
isys(3) = 1
Here isys(1) indicates the dimension of the matrix over which the FFT is being performed.
If the numbers n1 and n2 are not known ahead of time, then isys(2) and isys(3) could be initialized to 0 and
the routine would compute correct results for n1 and n2, albeit slowly, if either n1 or n2 were prime.

004– 2081– 002 297


SCFFT2D ( 3S ) SCFFT2D ( 3S )

The storage requirements for the vector table depend on the values of the isys vector.
This feature does not exist for UNICOS systems and isys is ignored on those machines.
These routines have the following arguments:
isign Integer. (input)
Specifies whether to initialize the table array or to do the forward or inverse Fourier transform,
as follows:
If isign = 0, the routine initializes the table array and returns. In this case, the only arguments
used or checked are isign, n1, n2, and table.
If isign = +1 or – 1, the value of isign is the sign of the exponent used in the FFT formula.
n1 Integer. (input)
Transform size in the first dimension. If n1 is not positive, SCFFT2D returns without
calculating a transform.
n2 Integer. (input)
Transform size in the second dimension. If n2 is not positive, SCFFT2D returns without
calculating a transform.
scale Real. (input)
Scale factor. Each element of the output array is multiplied by scale factor after taking the
Fourier transform, as defined previously.
x SCFFT2D: Real array of dimension (0:ldx– 1, 0:n2– 1). (input)
CSFFT2D: Complex array of dimension (0:ldx– 1, 0:n2– 1). (input)
Array of values to be transformed.
ldx Integer. (input)
The number of rows in the x array, as it was declared in the calling program. That is, the
leading dimension of x.
SCFFT2D: ldx ≥ MAX(n1, 1).
CSFFT2D: ldx ≥ MAX(n1/2 + 1, 1).
y SCFFT2D: Complex array of dimension (0:ldy– 1, 0:n2– 1). (output)
CSFFT2D: Real array of dimension (0:ldy– 1, 0:n2– 1). (output)
Output array of transformed values. The output array can be the same as the input array, in
which case, the transform is done in place and the input array is overwritten with the
transformed values. In this case, it is necessary that the following equalities hold:
SCFFT2D: ldx = 2ldy.
CSFFT2D: ldy = 2ldx.
ldy Integer. (input)
The number of rows in the y array, as it was declared in the calling program (the leading
dimension of y).

298 004– 2081– 002


SCFFT2D ( 3S ) SCFFT2D ( 3S )

SCFFT2D: ldy ≥ MAX(n1/2 + 1, 1).


CSFFT2D: ldy ≥ MAX(n1 + 2, 1).
In the complex-to-real routine, two extra elements are in the first dimension (ldy ≥ n1 + 2, rather
than just ldy ≥ n1). These elements are needed for intermediate storage during the computation.
On exit, their value is undefined.
table UNICOS systems: real array of dimension 100 + 2 . (n1 + n2). (input or output)
UNICOS/mk systems: Real array of dimension 2 . (n1 + n2) if both isys(2) and isys(3) are equal
to zero. Private real vector of length 12 . (n1 + n2) if either isys(2) or isys(3) is equal to 1.
(input or output)
Table of factors and trigonometric functions.
If isign = 0, the routine initializes table (table is output only).
If isign = +1 or – 1, the values in table are assumed to be initialized already by a prior call with
isign = 0 (table is input only).
work UNICOS systems: real array of dimension 512 . MAX(n1, n2). (scratch output)
UNICOS/mk systems: Real array of dimension (n1+n2) . n2 (scratch output)
Work array. This is a scratch array used for intermediate calculations. Its address space must
be different from that of the input and output arrays.
isys UNICOS systems: ignored.
UNICOS/mk systems: integer array of length 3. (input)
isys(1) = 2
isys(2) = 0 (if n1 is factorizable into powers of 2, 3 and 5)
1 ( if n1 is not factorizable into powers of 2, 3 and 5)
isys(3) = 0 (if n1 is factorizable into powers of 2, 3 and 5)
1 ( if n1 is not factorizable into powers of 2, 3 and 5)
See PCCFFT2D(3S) for a more detailed explanation of the isys parameter.

NOTES
The following notes are for UNICOS systems only. SCFFT2D(3S) and CSFFT2D(3S) on UNICOS/mk
systems provide the functionality of PCSFFT2D(3S) and PSCFFT2D on a single PE. For notes about
CSFFT2D(3S) on UNICOS/mk systems, see PSCFFT2D(3S).
Algorithm
SCFFT2D uses a routine similar to SCFFTM to do a real-to-complex FFT on the columns, then uses a
routine similar to CCFFTM to do a complex-to-complex FFT on the rows.

004– 2081– 002 299


SCFFT2D ( 3S ) SCFFT2D ( 3S )

CSFFT2D uses a routine similar to CCFFTM to do a complex-to-complex FFT on the rows, then uses a
routine similar to CSFFTM to do a complex-to-real FFT on the columns.
Table Initialization
The table array stores factors of n1 and n2, and trigonometric tables that are used in calculation of the FFT.
table must be initialized by calling the routine with isign = 0. table does not have to be reinitialized if the
values of the problem sizes, n1 and n2, do not change.
Dimensions
In the preceding description, it is assumed that array subscripts were zero-based, as is customary in FFT
applications. Thus, the input and output arrays are declared:
REAL X(0:ld x-1, 0:n 2-1)
COMPLEX Y(0:ld y-1, 0:n 2-1)

No change is made in the calling sequence, however, if you prefer to use the more customary Fortran style
with subscripts starting at 1. The same values of ldx and ldy would be passed to the subroutine even if the
input and output arrays were dimensioned as follows:
REA L X(ldx, n2)
COM PLEX Y(ldy, n2)

Performance Tips
This routine computes an FFT for any values of n1 and n2, but the performance depends on the prime
factorizations of n1 and n2. This fact is characteristic of all FFT algorithms.
Fastest performance is realized when both n1 and n2 are powers of 2; in which case, the number of floating-
point operations is approximately 5 . n1 . n2 . log 2 (n1 . n2)
If either n1 or n2 contains factors of 3, computation time is slightly longer, because more floating-point
operations are required. If they contain powers of 5, it is longer still.
The kernel routines are optimized for values of n1 and n2 that are products of powers of 2, 3, and 5.
In UNICOS implementation, to avoid memory bank conflicts, it is very important to make the leading
dimensions of the arrays odd numbers (or, if that is not possible, make them an odd multiple of 2).
Implementation-dependent Items
The Cray Standard FFT routines were designed so that they could be implemented efficiently on many
different architectures. The calling sequence is the same in any implementation. Certain details, however,
depend on the particular implementation. These details are confined to three areas:
• The first area is the size of the table and work arrays. Different systems may need different sizes. You
do not have to change the subroutine call, but you might have to change the array sizes in the
DIMENSION or type statements that declare the arrays.
• The second area is the isys parameter array, an argument that gives certain implementation-specific
information. All features and functions of the FFT routines that are specific to any particular
implementation are confined to this isys array. On any implementation, you can use the default values by
specifying an argument value of 0.

300 004– 2081– 002


SCFFT2D ( 3S ) SCFFT2D ( 3S )

In the UNICOS implementation, no special options are supported; therefore, you can always specify an
isys argument as constant 0. Other options may be provided in subsequent software releases.
• The third area is the issue of which problem sizes or dimensions give optimal performance in a particular
implementation. See the Performance Tips subsection.

EXAMPLES
The following examples are for UNICOS systems only.
Example 1: Initialize the TABLE array in preparation for doing a two-dimensional FFT of size 128 by 256.
In this case, only the isign, n1, n2, and table arguments are used; you can use dummy arguments or zeros for
other arguments.
REAL TAB LE( 100 + 2*(128 + 256 ))
CAL L SCFFT2 D (0, 128, 256, 0.0 , DUM MY, 1, DUM MY, 1,
& TAB LE, DUM MY, 0)

Example 2: X is a real array of size (0:128, 0: 255), and Y is a complex array of dimension (0:64, 0:255).
The first 128 elements of each column of X contain data; for performance reasons, the extra element forces
the leading dimension to be an odd number. Take the two-dimensional FFT of X and store it in Y. Initialize
the TABLE array, as in example 1.
REA L X(0 :12 8, 0:255)
COMPLE X Y(0:64 , 0:2 55)
REAL TABLE( 100 + 2*( 128 + 256 ))
REA L WOR K(512*256 )
...
CALL SCF FT2D(0 , 128, 256 , 1.0, X, 129, Y, 65, TAB LE, WOR K, 0)
CAL L SCFFT2 D(1 , 128 , 256, 1.0, X, 129, Y, 65, TAB LE, WOR K, 0)

Example 3: With X and Y as in example 2, take the inverse FFT of Y and store it back in X. The scale
factor 1/(128*256) is used. Assume that the TABLE array is initialized already.
CAL L CSF FT2 D(-1, 128 , 256, 1.0/(1 28. 0*2 56. 0), Y, 65,
& X, 130, TAB LE, WORK, 0)

Example 4: Do the same computation as in example 2, but assume that the lower bound of each array is 1,
rather than 0. No change is needed in the subroutine calls.
REAL X(1 29, 256 )
COMPLE X Y(65, 256 )
...
CALL SCF FT2D(0 , 128, 256 , 1.0, X, 129 , Y, 65, TAB LE, WOR K, 0)
CALL SCF FT2D(1, 128, 256 , 1.0 , X, 129 , Y, 65, TABLE, WOR K, 0)

004– 2081– 002 301


SCFFT2D ( 3S ) SCFFT2D ( 3S )

Example 5: Do the same computation as in example 4, but equivalence the input and output arrays to save
storage space. In this case, a row must be added to X, because it is equivalenced to a complex array.
Assume that TABLE is already initialized.
REA L X(1 30, 256 )
COM PLE X Y(6 5, 256 )
EQUIVA LENCE ( X(1 , 1), Y(1 , 1) )
...
CALL SCF FT2D(1, 128, 256, 1.0, X, 130, Y, 65, TAB LE, WOR K, 0)

SEE ALSO
CCFFT(3S), CCFFT2D(3S), CCFFT3D(3S), CCFFTM(3S), SCFFT(3S), SCFFT3D(3S), SCFFTM(3S)

302 004– 2081– 002


SCFFT3D ( 3S ) SCFFT3D ( 3S )

NAME
SCFFT3D, CSFFT3D – Applies a multitasked three-dimensional real-to-complex Fast Fourier Transform
(FFT)

SYNOPSIS
CALL SCFFT3D (isign, n1, n2, n3, scale, x, ldx, ldx2, y, ldy, ldy2, table, work, isys)
CALL CSFFT3D (isign, n1, n2, n3, scale, x, ldx, ldx2, y, ldy, ldy2, table, work, isys)

IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, these subroutines execute on a single processor and use only private data.

DESCRIPTION
SCFFT3D computes the three-dimensional Fast Fourier Transform (FFT) of the real matrix X, and it stores
the results in the complex matrix Y. CSFFT3D computes the corresponding inverse transform.
In FFT applications, it is customary to use zero-based subscripts; the formulas are simpler that way. First,
the function of SCFFT3D is described. Suppose the arrays are dimensioned as follows:
REA L X(0 :ldx-1 , 0:ldx2 -1, 0:n 3-1)
COM PLEX Y(0 :ldy-1 , 0:ldy2 -1, 0:n 3-1 )

SCFFT3D computes the formula:


n1−1 n2−1 n3−1 .k .k .k
Σ Σ Σ
j . ω2 j . ω3 j
Yk , k , k = scale
1 2 3
X j , j , j . ω1
1 2 3
1 1 2 2 3 3

j =0 j =0 j =0
1 2 3

k 1 = 0, . . ., n1 ⁄ 2
for k 2 = 0, . . ., n2−1
k 3 = 0, . . ., n3−1
where:
isign . 2 . π . i
ω1 = e n1
i = +√−1
isign . 2 . π . i
ω2 = e n2
π = 3.14159. . .
isign . 2 . π . i
ω3 = e n3
isign = ±1
Different authors use different conventions for which of the transforms, isign = +1 or isign = – 1, is the
forward or inverse transform, and what the scale factor should be in either case. You can make these
routines compute any of the various possible definitions, however, by choosing the appropriate values for
isign and scale.

004– 2081– 002 303


SCFFT3D ( 3S ) SCFFT3D ( 3S )

The relevant fact from FFT theory is this: If you take the FFT with any particular values of isign and scale,
the mathematical inverse function is computed by taking the FFT with – isign and 1 / (n1 . n2 . n3 . scale).
In particular, if you use isign = +1 and scale = 1.0 for the forward FFT, you can compute the inverse FFT
by isign = – 1 and
1
scale =
n1 . n2 . n3
SCFFT3D is very similar in function to CCFFT3D(3S), but it takes the real-to-complex transform in the first
dimension, followed by the complex-to-complex transform in the second and third dimensions.
CSFFT3D does the reverse. It takes the complex-to-complex FFT in the third and second dimensions,
followed by the complex-to-real FFT in the first dimension.
See the SCFFTM(3S) man page for more information about real-to-complex and complex-to-real FFTs. The
three dimensional analog of the conjugate formula is as follows:
Yk , k , k = Y n1 – k , n2 – k , n3 – k
1 2 3 1 2 3

for n1 / 2 < k 1 ≤ n1 - 1
0 ≤ k 2 ≤ n2 - 1
0 ≤ k 3 ≤ n3 - 1
where the notation z represents the complex conjugate of z.
Thus, you have to compute only (slightly more than) half out the output values, namely:
Yk , k , k
1 2 3

for 0 ≤ k 1 ≤ n1 / 2
0 ≤ k 2 ≤ n2 - 1
0 ≤ k 3 ≤ n3 - 1
UNICOS/mk systems only
If the values of either n1, n2, or n3 are prime or not factorizable into powers of 2, 3 and 5 significant
improvements in computational time can be obtained by using the following initializations of isys, which is a
vector of length 4.
The first element of isys indicates the dimension of the problem, that is, isys(1) = 3. The next three elements
of isys indicate if the lengths n1, n2 and n3 are factorizable into powers of 2, 3 and 5. isys(2) is set to 0 if
n1 is factorizable into powers of 2, 3 and 5 and is set to 1 otherwise. Similarly, isys(3) and isys(4) are set to
zero if n2 and n3 are factorizable into powers of 2, 3 and 5 and set to 1 if they are not.
For example, if n1 = 256, n2 = 240, and n3 = 254, then the best computational time is obtained by setting
the following:
isys(1) = 3 (dimension of the problem)
isys(2) = 0
isys(3) = 0
isys(4) = 1

304 004– 2081– 002


SCFFT3D ( 3S ) SCFFT3D ( 3S )

If the numbers n1, n2, and n3 are not known ahead of time, then isys(2), isys(3), and isys(4) could be
initialized to 0 and the routine would compute correct result, albeit slowly, if either n1, n2, or n3 were not
factorizable into powers of 2, 3, and 5.
The storage requirements for the vector table depend on the values of the isys vector.
UNICOS systems
The isys parameter is used to choose between two multitasking strategies and correspondingly different
amounts of workspace to be provided. A brief discussion about the significance of the isys parameter is
provided in the following argument list.
These routines have the following arguments:
isign Integer. (input)
Specifies whether to initialize the table array or to do the forward or inverse Fourier transform,
as follows:
If isign = 0, the routine initializes the table array and returns. In this case, the only arguments
used or checked are isign, n1, n2, n3, and table.
If isign = +1 or – 1, the value of isign is the sign of the exponent used in the FFT formula.
n1 Integer. (input)
Transform size in the first dimension. If n1 is not positive, SCFFT3D returns without
computing a transform.
n2 Integer. (input)
Transform size in the second dimension. If n2 is not positive, SCFFT3D returns without
computing a transform.
n3 Integer. (input)
Transform size in the third dimension. If n3 is not positive, SCFFT3D returns without
computing a transform.
scale Real. (input)
Scale factor. Each element of the output array is multiplied by scale after taking the Fourier
transform, as defined previously.
x SCFFT3D: Real array of dimension (0:ldx– 1, 0:ldx2– 1, 0:n3– 1). (input)
CSFFT3D: Complex array of dimension (0:ldx– 1, 0:ldx2– 1, 0:n3– 1). (input)
Array of values to be transformed.
ldx Integer. (input)
The first dimension of x, as it was declared in the calling program (the leading dimension of x).
SCFFT3D: ldx ≥ MAX(n1, 1).
CSFFT3D: ldx ≥ MAX(n1/2 + 1, 1).
ldx2 Integer. (input)
The second dimension of x, as it was declared in the calling program. ldx2 ≥ MAX(n2, 1).

004– 2081– 002 305


SCFFT3D ( 3S ) SCFFT3D ( 3S )

y SCFFT3D: Complex array of dimension (0:ldy– 1, 0:ldy2– 1, 0:n3– 1). (output)


CSFFT3D: Real array of dimension (0:ldy– 1, 0:ldy2– 1, 0:n3– 1). (output)
Output array of transformed values. The output array can be the same as the input array, in
which case, the transform is done in place; that is, the input array is overwritten with the
transformed values. In this case, it is necessary that the following equalities hold:
SCFFT3D: ldx = 2 . ldy, and ldx2 = ldy2.
CSFFT3D: ldy = 2 . ldx, and ldx2 = ldy2.
ldy Integer. (input)
The first dimension of y, as it was declared in the calling program; that is, the leading dimension
of y.
SCFFT3D: ldy ≥ MAX(n1/2 + 1, 1).
CSFFT3D: ldy ≥ MAX(n1 + 2, 1).
In the complex-to-real routine, two extra elements are in the first dimension (that is, ldy ≥ n1 +
2, rather than just ldy ≥ n1). These elements are needed for intermediate storage during the
computation. On exit, their value is undefined.
ldy2 Integer. (input)
The second dimension of y, as it was declared in the calling program. ldy2 ≥ MAX(n2, 1).
table UNICOS systems: real array of dimension 100 + 2(n1 + n2 + n3). (input or output)
UNICOS/mk systems: Real array of dimension 2(n1 + n2 + n3) if isys(2), isys(3), and isys(4) are
all equal to zero. Private real vector of length 12(n1 + n2 + n3) if either isys(2), isys(3), or
isys(4) is equal to 1. (input or output)
Table of factors and trigonometric functions.
This array must be initialized by a call to SCFFT3D or CSFFT3D with isign = 0.
If isign = 0, table is initialized to contain trigonometric tables needed to compute a three-
dimensional FFT of size n1 by n2 by n3. If isign = +1 or – 1, the values in table are assumed to
be initialized already by a prior call with isign = 0.
work UNICOS systems: real array of dimension 512 (MAX(n1, n2, n3)) when isys = 0 and of
dimension 4 . NCPUS . MAX(n1 . n2, n2 . n3, n3 . n1) when isys = 1. (scratch output)
When isys = 0, the parallel performance of this routine may be bad for small problem sizes.
Setting isys to 1 and providing the additional memory will significantly enhance parallel
performance. The NCPUS parameter is the environment variable that is set to indicate the
maximum number of CPUs that can be used in the application. The default value is 4.
UNICOS/mk systems: Real array of dimension (n1 + 2) . n2 . n3 (scratch output)
Work array. This is a scratch array used for intermediate calculations. Its address space must
be different from that of the input and output arrays.
isys UNICOS systems: Integer array of dimension (1). (input)

306 004– 2081– 002


SCFFT3D ( 3S ) SCFFT3D ( 3S )

isys = 0 or 1 depending on the amount of workspace the user can provide to the routine.
UNICOS/mk systems: Integer array of dimension 4. (input)
isys(1) = 3
isys(2) = 0 (if n1 is factorizable into powers of 2, 3 and 5)
1 ( if n1 is not factorizable into powers of 2, 3 and 5)
isys(3) = 0 (if n2 is factorizable into powers of 2, 3 and 5)
1 ( if n2 is not factorizable into powers of 2, 3 and 5)
isys(4) = 0 (if n3 is factorizable into powers of 2, 3 and 5)
1 ( if n3 is not factorizable into powers of 2, 3 and 5)

NOTES
The following notes are for UNICOS systems only. SCFFT3D and CSFFT3D on UNICOS/mk systems
provide the functionality of PSCFFT3D and PCSFFT3D on a single PE. For notes on SC and CSFFT3D on
UNICOS/mk systems, see PSCFFT3D(3S).
SCFFT3D is the generalization of SCFFT2D(3S) to three dimensions. All the notes for SCFFT2D(3S)
apply, with the obvious modifications for three dimensions.
Algorithm
SCFFT3D uses a routine similar to SCFFTM(3S) to do multiple FFTs first on all columns of the input
matrix, then uses a routine similar to CCFFTM(3S) on all rows of the result, and then on all planes of that
result. See SCFFTM(3S) and CCFFTM(3S) for more information about the algorithms used.

EXAMPLES

The following examples are for UNICOS systems only. In all the examples shown below isys is set to 0.
For better performance on small size 3D FFTs, setting isys = 1 and providing adequate workspace would
yield better performance.
Example 1: Initialize the TABLE array in preparation for doing a three-dimensional FFT of size 128 by 128
by 128. In this case only the isign, n1, n2, n3, and table arguments are used; you can use dummy arguments
or zeros for other arguments.
REA L TAB LE(100 + 2*( 128 + 128 + 128 ))
CAL L SCF FT3 D (0, 128 , 128 , 128 , 0.0, DUM MY, 1, 1, DUM MY, 1, 1,
& TABLE, DUM MY, 0)

Example 2: X is a real array of size (0:128, 0:128, 0:128). The first 128 elements of each dimension
contain data; for performance reasons, the extra element forces the leading dimensions to be odd numbers. Y
is a complex array of dimension (0:64, 0:128, 0:128). Take the three-dimensional FFT of X and store it in
Y. Initialize the TABLE array, as in example 1.

004– 2081– 002 307


SCFFT3D ( 3S ) SCFFT3D ( 3S )

REAL X(0 :128, 0:1 28, 0:1 28)


COMPLE X Y(0 :64, 0:1 28, 0:1 28)
REA L TAB LE(100 + 2*( 128 + 128 + 128 ))
REAL WOR K(512* 128 )
...
CAL L SCF FT3 D(0, 128, 128 , 128 , 1.0 , X, 129, 129,
& Y, 65, 129 , TAB LE, WOR K, 0)
CALL SCFFT3 D(1 , 128 , 128 , 128 , 1.0 , X, 129 , 129 ,
& Y, 65, 129 , TAB LE, WORK, 0)

Example 3: With X and Y as in example 2, take the inverse FFT of Y and store it back in X. The scale
factor 1/(128**3) is used. Assume that the TABLE array is initialized already.
CAL L CSF FT3 D(-1, 128, 128, 128 , 1.0/12 8.0 **3 , Y, 65, 129 ,
& X, 130 , 129 , TAB LE, WOR K, 0)

Example 4: Do the same computation as in example 2, but assume that the lower bound of each array is 1,
rather than 0. No change is made in the subroutine calls.
REAL X(1 29, 129 , 129 )
COM PLEX Y(6 5, 129, 129)
REA L TAB LE(100 + 2*( 128 + 128 + 128 ))
REAL WOR K(512* 128 )
...
CAL L SCF FT3 D(0, 128, 128 , 128 , 1.0 , X, 129, 129,
& Y, 65, 129 , TAB LE, WOR K, 0)
CALL SCFFT3 D(1 , 128 , 128 , 128 , 1.0 , X, 129 , 129 ,
& X, 129, 129 , TABLE, WOR K, 0)

Example 5: Do the same computation as in example 4, but equivalence the input and output arrays to save
storage space. Assume that the TABLE array is initialized already.
REAL X(1 30, 129 , 129 )
COMPLE X Y(65, 129 , 129 )
EQU IVALEN CE (X( 1, 1, 1), Y(1 , 1, 1))
...
CALL SCF FT3D(1, 128, 128, 128, 1.0, X, 130 , 129 ,
& Y, 65, 129 , TAB LE, WOR K, 0)

SEE ALSO
CCFFT(3S), CCFFT2D(3S), CCFFT3D(3S), CCFFTM(3S), SCFFT(3S), SCFFT2D(3S), SCFFTM(3S)

308 004– 2081– 002


SCFFTM ( 3S ) SCFFTM ( 3S )

NAME
SCFFTM, CSFFTM – Applies multiple real-to-complex or complex-to-real Fast Fourier Transforms (FFTs)

SYNOPSIS
CALL SCFFTM (isign, n, lot, scale, x, ldx, y, ldy, table, work, isys)
CALL CSFFTM (isign, n, lot, scale, x, ldx, y, ldy, table, work, isys)

IMPLEMENTATION
UNICOS and UNICOS/mk systems
On UNICOS/mk systems, this subroutine executes on a single processor and uses only private data

DESCRIPTION
SCFFTM computes the FFT of each column of the real matrix X, and it stores the results in the
corresponding column of the complex matrix Y. CSFFTM computes the corresponding inverse transforms.
In FFT applications, it is customary to use zero-based subscripts; the formulas are simpler that way. First,
the function of SCFFTM is described. Suppose that the arrays are dimensioned as follows:
REA L X(0 :ldx-1 , 0:lot-1)
COM PLEX Y(0 :ldy-1 , 0:lot- 1)

where ldx ≥ n, ldy ≥ n / 2 + 1.


Then column L of the output array is the FFT of column L of the input array, using the following formula
for the FFT:
n −1
n
Σ
. j .k
Yk ,L = scale X j . ωisign for k = 0, . . ., L = 0, . . ., lot −1
j =0 2
where 2 . π . i
ω=e n
i = +√−1
π = 3.14159. . . isign = ±1
lot = the number of rows to transform
Different authors use different conventions for which transform (isign = +1 or isign = – 1) is used in the
real-to-complex case, and what the scale factor should be. Some adopt the convention that isign = 1 for the
real-to-complex transform, and isign = – 1 for the complex-to-real inverse. Others use the opposite
convention. You can make these routines compute any of the various possible definitions, however, by
choosing the appropriate values for isign and scale.
The relevant fact from FFT theory is this: If you use SCFFTM to take the real-to-complex FFT, using any
particular values of isign and scale, the mathematical inverse function is computed by using CSFFTM with
– isign and 1 / (n . scale). In particular, if you call SCFFTM with isign = +1 and scale = 1.0, you can use
CSFFTM to compute the inverse complex-to-real FFT by using isign = – 1 and scale = 1.0 /n.

004– 2081– 002 309


SCFFTM ( 3S ) SCFFTM ( 3S )

These routines have the following arguments:


isign Integer. (input)
Specifies whether to initialize the table array or do the forward or inverse Fourier transform, as
follows:
If isign = 0, the routine initializes table and returns. In this case, the only arguments used or
checked are isign, n, and table.
If isign = +1 or – 1, the value of isign is the sign of the exponent used in the FFT formula.
n Integer. (input)
Size of the transforms (the number of elements in each column of the input and output matrix to
be transformed). If n is not positive, SCFFTM or CSFFTM returns without computing a
transforms.
lot Integer. (input)
The number of transforms to be computed (or "lot size"). This is the number of elements in
each row of the input and output matrix. If lot is not positive, CSFFTM or SCFFTM returns
without computing a transforms.
scale Real. (input)
Scale factor. Each element of the output array is multiplied by scale after taking the transform,
as defined in the preceding formula.
x SCFFTM: Real array of dimension (0:ldx– 1, 0:lot– 1). (input)
CSFFTM: Complex array of dimension (0:ldx– 1, 0:lot– 1). (input)
Input array of values to be transformed.
ldx Integer. (input)
The number of rows in the x array, as it was declared in the calling program (the leading
dimension of x).
SCFFTM: ldx ≥ MAX(n, 1).
CSFFTM: ldx ≥ MAX(n/2 + 1, 1).
y SCFFTM: Complex array of dimension (0:ldy– 1, 0:lot– 1). (output)
CSFFTM: Real array of dimension (0:ldy– 1, 0:lot– 1). (output)
Output array of transformed values.
Each column of the output array, y, is the FFT of the corresponding column of the input array, x,
computed according to the preceding formula. The output array may be equivalenced to the
input array. In that case, the transform is done in place and the input array is overwritten with
the transformed values. In this case, the following conditions on the leading dimensions must
hold:
SCFFTM: ldx = 2ldy.
CSFFTM: ldy = 2ldx.

310 004– 2081– 002


SCFFTM ( 3S ) SCFFTM ( 3S )

ldy Integer. (input)


Number of rows in the y array, as declared in the calling program (the leading dimension of y).
SCFFTM: ldy ≥ MAX(n / 2 + 1, 1).
CSFFTM: ldy ≥ MAX(n, 1).
table UNICOS systems: Real array of dimension 100 + 2n. (input or output)
UNICOS/mk systems: Real array of length 2n.
Table of factors and trigonometric functions.
This array must be initialized by a call to SCFFTM (or CSFFTM) with isign = 0.
If isign = 0, table is initialized to contain trigonometric tables needed to compute an FFT of
length n.
If isign = +1 or – 1, the values in table are assumed to be initialized already by a prior call with
isign = 0.
work UNICOS systems: Real array of dimension (2n + 4)lot. (scratch output)
UNICOS/mk systems: Real array of length 2n.
Work array used for intermediate calculations. Its address space must be different from that of
the input and output arrays.
isys Integer array of dimension (0:isys(0)). (input and output)
That is, the first element of the array specifies how many more elements are in the array. Use
isys to specify certain processor-specific parameters or options.
If isys(0) = 0, the default values of such parameters are used. In this case, you can specify the
argument value as the scalar integer constant 0. If isys(0) > 0, isys(0) gives the upper bound of
the isys array; that is, if il = isys(0), user-specified parameters are expected in isys(1) through
isys(il).
Real-to-complex FFTs
Notice in the preceding formula that there are n real input values and (n/2) + 1 complex output values for
each column. This property is characteristic of real-to-complex FFTs.
The mathematical definition of the Fourier transform takes a sequence of n complex values and transforms it
to another sequence of n complex values. A complex-to-complex FFT routine, such as CCFFTM, will take n
complex input values and produce n complex output values. In fact, one easy way to compute a real-to-
complex FFT is to store the input data x in a complex array, then call routine CCFFTM to compute the FFT.
You get the same answer when using the SCFFTM routine.
A separate real-to-complex FFT routine is more efficient than the equivalent complex-to-complex routine.
Because the input data is real, you can make use of this fact to save almost half of the computational work.
According to the theory of Fourier transforms, for real input data, you have to compute only the first n/2 + 1
complex output values in each column, because the second half of the FFT values in each column can be
computed from the first half of the values by the simple formula:
n
Y k,L = Y n– k, L for ≤ k ≤ n– 1
2
where the notation z represents the complex conjugate of z.

004– 2081– 002 311


SCFFTM ( 3S ) SCFFTM ( 3S )

In fact, in many applications, the second half of the complex output data is never explicitly computed or
stored. Likewise, you must supply only the first half of the complex data in each column has to be supplied
for the complex-to-real FFT.
Another implication of FFT theory is that for real input data, the first output value in each column, Y(0, L),
will always be a real number; therefore, the imaginary part will always be 0. If n is an even number, Y(n / 2,
L) will also be real and have 0 imaginary parts.
Complex-to-real FFTs
Consider the complex-to-real case. The effect of the computation is given by the preceding formula, but
with X complex and Y real.
In general, the FFT transforms a complex sequence into a complex sequence; however, in a certain
application you may know the output sequence is real, perhaps because the complex input sequence was the
transform of a real sequence. In this case, you can save about half of the computational work.
According to the theory of Fourier transforms, for the output sequence, Y, to be a real sequence, the
following identity on the input sequence, X, must be true:
n
X k,L = X n– k,L for ≤ k ≤ n– 1
2
n
And, in fact, the input values X k,L for k > do not have to be supplied, because they can be inferred from
2
the first half of the input.
Thus, in the complex-to-real routine, CSFFTM, the arrays can be dimensioned as follows:
COM PLE X X(0 :ld x-1, 0:l ot-1)
REA L Y(0 :ldy-1, 0:lot-1)

where ldx ≥ n/2 + 1, ldy ≥ n.


In each column, there are (n / 2) + 1 complex input values and n real output values. Even though only
(n/2) + 1 input values are supplied, the size of the transform is still n in this case, because implicitly the FFT
formula for a sequence of length n is used.
Another implication of the theory is that X(0, L) must be a real number (that is, must have zero imaginary
part). If n is an even number, X(n / 2, L) must also be real. Routine CSFFTM assumes that each of these
values is real; if a nonzero imaginary part is given, it is ignored.

NOTES

Table Initialization
The table array contains the trigonometric tables used in calculation of the FFT. You must initialize this
table by calling the routine with isign = 0 prior to doing the transforms. table does not have to be
reinitialized if the value of the problem size, n, does not change.

312 004– 2081– 002


SCFFTM ( 3S ) SCFFTM ( 3S )

Dimensions
In the preceding description, it is assumed that array subscripts were zero-based, as is customary in FFT
applications. Thus, the input and output arrays are declared (for SCFFTM):
REA L X(0:ld x-1, 0:l ot- 1)
COMPLE X Y(0:ld y-1, 0:l ot- 1)

No change is made in the calling sequence, however, if you prefer to use the more customary Fortran style
with subscripts starting at 1. The same values of ldx and ldy would be passed to the subroutine even if the
input and output arrays were dimensioned as follows:
REA L X(l dx, lot )
COM PLE X Y(l dy, lot )

Performance Tips
This routine computes an FFT for any value of n, but the performance for a given value of n depends on the
prime factorization of n. This fact is characteristic of all FFT algorithms.
Fastest performance is realized when n is a power of 2; in which case, the number of floating-point
operations is approximately:
5 .
lot . n . log 2 (n)
2
If n contains factors of 3, computation time is slightly longer, because more floating-point operations are
required. It is longer still if n contains powers of 5. Slowest performance is when n is a prime number, in
2
which case, the number of floating-point operations is approximately 4 . lot . n .
The kernel routines are optimized for values of n that are products of powers of 2, 3, and 5. (Because the
kernel routines have a special case for multiples of 4, even powers of 2 will be slightly faster than odd
powers of 2.)
In the UNICOS implementation, to avoid memory bank conflicts, it is very important to make the leading
dimensions of the arrays odd numbers (or, if that is not possible, make them an odd multiple of 2). To
attain best vectorization performance, the lot size should be at least 64, and preferably it should be a multiple
of 64.
Neither SCFFTM nor CSFFTM is optimized on UNICOS/mk systems.
Implementation-dependent Items
The Standard FFT routines were designed so that they could be implemented efficiently on many different
architectures. The calling sequence is the same in any implementation. Certain details, however, depend on
the particular implementation. These details are confined to three areas:
• The first area is the size of the table and work arrays. Different systems may need different sizes. No
change is required to the subroutine call, but you might have to change the array sizes in the
DIMENSION or type statements that declare the arrays.

004– 2081– 002 313


SCFFTM ( 3S ) SCFFTM ( 3S )

• The second area is the isys parameter array, an argument that gives certain implementation-specific
information. All features and functions of the FFT routines that are specific to any particular
implementation are confined to this isys array. On any implementation, you can use the default values by
specifying an argument value of 0.
No special options are supported; therefore, you can always specify an isys argument as constant 0. Other
options may be provided in subsequent software releases.
• The third area is the issue of which problem sizes or dimensions give optimal performance in a particular
implementation. See the Performance Tips subsection.

EXAMPLES

Example 1: Initialize the complex array TABLE in preparation for doing an FFT of size 128. In this case
only the isign, n, and table arguments are used; you may use dummy arguments or zeros for the other
arguments in the subroutine call.
REA L TABLE( 100 + 2*128)
CAL L SCFFTM (0, 128 , 1, 0.0 , DUM MY, 1, DUM MY, 1,
& TABLE, DUM MY, 0)

Example 2: X is a real array of dimension (0:128, 0:55), and Y is a complex array of dimension (0:64,
0:55). The first 128 elements in each column of X contain data; the extra element forces an odd leading
dimension. Take the FFT of the first 50 columns of X and store the results in the first 50 columns of Y.
Before taking the FFT, initialize the TABLE array, as in example 1.
REAL X(0:12 8, 0:5 5)
COM PLEX Y(0 :64, 0:5 5)
REA L TAB LE( 100 + 2*128)
REAL WORK(( 2*128 + 4)*50)
...
CALL SCFFTM (0, 128 , 50, 1.0 , X, 129 , Y, 65, TABLE, WOR K, 0)
CAL L SCFFTM (1, 128 , 50, 1.0, X, 129 , Y, 65, TAB LE, WOR K, 0)

Example 3: With X and Y as in example 2, take the inverse FFT of Y and store it back in X. The scale
factor 1/128 is used. Assume that the TABLE array is initialized already.
CAL L CSFFTM (-1 , 128 , 50, 1.0 /128.0 , Y, 65, X, 129 ,
& TABLE, WOR K, 0)

Example 4: Do the same computation as in example 2, but assume that the lower bound of each array is 1,
rather than 0. No change is made in the subroutine calls.

314 004– 2081– 002


SCFFTM ( 3S ) SCFFTM ( 3S )

REA L X(129, 56)


COM PLE X Y(65, 56)
...
CAL L SCFFTM (0, 128, 50, 1.0 , X, 129 , Y, 65, TAB LE, WOR K, 0)
CAL L SCFFTM (1, 128, 50, 1.0 , X, 129 , Y, 65, TAB LE, WOR K, 0)

Example 5: Do the same computation as in example 4, but equivalence the input and output arrays to save
storage space. In this case, a row must be added to X, because it is equivalenced to a complex array. The
leading dimension of X is two times an odd number; therefore, memory bank conflicts are minimal. Assume
that TABLE is initialized already.
REA L X(1 30, 56)
COMPLE X Y(6 5, 56)
EQU IVALEN CE ( X(1 , 1), Y(1 , 1) )
...
CAL L SCF FTM(1, 128 , 50, 1.0 , X, 130 , Y, 65, TAB LE, WORK, 0)

SEE ALSO
CCFFT(3S), CCFFTM(3S), SCFFT(3S)

004– 2081– 002 315


SCNVL1D ( 3S ) SCNVL1D ( 3S )

NAME
SCNVL1D – Computes a real one-dimensional (1D) convolution of two vectors

SYNOPSIS
CALL SCNVL1D (domain, isign, shape, symm, a, m, b, n, c, inc, table, work)

IMPLEMENTATION
UNICOS/mk systems
This routine executes on a single PE and uses only private data.

DESCRIPTION
SCNVL1D computes the convolution of a real filter vector a with a real data vector b, producing the output
vector c.
Let
a = a(1), a(2), . . . , a(m)
b = b(1), b(2), . . . , b(n)
be the filter and data vectors. This routine requires that m ≤ n. If m is greater than n, then a and b must be
interchanged in the calling sequence.
The convolution operation can be defined either with or without a zero-padded data vector. If we assume a
zero-padded data vector, then the convolution output sequence would be the following:

c(i) = Sum a(j) * b(i– j+1) for i = 1, m– 1


1≤j≤i
c(i) = Sum a(j) * b(i– j+1) for i = m, n
1≤j≤m
c(i) = Sum a(j) * b(i– j+1) for i = n+1, n+m– 1
i+1– n≤j≤m
If we assume a non zero padded data vector then the convolution sequence would be:

c(i– m+1) = Sum a(j) * b(i– j+1) for i = m, n


1≤j≤m
The routine allows the user to choose between the two forms of the output product: with zero-padded data
or without zero-padded data. For computation of the convolution product with a zero-padded data vector,
the character variable shape is set to ’F’ or ’f’ to indicate the full length output vector (n+m– 1). For
computation of the convolution product without a zero-padded data vector, the character variable shape, is
set to ’V’ or ’v’ to indicate an output vector computed with valid data.

316 004– 2081– 002


SCNVL1D ( 3S ) SCNVL1D ( 3S )

The routine allows the user to choose between computing the convolution product in either the time or
frequency domain. This can be done by setting the character variable domain to either ’F’ or ’f’ to indicate
frequency domain or ’T’ or ’t’ to indicate time domain computation.
If the user chooses to compute the convolution product in the frequency domain, then the variable isign is
used to decide if the trig tables are being initialized or the convolution product is being computed. The
SCNVL1D routine would have to be called twice (like the FFT routines), once to initialize the trig tables and
the second time to actually compute the convolution product.
If the user chooses to compute the convolution in the time domain, then the variable isign is ignored and the
convolution product is computed in the first and only pass. In addition, the two other vectors, table and
work, are also ignored.
SCNVL1D also provides a feature of immense importance in several signal processing applications. It allows
the user to set a decimation rate for the output vector. This is done using the argument inc. If inc is set to
1, then all the output elements are computed, i.e., (n+m– 1) for zero-padded convolution and (n– m+1)
elements for non zero-padded convolution. Setting inc to a positive integer greater than 1 makes the routine
compute every (inc)ˆth element of the convolution output. For example, setting inc = 2 means that the
routine computes every other output element.
For example, if m = 10, n = 200 and shape = ’V’, then setting inc = 1 would result in 200– 10+1 = 191
elements of the output vector being computed and stored in c(1), . . . , c(191). If inc is set to 2, then
trunc((n– m+inc)/inc) = 96 elements of the output vector being computed and stored in c(1), . . . , c(96).
Here, trunc is used to indicate the truncation operation.
Another feature of SCNVL1D that is of importance in certain signal processing applications such as FIR
filters is that the filter vector may be symmetric or unsymmetric. If the filter vector is symmetric, the routine
saves some computation by adding the elements of b that contribute equally to the convolution sum. The
filter vector a can be either even or odd symmetric depending on whether m is even or odd. The character
argument symm can be set to ’S’ or ’s’ to indicate symmetry and ’N’ or ’n’ to indicate that a is
non-symmetric.
This routine has the following arguments:
domain Character*1. (input)
Specifies whether the computation should proceed in the frequency domain or the time domain.
domain = ’F’ or ’f’ for frequency domain computation
domain = ’T’ or ’t’ for time domain computation
isign Integer. (input)
Specifies if the routine should initialize the trig tables or proceed with the computation of the
convolution product in the frequency domain. If the computation is to be performed in the time
domain then isign is ignored.
isign = 0 to set the trig tables
isign = 1 to compute the convolution product
shape Character*1. (input)
Specifies if the computation should proceed with or without a zero padded data vector.

004– 2081– 002 317


SCNVL1D ( 3S ) SCNVL1D ( 3S )

shape = ’F’ or ’f’ for a zero padded convolution product


shape = ’V’ or ’v’ for a convolution with valid data (no zero padding)
symm Character*1. (input)
Specifies if the filter vector is symmetric or nonsymmetric.
symm = ’S’ or ’s’ for filter vectors that are symmetric
symm = ’N’ or ’n’ for non-symmetric filter vectors
If the filter vector is symmetric the entire length of the vector would have to be stored.
a Real 1D array. (input)
An input array containing the filter vector.
m Integer. (input)
Size of the filter vector.
b Real 1D array. (input)
An input array containing the data vector.
n Integer. (input)
Size of the data vector. n ≥ m.
c Real 1D array. (output)
An output array containing the convolution product of a and b.
inc Integer. (input)
The decimation factor in the computation of elements of the convolution product of a and b. inc
should be a positive integer.
table Real array of size 4(m+n). (input or output)
Table of factors and trigonometric functions. This is used only when the convolution is to be
computed in the frequency domain. If the convolution is computed in the time domain, this
argument is ignored.
work Real array of size 8(m+n). (workspace)
Workspace used when the convolution is computed in the frequency domain. Ignored when
computation is carried out in the time domain.

NOTES
The flexibility of choosing between frequency domain and time domain computation of the convolution has
been provided so that the user may experiment between the two and choose the one that is faster for the
particular problem size at hand.
Usually when n is big and m is small, time domain computation is less expensive. If both m and n are large,
then the frequency domain computation is less expensive. For all problem sizes in between, the user is
encouraged to choose between the two modes depending on the numerical complexity of the two algorithms,
the amount of workspace available, and experimentation.

318 004– 2081– 002


SCNVL1D ( 3S ) SCNVL1D ( 3S )

EXAMPLES
If m = 5, n = 10, a = [1 2 3 4 5], and b = [1 2 3 4 5 6 7 8 9 10], then shape = ’F’ and inc = 1 would yield
c = [1 4 10 20 35 50 65 80 95 110 114 106 85 50].
If shape = ’F’ and inc = 3, then c = [1 20 65 110 85].
If shape = ’V’ and inc = 1, then c = [35 50 65 80 95 110].
If shape = ’V’ and inc = 2, then c = [35 65 95].

SEE ALSO
SCNVL2D(3S)

004– 2081– 002 319


SCNVL2D ( 3S ) SCNVL2D ( 3S )

NAME
SCNVL2D – Computes a real two-dimensional (2D) convolution of two matrices

SYNOPSIS
CALL SCNVL2D (domain, isign, shape, symm, A, lda, m1, m2, B, ldb, n1, n2, C, ldc,
inc1, inc2, table, work)

IMPLEMENTATION
UNICOS/mk systems
This routine executes on a single PE and uses only private data.

DESCRIPTION
SCNVL2D computes the convolution of a real filter matrix A with a real data matrix B, producing the output
matrix C.
Let the following be the filter and data matrices:
A = a(j,i) 1≤j≤m1, 1≤i≤m2
B = b(j,i) 1≤j≤n1, 1≤i≤n2
This routine requires that m1 ≤ n1 and m2 ≤ n2. If m1 > n1 and m2 > n2, then A and B must be
interchanged in the calling sequence. The following cases cannot be handled by this routine: m1 > n1 and
m2 ≤ n2, m1 ≤ n1 and m2 > n2.
The convolution operation can be defined either with or without a zero-padded data matrix. If we assume a
zero-padded data matrix, then the convolution output sequence would be the following:

c(j,i) = Sum Sum a(k,l) * b(j– k+1,i– l+1)


r1≤l≤r2 1≤k≤j
for j = 1, m1– 1
where:
r1 = 1, and r2 = i for i = 1, m2– 1
r1 = 1, and r2 = m2 for i = m2, n2
r1 = i– n2+1 and r2 = m2 for i = n2+1, n2+m2– 1

320 004– 2081– 002


SCNVL2D ( 3S ) SCNVL2D ( 3S )

c(j,i) = Sum Sum a(k,l) * b(j– k+1,i– l+1)


r1≤l≤r2 1≤k≤m1
for j = m1, n1
where:
r1 = 1, and r2 = i for i = 1, m2– 1
r1 = 1, and r2 = m2 for i = m2, n2
r1 = i– n2+1 and r2 = m2 for i = n2+1, n2+m2– 1
c(j,i) = Sum Sum a(k,l) * b(j– k+1,i– l+1)
r1≤l≤r2 1≤k≤m1
for j = n1+1, n1+m1– 1
where:
r1 = 1, and r2 = i for i = 1, m2– 1
r1 = 1, and r2 = m2 for i = m2, n2
r1 = i– n2+1 and r2 = m2 for i = n2+1, n2+m2– 1
If we assume a non zero-padded data matrix then the convolution sequence would be
c(j– m1+1,i– m2+1) = Sum Sum a(k,l) * b(j– k+1,i– l+1)
1≤l≤m2 1≤k≤m1
for j = m1, n1
i = m2, n2
The routine allows the user to choose between the two forms of the convolution product shown previously
with zero-padded data or without zero-padded data.
For computation of the convolution product with a zero padded data matrix, the character argument shape is
set to ’F’ or ’f’ to indicate the full shape output matrix (n1+m1– 1) * (n2+m2– 1). For computation of the
convolution product without a zero-padded data matrix, the character variable shape is set to ’V’ or ’v’ to
indicate an output matrix computed with valid data. The size of the output matrix would be (n1– m1+1) x
(n2– m2+1).
The user cannot specify a zero-padded convolution in one dimension and a non zero-padded convolution in
the other.
The routine also allows the user to choose between computing the convolution product in either the time or
frequency domain. This can be done by setting the character variable domain to either ’F’ or ’f’ to indicate
frequency domain or ’T’ or ’t’ to indicate time domain computation.
If the user chooses to compute the convolution product in the frequency domain, then the variable isign is
used to decide if the trig tables are being initialized or the convolution product is being computed. The
SCNVL2D routine would have to be called twice (like the FFT routines), once to initialize the trig tables and
the second time to actually compute the convolution product.

004– 2081– 002 321


SCNVL2D ( 3S ) SCNVL2D ( 3S )

If the user chooses to compute the convolution in the time domain, then the variable isign is ignored and the
convolution product is computed in the first and only pass. In addition, the two other vectors, table and
work, are also ignored.
SCNVL2D also provides a feature of immense importance in several signal processing applications. It allows
the user to set a decimation rate in both dimensions for the output matrix. This is done using the arguments
inc1 and inc2. If inc1 is set to 1, then all the output elements along the first dimension are computed, i.e.,
(n1+m1– 1) for zero-padded convolution and (n1– m1+1) elements for non zero-padded convolution. The
same is the case for inc2 = 1 along the second dimension. Setting inc1 and inc2 to a positive integer greater
than 1 makes the routine compute every (inc1)th and (inc2)th element of the convolution output in the two
dimensions. For example setting inc1 = 1 and inc2 = 2 means that the routine computes every other column
in the convolution product.
For example if m1 = 10, m2 = 12, n1 = 200, n2 = 255, and shape = ’V’, then setting inc1 = 1 and inc2 = 2
would result in (200– 10+1) *trunc((255– 12+2)/2) = 191*122 elements of the output matrix being computed
and stored in c(1,1), . . . , c(191,122). Here trunc() is used to indicate the truncation operation.
Another feature of SCNVL2D that is of importance in certain signal processing applications such as FIR
filters is that the filter matrix may be symmetric or unsymmetric. If the filter matrix is symmetric, the
routine saves some computation by adding the elements of B that contribute equally to the convolution sum.
The filter matrix A can be either even or odd symmetric depending on whether m1 and m2 are even or odd.
The character argument symm can be set to ’S’ or ’s’ to indicate symmetry and ’N’ or ’n’ to indicate that A
is non-symmetric.
One restriction in exploiting the symmetry feature is that both m1 and m2 must be either odd or even. The
user cannot have m1 even and m2 odd if symmetry is to be exploited. If symm = ’S’ is specified when m1 =
10 and m2 = 11, for example, them the routine ignores the symm flag and computes the convolution product
assuming no symmetry.
This routine has the following arguments:
domain Character*1. (input)
Specifies if the computation should proceed in the frequency domain or the time domain.
domain = ’F’ or ’f’ for frequency domain computation
domain = ’T’ or ’t’ for time domain computation
isign Integer. (input)
Specifies if the routine should initialize the trig tables or proceed with the computation of the
convolution product in the frequency domain. If the computation is to be performed in the time
domain, isign is ignored. isign = 0 to set the trig tables
isign = 1 to compute the convolution product
shape Character*1. (input)
Specifies if the computation should proceed with or without a zero-padded data vector.
shape = ’F’ or ’f’ for a zero-padded convolution product
shape = ’V’ or ’v’ for a convolution with valid data (no zero padding)

322 004– 2081– 002


SCNVL2D ( 3S ) SCNVL2D ( 3S )

symm Character*1. (input)


Specifies if the filter vector is symmetric or nonsymmetric.
symm = ’S’ or ’s’ for filter vectors that are symmetric
symm = ’N’ or ’n’ for non-symmetric filter vectors
If the filter vector is symmetric, the entire length of the vector would have to be stored.
A Real 2D array. (input)
An input array containing the filter matrix.
lda Integer. (input)
Leading dimension of the filter matrix.
m1 Integer. (input)
Number of rows in the filter matrix.
m2 Integer. (input)
Number of columns in the filter matrix.
B Real 2D array. (input)
Input array containing the data matrix.
ldb Integer. (input)
Leading dimension of the data matrix.
n1 Integer. (input)
Number of rows in the data matrix.
n2 Integer. (input)
Number of columns in the data matrix.
C Real 2D array. (output)
An output array containing the convolution product of A and B.
ldc Integer. (input)
Leading dimension of the output matrix.
inc1 Integer. (input)
Specifies the decimation factor along the rows in the computation of elements of the convolution
product of A and B.
inc1 should be a positive integer.
inc2 Integer. (input)
Specifies the decimation factor along the columns in the computation of elements of the
convolution product of A and B.
inc2 should be a positive integer.
table Real array of size 4(m1+m2+n1+m2). (input or output)
Table of factors and trigonometric functions.
This is used only when the convolution is to be computed in the frequency domain. If the
convolution is computed in the time domain, this argument is ignored.

004– 2081– 002 323


SCNVL2D ( 3S ) SCNVL2D ( 3S )

work Real array of size 3(m1+n1)(m2+n2). (workspace)


Workspace used when the convolution is computed in the frequency domain. Ignored when
computation is carried out in the time domain.

NOTES
The flexibility of choosing between frequency domain and time domain computation of the convolution has
been provided so that the user may experiment between the two and choose the one that is faster for the
particular problem size at hand.
Usually when n1 and n2 are big and m1 and m2 are small, time domain computation is less expensive. If
m1, m2, n1, and n2 and n are large, the frequency domain computation is less expensive. For all problem
sizes in between, the user is encouraged to choose between the two modes depending on the numerical
complexity of the two algorithms, the amount of workspace available, and experimentation.

EXAMPLES
If
A = 1 2 3
2 3 1
3 1 2

and
B = 1 2 3 4 5
5 1 2 3 4
4 5 1 2 3
3 4 5 1 2
2 3 4 5 1

shape = ’F’ and inc = 1 would yield


C =

1 4 10 16 22 22 15
7 18 32 29 41 36 17
17 37 48 51 54 40 23
26 40 60 48 51 28 17
20 43 57 60 48 31 11
13 27 44 41 38 12 5
6 11 19 25 16 11 2

If shape = ’F’ and inc1 = 2, inc2 = 1, then

324 004– 2081– 002


SCNVL2D ( 3S ) SCNVL2D ( 3S )

C =

1 4 10 16 22 22 15
17 37 48 51 54 40 23
20 43 57 60 48 31 11
6 11 19 25 16 11 2

If shape = ’V’ and inc1 = 2, inc2 = 1, then


C =

48 51 54
57 60 48

SEE ALSO
SCNVL1D(3S)

004– 2081– 002 325


SCONV ( 3S ) SCONV ( 3S )

NAME
SCONV – Performs the convolution of two sequences of real numbers

SYNOPSIS
CALL SCONV (nh, nx, ny, h, x, y)

IMPLEMENTATION
UNICOS/mk systems

DESCRIPTION
SCONV computes the convolution of the filter sequence h with the data sequence x, producing the output
sequence y.
Suppose h and x are two sequences of real numbers, having nh and nx elements, respectively. As is
customary in signal processing applications, let the subscripts start at 0, so
h = h(0), h(1), . . ., h(nh – 1)
x = x(0), x(1), . . ., x(nx – 1)
The "convolution product", y, is the sequence having elements defined by:
y(0) = h(nh– 1) . x(0) + h(nh– 2) . x(1) + . . . + h(0) . x(nh – 1)
y(1) = h(nh– 1) . x(1) + h(nh– 2) . x(2) + . . . + h(0) . x(nh)
y(2) = h(nh– 1) . x(2) + h(nh– 2) . x(3) + . . . + h(0) . x(nh + 1)
This example definition assumes nx > nh.
The precise definition of the convolution is:
Yk = Σ H (nh −1−j ) . x (k +j ) for 0 ≤ k ≤ny −1
0≤j ≤MIN

The number of terms in the output sequence is specified by an argument, ny. If ny < nx, the output sequence
is just truncated. If ny > nx, zeros are appended to the output sequence.
By choosing ny > nx – nh + 1, the routine does what is sometimes called "post-tapered" convolution. The
effect is as though the data sequence, x, were padded on the end with zeros, except that no zeros are actually
stored and no multiplications by zero are actually done.
This routine has the following arguments:
nh Integer. (input)
Specifies the number of elements in the filter sequence, h. nh ≥ 0.
nx Integer. (input)
Specifies the number of elements in the data sequence, x. nx ≥ 0.
ny Integer. (input)
Specifies the number of elements in the output sequence, y. ny ≥ 0.

326 004– 2081– 002


SCONV ( 3S ) SCONV ( 3S )

h Real array of dimension (0, nh−1). (input)


Specifies the input sequence of filter values.
x Real array of dimension (0, nx−1). (input)
Specifies the input sequence of data values.
y Real array of dimension (0, ny−1). (output)
Specifies the output matrix of convolutions.

NOTES
If ny = 0, the routine just returns. If either nh = 0 or nx = 0, the routine will zero the first ny elements in y
and return.

EXAMPLES

rea l h(0 :2) , x(0 :3), y(0 :7)


dat a (h( i), i = 0, 2) / 1.0, 2.0, 3.0 /
data (x( i), i = 0, 3) / 4.0 , 5.0 , 6.0, 7.0 /
cal l sco nv(3, 4, 8, h, x, y)
pri nt *, y

The output produced is:


28. , 34. , 32., 21., 0., 0., 0., 0.

SEE ALSO
SCORR(3S), SCORRS(3S)

004– 2081– 002 327


SCORR ( 3S ) SCORR ( 3S )

NAME
SCORR – Performs the correlation of two sequences of real numbers

SYNOPSIS
CALL SCORR (nh, nx, ny, h, x, y)

IMPLEMENTATION
UNICOS/mk systems

DESCRIPTION
SCORR computes the correlation of the filter sequence h with the data sequence x, producing the output
sequence y.
Suppose h and x are two sequences of real numbers, having nh and nx elements, respectively. As is
customary in signal processing applications, let the subscripts start at 0, so
h = h(0), h(1), . . ., h(nh – 1)
x = x(0), x(1), . . ., x(nx – 1)
The "correlation product", y, is the sequence having elements defined by:
y(0) = h(0) . x(0) + h(1) . x(1) + . . . + h(nh – 1) . x(nh – 1)
y(1) = h(0) . x(1) + h(1) . x(2) + . . . + h(nh – 1) . x(nh)
y(2) = h(0) . x(2) + h(1) . x(3) + . . . + h(nh – 1) . x(nh + 1)
This example definition assumes that nx ≥ nh.
The precise definition is as follows:
Yk = Σ H (nh −1−j ) . x (k +j ) for 0 ≤ k ≤ny −1
0≤j ≤min

The number of terms in the output sequence is specified by the argument ny. If ny < nx, the output sequence
is just truncated. If ny > nx, zeros are appended to the output sequence.
By choosing ny > nx – nh + 1, the routine does what is sometimes called "post-tapered" correlation. The
effect is as though the data sequence, x, were padded on the end with zeros, except that no zeros are actually
stored and no multiplications by zero are actually done. This routine has the following arguments:
nh Integer. (input)
Specifies the number of elements in the filter sequence, h. nh ≥ 0.
nx Integer. (input)
Specifies the number of elements in the sequence of data sequence, x. nx ≥ 0.
ny Integer. (input)
Specifies the number of elements in the output sequence, y. ny ≥ 0.
h Real array of dimension (0, nh−1). (input)
Specifies the input sequence of filter values.

328 004– 2081– 002


SCORR ( 3S ) SCORR ( 3S )

x Real array of dimension (0, nx−1). (input)


Specifies the input sequence of data values.
y Real array of dimension (0, ny−1). (output)
Specifies the output matrix of convolutions.

NOTES
If ny = 0, the routine returns. If either nh = 0 or nx = 0, the routine will zero the first ny elements in y and
return.

EXAMPLES

rea l h(0 :2), x(0 :3), y(0 :7)


dat a (h( i), i = 0, 2) / 1.0, 2.0, 3.0 /
data (x(i), i = 0, 3) / 4.0, 5.0 , 6.0 , 7.0 /
cal l sco rr( 3, 4, 8, h, x, y)
pri nt *, y

The output produced is:


32., 38. , 20. , 7., 0., 0., 0., 0.

SEE ALSO
SCONV(3S), SCORRS(3S)

004– 2081– 002 329


SCORRS ( 3S ) SCORRS ( 3S )

NAME
SCORRS – Performs the correlation of two sequences of real numbers (symmetric filter)

SYNOPSIS
CALL SCORRS (nh, nx, ny, h, x, y)

IMPLEMENTATION
UNICOS/mk systems

DESCRIPTION
SCORRS computes the correlation of the symmetric filter sequence h with the data sequence x, producing the
output sequence y. The filter, h, is assumed to be symmetric about its middle.
The computation carried out by SCORRS is exactly the same as that done by routine SCORR, with one
exception: the filter, h, is assumed to be symmetric, so only the first half of the elements are accessed. The
values of the second half are inferred from the first half and do not actually have to be supplied by the
calling routine.
To review the definition of correlation (not necessarily assuming a symmetric filter), suppose h and x are two
sequences of real numbers, having nh and nx elements, respectively. As is customary in signal processing
applications, let the subscripts start at 0, so
h = h(0), h(1), . . ., h(nh – 1)
x = x(0), x(1), . . ., x(nx – 1)
The "correlation product", y, is the sequence having elements defined by:
y(0) = h(0) . x(0) + h(1) . x(1) + . . . + h(nh– 1) . x(nh– 1)
y(1) = h(0) . x(1) + h(1) . x(2) + . . . + h(nh– 1) . x(nh)
y(2) = h(0) . x(2) + h(1) . x(3) + . . . + h(nh– 1) . x(nh+1)
This example definition assumes that nx ≥ nh.
The precise definition of correlation is as follows:
Yk = Σ H (nh −1−j ) . x (k +j ) for 0 ≤ k ≤ny −1
0≤j ≤MIN

The SCORRS routine makes the assumption that the filter is symmetric; in other words, that h(nh − j) = h(j),
for 0 ≤ j ≤ nh / 2.
Only the elements h(0) through h (nh/2) are accessed by the routine. The last half of the filter values are not
accessed and do not actually have to be supplied by the calling routine.
The number of terms in the output sequence is specified by an argument, ny. If ny < nx, then the output
sequence is just truncated. If ny > nx, then zeros are appended to the output sequence.

330 004– 2081– 002


SCORRS ( 3S ) SCORRS ( 3S )

By choosing ny > nx − nh+1, the routine does what is sometimes called "post-tapered" correlation. The
effect is as though the data sequence, x, were padded on the end with zeros, except that no zeros are actually
stored and no multiplications by zero are actually done.
This routine has the following arguments:
nh Integer. (input)
Specifies the number of elements in the filter sequence, h. nh ≥ 0.
nx Integer. (input)
Specifies the number of elements in the data sequence, x. nx ≥ 0.
ny Integer. (input)
Specifies the number of elements in the output sequence, y. ny ≥ 0.
h Real array of dimension (0, nh/2). (input)
Specifies the input sequence of filter values. Only values h(0) through h (nh/2) are accessed; the
second half of the filter values are inferred from the symmetry of h.
x Real array of dimension (0, nx−1). (input)
Specifies the input sequence of data values.
y Real array of dimension (0, ny−1). (output)
Specifies the output sequence.

NOTES
If ny = 0, the routine returns. If either nh = 0 or nx = 0, the routine zeroes the first ny elements in y and
returns.

EXAMPLES

rea l h(0 :2) , x(0 :3), y(0:7)


dat a (h( i), i = 0, 2) / 1.0 , 2.0 , 3.0 /
dat a (x( i), i = 0, 3) / 4.0 , 5.0 , 6.0 , 7.0 /
call scorrs(5, 4, 8, h, x, y)
print *, y

The output produced is:


46., 38., 20., 7., 0., 0., 0., 0.

SEE ALSO
SCONV(3S), SCORR(3S)

004– 2081– 002 331


INDEX

3D array (FFT) ................................................................................................ descinit3d(3S) ................................................ 219


Absolute value ................................................................................................. sasum(3S) .............................................................. 42
Absolute values of vector elements addition (BLAS1) .................................. sasum(3S) .............................................................. 42
Adds a scalar multiple of a real or complex matrix to a scalar multiple of
another real or complex matrix ....................................................................... sgesum(3S) .......................................................... 100
Adds a scalar multiple of a real or complex vector to another real or
complex vector ................................................................................................ haxpy(3S) .............................................................. 38
Adds a scalar multiple of a real or complex vector to another real or
complex vector ................................................................................................ saxpy(3S) .............................................................. 46
Adds a scalar multiple of a real or complex vector x to a scalar multiple
of another real or complex vector y ............................................................... saxpby(3S) ............................................................ 44
Adds a scalar multiple of a real vector to a sparse real vector ...................... spaxpy(3S) ............................................................ 56
Applies a complex Fast Fourier Transform (FFT) ......................................... cfft2(3S) ............................................................ 201
Applies a complex-to-real Fast Fourier Transform (FFT) .............................. crfft2(3S) .......................................................... 217
Applies a modified Givens plane rotation ...................................................... srotm(3S) .............................................................. 62
Applies a multitasked complex Fast Fourier Transform (FFT) ...................... cfft(3S)............................................................... 195
Applies a multitasked complex-to-complex Fast Fourier Transform (FFT) ... ccfft(3S) ............................................................ 167
Applies a multitasked complex-to-complex Fast Fourier Transform (FFT) ... ggfft(3S) ............................................................ 227
Applies a multitasked three-dimensional complex Fast Fourier Transform
(FFT) ............................................................................................................... cfft3d(3S) .......................................................... 208
Applies a multitasked three-dimensional real-to-complex Fast Fourier
Transform (FFT) ............................................................................................. scfft3d(3S)........................................................ 303
Applies a multitasked two-dimensional complex Fast Fourier Transform
(FFT) ............................................................................................................... cfft2d(3S) .......................................................... 202
Applies a real plane rotation or complex coordinate rotation ........................ srot(3S)................................................................. 58
Applies a real plane rotation to a pair of complex vectors ............................ csrot(3S) .............................................................. 37
Applies a real-to-complex Fast Fourier Transform (FFT) .............................. rcfft2(3S) .......................................................... 285
Applies a three-dimensional (3D) complex-to-complex Fast Fourier
Transform (FFT) to a matrix distributed across a set of processors .............. pccfft3d(3S) ..................................................... 262
Applies a three-dimensional (3D) real-to-complex or complex-to-real Fast
Fourier Transform (FFT) to a matrix distributed across a set of
processors ........................................................................................................ pscfft3d(3S) ..................................................... 277
Applies a three-dimensional complex-to-complex Fast Fourier Transform
(FFT) ............................................................................................................... ccfft3d(3S)........................................................ 178
Applies a two-dimensional (2D) complex-to-complex Fast Fourier
Transform (FFT) to a matrix distributed across a set of processors .............. pccfft2d(3S) ..................................................... 255
Applies a two-dimensional (2D) real-to-complex or complex-to-real Fast
Fourier Transform (FFT) to a matrix distributed across a set of
processors ........................................................................................................ pscfft2d(3S) ..................................................... 269
Applies a two-dimensional complex-to-complex Fast Fourier Transform
(FFT) ............................................................................................................... ccfft2d(3S)........................................................ 172
Applies a two-dimensional real-to-complex or complex-to-real Fast
Fourier Transform (FFT) ................................................................................ scfft2d(3S)........................................................ 296
Applies complex-to-complex Fast Fourier Transforms (FFTs) on multiple
input vectors .................................................................................................... cfftmlt(3S)........................................................ 215
Applies complex-to-real or real-to-complex Fast Fourier Transforms
(FFTs) on multiple input vectors .................................................................... rfftmlt(3S)........................................................ 286

004– 2081– 002 Index-1


Applies multiple multitasked complex Fast Fourier Transforms (FFTs) ....... mcfft(3S) ............................................................ 245
Applies multiple multitasked complex-to-complex Fast Fourier
Transforms (FFTs) .......................................................................................... ccfftm(3S) .......................................................... 183
Applies multiple real-to-complex or complex-to-real Fast Fourier
Transforms (FFTs) .......................................................................................... scfftm(3S) .......................................................... 309
BAKVEC(3S) .................................................................................................... eispack(3S) ......................................................... 23
bakvec(3S) .................................................................................................... eispack(3S) ......................................................... 23
BALANC(3S) .................................................................................................... eispack(3S) ......................................................... 23
balanc(3S) .................................................................................................... eispack(3S) ......................................................... 23
BALBAK(3S) .................................................................................................... eispack(3S) ......................................................... 23
balbak(3S) .................................................................................................... eispack(3S) ......................................................... 23
Banded symmetric systems of linear equations .............................................. eispack(3S) ......................................................... 23
BANDR(3S) ...................................................................................................... eispack(3S) ......................................................... 23
bandr(3S) ...................................................................................................... eispack(3S) ......................................................... 23
BANDV(3S) ...................................................................................................... eispack(3S) ......................................................... 23
bandv(3S) ...................................................................................................... eispack(3S) ......................................................... 23
Basic Linear Algebra Subprogram .................................................................. sspmv(3S) ............................................................ 104
Basic Linear Algebra Subprogram .................................................................. stpsv(3S) ............................................................ 126
Basic Linear Algebra Subprogram .................................................................. strsv(3S) ............................................................ 130
Basic Linear Algebra Subprogram .................................................................. chemm(3S) ............................................................ 135
Basic Linear Algebra Subprogram .................................................................. strmm(3S) ............................................................ 160
Basic Linear Algebra Subprogram .................................................................. hdot(3S)................................................................. 40
Basic Linear Algebra Subprogram .................................................................. srotm(3S) .............................................................. 62
Basic Linear Algebra Subprogram .................................................................. srotmg(3S) ............................................................ 64
Basic Linear Algebra Subprogram .................................................................. intro_blas2(3S) ................................................ 75
BISECT(3S) .................................................................................................... eispack(3S) ......................................................... 23
bisect(3S) .................................................................................................... eispack(3S) ......................................................... 23
BLAS .............................................................................................................. hdot(3S)................................................................. 40
BLAS .............................................................................................................. srotm(3S) .............................................................. 62
BLAS .............................................................................................................. srotmg(3S) ............................................................ 64
BLAS 1 ........................................................................................................... intro_blas1(3S) ................................................ 33
BLAS 2 ........................................................................................................... sspmv(3S) ............................................................ 104
BLAS 2 ........................................................................................................... stpsv(3S) ............................................................ 126
BLAS 2 ........................................................................................................... intro_blas2(3S) ................................................ 75
BLAS 3 ........................................................................................................... intro_blas3(3S) .............................................. 133
BLAS 3 ........................................................................................................... chemm(3S) ............................................................ 135
BLAS 3 ........................................................................................................... strmm(3S) ............................................................ 160
blas1(3S) ...................................................................................................... intro_blas1(3S) ................................................ 33
blas2(3S) ...................................................................................................... intro_blas2(3S) ................................................ 75
blas3(3S) ...................................................................................................... intro_blas3(3S) .............................................. 133
BQR(3S) ........................................................................................................... eispack(3S) ......................................................... 23
bqr(3S) ........................................................................................................... eispack(3S) ......................................................... 23
CAXPBY(3S) .................................................................................................... saxpby(3S) ............................................................ 44
caxpby(3S) .................................................................................................... saxpby(3S) ............................................................ 44
CAXPY(3S) ...................................................................................................... saxpy(3S) .............................................................. 46
caxpy(3S) ...................................................................................................... saxpy(3S) .............................................................. 46
CBABK2(3S) .................................................................................................... eispack(3S) ......................................................... 23
cbabk2(3S) .................................................................................................... eispack(3S) ......................................................... 23
CBAL(3S) ........................................................................................................ eispack(3S) ......................................................... 23
cbal(3S) ........................................................................................................ eispack(3S) ......................................................... 23

Index-2 004– 2081– 002


CCFFT2D(3S) ................................................................................................. ccfft2d(3S)........................................................ 172
ccfft2d(3S) ................................................................................................. ccfft2d(3S)........................................................ 172
CCFFT3D(3S) ................................................................................................. ccfft3d(3S)........................................................ 178
ccfft3d(3S) ................................................................................................. ccfft3d(3S)........................................................ 178
CCFFT(3S) ...................................................................................................... ccfft(3S) ............................................................ 167
ccfft(3S) ...................................................................................................... ccfft(3S) ............................................................ 167
CCFFTM(3S) .................................................................................................... ccfftm(3S) .......................................................... 183
ccfftm(3S) .................................................................................................... ccfftm(3S) .......................................................... 183
cchdc(3S) ...................................................................................................... linpack(3S) ......................................................... 29
cchdd(3S) ...................................................................................................... linpack(3S) ......................................................... 29
cchex(3S) ...................................................................................................... linpack(3S) ......................................................... 29
cchud(3S) ...................................................................................................... linpack(3S) ......................................................... 29
CCNVL(3S) ...................................................................................................... ccnvl(3S) ............................................................ 188
ccnvl(3S) ...................................................................................................... ccnvl(3S) ............................................................ 188
CCNVLF(3S) .................................................................................................... ccnvlf(3S) .......................................................... 192
ccnvlf(3S) .................................................................................................... ccnvlf(3S) .......................................................... 192
CCOPY2(3S) .................................................................................................... scopy2(3S) .......................................................... 143
ccopy2(3S) .................................................................................................... scopy2(3S) .......................................................... 143
CCOPY(3S) ...................................................................................................... scopy(3S) .............................................................. 48
ccopy(3S) ...................................................................................................... scopy(3S) .............................................................. 48
CDOTC(3S) ...................................................................................................... sdot(3S)................................................................. 50
cdotc(3S) ...................................................................................................... sdot(3S)................................................................. 50
CDOTU(3S) ...................................................................................................... sdot(3S)................................................................. 50
cdotu(3S) ...................................................................................................... sdot(3S)................................................................. 50
CFFT2(3S) ...................................................................................................... cfft2(3S) ............................................................ 201
cfft2(3S) ...................................................................................................... cfft2(3S) ............................................................ 201
CFFT2D(3S) .................................................................................................... cfft2d(3S) .......................................................... 202
cfft2d(3S) .................................................................................................... cfft2d(3S) .......................................................... 202
CFFT3D(3S) .................................................................................................... cfft3d(3S) .......................................................... 208
cfft3d(3S) .................................................................................................... cfft3d(3S) .......................................................... 208
CFFT(3S) ........................................................................................................ cfft(3S)............................................................... 195
cfft(3S) ........................................................................................................ cfft(3S)............................................................... 195
CFFTMLT(3S) ................................................................................................. cfftmlt(3S)........................................................ 215
cfftmlt(3S) ................................................................................................. cfftmlt(3S)........................................................ 215
CFTFAX(3S) .................................................................................................... cfftmlt(3S)........................................................ 215
cftfax(3S) .................................................................................................... cfftmlt(3S)........................................................ 215
CG(3S) ............................................................................................................. eispack(3S) ......................................................... 23
cg(3S) ............................................................................................................. eispack(3S) ......................................................... 23
cgbco(3S) ...................................................................................................... linpack(3S) ......................................................... 29
cgbdi(3S) ...................................................................................................... linpack(3S) ......................................................... 29
cgbfa(3S) ...................................................................................................... linpack(3S) ......................................................... 29
CGBMV(3S) ...................................................................................................... sgbmv(3S) .............................................................. 93
cgbmv(3S) ...................................................................................................... sgbmv(3S) .............................................................. 93
cgbsl(3S) ...................................................................................................... linpack(3S) ......................................................... 29
cgeco(3S) ...................................................................................................... linpack(3S) ......................................................... 29
cgedi(3S) ...................................................................................................... linpack(3S) ......................................................... 29
cgefa(3S) ...................................................................................................... linpack(3S) ......................................................... 29
CGEMM(3S) ...................................................................................................... sgemm(3S) ............................................................ 145
CGEMMS(3S) .................................................................................................... sgemms(3S) .......................................................... 148
cgemms(3S) .................................................................................................... sgemms(3S) .......................................................... 148

004– 2081– 002 Index-3


CGEMV(3S) ...................................................................................................... sgemv(3S) .............................................................. 96
CGERC(3S) ...................................................................................................... sger(3S)................................................................. 98
CGERU(3S) ...................................................................................................... sger(3S)................................................................. 98
cgesl(3S) ...................................................................................................... linpack(3S) ......................................................... 29
CGESUM(3S) .................................................................................................... sgesum(3S) .......................................................... 100
cgesum(3S) .................................................................................................... sgesum(3S) .......................................................... 100
cgtsl(3S) ...................................................................................................... linpack(3S) ......................................................... 29
CH(3S) ............................................................................................................. eispack(3S) ......................................................... 23
ch(3S) ............................................................................................................. eispack(3S) ......................................................... 23
CHBMV(3S) ...................................................................................................... chbmv(3S) .............................................................. 78
chbmv(3S) ...................................................................................................... chbmv(3S) .............................................................. 78
CHEMM(3S) ...................................................................................................... chemm(3S) ............................................................ 135
chemm(3S) ...................................................................................................... chemm(3S) ............................................................ 135
CHEMV(3S) ...................................................................................................... chemv(3S) .............................................................. 81
chemv(3S) ...................................................................................................... chemv(3S) .............................................................. 81
CHER2(3S) ...................................................................................................... cher2(3S) .............................................................. 85
cher2(3S) ...................................................................................................... cher2(3S) .............................................................. 85
CHER2K(3S) .................................................................................................... cher2k(3S) .......................................................... 138
cher2k(3S) .................................................................................................... cher2k(3S) .......................................................... 138
CHER(3S) ........................................................................................................ cher(3S)................................................................. 83
cher(3S) ........................................................................................................ cher(3S)................................................................. 83
CHERK(3S) ...................................................................................................... cherk(3S) ............................................................ 141
cherk(3S) ...................................................................................................... cherk(3S) ............................................................ 141
chico(3S) ...................................................................................................... linpack(3S) ......................................................... 29
chidi(3S) ...................................................................................................... linpack(3S) ......................................................... 29
chifa(3S) ...................................................................................................... linpack(3S) ......................................................... 29
chisl(3S) ...................................................................................................... linpack(3S) ......................................................... 29
Cholesky factorization ..................................................................................... linpack(3S) ......................................................... 29
Cholesky factorization ..................................................................................... intro_lapack(3S) ................................................ 7
chpco(3S) ...................................................................................................... linpack(3S) ......................................................... 29
chpdi(3S) ...................................................................................................... linpack(3S) ......................................................... 29
chpfa(3S) ...................................................................................................... linpack(3S) ......................................................... 29
CHPMV(3S) ...................................................................................................... chpmv(3S) .............................................................. 87
chpmv(3S) ...................................................................................................... chpmv(3S) .............................................................. 87
CHPR2(3S) ...................................................................................................... chpr2(3S) .............................................................. 91
chpr2(3S) ...................................................................................................... chpr2(3S) .............................................................. 91
CHPR(3S) ........................................................................................................ chpr(3S)................................................................. 89
chpr(3S) ........................................................................................................ chpr(3S)................................................................. 89
chpsl(3S) ...................................................................................................... linpack(3S) ......................................................... 29
CINVIT(3S) .................................................................................................... eispack(3S) ......................................................... 23
cinvit(3S) .................................................................................................... eispack(3S) ......................................................... 23
COMBAK(3S) .................................................................................................... eispack(3S) ......................................................... 23
combak(3S) .................................................................................................... eispack(3S) ......................................................... 23
COMHES(3S) .................................................................................................... eispack(3S) ......................................................... 23
comhes(3S) .................................................................................................... eispack(3S) ......................................................... 23
COMLR2(3S) .................................................................................................... eispack(3S) ......................................................... 23
comlr2(3S) .................................................................................................... eispack(3S) ......................................................... 23
COMLR(3S) ...................................................................................................... eispack(3S) ......................................................... 23
comlr(3S) ...................................................................................................... eispack(3S) ......................................................... 23
Complex convolution ...................................................................................... ccnvl(3S) ............................................................ 188

Index-4 004– 2081– 002


Complex convolution ...................................................................................... ccnvlf(3S) .......................................................... 192
Complex convolution using Fast Fourier Transform ...................................... ccnvlf(3S) .......................................................... 192
Complex Fast Fourier transform ..................................................................... cfft2(3S) ............................................................ 201
Complex Fast Fourier transform (multiple input vectors) .............................. cfftmlt(3S)........................................................ 215
Complex FFT convolution .............................................................................. ccnvlf(3S) .......................................................... 192
Complex FFT filter ......................................................................................... ccnvlf(3S) .......................................................... 192
Complex filter ................................................................................................. ccnvl(3S) ............................................................ 188
Complex filter ................................................................................................. ccnvlf(3S) .......................................................... 192
Complex filter using Fast Fourier Transform ................................................. ccnvlf(3S) .......................................................... 192
complex general matrix multiplication (BLAS3) ........................................... sgemm(3S) ............................................................ 145
complex general matrix multiplied by a complex symmetric matrix
(BLAS3) .......................................................................................................... ssymm(3S) ............................................................ 152
complex matrix copy operation (BLAS3) ....................................................... scopy2(3S) .......................................................... 143
complex matrix multiplication (BLAS3) ........................................................ sgemms(3S) .......................................................... 148
complex symmetric matrix rank update (BLAS30 ......................................... ssyrk(3S) ............................................................ 158
complex to real computation (FFT) ................................................................ hgfft(3S) ............................................................ 237
Complex vector addition ................................................................................. saxpy(3S) .............................................................. 46
Complex vector addition ................................................................................. ssum(3S)................................................................. 72
Complex vector exchange ............................................................................... sswap(3S) .............................................................. 73
Complex vector multiplication ........................................................................ saxpby(3S) ............................................................ 44
complex vector multiplication (BLAS2) ......................................................... chbmv(3S) .............................................................. 78
complex vector multiplication (BLAS2) ......................................................... chemv(3S) .............................................................. 81
complex vector multiplication (BLAS2) ......................................................... sgemv(3S) .............................................................. 96
Complex-to-real Fast Fourier transform ......................................................... crfft2(3S) .......................................................... 217
Complex-to-real Fast Fourier transform (multiple input vectors) .................. rfftmlt(3S)........................................................ 286
Computes a correlation of two vectors ........................................................... filterg(3S)........................................................ 224
Computes a correlation of two vectors (symmetric coefficient) ..................... filters(3S)........................................................ 225
Computes a dot product (inner product) of two real or complex vectors ...... hdot(3S)................................................................. 40
Computes a dot product (inner product) of two real or complex vectors ...... sdot(3S)................................................................. 50
Computes a real one-dimensional (1D) convolution of two vectors .............. scnvl1d(3S)........................................................ 316
Computes a real two-dimensional (2D) convolution of two matrices ............ scnvl2d(3S)........................................................ 320
Computes a real-to-complex or complex-to-real Fast Fourier Transform
(FFT) ............................................................................................................... hgfft(3S) ............................................................ 237
Computes a real-to-complex or complex-to-real Fast Fourier Transform
(FFT) ............................................................................................................... scfft(3S) ............................................................ 290
Computes the convolution of a complex sequence with one or more other
complex sequences .......................................................................................... ccnvl(3S) ............................................................ 188
Computes the convolution of a complex sequence with one or more other
complex sequences by using a Fourier transform method ............................. ccnvlf(3S) .......................................................... 192
Computes the dot product of a real vector and a real sparse vector .............. spdot(3S) .............................................................. 57
Computes the Euclidean norm of a vector ..................................................... snrm2(3S) .............................................................. 54
Computes the Hadamard product of two vectors ........................................... shad(3S)................................................................. 52
Condition number ............................................................................................ intro_lapack(3S) ................................................ 7
Constructs a Givens plane rotation ................................................................. srotg(3S) .............................................................. 60
Constructs a modified Givens plane rotation .................................................. srotmg(3S) ............................................................ 64
Convolution ..................................................................................................... intro_fft(3S) ................................................... 163
Convolution ..................................................................................................... ccnvl(3S) ............................................................ 188
Convolution ..................................................................................................... ccnvlf(3S) .......................................................... 192
convolution ...................................................................................................... hconv(3S) ............................................................ 231
convolution ...................................................................................................... sconv(3S) ............................................................ 326

004– 2081– 002 Index-5


CONVOLUTION(3S) ........................................................................................ intro_fft(3S) ................................................... 163
convolution(3S) ........................................................................................ intro_fft(3S) ................................................... 163
Copies a real or complex matrix into another real or complex matrix .......... scopy2(3S) .......................................................... 143
Copies a real or complex vector into another real or complex vector ........... scopy(3S) .............................................................. 48
copying matrices (BLAS3) ............................................................................. scopy2(3S) .......................................................... 143
Copying vectors (BLAS1) .............................................................................. scopy(3S) .............................................................. 48
Correlation ....................................................................................................... filterg(3S)........................................................ 224
Correlation ....................................................................................................... filters(3S)........................................................ 225
correlation ....................................................................................................... hcorr(3S) ............................................................ 233
correlation ....................................................................................................... hcorrs(3S) .......................................................... 235
correlation ....................................................................................................... scorr(3S) ............................................................ 328
correlation ....................................................................................................... scorrs(3S) .......................................................... 330
Correlation of symmetric vectors .................................................................... filters(3S)........................................................ 225
Correlation of two vectors .............................................................................. filters(3S)........................................................ 225
Correlation of vectors ..................................................................................... filterg(3S)........................................................ 224
CORTB(3S) ...................................................................................................... eispack(3S) ......................................................... 23
cortb(3S) ...................................................................................................... eispack(3S) ......................................................... 23
CORTH(3S) ...................................................................................................... eispack(3S) ......................................................... 23
corth(3S) ...................................................................................................... eispack(3S) ......................................................... 23
cpbco(3S) ...................................................................................................... linpack(3S) ......................................................... 29
cpbdi(3S) ...................................................................................................... linpack(3S) ......................................................... 29
cpbfa(3S) ...................................................................................................... linpack(3S) ......................................................... 29
cpbsl(3S) ...................................................................................................... linpack(3S) ......................................................... 29
cpoco(3S) ...................................................................................................... linpack(3S) ......................................................... 29
cpodi(3S) ...................................................................................................... linpack(3S) ......................................................... 29
cpofa(3S) ...................................................................................................... linpack(3S) ......................................................... 29
cposl(3S) ...................................................................................................... linpack(3S) ......................................................... 29
cppco(3S) ...................................................................................................... linpack(3S) ......................................................... 29
cppdi(3S) ...................................................................................................... linpack(3S) ......................................................... 29
cppfa(3S) ...................................................................................................... linpack(3S) ......................................................... 29
cppsl(3S) ...................................................................................................... linpack(3S) ......................................................... 29
cptsl(3S) ...................................................................................................... linpack(3S) ......................................................... 29
cqrdc(3S) ...................................................................................................... linpack(3S) ......................................................... 29
cqrsl(3S) ...................................................................................................... linpack(3S) ......................................................... 29
CRFFT2(3S) .................................................................................................... crfft2(3S) .......................................................... 217
crfft2(3S) .................................................................................................... crfft2(3S) .......................................................... 217
CROT(3S) ........................................................................................................ srot(3S)................................................................. 58
crot(3S) ........................................................................................................ srot(3S)................................................................. 58
CROTG(3S) ...................................................................................................... srotg(3S) .............................................................. 60
crotg(3S) ...................................................................................................... srotg(3S) .............................................................. 60
CSCAL(3S) ...................................................................................................... sscal(3S) .............................................................. 70
cscal(3S) ...................................................................................................... sscal(3S) .............................................................. 70
CSFFT2D(3S) ................................................................................................. scfft2d(3S)........................................................ 296
csfft2d(3S) ................................................................................................. scfft2d(3S)........................................................ 296
CSFFT3D(3S) ................................................................................................. scfft3d(3S)........................................................ 303
csfft3d(3S) ................................................................................................. scfft3d(3S)........................................................ 303
CSFFT(3S) ...................................................................................................... scfft(3S) ............................................................ 290
csfft(3S) ...................................................................................................... scfft(3S) ............................................................ 290
CSFFTM(3S) .................................................................................................... scfftm(3S) .......................................................... 309
csfftm(3S) .................................................................................................... scfftm(3S) .......................................................... 309

Index-6 004– 2081– 002


cspco(3S) ...................................................................................................... linpack(3S) ......................................................... 29
cspdi(3S) ...................................................................................................... linpack(3S) ......................................................... 29
cspfa(3S) ...................................................................................................... linpack(3S) ......................................................... 29
CSPMV(3S) ...................................................................................................... sspmv(3S) ............................................................ 104
cspmv(3S) ...................................................................................................... sspmv(3S) ............................................................ 104
CSPR(3S) ........................................................................................................ sspr(3S)............................................................... 106
cspr(3S) ........................................................................................................ sspr(3S)............................................................... 106
cspsl(3S) ...................................................................................................... linpack(3S) ......................................................... 29
CSROT(3S) ...................................................................................................... csrot(3S) .............................................................. 37
csrot(3S) ...................................................................................................... csrot(3S) .............................................................. 37
CSSCAL(3S) .................................................................................................... sscal(3S) .............................................................. 70
csscal(3S) .................................................................................................... sscal(3S) .............................................................. 70
CSUM(3S) ........................................................................................................ ssum(3S)................................................................. 72
csum(3S) ........................................................................................................ ssum(3S)................................................................. 72
csvdc(3S) ...................................................................................................... linpack(3S) ......................................................... 29
CSWAP(3S) ...................................................................................................... sswap(3S) .............................................................. 73
cswap(3S) ...................................................................................................... sswap(3S) .............................................................. 73
CSYMM(3S) ...................................................................................................... ssymm(3S) ............................................................ 152
CSYMV(3S) ...................................................................................................... ssymv(3S) ............................................................ 112
CSYR2K(3S) .................................................................................................... ssyr2k(3S) .......................................................... 155
CSYR(3S) ........................................................................................................ ssyr(3S)............................................................... 114
CSYRK(3S) ...................................................................................................... ssyrk(3S) ............................................................ 158
CTBMV(3S) ...................................................................................................... stbmv(3S) ............................................................ 118
ctbmv(3S) ...................................................................................................... stbmv(3S) ............................................................ 118
CTBSV(3S) ...................................................................................................... stbsv(3S) ............................................................ 121
ctbsv(3S) ...................................................................................................... stbsv(3S) ............................................................ 121
CTPMV(3S) ...................................................................................................... stpmv(3S) ............................................................ 124
ctpmv(3S) ...................................................................................................... stpmv(3S) ............................................................ 124
CTPSV(3S) ...................................................................................................... stpsv(3S) ............................................................ 126
ctpsv(3S) ...................................................................................................... stpsv(3S) ............................................................ 126
CTRMM(3S) ...................................................................................................... strmm(3S) ............................................................ 160
CTRMV(3S) ...................................................................................................... strmv(3S) ............................................................ 128
ctrmv(3S) ...................................................................................................... strmv(3S) ............................................................ 128
CTRSV(3S) ...................................................................................................... strsv(3S) ............................................................ 130
ctrsv(3S) ...................................................................................................... strsv(3S) ............................................................ 130
Dense ............................................................................................................... linpack(3S) ......................................................... 29
Dense linear algebra ........................................................................................ intro_lapack(3S) ................................................ 7
Dense linear system ........................................................................................ linpack(3S) ......................................................... 29
Dense linear system solvers ............................................................................ intro_lapack(3S) ................................................ 7
Dense linear systems ....................................................................................... intro_lapack(3S) ................................................ 7
Dense solver .................................................................................................... linpack(3S) ......................................................... 29
DESCINIT3D(3S) .......................................................................................... descinit3d(3S) ................................................ 219
descinit3d(3S) .......................................................................................... descinit3d(3S) ................................................ 219
Dot product ..................................................................................................... hdot(3S)................................................................. 40
Dot product (BLAS1) ..................................................................................... sdot(3S)................................................................. 50
Eigenvalue problem ........................................................................................ eispack(3S) ......................................................... 23
Eigenvalues ..................................................................................................... eispack(3S) ......................................................... 23
Eigenvectors .................................................................................................... eispack(3S) ......................................................... 23
EISPACK(3S) ................................................................................................. eispack(3S) ......................................................... 23
eispack(3S) ................................................................................................. eispack(3S) ......................................................... 23

004– 2081– 002 Index-7


ELMBAK(3S) .................................................................................................... eispack(3S) ......................................................... 23
elmbak(3S) .................................................................................................... eispack(3S) ......................................................... 23
ELMHES(3S) .................................................................................................... eispack(3S) ......................................................... 23
elmhes(3S) .................................................................................................... eispack(3S) ......................................................... 23
ELTRAN(3S) .................................................................................................... eispack(3S) ......................................................... 23
eltran(3S) .................................................................................................... eispack(3S) ......................................................... 23
Euclidean norm ............................................................................................... snrm2(3S) .............................................................. 54
Fast Fourier transform ..................................................................................... intro_fft(3S) ................................................... 163
Fast Fourier transform ..................................................................................... ccfft(3S) ............................................................ 167
Fast Fourier transform ..................................................................................... ccfft2d(3S)........................................................ 172
Fast Fourier transform ..................................................................................... ccfft3d(3S)........................................................ 178
Fast Fourier transform ..................................................................................... ccfftm(3S) .......................................................... 183
Fast Fourier transform ..................................................................................... cfft(3S)............................................................... 195
Fast Fourier transform ..................................................................................... cfft2(3S) ............................................................ 201
Fast Fourier transform ..................................................................................... cfft2d(3S) .......................................................... 202
Fast Fourier transform ..................................................................................... cfft3d(3S) .......................................................... 208
Fast Fourier transform ..................................................................................... cfftmlt(3S)........................................................ 215
Fast Fourier transform ..................................................................................... crfft2(3S) .......................................................... 217
Fast Fourier transform ..................................................................................... ggfft(3S) ............................................................ 227
Fast Fourier transform ..................................................................................... mcfft(3S) ............................................................ 245
Fast Fourier transform ..................................................................................... rcfft2(3S) .......................................................... 285
Fast Fourier transform ..................................................................................... rfftmlt(3S)........................................................ 286
Fast Fourier transform ..................................................................................... scfft(3S) ............................................................ 290
Fast Fourier transform ..................................................................................... scfft2d(3S)........................................................ 296
Fast Fourier transform ..................................................................................... scfft3d(3S)........................................................ 303
Fast Fourier transform ..................................................................................... scfftm(3S) .......................................................... 309
Fast Fourier transform for multiple input vectors .......................................... rfftmlt(3S)........................................................ 286
FFT .................................................................................................................. intro_fft(3S) ................................................... 163
FFT .................................................................................................................. ccfft(3S) ............................................................ 167
FFT .................................................................................................................. ccfft2d(3S)........................................................ 172
FFT .................................................................................................................. ccfft3d(3S)........................................................ 178
FFT .................................................................................................................. ccfftm(3S) .......................................................... 183
FFT .................................................................................................................. cfft(3S)............................................................... 195
FFT .................................................................................................................. cfft2(3S) ............................................................ 201
FFT .................................................................................................................. cfft2d(3S) .......................................................... 202
FFT .................................................................................................................. cfft3d(3S) .......................................................... 208
FFT .................................................................................................................. cfftmlt(3S)........................................................ 215
FFT .................................................................................................................. crfft2(3S) .......................................................... 217
FFT .................................................................................................................. ggfft(3S) ............................................................ 227
FFT .................................................................................................................. mcfft(3S) ............................................................ 245
FFT .................................................................................................................. rcfft2(3S) .......................................................... 285
FFT .................................................................................................................. rfftmlt(3S)........................................................ 286
FFT .................................................................................................................. scfft(3S) ............................................................ 290
FFT .................................................................................................................. scfft2d(3S)........................................................ 296
FFT .................................................................................................................. scfft3d(3S)........................................................ 303
FFT .................................................................................................................. scfftm(3S) .......................................................... 309
FFT convolution .............................................................................................. ccnvlf(3S) .......................................................... 192
FFT filter ......................................................................................................... ccnvlf(3S) .......................................................... 192
FFT(3S) ........................................................................................................... intro_fft(3S) ................................................... 163
fft(3S) ........................................................................................................... intro_fft(3S) ................................................... 163

Index-8 004– 2081– 002


FFTFAX(3S) .................................................................................................... rfftmlt(3S)........................................................ 286
fftfax(3S) .................................................................................................... rfftmlt(3S)........................................................ 286
FIGI2(3S) ...................................................................................................... eispack(3S) ......................................................... 23
figi2(3S) ...................................................................................................... eispack(3S) ......................................................... 23
FIGI(3S) ........................................................................................................ eispack(3S) ......................................................... 23
figi(3S) ........................................................................................................ eispack(3S) ......................................................... 23
Filter ................................................................................................................ intro_fft(3S) ................................................... 163
Filter ................................................................................................................ ccnvl(3S) ............................................................ 188
Filter ................................................................................................................ ccnvlf(3S) .......................................................... 192
Filter ................................................................................................................ filterg(3S)........................................................ 224
Filter ................................................................................................................ filters(3S)........................................................ 225
Filter ................................................................................................................ hopfilt(3S)........................................................ 243
Filter ................................................................................................................ opfilt(3S) .......................................................... 253
filter sequence ................................................................................................. hconv(3S) ............................................................ 231
filter sequence ................................................................................................. hcorr(3S) ............................................................ 233
filter sequence ................................................................................................. sconv(3S) ............................................................ 326
filter sequence ................................................................................................. scorr(3S) ............................................................ 328
FILTER(3S) .................................................................................................... intro_fft(3S) ................................................... 163
filter(3S) .................................................................................................... intro_fft(3S) ................................................... 163
FILTERG(3S) ................................................................................................. filterg(3S)........................................................ 224
filterg(3S) ................................................................................................. filterg(3S)........................................................ 224
FILTERS(3S) ................................................................................................. filters(3S)........................................................ 225
filters(3S) ................................................................................................. filters(3S)........................................................ 225
Fourier transform ............................................................................................ intro_fft(3S) ................................................... 163
GAXPY(3S) ...................................................................................................... haxpy(3S) .............................................................. 38
gaxpy(3S) ...................................................................................................... haxpy(3S) .............................................................. 38
GDOTC(3S) ...................................................................................................... hdot(3S)................................................................. 40
gdotc(3S) ...................................................................................................... hdot(3S)................................................................. 40
GDOTU(3S) ...................................................................................................... hdot(3S)................................................................. 40
gdotu(3S) ...................................................................................................... hdot(3S)................................................................. 40
General band matrix ........................................................................................ sgbmv(3S) .............................................................. 93
General band matrix ........................................................................................ sgemv(3S) .............................................................. 96
General matrix ................................................................................................. chemm(3S) ............................................................ 135
General matrix ................................................................................................. strmm(3S) ............................................................ 160
General matrix ................................................................................................. sger(3S)................................................................. 98
GGFFT(3S) ...................................................................................................... ggfft(3S) ............................................................ 227
ggfft(3S) ...................................................................................................... ggfft(3S) ............................................................ 227
GHFFT(3S) ...................................................................................................... hgfft(3S) ............................................................ 237
ghfft(3S) ...................................................................................................... hgfft(3S) ............................................................ 237
Givens ............................................................................................................. srotm(3S) .............................................................. 62
Givens ............................................................................................................. srotmg(3S) ............................................................ 64
Givens plane rotation application ................................................................... srotm(3S) .............................................................. 62
Givens plane rotation construction ................................................................. srotg(3S) .............................................................. 60
Givens plane rotation construction ................................................................. srotmg(3S) ............................................................ 64
Hadamard products ......................................................................................... shad(3S)................................................................. 52
HAXPY(3S) ...................................................................................................... haxpy(3S) .............................................................. 38
haxpy(3S) ...................................................................................................... haxpy(3S) .............................................................. 38
HCONV(3S) ...................................................................................................... hconv(3S) ............................................................ 231
hconv(3S) ...................................................................................................... hconv(3S) ............................................................ 231
HCORR(3S) ...................................................................................................... hcorr(3S) ............................................................ 233

004– 2081– 002 Index-9


hcorr(3S) ...................................................................................................... hcorr(3S) ............................................................ 233
HCORRS(3S) .................................................................................................... hcorrs(3S) .......................................................... 235
hcorrs(3S) .................................................................................................... hcorrs(3S) .......................................................... 235
HDOT(3S) ........................................................................................................ hdot(3S)................................................................. 40
hdot(3S) ........................................................................................................ hdot(3S)................................................................. 40
Hermitian band matrix .................................................................................... chbmv(3S) .............................................................. 78
Hermitian matrix ............................................................................................. chemm(3S) ............................................................ 135
Hermitian matrix ............................................................................................. cher(3S)................................................................. 83
Hermitian packed matrix ................................................................................. chpmv(3S) .............................................................. 87
Hermitian packed matrix ................................................................................. chpr2(3S) .............................................................. 91
Hermitian rank 1 update ................................................................................. cher(3S)................................................................. 83
Hermitian rank 1 update ................................................................................. chpr(3S)................................................................. 89
Hermitian rank 2 update ................................................................................. cher2(3S) .............................................................. 85
Hermitian rank 2k update (BLAS3) ............................................................... cher2k(3S) .......................................................... 138
Hermitian rank k update (BLAS3) ................................................................. cherk(3S) ............................................................ 141
HGFFT(3S) ...................................................................................................... hgfft(3S) ............................................................ 237
hgfft(3S) ...................................................................................................... hgfft(3S) ............................................................ 237
HOPFILT(3S) ................................................................................................. hopfilt(3S)........................................................ 243
hopfilt(3S) ................................................................................................. hopfilt(3S)........................................................ 243
HTRIB3(3S) .................................................................................................... eispack(3S) ......................................................... 23
htrib3(3S) .................................................................................................... eispack(3S) ......................................................... 23
HTRIBK(3S) .................................................................................................... eispack(3S) ......................................................... 23
htribk(3S) .................................................................................................... eispack(3S) ......................................................... 23
HTRID3(3S) .................................................................................................... eispack(3S) ......................................................... 23
htrid3(3S) .................................................................................................... eispack(3S) ......................................................... 23
HTRIDI(3S) .................................................................................................... eispack(3S) ......................................................... 23
htridi(3S) .................................................................................................... eispack(3S) ......................................................... 23
IMTQL1(3S) .................................................................................................... eispack(3S) ......................................................... 23
imtql1(3S) .................................................................................................... eispack(3S) ......................................................... 23
IMTQL2(3S) .................................................................................................... eispack(3S) ......................................................... 23
imtql2(3S) .................................................................................................... eispack(3S) ......................................................... 23
IMTQLV(3S) .................................................................................................... eispack(3S) ......................................................... 23
imtqlv(3S) .................................................................................................... eispack(3S) ......................................................... 23
initialize descriptor vector (FFT) .................................................................... descinit3d(3S) ................................................ 219
Initializes a descriptor vector that contains information about the
distribution of a three-dimensional (3D) array across a 3D grid of
processors ........................................................................................................ descinit3d(3S) ................................................ 219
Inner product ................................................................................................... hdot(3S)................................................................. 40
Inner product ................................................................................................... sdot(3S)................................................................. 50
INTRO_BLAS1(3S) ........................................................................................ intro_blas1(3S) ................................................ 33
intro_blas1(3S) ........................................................................................ intro_blas1(3S) ................................................ 33
INTRO_BLAS2(3S) ........................................................................................ intro_blas2(3S) ................................................ 75
intro_blas2(3S) ........................................................................................ intro_blas2(3S) ................................................ 75
INTRO_BLAS3(3S) ........................................................................................ intro_blas3(3S) .............................................. 133
intro_blas3(3S) ........................................................................................ intro_blas3(3S) .............................................. 133
Introduction to BLAS 1 .................................................................................. intro_blas1(3S) ................................................ 33
Introduction to BLAS 3 .................................................................................. intro_blas3(3S) .............................................. 133
Introduction to Eigensystem computation for dense linear systems ............... eispack(3S) ......................................................... 23
Introduction to LAPACK solvers for dense linear systems ........................... intro_lapack(3S) ................................................ 7
Introduction to matrix-matrix linear algebra subprograms ............................. intro_blas3(3S) .............................................. 133

Index-10 004– 2081– 002


Introduction to matrix-vector linear algebra subprograms .............................. intro_blas2(3S) ................................................ 75
Introduction to Scientific Libraries ................................................................. intro_libsci(3S) ................................................ 1
Introduction to Scientific Library routines ...................................................... intro_libsci(3S) ................................................ 1
Introduction to signal processing routines ...................................................... intro_fft(3S) ................................................... 163
Introduction to vector-vector linear algebra subprograms .............................. intro_blas1(3S) ................................................ 33
INTRO_FFT(3S) ............................................................................................. intro_fft(3S) ................................................... 163
intro_fft(3S) ............................................................................................. intro_fft(3S) ................................................... 163
INTRO_LAPACK(3S) ...................................................................................... intro_lapack(3S) ................................................ 7
intro_lapack(3S) ...................................................................................... intro_lapack(3S) ................................................ 7
INTRO_LIBSCI(3S) ...................................................................................... intro_libsci(3S) ................................................ 1
intro_libsci(3S) ...................................................................................... intro_libsci(3S) ................................................ 1
Inverse ............................................................................................................. intro_lapack(3S) ................................................ 7
INVIT(3S) ...................................................................................................... eispack(3S) ......................................................... 23
invit(3S) ...................................................................................................... eispack(3S) ......................................................... 23
LAPACK ......................................................................................................... intro_lapack(3S) ................................................ 7
Lapack ............................................................................................................. intro_lapack(3S) ................................................ 7
LAPACK(3S) .................................................................................................... intro_lapack(3S) ................................................ 7
lapack(3S) .................................................................................................... intro_lapack(3S) ................................................ 7
Level 1 Basic Linear Algebra Subprogram .................................................... hdot(3S)................................................................. 40
Level 1 Basic Linear Algebra Subprogram .................................................... srotm(3S) .............................................................. 62
Level 1 Basic Linear Algebra Subprogram .................................................... srotmg(3S) ............................................................ 64
Level 1 BLAS ................................................................................................. hdot(3S)................................................................. 40
Level 1 BLAS ................................................................................................. srotm(3S) .............................................................. 62
Level 1 BLAS ................................................................................................. srotmg(3S) ............................................................ 64
Level 2 Basic Linear Algebra Subprogram .................................................... sspmv(3S) ............................................................ 104
Level 2 Basic Linear Algebra Subprogram .................................................... stpsv(3S) ............................................................ 126
Level 2 Basic Linear Algebra Subprogram .................................................... intro_blas2(3S) ................................................ 75
Level 2 BLAS ................................................................................................. sspmv(3S) ............................................................ 104
Level 2 BLAS ................................................................................................. stpsv(3S) ............................................................ 126
Level 2 BLAS ................................................................................................. intro_blas2(3S) ................................................ 75
Level 3 Basic Linear Algebra Subprogram .................................................... chemm(3S) ............................................................ 135
Level 3 Basic Linear Algebra Subprogram .................................................... strmm(3S) ............................................................ 160
Level 3 BLAS ................................................................................................. intro_blas3(3S) .............................................. 133
Level 3 BLAS ................................................................................................. chemm(3S) ............................................................ 135
Level 3 BLAS ................................................................................................. strmm(3S) ............................................................ 160
Libsci intro ...................................................................................................... intro_libsci(3S) ................................................ 1
libsci(3S) .................................................................................................... intro_libsci(3S) ................................................ 1
Linear algebra ................................................................................................. intro_lapack(3S) ................................................ 7
Linear digital filter .......................................................................................... intro_fft(3S) ................................................... 163
Linear equations .............................................................................................. hopfilt(3S)........................................................ 243
Linear equations .............................................................................................. opfilt(3S) .......................................................... 253
Linear equations .............................................................................................. linpack(3S) ......................................................... 29
Linear system solvers ...................................................................................... intro_lapack(3S) ................................................ 7
Linear systems ................................................................................................. intro_lapack(3S) ................................................ 7
LINPACK(3S) ................................................................................................. linpack(3S) ......................................................... 29
linpack(3S) ................................................................................................. linpack(3S) ......................................................... 29
LU factorization .............................................................................................. intro_lapack(3S) ................................................ 7
Matrix-matrix multiplication ........................................................................... chemm(3S) ............................................................ 135
Matrix-matrix multiplication ........................................................................... strmm(3S) ............................................................ 160
Matrix-vector linear algebra subprograms ...................................................... intro_blas2(3S) ................................................ 75

004– 2081– 002 Index-11


Matrix-vector multiplication ........................................................................... sspmv(3S) ............................................................ 104
Matrix-vector multiplication ........................................................................... ssymv(3S) ............................................................ 112
Matrix-vector multiplication ........................................................................... stbmv(3S) ............................................................ 118
Matrix-vector multiplication ........................................................................... strmv(3S) ............................................................ 128
Matrix-vector multiplication ........................................................................... chbmv(3S) .............................................................. 78
Matrix-vector multiplication ........................................................................... chemv(3S) .............................................................. 81
MCFFT(3S) ...................................................................................................... mcfft(3S) ............................................................ 245
mcfft(3S) ...................................................................................................... mcfft(3S) ............................................................ 245
MINFIT(3S) .................................................................................................... eispack(3S) ......................................................... 23
minfit(3S) .................................................................................................... eispack(3S) ......................................................... 23
Modified Givens plane rotation ...................................................................... srotm(3S) .............................................................. 62
Modified Givens plane rotation ...................................................................... srotmg(3S) ............................................................ 64
Multiple multitasked complex Fast Fourier transform ................................... ccfftm(3S) .......................................................... 183
Multiple multitasked complex Fast Fourier transform ................................... mcfft(3S) ............................................................ 245
Multiple multitasked real-to-complex Fast Fourier transform ........................ scfftm(3S) .......................................................... 309
Multiple-input vector complex Fast Fourier transform ................................... cfftmlt(3S)........................................................ 215
Multiplies a complex general matrix by a complex Hermitian matrix .......... chemm(3S) ............................................................ 135
Multiplies a complex vector by a complex Hermitian band matrix ............... chbmv(3S) .............................................................. 78
Multiplies a complex vector by a complex Hermitian matrix ........................ chemv(3S) .............................................................. 81
Multiplies a complex vector by a packed complex Hermitian matrix ........... chpmv(3S) .............................................................. 87
Multiplies a real or complex general matrix by a real or complex general
matrix .............................................................................................................. sgemm(3S) ............................................................ 145
Multiplies a real or complex general matrix by a real or complex general
matrix, using Strassen’s algorithm .................................................................. sgemms(3S) .......................................................... 148
Multiplies a real or complex general matrix by a real or complex
symmetric matrix ............................................................................................ ssymm(3S) ............................................................ 152
Multiplies a real or complex general matrix by a real or complex
triangular matrix .............................................................................................. strmm(3S) ............................................................ 160
Multiplies a real or complex symmetric packed matrix by a real or
complex vector ................................................................................................ sspmv(3S) ............................................................ 104
Multiplies a real or complex vector by a real or complex general band
matrix .............................................................................................................. sgbmv(3S) .............................................................. 93
Multiplies a real or complex vector by a real or complex general matrix ..... sgemv(3S) .............................................................. 96
Multiplies a real or complex vector by a real or complex symmetric
matrix .............................................................................................................. ssymv(3S) ............................................................ 112
Multiplies a real or complex vector by a real or complex triangular band
matrix .............................................................................................................. stbmv(3S) ............................................................ 118
Multiplies a real or complex vector by a real or complex triangular
matrix .............................................................................................................. strmv(3S) ............................................................ 128
Multiplies a real or complex vector by a real or complex triangular
packed matrix .................................................................................................. stpmv(3S) ............................................................ 124
Multiplies a real vector by a real symmetric band matrix ............................. ssbmv(3S) ............................................................ 102
Multitasked complex Fast Fourier transform .................................................. ccfft(3S) ............................................................ 167
Multitasked complex Fast Fourier transform .................................................. cfft(3S)............................................................... 195
Multitasked complex Fast Fourier transform .................................................. ggfft(3S) ............................................................ 227
Multitasked complex Fast Fourier transform .................................................. scfft(3S) ............................................................ 290
Multitasked three-dimensional Fast Fourier transform ................................... ccfft3d(3S)........................................................ 178
Multitasked three-dimensional Fast Fourier transform ................................... cfft3d(3S) .......................................................... 208
Multitasked three-dimensional real-to-complex Fast Fourier transform ......... scfft3d(3S)........................................................ 303
Multitasked two-dimensional complex Fast Fourier transform ...................... ccfft2d(3S)........................................................ 172

Index-12 004– 2081– 002


Multitasked two-dimensional complex Fast Fourier transform ...................... cfft2d(3S) .......................................................... 202
Multitasked two-dimensional real-to-complex Fast Fourier transform ........... scfft2d(3S)........................................................ 296
OPFILT(3S) .................................................................................................... opfilt(3S) .......................................................... 253
opfilt(3S) .................................................................................................... opfilt(3S) .......................................................... 253
Optional scaling .............................................................................................. sgesum(3S) .......................................................... 100
ORTBAK(3S) .................................................................................................... eispack(3S) ......................................................... 23
ortbak(3S) .................................................................................................... eispack(3S) ......................................................... 23
ORTHES(3S) .................................................................................................... eispack(3S) ......................................................... 23
orthes(3S) .................................................................................................... eispack(3S) ......................................................... 23
ORTRAN(3S) .................................................................................................... eispack(3S) ......................................................... 23
ortran(3S) .................................................................................................... eispack(3S) ......................................................... 23
Packed Hermitian matrix ................................................................................ chpmv(3S) .............................................................. 87
Packed Hermitian matrix ................................................................................ chpr2(3S) .............................................................. 91
Packed symmetric matrix ................................................................................ sspmv(3S) ............................................................ 104
Packed symmetric matrix ................................................................................ sspr(3S)............................................................... 106
Packed symmetric matrix ................................................................................ sspr2(3S) ............................................................ 108
Packed symmetric matrix ................................................................................ sspr12(3S) .......................................................... 110
Packed triangular matrix ................................................................................. stpmv(3S) ............................................................ 124
Packed triangular system of equations ............................................................ stpsv(3S) ............................................................ 126
PCCFFT2D(3S) ............................................................................................... pccfft2d(3S) ..................................................... 255
pccfft2d(3S) ............................................................................................... pccfft2d(3S) ..................................................... 255
PCCFFT3D(3S) ............................................................................................... pccfft3d(3S) ..................................................... 262
pccfft3d(3S) ............................................................................................... pccfft3d(3S) ..................................................... 262
PCSFFT2D(3S) ............................................................................................... pscfft2d(3S) ..................................................... 269
pcsfft2d(3S) ............................................................................................... pscfft2d(3S) ..................................................... 269
PCSFFT3D(3S) ............................................................................................... pscfft3d(3S) ..................................................... 277
pcsfft3d(3S) ............................................................................................... pscfft3d(3S) ..................................................... 277
Performs Hermitian rank 1 update of a complex Hermitian matrix ............... cher(3S)................................................................. 83
Performs Hermitian rank 1 update of a packed complex Hermitian matrix .. chpr(3S)................................................................. 89
Performs Hermitian rank 2 update of a complex Hermitian matrix ............... cher2(3S) .............................................................. 85
Performs Hermitian rank 2 update of a packed complex Hermitian matrix .. chpr2(3S) .............................................................. 91
Performs Hermitian rank 2k update of a complex Hermitian matrix ............. cher2k(3S) .......................................................... 138
Performs Hermitian rank k update of a complex Hermitian matrix ............... cherk(3S) ............................................................ 141
Performs rank 1 update of a real general matrix ............................................ sger(3S)................................................................. 98
Performs symmetric rank 1 update of a real or complex symmetric matrix .. ssyr(3S)............................................................... 114
Performs symmetric rank 1 update of a real or complex symmetric
packed matrix .................................................................................................. sspr(3S)............................................................... 106
Performs symmetric rank 2 update of a real symmetric matrix ..................... ssyr2(3S) ............................................................ 116
Performs symmetric rank 2 update of a real symmetric packed matrix ........ sspr2(3S) ............................................................ 108
Performs symmetric rank 2k update of a real or complex symmetric
matrix .............................................................................................................. ssyr2k(3S) .......................................................... 155
Performs symmetric rank k update of a real or complex symmetric matrix .. ssyrk(3S) ............................................................ 158
Performs the convolution of two sequences of real numbers ......................... hconv(3S) ............................................................ 231
Performs the convolution of two sequences of real numbers ......................... sconv(3S) ............................................................ 326
Performs the correlation of two sequences of real numbers .......................... hcorr(3S) ............................................................ 233
Performs the correlation of two sequences of real numbers .......................... scorr(3S) ............................................................ 328
Performs the correlation of two sequences of real numbers (symmetric
filter) ................................................................................................................ hcorrs(3S) .......................................................... 235
Performs the correlation of two sequences of real numbers (symmetric
filter) ................................................................................................................ scorrs(3S) .......................................................... 330

004– 2081– 002 Index-13


Performs two simultaneous symmetric rank 1 updates of a real symmetric
packed matrix .................................................................................................. sspr12(3S) .......................................................... 110
Plane rotation .................................................................................................. srotm(3S) .............................................................. 62
Plane rotation .................................................................................................. srotmg(3S) ............................................................ 64
post tapered ..................................................................................................... hconv(3S) ............................................................ 231
post tapered ..................................................................................................... hcorr(3S) ............................................................ 233
post tapered ..................................................................................................... hcorrs(3S) .......................................................... 235
post tapered ..................................................................................................... sconv(3S) ............................................................ 326
post tapered ..................................................................................................... scorr(3S) ............................................................ 328
post tapered ..................................................................................................... scorrs(3S) .......................................................... 330
post-tapered ..................................................................................................... hconv(3S) ............................................................ 231
post-tapered ..................................................................................................... hcorr(3S) ............................................................ 233
post-tapered ..................................................................................................... hcorrs(3S) .......................................................... 235
post-tapered ..................................................................................................... sconv(3S) ............................................................ 326
post-tapered ..................................................................................................... scorr(3S) ............................................................ 328
post-tapered ..................................................................................................... scorrs(3S) .......................................................... 330
PSCFFT2D(3S) ............................................................................................... pscfft2d(3S) ..................................................... 269
pscfft2d(3S) ............................................................................................... pscfft2d(3S) ..................................................... 269
PSCFFT3D(3S) ............................................................................................... pscfft3d(3S) ..................................................... 277
pscfft3d(3S) ............................................................................................... pscfft3d(3S) ..................................................... 277
QR factorization .............................................................................................. linpack(3S) ......................................................... 29
QZHES(3S) ...................................................................................................... eispack(3S) ......................................................... 23
qzhes(3S) ...................................................................................................... eispack(3S) ......................................................... 23
QZIT(3S) ........................................................................................................ eispack(3S) ......................................................... 23
qzit(3S) ........................................................................................................ eispack(3S) ......................................................... 23
QZVAL(3S) ...................................................................................................... eispack(3S) ......................................................... 23
qzval(3S) ...................................................................................................... eispack(3S) ......................................................... 23
QZVEC(3S) ...................................................................................................... eispack(3S) ......................................................... 23
qzvec(3S) ...................................................................................................... eispack(3S) ......................................................... 23
Rank 1 update ................................................................................................. sger(3S)................................................................. 98
RATQR(3S) ...................................................................................................... eispack(3S) ......................................................... 23
ratqr(3S) ...................................................................................................... eispack(3S) ......................................................... 23
RCFFT2(3S) .................................................................................................... rcfft2(3S) .......................................................... 285
rcfft2(3S) .................................................................................................... rcfft2(3S) .......................................................... 285
real general matrix multiplication (BLAS3) ................................................... sgemm(3S) ............................................................ 145
real general matrix multiplied by a real symmetric matrix (BLAS3) ............ ssymm(3S) ............................................................ 152
real matrix copy operation (BLAS3) .............................................................. scopy2(3S) .......................................................... 143
real matrix multiplication (BLAS3) ................................................................ sgemms(3S) .......................................................... 148
real plane rotation ........................................................................................... srot(3S)................................................................. 58
real plane rotation (BLAS1) ........................................................................... csrot(3S) .............................................................. 37
real symmetric matrix rank update (BLAS3) ................................................. ssyrk(3S) ............................................................ 158
real to complex computation (FFT) ................................................................ hgfft(3S) ............................................................ 237
Real vector addition ........................................................................................ haxpy(3S) .............................................................. 38
Real vector addition ........................................................................................ saxpy(3S) .............................................................. 46
Real vector addition ........................................................................................ ssum(3S)................................................................. 72
Real vector exchange ...................................................................................... sswap(3S) .............................................................. 73
Real vector multiplication ............................................................................... saxpby(3S) ............................................................ 44
real vector multiplication (BLAS2) ................................................................ sgemv(3S) .............................................................. 96
Real-to-complex Fast Fourier transform ......................................................... rcfft2(3S) .......................................................... 285
Real-to-complex Fast Fourier transform (multiple input vectors) .................. rfftmlt(3S)........................................................ 286

Index-14 004– 2081– 002


REBAK(3S) ...................................................................................................... eispack(3S) ......................................................... 23
rebak(3S) ...................................................................................................... eispack(3S) ......................................................... 23
REBAKB(3S) .................................................................................................... eispack(3S) ......................................................... 23
rebakb(3S) .................................................................................................... eispack(3S) ......................................................... 23
REDUC2(3S) .................................................................................................... eispack(3S) ......................................................... 23
reduc2(3S) .................................................................................................... eispack(3S) ......................................................... 23
REDUC(3S) ...................................................................................................... eispack(3S) ......................................................... 23
reduc(3S) ...................................................................................................... eispack(3S) ......................................................... 23
RFFTMLT(3S) ................................................................................................. rfftmlt(3S)........................................................ 286
rfftmlt(3S) ................................................................................................. rfftmlt(3S)........................................................ 286
RG(3S) ............................................................................................................. eispack(3S) ......................................................... 23
rg(3S) ............................................................................................................. eispack(3S) ......................................................... 23
RGG(3S) ........................................................................................................... eispack(3S) ......................................................... 23
rgg(3S) ........................................................................................................... eispack(3S) ......................................................... 23
Rotation ........................................................................................................... srotm(3S) .............................................................. 62
Rotation ........................................................................................................... srotmg(3S) ............................................................ 64
RS(3S) ............................................................................................................. eispack(3S) ......................................................... 23
rs(3S) ............................................................................................................. eispack(3S) ......................................................... 23
RSB(3S) ........................................................................................................... eispack(3S) ......................................................... 23
rsb(3S) ........................................................................................................... eispack(3S) ......................................................... 23
RSG(3S) ........................................................................................................... eispack(3S) ......................................................... 23
rsg(3S) ........................................................................................................... eispack(3S) ......................................................... 23
RSGAB(3S) ...................................................................................................... eispack(3S) ......................................................... 23
rsgab(3S) ...................................................................................................... eispack(3S) ......................................................... 23
RSGBA(3S) ...................................................................................................... eispack(3S) ......................................................... 23
rsgba(3S) ...................................................................................................... eispack(3S) ......................................................... 23
RSM(3S) ........................................................................................................... eispack(3S) ......................................................... 23
rsm(3S) ........................................................................................................... eispack(3S) ......................................................... 23
RSP(3S) ........................................................................................................... eispack(3S) ......................................................... 23
rsp(3S) ........................................................................................................... eispack(3S) ......................................................... 23
RST(3S) ........................................................................................................... eispack(3S) ......................................................... 23
rst(3S) ........................................................................................................... eispack(3S) ......................................................... 23
RT(3S) ............................................................................................................. eispack(3S) ......................................................... 23
rt(3S) ............................................................................................................. eispack(3S) ......................................................... 23
SASUM(3S) ...................................................................................................... sasum(3S) .............................................................. 42
sasum(3S) ...................................................................................................... sasum(3S) .............................................................. 42
SAXPBY(3S) .................................................................................................... saxpby(3S) ............................................................ 44
saxpby(3S) .................................................................................................... saxpby(3S) ............................................................ 44
SAXPY(3S) ...................................................................................................... saxpy(3S) .............................................................. 46
saxpy(3S) ...................................................................................................... saxpy(3S) .............................................................. 46
Scalar ............................................................................................................... saxpby(3S) ............................................................ 44
Scalar multiple addition .................................................................................. haxpy(3S) .............................................................. 38
Scalar multiple addition (BLAS1) .................................................................. saxpy(3S) .............................................................. 46
Scales a real or complex vector ...................................................................... sscal(3S) .............................................................. 70
Scaling ............................................................................................................. sgesum(3S) .......................................................... 100
Scaling a complex vector ................................................................................ sscal(3S) .............................................................. 70
Scaling a real vector ....................................................................................... sscal(3S) .............................................................. 70
SCASUM(3S) .................................................................................................... sasum(3S) .............................................................. 42
scasum(3S) .................................................................................................... sasum(3S) .............................................................. 42
SCFFT2D(3S) ................................................................................................. scfft2d(3S)........................................................ 296

004– 2081– 002 Index-15


scfft2d(3S) ................................................................................................. scfft2d(3S)........................................................ 296
SCFFT3D(3S) ................................................................................................. scfft3d(3S)........................................................ 303
scfft3d(3S) ................................................................................................. scfft3d(3S)........................................................ 303
SCFFT(3S) ...................................................................................................... scfft(3S) ............................................................ 290
scfft(3S) ...................................................................................................... scfft(3S) ............................................................ 290
SCFFTM(3S) .................................................................................................... scfftm(3S) .......................................................... 309
scfftm(3S) .................................................................................................... scfftm(3S) .......................................................... 309
schdc(3S) ...................................................................................................... linpack(3S) ......................................................... 29
schdd(3S) ...................................................................................................... linpack(3S) ......................................................... 29
schex(3S) ...................................................................................................... linpack(3S) ......................................................... 29
schud(3S) ...................................................................................................... linpack(3S) ......................................................... 29
Scilib ............................................................................................................... intro_libsci(3S) ................................................ 1
Scilib intro ....................................................................................................... intro_libsci(3S) ................................................ 1
SCNRM2(3S) .................................................................................................... snrm2(3S) .............................................................. 54
scnrm2(3S) .................................................................................................... snrm2(3S) .............................................................. 54
SCNVL1D(3S) ................................................................................................. scnvl1d(3S)........................................................ 316
scnvl1d(3S) ................................................................................................. scnvl1d(3S)........................................................ 316
SCNVL2D(3S) ................................................................................................. scnvl2d(3S)........................................................ 320
scnvl2d(3S) ................................................................................................. scnvl2d(3S)........................................................ 320
SCONV(3S) ...................................................................................................... sconv(3S) ............................................................ 326
sconv(3S) ...................................................................................................... sconv(3S) ............................................................ 326
SCOPY2(3S) .................................................................................................... scopy2(3S) .......................................................... 143
scopy2(3S) .................................................................................................... scopy2(3S) .......................................................... 143
SCOPY(3S) ...................................................................................................... scopy(3S) .............................................................. 48
scopy(3S) ...................................................................................................... scopy(3S) .............................................................. 48
SCORR(3S) ...................................................................................................... scorr(3S) ............................................................ 328
scorr(3S) ...................................................................................................... scorr(3S) ............................................................ 328
SCORRS(3S) .................................................................................................... scorrs(3S) .......................................................... 330
scorrs(3S) .................................................................................................... scorrs(3S) .......................................................... 330
SDOT(3S) ........................................................................................................ sdot(3S)................................................................. 50
sdot(3S) ........................................................................................................ sdot(3S)................................................................. 50
sgbco(3S) ...................................................................................................... linpack(3S) ......................................................... 29
sgbdi(3S) ...................................................................................................... linpack(3S) ......................................................... 29
sgbfa(3S) ...................................................................................................... linpack(3S) ......................................................... 29
SGBMV(3S) ...................................................................................................... sgbmv(3S) .............................................................. 93
sgbmv(3S) ...................................................................................................... sgbmv(3S) .............................................................. 93
sgbsl(3S) ...................................................................................................... linpack(3S) ......................................................... 29
sgeco(3S) ...................................................................................................... linpack(3S) ......................................................... 29
sgedi(3S) ...................................................................................................... linpack(3S) ......................................................... 29
sgefa(3S) ...................................................................................................... linpack(3S) ......................................................... 29
SGEMM(3S) ...................................................................................................... sgemm(3S) ............................................................ 145
sgemm(3S) ...................................................................................................... sgemm(3S) ............................................................ 145
SGEMMS(3S) .................................................................................................... sgemms(3S) .......................................................... 148
sgemms(3S) .................................................................................................... sgemms(3S) .......................................................... 148
SGEMV(3S) ...................................................................................................... sgemv(3S) .............................................................. 96
sgemv(3S) ...................................................................................................... sgemv(3S) .............................................................. 96
SGER(3S) ........................................................................................................ sger(3S)................................................................. 98
sger(3S) ........................................................................................................ sger(3S)................................................................. 98
sgesl(3S) ...................................................................................................... linpack(3S) ......................................................... 29
SGESUM(3S) .................................................................................................... sgesum(3S) .......................................................... 100

Index-16 004– 2081– 002


sgesum(3S) .................................................................................................... sgesum(3S) .......................................................... 100
sgtsl(3S) ...................................................................................................... linpack(3S) ......................................................... 29
SHAD(3S) ........................................................................................................ shad(3S)................................................................. 52
shad(3S) ........................................................................................................ shad(3S)................................................................. 52
Signal processing ............................................................................................ intro_fft(3S) ................................................... 163
Single-precision real and complex LINPACK routines .................................. linpack(3S) ......................................................... 29
Singular value decomposition ......................................................................... eispack(3S) ......................................................... 23
Singular value decomposition ......................................................................... linpack(3S) ......................................................... 29
SNRM2(3S) ...................................................................................................... snrm2(3S) .............................................................. 54
snrm2(3S) ...................................................................................................... snrm2(3S) .............................................................. 54
Solves a real or complex triangular banded system of equations .................. stbsv(3S) ............................................................ 121
Solves a real or complex triangular packed system of equations .................. stpsv(3S) ............................................................ 126
Solves a real or complex triangular system of equations ............................... strsv(3S) ............................................................ 130
Solves Weiner-Levinson linear equations ....................................................... hopfilt(3S)........................................................ 243
Solves Weiner-Levinson linear equations ....................................................... opfilt(3S) .......................................................... 253
Sparse .............................................................................................................. spdot(3S) .............................................................. 57
Sparse dot product .......................................................................................... spdot(3S) .............................................................. 57
Sparse inner product ....................................................................................... spdot(3S) .............................................................. 57
Sparse vector ................................................................................................... spaxpy(3S) ............................................................ 56
Sparse vector ................................................................................................... spdot(3S) .............................................................. 57
SPAXPY(3S) .................................................................................................... spaxpy(3S) ............................................................ 56
spaxpy(3S) .................................................................................................... spaxpy(3S) ............................................................ 56
spbco(3S) ...................................................................................................... linpack(3S) ......................................................... 29
spbdi(3S) ...................................................................................................... linpack(3S) ......................................................... 29
spbfa(3S) ...................................................................................................... linpack(3S) ......................................................... 29
spbsl(3S) ...................................................................................................... linpack(3S) ......................................................... 29
SPDOT(3S) ...................................................................................................... spdot(3S) .............................................................. 57
spdot(3S) ...................................................................................................... spdot(3S) .............................................................. 57
spoco(3S) ...................................................................................................... linpack(3S) ......................................................... 29
spodi(3S) ...................................................................................................... linpack(3S) ......................................................... 29
spofa(3S) ...................................................................................................... linpack(3S) ......................................................... 29
sposl(3S) ...................................................................................................... linpack(3S) ......................................................... 29
sppco(3S) ...................................................................................................... linpack(3S) ......................................................... 29
sppdi(3S) ...................................................................................................... linpack(3S) ......................................................... 29
sppfa(3S) ...................................................................................................... linpack(3S) ......................................................... 29
sppsl(3S) ...................................................................................................... linpack(3S) ......................................................... 29
sptsl(3S) ...................................................................................................... linpack(3S) ......................................................... 29
sqrdc(3S) ...................................................................................................... linpack(3S) ......................................................... 29
sqrsl(3S) ...................................................................................................... linpack(3S) ......................................................... 29
SROT(3S) ........................................................................................................ srot(3S)................................................................. 58
srot(3S) ........................................................................................................ srot(3S)................................................................. 58
SROTG(3S) ...................................................................................................... srotg(3S) .............................................................. 60
srotg(3S) ...................................................................................................... srotg(3S) .............................................................. 60
SROTM(3S) ...................................................................................................... srotm(3S) .............................................................. 62
srotm(3S) ...................................................................................................... srotm(3S) .............................................................. 62
SROTMG(3S) .................................................................................................... srotmg(3S) ............................................................ 64
srotmg(3S) .................................................................................................... srotmg(3S) ............................................................ 64
SSBMV(3S) ...................................................................................................... ssbmv(3S) ............................................................ 102
ssbmv(3S) ...................................................................................................... ssbmv(3S) ............................................................ 102
SSCAL(3S) ...................................................................................................... sscal(3S) .............................................................. 70

004– 2081– 002 Index-17


sscal(3S) ...................................................................................................... sscal(3S) .............................................................. 70
ssico(3S) ...................................................................................................... linpack(3S) ......................................................... 29
ssidi(3S) ...................................................................................................... linpack(3S) ......................................................... 29
ssifa(3S) ...................................................................................................... linpack(3S) ......................................................... 29
ssisl(3S) ...................................................................................................... linpack(3S) ......................................................... 29
sspco(3S) ...................................................................................................... linpack(3S) ......................................................... 29
sspdi(3S) ...................................................................................................... linpack(3S) ......................................................... 29
sspfa(3S) ...................................................................................................... linpack(3S) ......................................................... 29
SSPMV(3S) ...................................................................................................... sspmv(3S) ............................................................ 104
sspmv(3S) ...................................................................................................... sspmv(3S) ............................................................ 104
SSPR12(3S) .................................................................................................... sspr12(3S) .......................................................... 110
sspr12(3S) .................................................................................................... sspr12(3S) .......................................................... 110
SSPR2(3S) ...................................................................................................... sspr2(3S) ............................................................ 108
sspr2(3S) ...................................................................................................... sspr2(3S) ............................................................ 108
SSPR(3S) ........................................................................................................ sspr(3S)............................................................... 106
sspr(3S) ........................................................................................................ sspr(3S)............................................................... 106
sspsl(3S) ...................................................................................................... linpack(3S) ......................................................... 29
SSUM(3S) ........................................................................................................ ssum(3S)................................................................. 72
ssum(3S) ........................................................................................................ ssum(3S)................................................................. 72
ssvdc(3S) ...................................................................................................... linpack(3S) ......................................................... 29
SSWAP(3S) ...................................................................................................... sswap(3S) .............................................................. 73
sswap(3S) ...................................................................................................... sswap(3S) .............................................................. 73
SSYMM(3S) ...................................................................................................... ssymm(3S) ............................................................ 152
ssymm(3S) ...................................................................................................... ssymm(3S) ............................................................ 152
SSYMV(3S) ...................................................................................................... ssymv(3S) ............................................................ 112
ssymv(3S) ...................................................................................................... ssymv(3S) ............................................................ 112
SSYR2(3S) ...................................................................................................... ssyr2(3S) ............................................................ 116
ssyr2(3S) ...................................................................................................... ssyr2(3S) ............................................................ 116
SSYR2K(3S) .................................................................................................... ssyr2k(3S) .......................................................... 155
ssyr2k(3S) .................................................................................................... ssyr2k(3S) .......................................................... 155
SSYR(3S) ........................................................................................................ ssyr(3S)............................................................... 114
ssyr(3S) ........................................................................................................ ssyr(3S)............................................................... 114
SSYRK(3S) ...................................................................................................... ssyrk(3S) ............................................................ 158
ssyrk(3S) ...................................................................................................... ssyrk(3S) ............................................................ 158
STBMV(3S) ...................................................................................................... stbmv(3S) ............................................................ 118
stbmv(3S) ...................................................................................................... stbmv(3S) ............................................................ 118
STBSV(3S) ...................................................................................................... stbsv(3S) ............................................................ 121
stbsv(3S) ...................................................................................................... stbsv(3S) ............................................................ 121
STPMV(3S) ...................................................................................................... stpmv(3S) ............................................................ 124
stpmv(3S) ...................................................................................................... stpmv(3S) ............................................................ 124
STPSV(3S) ...................................................................................................... stpsv(3S) ............................................................ 126
stpsv(3S) ...................................................................................................... stpsv(3S) ............................................................ 126
strco(3S) ...................................................................................................... linpack(3S) ......................................................... 29
strdi(3S) ...................................................................................................... linpack(3S) ......................................................... 29
STRMM(3S) ...................................................................................................... strmm(3S) ............................................................ 160
strmm(3S) ...................................................................................................... strmm(3S) ............................................................ 160
STRMV(3S) ...................................................................................................... strmv(3S) ............................................................ 128
strmv(3S) ...................................................................................................... strmv(3S) ............................................................ 128
strsl(3S) ...................................................................................................... linpack(3S) ......................................................... 29
STRSV(3S) ...................................................................................................... strsv(3S) ............................................................ 130

Index-18 004– 2081– 002


strsv(3S) ...................................................................................................... strsv(3S) ............................................................ 130
Sums the absolute value of elements in a real or complex vector ................. sasum(3S) .............................................................. 42
Sums the elements of a real or complex vector ............................................. ssum(3S)................................................................. 72
SVD(3S) ........................................................................................................... eispack(3S) ......................................................... 23
svd(3S) ........................................................................................................... eispack(3S) ......................................................... 23
Swapping vectors ............................................................................................ sswap(3S) .............................................................. 73
Swaps two real or complex vectors ................................................................ sswap(3S) .............................................................. 73
Symmetric band matrix ................................................................................... ssbmv(3S) ............................................................ 102
Symmetric coefficient ..................................................................................... filters(3S)........................................................ 225
symmetric filter ............................................................................................... hcorrs(3S) .......................................................... 235
symmetric filter ............................................................................................... scorrs(3S) .......................................................... 330
Symmetric matrix ............................................................................................ sspmv(3S) ............................................................ 104
Symmetric matrix ............................................................................................ sspr(3S)............................................................... 106
Symmetric matrix ............................................................................................ sspr2(3S) ............................................................ 108
Symmetric matrix ............................................................................................ sspr12(3S) .......................................................... 110
Symmetric matrix ............................................................................................ ssymv(3S) ............................................................ 112
Symmetric matrix ............................................................................................ ssyr(3S)............................................................... 114
Symmetric matrix ............................................................................................ ssyr2(3S) ............................................................ 116
Symmetric packed matrix ............................................................................... sspmv(3S) ............................................................ 104
Symmetric packed matrix ............................................................................... sspr(3S)............................................................... 106
Symmetric packed matrix ............................................................................... sspr2(3S) ............................................................ 108
Symmetric packed matrix ............................................................................... sspr12(3S) .......................................................... 110
Symmetric rank 1 update ................................................................................ sspr(3S)............................................................... 106
Symmetric rank 1 update ................................................................................ sspr12(3S) .......................................................... 110
Symmetric rank 1 update ................................................................................ ssyr(3S)............................................................... 114
Symmetric rank 2 update ................................................................................ sspr2(3S) ............................................................ 108
Symmetric rank 2 update ................................................................................ ssyr2(3S) ............................................................ 116
Symmetric rank 2k update (BLAS3) .............................................................. ssyr2k(3S) .......................................................... 155
Symmetric vectors ........................................................................................... filters(3S)........................................................ 225
TINVIT(3S) .................................................................................................... eispack(3S) ......................................................... 23
tinvit(3S) .................................................................................................... eispack(3S) ......................................................... 23
TQL1(3S) ........................................................................................................ eispack(3S) ......................................................... 23
tql1(3S) ........................................................................................................ eispack(3S) ......................................................... 23
TQL2(3S) ........................................................................................................ eispack(3S) ......................................................... 23
tql2(3S) ........................................................................................................ eispack(3S) ......................................................... 23
TQLRAT(3S) .................................................................................................... eispack(3S) ......................................................... 23
tqlrat(3S) .................................................................................................... eispack(3S) ......................................................... 23
TRBAK3(3S) .................................................................................................... eispack(3S) ......................................................... 23
trbak3(3S) .................................................................................................... eispack(3S) ......................................................... 23
TRBAK(3S) ...................................................................................................... eispack(3S) ......................................................... 23
trbak(3S) ...................................................................................................... eispack(3S) ......................................................... 23
TRED1(3S) ...................................................................................................... eispack(3S) ......................................................... 23
tred1(3S) ...................................................................................................... eispack(3S) ......................................................... 23
TRED2(3S) ...................................................................................................... eispack(3S) ......................................................... 23
tred2(3S) ...................................................................................................... eispack(3S) ......................................................... 23
TRED3(3S) ...................................................................................................... eispack(3S) ......................................................... 23
tred3(3S) ...................................................................................................... eispack(3S) ......................................................... 23
Triangular band matrix ................................................................................... stbmv(3S) ............................................................ 118
Triangular banded system of equations .......................................................... stbsv(3S) ............................................................ 121
Triangular matrix ............................................................................................ strmv(3S) ............................................................ 128

004– 2081– 002 Index-19


Triangular matrix ............................................................................................ strmm(3S) ............................................................ 160
Triangular packed matrix ................................................................................ stpmv(3S) ............................................................ 124
Triangular packed system of equations .......................................................... stpsv(3S) ............................................................ 126
Triangular system of equations ....................................................................... strsv(3S) ............................................................ 130
TRIDIB(3S) .................................................................................................... eispack(3S) ......................................................... 23
tridib(3S) .................................................................................................... eispack(3S) ......................................................... 23
TSTURM(3S) .................................................................................................... eispack(3S) ......................................................... 23
tsturm(3S) .................................................................................................... eispack(3S) ......................................................... 23
Vector .............................................................................................................. saxpby(3S) ............................................................ 44
Vector addition ................................................................................................ ssum(3S)................................................................. 72
Vector element addition .................................................................................. sasum(3S) .............................................................. 42
Weiner-Levinson linear equations ................................................................... hopfilt(3S)........................................................ 243
Weiner-Levinson linear equations ................................................................... opfilt(3S) .......................................................... 253

Index-20 004– 2081– 002


Scientific Libraries Reference
Manual, Volume 2
004–2081–002
Copyright © 1989, 1994, 1995, 1997–1999 Silicon Graphics, Inc. All Rights Reserved. This manual or parts thereof may not be
reproduced in any form unless permitted by contract or by written permission of Silicon Graphics, Inc.

LIMITED AND RESTRICTED RIGHTS LEGEND

Use, duplication, or disclosure by the Government is subject to restrictions as set forth in the Rights in Data clause at FAR
52.227-14 and/or in similar or successor clauses in the FAR, or in the DOD, DOE or NASA FAR Supplements. Unpublished rights
reserved under the Copyright Laws of the United States. Contractor/manufacturer is Silicon Graphics, Inc., 1600 Amphitheatre
Pkwy., Mountain View, CA 94043-1351.

Autotasking, CF77, Cray, Cray Ada, CraySoft, Cray Y-MP, Cray-1, CRInform, CRI/TurboKiva, HSX, LibSci, MPP Apprentice, SSD,
SUPERCLUSTER, UNICOS, X-MP EA, and UNICOS/mk are federally registered trademarks and Because no workstation is an
island, CCI, CCMT, CF90, CFT, CFT2, CFT77, ConCurrent Maintenance Tools, COS, Cray Animation Theater, Cray APP, Cray C90,
Cray C90D, Cray C++ Compiling System, CrayDoc, Cray EL, Cray J90, Cray J90se, CrayLink, Cray NQS, Cray/REELlibrarian,
Cray S-MP, Cray SSD-T90, Cray SV1, Cray T90, Cray T3D, Cray T3E, CrayTutor, Cray X-MP, Cray XMS, Cray-2, CSIM, CVT,
Delivering the power . . ., DGauss, Docview, EMDS, GigaRing, HEXAR, IOS, ND Series Network Disk Array,
Network Queuing Environment, Network Queuing Tools, OLNET, RQS, SEGLDR, SMARTE, SUPERLINK,
System Maintenance and Remote Testing Environment, Trusted UNICOS, and UNICOS MAX are trademarks of Cray Research,
L.L.C., a wholly owned subsidiary of Silicon Graphics, Inc.

SGI is a trademark of Silicon Graphics, Inc. IRIX and Silicon Graphics are registered trademarks and the Silicon Graphics logo is a
trademark of Silicon Graphics, Inc.

CDC is a trademark of Control Data Systems, Inc. DEC, ULTRIX, VAX, and VMS are trademarks of Digital Equipment
Corporation. ER90 is a trademark of EMASS, Inc. ETA is a trademark of ETA Systems, Inc. IBM is a trademark of International
Business Machines Corporation. MIPS is a trademark of MIPS Computer Systems. UNIX is a registered trademark in the United
States and other countries, licensed exclusively through X/Open Company Limited. X/Open is a registered trademark of X/Open
Company Ltd. X Window System and the X device are trademarks of The Open Group.

The UNICOS operating system is derived from UNIX® System V. The UNICOS operating system is also based in part on the
Fourth Berkeley Software Distribution (BSD) under license from The Regents of the University of California.
New Features

Scientific Libraries Reference Manual, Volume 2 004–2081–002

No user interface changes were made for this release.


Record of Revision

Version Description

5.0 March 1989


Documentation supporting the UNICOS 5.0 release running on Cray Research
computer systems.

6.0 January 1991


Reprint with revision supporting the UNICOS 6.0 release running on Cray Research
computer systems.

7.0 August 1992


Reprint with revision supporting UNICOS 7.0 release running on Cray Research
computer systems.

8.0 August 1993


Reprint with revision supporting the CrayLibs 1.0 release (asynchronous) that runs
on Cray Research systems. In this revision of the documentation, the math library is
no longer documented in the same manual as the scientific library. Instead, it is
documented in the Math Library Reference Manual, publication SR-2138. The sort and
search routines, which were in the UNICOS 7.0 version of the scientific library, were
moved to the UNICOS Fortran Library.

8.1 June 1994


Rewrite to support the CrayLibs 1.1 release (asynchronous) that runs on Cray
Research systems. This revision incorporates support for the Cray MPP hardware
platform.

8.2 October 1994


Rewrite to support the CrayLibs 1.2 release (asynchronous) that runs on Cray
Research systems. This revision incorporates support for the Basic Linear Algebra
Subprograms for shared arrays (BLAS_S).

2.0 December 1995


Rewrite to support the CrayLibs 2.0 release that runs on Cray Research systems.
This revision incorporates support for Scalable LAPACK (ScaLAPACK) and
documented support for 32-bit FFT routines. Additional routines were added for
FFT and BLAS.

3.0 June 1997


Rewrite to support the CrayLibs 3.0 release that runs on Cray Research systems.
This revision removes support for the Basic Linear Algebra Subprograms for shared

004–2081–002 i
Scientific Libraries Reference Manual, Volume 2

arrays (BLAS_S). See the New Features page for more details about additional
functionality added at this release.

3.1 August 1998


Updated to reflect changes in the Programming Environment 3.1 release. The
printed text of this manual was made available in postscript (.ps) format only for
this release.

3.3 July 1999


Updated to reflect changes in the Programming Environment 3.3 release. The
printed text of this manual was made available in postscript (.ps) format only for
this release.

ii 004–2081–002
About This Guide

This publication documents subprograms and routines available to users of the


CrayLibs product, which is included in the Programming Environment 3.3
release. The CrayLibs product contains several libraries; the library routines can
be called from source code written in a number of programming languages,
including Fortran, C, Pascal, and assembly language. The information in this
document supplements information contained in other manuals of the
Programming Environment documentation set.
This is a reference manual for application and system programmers. Readers
should also have a working knowledge of either the UNICOS, UNICOS/mk, or
UNIX operating system and a working knowledge of the Fortran or C
programming language.

Documentation Organization
The printed versions of the Scientific Library man pages appear in 2 volumes
and are grouped according to topics. See the INTRO_LIBSCI(3S) man page for
details about the contents of each volume.
Each topic section also has an introductory man page which explains the
contents of the section and provides other information about the usage of those
routines. The following introductory man pages are available:
INTRO_BLACS(3S)
INTRO_BLAS1(3S)
INTRO_BLAS2(3S)
INTRO_BLAS3(3S)
INTRO_CORE(3S)
INTRO_FFT(3S)
INTRO_LAPACK(3S)
INTRO_MACH(3S)
INTRO_SCALAPACK(3S)
INTRO_SPARSE(3S)

004–2081–002 iii
Scientific Libraries Reference Manual, Volume 2

INTRO_SPEC(3S)
INTRO_SUPERSEDED(3S)

Related Publications
The following manuals document the CrayLibs products. All man pages in
these manuals can also be viewed online by using the man command.
• Intrinsic Procedures Reference Manual
• Application Programmer’s Library Reference Manual
• Scientific Libraries Ready Reference
• Application Programmer’s Library Ready Reference
The following manuals describe the products in the Programming Environment.
These publications describe the operating system, input/output (I/O), and
other related topics.
• Segment Loader (SEGLDR) and ld Reference Manual
• UNICOS User Commands Reference Manual
• UNICOS User Commands Ready Reference
• Guide to Parallel Vector Applications
• Application Programmer’s I/O Guide
In addition to these documents, several documents are available that describe
the compiler systems available on UNICOS and UNICOS/mk. Some of these
manuals are:
• CF90 Ready Reference
• CF90 Commands and Directives Reference Manual
• Fortran Language Reference Manual, Volume 1
• Fortran Language Reference Manual, Volume 2
• Fortran Language Reference Manual, Volume 3
• Cray C/C++ Reference Manual

iv 004–2081–002
About This Guide

The following manuals document the compilers that are available on IRIX
systems:
• MIPSPro 7 Fortran 90 Commands and Directives Reference Manual
• MIPSpro Assembly Language Programmer’s Guide
• MIPSpro Fortran 77 Language Reference Manual
• MIPSpro Fortran 77 Programmer’s Guide
• MIPSpro 64-Bit Porting and Transition Guide

Obtaining Publications
SGI maintains information about available publications at the following URL:
http://techpubs.sgi.com/library

This Web site contains information that allows you to browse documents online,
order documents, and send feedback to SGI. You can also order a printed SGI
document by calling 1 800 627 9307.
The User Publications Catalog describes the availability and content of all Cray
hardware and software documents that are available to customers. Customers
who subscribe to the Cray Inform (CRInform) program can access this
information on the CRInform system.
SGI maintains information on publicly available Cray documents at the
following URL:
http://www.cray.com/swpubs/

This Web site contains information that allows you to browse documents online
and send feedback to SGI. To order a printed Cray document, either call
+1 651 683 5907 or send a facsimile of your request to fax number
+1 651 683 3840. SGI employees may also order printed Cray documents by
sending their orders via electronic mail to orderdsk.
Customers outside of the United States and Canada should contact their local
service organization for ordering information and documentation information.

Conventions
The following conventions are used throughout this document:

004–2081–002 v
Scientific Libraries Reference Manual, Volume 2

Convention Meaning
command This fixed-space font denotes literal items such as
commands, files, routines, path names, signals,
messages, and programming language structures.
variable Italic typeface denotes variable entries and words
or concepts being defined.
user input This bold, fixed-space font denotes literal items
that the user enters in interactive sessions.
Output is shown in nonbold, fixed-space font.
In addition to these formatting conventions, several naming conventions are
used throughout the documentation. “Cray PVP systems” denotes all
configurations of Cray parallel vector processing (PVP) systems that run the
UNICOS operating system. “Cray MPP systems” denotes all configurations of
the Cray T3E series that run the UNICOS/mk operating system. “IRIX
systems” denotes SGI platforms which run the IRIX operating system.
The default shell in the UNICOS and UNICOS/mk operating systems, referred
to as the standard shell, is a version of the Korn shell that conforms to the
following standards:
• Institute of Electrical and Electronics Engineers (IEEE) Portable Operating
System Interface (POSIX) Standard 1003.2–1992
• X/Open Portability Guide, Issue 4 (XPG4)
The UNICOS and UNICOS/mk operating systems also support the optional use
of the C shell.

Man page sections


The entries in this document are based on a common format. The following list
shows the order of sections in an entry and describes each section. Most entries
contain only a subset of these sections.

Section heading Description


NAME Specifies the name of the entry and briefly states
its function.
SYNOPSIS Presents the syntax of the entry.
IMPLEMENTATION Identifies the systems to which the entry applies.

vi 004–2081–002
About This Guide

STANDARDS Provides information about the portability of a


utility or routine.
DESCRIPTION Discusses the entry in detail.
NOTES Presents items of particular importance.
CAUTIONS Describes actions that can destroy data or
produce undesired results.
WARNINGS Describes actions that can harm people,
equipment, or system software.
ENVIRONMENT Describes predefined shell variables that
VARIABLES determine some characteristics of the shell or that
affect the behavior of some programs, commands,
or utilities.
RETURN VALUES Describes possible return values that indicate a
library or system call executed successfully, or
identifies the error condition under which it
failed.
EXIT STATUS Describes possible exit status values that indicate
whether the command or utility executed
successfully.
MESSAGES Describes informational, diagnostic, and error
messages that may appear. Self-explanatory
messages are not listed.
ERRORS Documents error codes. Applies only to system
calls.
FORTRAN Describes how to call a system call from Fortran.
EXTENSIONS Applies only to system calls.
BUGS Indicates known bugs and deficiencies.
EXAMPLES Shows examples of usage.
FILES Lists files that are either part of the entry or are
related to it.

004–2081–002 vii
Scientific Libraries Reference Manual, Volume 2

SEE ALSO Lists entries and publications that contain related


information.

Reader Comments
If you have comments about the technical accuracy, content, or organization of
this document, please tell us. Be sure to include the title and part number of
the document with your comments.
You can contact us in any of the following ways:
• Send e-mail to the following address:
techpubs@sgi.com

• Send a fax to the attention of “Technical Publications” at: +1 650 932 0801.
• Use the Feedback option on the Technical Publications Library World Wide
Web page:
http://techpubs.sgi.com

• Call the Technical Publications Group, through the Technical Assistance


Center, using one of the following numbers:
For SGI IRIX based operating systems: 1 800 800 4SGI
For UNICOS or UNICOS/mk based operating systems or Cray Origin 2000
systems: 1 800 950 2729 (toll free from the United States and Canada) or
+1 651 683 5600
• Send mail to the following address:
Technical Publications
SGI
1600 Amphitheatre Pkwy.
Mountain View, California 94043–1351
We value your comments and will respond to them promptly.

viii 004–2081–002
CONTENTS

Solvers for dense linear systems and eigensystems

intro_lapack, INTRO_LAPACK .......................... Introduction to LAPACK solvers for dense linear systems ...................... 333
eispack, EISPACK .................................................. Introduction to Eigensystem computation for dense linear systems ......... 349
linpack, LINPACK .................................................. Single-precision real and complex LINPACK routines ............................ 355

Scalable LAPACK
intro_scalapack, INTRO_SCALAPACK ............ Introduction to the ScaLAPACK routines for distributed matrix
computations .............................................................................................. 359
descinit, DESCINIT ............................................. Initializes a descriptor vector of a distributed two-dimensional array ...... 362
indxg2p, INDXG2P .................................................. Computes the coordinate of the processing element (PE) that
possesses the entry of a distributed matrix ............................................... 364
numroc, NUMROC ...................................................... Computes the number of rows or columns of a distributed matrix
owned locally ............................................................................................ 365
pcheevx, PCHEEVX .................................................. Computes selected eigenvalues and eigenvectors of a Hermitian-
definite eigenproblem ................................................................................ 366
pchegvx, PCHEGVX .................................................. Computes selected eigenvalues and eigenvectors of a Hermitian-
definite generalized eigenproblem ............................................................. 374
psgebrd, PSGEBRD, PCGEBRD ............................... Reduces a real or complex distributed matrix to bidiagonal form ........... 382
psgelqf, PSGELQF, PCGELQF ............................... Computes an LQ factorization of a real or complex distributed matrix ... 387
psgeqlf, PSGEQLF, PCGEQLF ............................... Computes a QL factorization of a real or complex distributed matrix ..... 390
psgeqpf, PSGEQPF, PCGEQPF ............................... Computes a QR factorization with column pivoting of a real or
complex distributed matrix ........................................................................ 393
psgeqrf, PSGEQRF, PCGEQRF ............................... Computes a QR factorization of a real or complex distributed matrix ..... 396
psgerqf, PSGERQF, PCGERQF ............................... Computes a RQ factorization of a real or complex distributed matrix ..... 399
psgesv, PSGESV, PCGESV ...................................... Computes the solution to a real or complex system of linear
equations .................................................................................................... 402
psgetrf, PSGETRF, PCGETRF ............................... Computes an LU factorization of a real or complex distributed matrix ... 405
psgetri, PSGETRI, PCGETRI ............................... Computes the inverse of a real or complex distributed matrix ................. 408
psgetrs, PSGETRS, PCGETRS ............................... Solves a real or complex distributed system of linear equations .............. 411
psposv, PSPOSV, PCPOSV ...................................... Solves a real symmetric or complex Hermitian system of linear
equations .................................................................................................... 414
pspotrf, PSPOTRF, PCPOTRF ............................... Computes the Cholesky factorization of a real symmetric or complex
Hermitian positive definite distributed matrix ........................................... 418
pspotri, PSPOTRI, PCPOTRI ............................... Computes the inverse of a real symmetric or complex Hermitian
positive definite distributed matrix ............................................................ 421
pspotrs, PSPOTRS, PCPOTRS ............................... Solves a real symmetric positive definite or complex Hermitian
positive definite system of linear equations .............................................. 424
pssyevx, PSSYEVX .................................................. Computes selected eigenvalues and eigenvectors of a real symmetric
matrix ......................................................................................................... 427
pssygvx, PSSYGVX .................................................. Computes selected eigenvalues and eigenvectors of a real
symmetric-definite generalized eigenproblem ........................................... 434
pssytrd, PSSYTRD, PCHETRD ............................... Reduces a real symmetric or complex Hermitian distributed matrix to
tridiagonal form ......................................................................................... 442
pstrtri, PSTRTRI, PCTRTRI ............................... Computes the inverse of a real or complex upper or lower triangular
distributed matrix ....................................................................................... 446
pstrtrs, PSTRTRS, PCTRTRS ............................... Solves a real or complex distributed triangular system ............................ 449

004– 2081– 002 ix


Solvers for sparse linear systems
intro_sparse, INTRO_SPARSE .......................... Introduction to solvers for sparse linear systems ...................................... 453
dfaults, DFAULTS .................................................. Assigns default values to the parameter arguments for SITRSOL(3S) .... 461
sitrsol, SITRSOL .................................................. Solves a real general sparse system, using a preconditioned conjugate
gradient-like method .................................................................................. 466
ssgetrf, SSGETRF .................................................. Factors a real sparse general matrix with threshold pivoting
implemented .............................................................................................. 482
ssgetrs, SSGETRS .................................................. Solves a real sparse general system, using the factorization computed
in SSGETRF(3S) ....................................................................................... 487
sspotrf, SSPOTRF .................................................. Factors a real sparse symmetric definite matrix ........................................ 489
sspotrs, SSPOTRS .................................................. Solves a real sparse symmetric definite system, using the factorization
computed in SSPOTRF(3S) ....................................................................... 494
sststrf, SSTSTRF .................................................. Factors a real sparse general matrix with a symmetric nonzero pattern
(no form of pivoting is implemented) ....................................................... 496
sststrs, SSTSTRS .................................................. Solves a real sparse general system with a symmetric nonzero
pattern, using the factorization computed in SSTSTRF(3S) .................... 501

Solvers for special linear systems


intro_spec, INTRO_SPEC ................................... Introduction to solvers for special linear systems ..................................... 503
folr, FOLR, FOLRP .................................................. Solves first-order linear recurrences .......................................................... 504
folr2, FOLR2, FOLR2P ........................................... Solves first-order linear recurrences without overwriting the operand
vector ......................................................................................................... 509
folrc, FOLRC ........................................................... Solves a first-order linear recurrence with a scalar multiplier .................. 511
folrn, FOLRN, FOLRNP ........................................... Solves for the last term of first-order linear recurrence ............................ 513
recpp, RECPP, RECPS ............................................. Solves a partial product or partial summation problem ............................ 516
sdtsol, SDTSOL, CDTSOL ...................................... Solves a real-valued or complex-valued tridiagonal system with one
right-hand side ........................................................................................... 518
sdttrf, SDTTRF, CDTTRF ...................................... Factors a real-valued or complex-valued tridiagonal system .................... 520
sdttrs, SDTTRS, CDTTRS ...................................... Solves a real-valued or complex-valued tridiagonal system with one
right-hand side, using its factorization as computed by SDTTRF(3S)
or CDTTRF(3) ............................................................................................ 523
solr, SOLR ................................................................ Solves a second-order linear recurrence .................................................... 526
solr3, SOLR3 ........................................................... Solves a second-order linear recurrence for three terms ........................... 528
solrn, SOLRN ........................................................... Solves a second-order linear recurrence for only the last term ................ 531

BLACS routines
intro_blacs, INTRO_BLACS ............................... Introduction to Basic Linear Algebra Communication Subprograms ...... 535
blacs_barrier, BLACS_BARRIER ..................... Stops execution until all specifed processes have called a routine ........... 539
blacs_exit, BLACS_EXIT ................................... Frees all existing grids .............................................................................. 540
blacs_gridexit, BLACS_GRIDEXIT ................. Frees a grid ................................................................................................ 541
blacs_gridinfo, BLACS_GRIDINFO ................. Returns information about the two-dimensional processor grid ............... 542
blacs_gridinit, BLACS_GRIDINIT ................. Initializes counters, variables, and so on, for the BLACS routines .......... 543
blacs_gridmap, BLACS_GRIDMAP ..................... a grid of processors ................................................................................... 544
blacs_pcoord, BLACS_PCOORD .......................... Computes coordinates in two-dimensional grids ....................................... 545
blacs_pnum, BLACS_PNUM ................................... Returns the processor element number for specified coordinates in
two-dimensional grids ............................................................................... 546
gridinfo3d, GRIDINFO3D ................................... Returns information about the three-dimensional processor grid ............. 547
gridinit3d, GRIDINIT3D ................................... Initializes variables for a three-dimensional (3D) grid partition of
processor set .............................................................................................. 548
igamn2d, IGAMN2D, SGAMN2D, CGAMN2D ............ Determines minimum absolute values of rectangular matrices ................. 550

x 004– 2081– 002


igamx2d, IGAMX2D, SGAMX2D, CGAMX2D ............ Determines maximum absolute values of rectangular matrices ................ 552
igebr2d, IGEBR2D, SGEBR2D, CGEBR2D ............ Receives a broadcast general rectangular matrix from all or a subset
of processors .............................................................................................. 554
igebs2d, IGEBS2D, SGEBS2D, CGEBS2D ............ Broadcasts a general rectangular matrix to all or a subset of
processors .................................................................................................. 556
igerv2d, IGERV2D, SGERV2D, CGERV2D ............ Receives a general rectangular matrix from another processor ................ 558
igesd2d, IGESD2D, SGESD2D, CGESD2D ............ Sends a general rectangular matrix to another processor .......................... 560
igsum2d, IGSUM2D, SGSUM2D, CGSUM2D ............ Performs element summation operations on rectangular matrices ............ 562
itrbr2d, ITRBR2D, STRBR2D, CTRBR2D ............ Receives a broadcast trapezoidal rectangular matrix from all or a
subset of processors ................................................................................... 564
itrbs2d, ITRBS2D, STRBS2D, CTRBS2D ............ Broadcasts a trapezoidal rectangular matrix to all or a subset of
processors .................................................................................................. 566
itrrv2d, ITRRV2D, STRRV2D, CTRRV2D ............ Receives a trapezoidal rectangular matrix from another processor .......... 568
itrsd2d, ITRSD2D, STRSD2D, CTRSD2D ............ Sends a trapezoidal rectangular matrix to another processor .................... 570
mynode, MYNODE ...................................................... Returns the calling processor’s assigned number ..................................... 572
pcoord3d, PCOORD3D ............................................. Computes three-dimensional (3D) processor grid coordinates ................ 573
pnum3d, PNUM3D ...................................................... Returns the processor element number for specified three-dimensional
(3D) coordinates ........................................................................................ 574

Out-of-core routines
intro_core, INTRO_CORE ................................... Introduction to the Cray Research Scientific Library out-of-core
routines for linear algebra ......................................................................... 575
scopy2rv, SCOPY2RV, CCOPY2RV ........................ Copies a submatrix of a real or complex matrix in memory into a
virtual matrix ............................................................................................. 590
scopy2vr, SCOPY2VR, CCOPY2VR ........................ Copies a submatrix of a virtual matrix to a real or complex (in
memory) matrix ......................................................................................... 593
vbegin, VBEGIN ...................................................... Initializes the out-of-core routine data structures ...................................... 595
vend, VEND ................................................................ Handles terminal processing for the out-of-core routines ......................... 598
vsgemm, VSGEMM, VCGEMM ...................................... Multiplies a virtual real or complex general matrix by a virtual real
or complex general matrix ........................................................................ 600
vsgetrf, VSGETRF, VCGETRF ............................... Computes an LU factorization of a virtual general matrix with real or
complex elements, using partial pivoting with row interchanges ............. 604
vsgetrs, VSGETRS, VCGETRS ............................... Solves a virtual system of linear equations, using the LU factorization
computed by VSGETRF(3S) or VCGETRF(3S) ......................................... 608
vspotrf, VSPOTRF .................................................. Computes the Cholesky factorization of a real symmetric positive
definite virtual matrix ................................................................................ 610
vspotrs, VSPOTRS .................................................. Solves a virtual system of linear equations with a symmetric positive
definite matrix whose Cholesky factorization has been computed by
VSPOTRF(3S) ............................................................................................ 612
vssyrk, VSSYRK ...................................................... Performs symmetric rank k update of a real or complex symmetric
virtual matrix ............................................................................................. 614
vstorage, VSTORAGE ............................................. Declares packed storage mode for a triangular, symmetric, or
Hermitian (complex only) virtual matrix .................................................. 616
vstrsm, VSTRSM, VCTRSM ...................................... Solves a virtual real or virtual complex triangular system of equations
with multiple right-hand sides ................................................................... 619

Machine constant functions


intro_mach, INTRO_MACH ................................... Introduction to machine constant functions .............................................. 623
r1mach, R1MACH ...................................................... Returns Cray PVP machine constants ....................................................... 624
slamch, SLAMCH ...................................................... Determines single-precision machine parameters ..................................... 626

004– 2081– 002 xi


smach, SMACH, CMACH ............................................. Returns machine epsilon, small or large normalized numbers ................. 628

Superseded routines
intro_superseded, INTRO_SUPERSEDED ....... Introduction to superseded Scientific Library routines ............................. 631
gather, GATHER ...................................................... Gathers a vector from a source vector ...................................................... 633
minv, MINV ................................................................ Solves systems of linear equations by inverting a square matrix ............. 634
mxm, MXM ....................................................................
Computes matrix-times-matrix product (unit increments) ........................ 637
mxma, MXMA ................................................................ Computes matrix-times-matrix product (arbitrary increments) ................. 639
mxv, MXV ....................................................................
Computes matrix-times-vector product (unit increments) ......................... 642
mxva, MXVA ................................................................ Computes matrix-times-vector product (arbitrary increments) ................. 644
scatter, SCATTER .................................................. Scatters a vector into another vector ......................................................... 646
smxpy, SMXPY ........................................................... Multiplies a column vector by a matrix and adds the result to another
column vector ............................................................................................ 647
sxmpy, SXMPY ........................................................... Multiplies a row vector by a matrix and adds the result to another
row vector .................................................................................................. 649
trid, TRID ................................................................ Solves a tridiagonal system ....................................................................... 651

xii 004– 2081– 002


INTRO_LAPACK ( 3S ) INTRO_LAPACK ( 3S )

NAME
INTRO_LAPACK – Introduction to LAPACK solvers for dense linear systems

IMPLEMENTATION
See individual man pages for implementation details

DESCRIPTION
The preferred solvers for dense linear systems are those parts of the LAPACK package included in the
current version of the Scientific Library. The LAPACK routines in the Scientific Library supersede the older
LINPACK routines (see LINPACK(3S) for more information).
LAPACK Routines
LAPACK is a public domain library of subroutines for solving dense linear algebra problems, including the
following:
• Systems of linear equations
• Linear least squares problems
• Eigenvalue problems
• Singular value decomposition (SVD) problems
For details about which routines are supported, see LAPACK Routines Contained in the Scientific Library,
which follows.
The LAPACK package is designed to be the successor to the older LINPACK and EISPACK packages. It
uses today’s high-performance computers more efficiently than the older packages. It also extends the
functionality of these packages by including equilibration, iterative refinement, error bounds, and driver
routines for linear systems, routines for computing and reordering the Schur factorization, and condition
estimation routines for eigenvalue problems.
Performance issues are addressed by implementing the most computationally-intensive algorithms by using
the Level 2 and 3 Basic Linear Algebra Subprograms (BLAS). Because most of the BLAS were optimized
in single- and multiple-processor environments for UNICOS and UNICOS/mk systems, these algorithms give
near optimal performance.
The original Fortran programs are described in the LAPACK User’s Guide by E. Anderson, Z. Bai,
C. Bischof, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney,
S. Ostrouchov, and D. Sorensen, published by the Society for Industrial and Applied Mathematics (SIAM),
Philadelphia, 1992. You can order the LAPACK User’s Guide, publication TPD– 0003.
LAPACK Routines Contained in the Scientific Library
Most of the single-precision (64-bit) real and complex routines from LAPACK 2.0 are supported in the
Scientific Library. This includes driver routines and computational routines for solving linear systems, least
squares problems, and eigenvalue and singular value problems. Selected auxiliary routines for generating
and manipulating elementary orthogonal transformations are also supported.

004– 2081– 002 333


INTRO_LAPACK ( 3S ) INTRO_LAPACK ( 3S )

The Scientific Library does not include the LAPACK driver routines for certain generalized eigenvalue and
singular value computations and the divide-and-conquer routines for computing eigenvalues, which were new
for LAPACK 2.0. This may be added in a future release. Also, most of the auxiliary routines used only
internally by LAPACK have been renamed to avoid conflicts with user-defined subroutine names.
The LAPACK routines in the Scientific Library are described online in man pages. For example, to see a
description of the arguments to the expert driver routine for solving a general system of equations, enter the
following command:
% man sgesvx

The user interface to all LAPACK routines is exactly the same as the standard LAPACK interface, except
for the CPTSV(3L) and CPTSVX(3L) driver routines. An optional character argument was added to CPTSV
and CPTSVX to afford upward compatibility with the storage format in LINPACK’s CPTSL. However,
because the argument is optional the LAPACK calling sequence also is accepted.
Several enhancements were made to the public-domain LAPACK software to improve performance for
UNICOS and UNICOS/mk systems. In particular, the solve routines were redesigned to give better
performance for one or a small number of right-hand sides, and to make better use of parallelism when the
number of right-hand sides is large.
Tuning parameters for the block algorithms provided in the Scientific Library are set within the LAPACK
routine ILAENV(3L). ILAENV(3L) is an integer function subprogram that accepts information about the
problem type and dimensions, and it returns one integer parameter, such as the optimal block size, the
minimum block size for which a block algorithm should be used, or the crossover point (the problem size at
which it becomes more efficient to switch to an unblocked algorithm). The setting of tuning parameters
occurs without user intervention, but users may call ILAENV(3L) directly to discover the values that will be
used (for example, to determine how much workspace to provide).
Naming Scheme
The name of each LAPACK routine is a coded specification of its function (within the limits of standard
FORTRAN 77 six-character names).
All driver and computational routines have five- or six-character names of the form XYYZZ or XYYZZZ.
The first letter in each name, X, indicates the data type, as follows:
S REAL (single precision)
C COMPLEX
The next two letters, YY, indicate the type of matrix (or the most-significant matrix). Most of these
two-letter codes apply to both real and complex matrices, but a few apply specifically to only one or the
other. The matrix types are as follows:
BD BiDiagonal
GB General Band
GE GEneral (nonsymmetric)
GG General matrices, Generalized problem

334 004– 2081– 002


INTRO_LAPACK ( 3S ) INTRO_LAPACK ( 3S )

GT General Tridiagonal
HB Hermitian Band (complex only)
HE HErmitian (possibly indefinite) (complex only)
HG Hessenberg matrix, Generalized problem
HP Hermitian Packed (possibly indefinite) (complex only)
HS upper HeSsenberg
OP Orthogonal Packed (real only)
OR ORthogonal (real only)
PB Positive definite Band (symmetric or Hermitian)
PO POsitive definite (symmetric or Hermitian)
PP Positive definite Packed (symmetric or Hermitian)
PT Positive definite Tridiagonal (symmetric or Hermitian)
SB Symmetric Band (real only)
SP Symmetric Packed (possibly indefinite)
ST Symmetric Tridiagonal
SY SYmmetric (possibly indefinite)
TB Triangular Band
TG Triangular matrices, Generalized problem
TP Triangular Packed
TR TRiangular
TZ TrapeZoidal
UN UNitary (complex only)
UP Unitary Packed (complex only)
Some LAPACK auxiliary routines also have man pages on UNICOS and UNICOS/mk systems. These
routines use the special YY designation:
LA LAPACK Auxiliary routine
For example, ILAENV(3) is the auxiliary routine that determines the block size for a particular algorithm and
problem size.
The last two or three letters, ZZ or ZZZ, indicate the computation performed. For example, SGETRF
performs a TRiangular Factorization of a Single-precision (real) GEneral matrix; CGETRF performs the
factorization of a Complex GEneral matrix.

004– 2081– 002 335


INTRO_LAPACK ( 3S ) INTRO_LAPACK ( 3S )

Lists of Available LAPACK Routines


The following pages contain tables of driver and computational routines from LAPACK available in the
Scientific Library. For details about the argument lists and usage of these routines, see the individual online
man pages or the LAPACK User’s Guide, publication TPD– 0003.
Driver Routines
These routines are listed in alphabetical order.

Name Purpose

CHESV Solves a complex Hermitian indefinite system of linear equations AX = B.


CHESVX Solves a complex Hermitian indefinite system of linear equations AX = B and provides an
estimate of the condition number and error bounds on the solution.
CHPSV Solves a complex Hermitian indefinite system of linear equations AX = B; A is held in
packed storage.
CHPSVX Solves a complex Hermitian indefinite system of linear equations AX = B (A is held in
packed storage) and provides an estimate of the condition number and error bounds on the
solution.
SGBSV Solves a general banded system of linear equations AX = B.
CGBSV
SGBSVX Solves any of the following general banded systems of linear equations and provides an
CGBSVX estimate of the condition number and error bounds on the solution.
AX = B
T
A X=B
H
A X=B
SGEES Compute eigenvalues, Schur form, and Schur vectors of a general matrix
CGEES
SGEESX Compute eigenvalues, Schur form, Schur vectors, and condition numbers of a general matrix
CGEESX
SGEEV Compute eigenvalues and eigenvectors of a general matrix
CGEEV
SGEEVX Compute eigenvalues, eigenvectors, and condition numbers of a general matrix
CGEEVX
SGEGS Compute the generalized Schur factorization of a matrix pair (A,B)
CGEGS
SGEGV Compute the eigenvalues and eigenvectors of a matrix pair (A,B)
CGEGV

336 004– 2081– 002


INTRO_LAPACK ( 3S ) INTRO_LAPACK ( 3S )

Name Purpose

SGELS Finds a least squares or minimum norm solution of an overdetermined or underdetermined


CGELS linear system.
SGELSS Solve linear least squares problem using SVD
CGELSS
SGELSX Computes a minimum norm solution of a linear least squares problem using a complete
CGELSX orthogonal factorization.
SGESV Solves a general system of linear equations AX = B.
CGESV
SGESVD Compute the singular value decomposition (SVD) of a general matrix
CGESVD
SGESVX Solves any of the following general systems of linear equations and provides an estimate of
CGESVX the condition number and error bounds on the solution.
AX = B
T
A X=B
H
A X=B
SGTSV Solves a general tridiagonal system of linear equations AX = B.
CGTSV
SGTSVX Solves any of the following general tridiagonal systems of linear equations and provides an
CGTSVX estimate of the condition number and error bounds on the solution.
AX = B
T
A X=B
H
A X=B
SPBSV Solves a symmetric or Hermitian positive definite banded system of linear equations
CPBSV AX = B.
SPBSVX Solves a symmetric or Hermitian positive definite banded system of linear equations AX = B
CPBSVX and provides an estimate of the condition number and error bounds on the solution.
SPOSV Solves a symmetric or Hermitian positive definite system of linear equations AX = B.
CPOSV
SPOSVX Solves a symmetric or Hermitian positive definite system of linear equations AX = B and
CPOSVX provides an estimate of the condition number and error bounds on the solution.
SPPSV Solves a symmetric or Hermitian positive definite system of linear equations AX = B; A is
CPPSV held in packed storage.

004– 2081– 002 337


INTRO_LAPACK ( 3S ) INTRO_LAPACK ( 3S )

Name Purpose

SPPSVX Solves a symmetric or Hermitian positive definite system of linear equations AX = B (A is


CPPSVX held in packed storage) and provides an estimate of the condition number and error bounds
on the solution.
SPTSV Solves a symmetric or Hermitian positive definite tridiagonal system of linear equations
CPTSV AX = B.
SPTSVX Solves a symmetric or Hermitian positive definite tridiagonal system of linear equations
CPTSVX AX = B and provides an estimate of the condition number and error bounds on the solution.
SSBEV Compute all eigenvalues and eigenvectors of a symmetric or Hermitian band matrix
CHBEV
SSBEVX Compute selected eigenvalues and eigenvectors of a symmetric or Hermitian band matrix
CHBEVX
SSBGV Compute all eigenvalues and eigenvectors of a generalized symmetric-definite or Hermitian-
CHBGV definite banded eigenproblem
SSPEV Compute all eigenvalues and eigenvectors of a symmetric or Hermitian packed matrix
CHPEV
SSPEVX Compute selected eigenvalues and eigenvectors of a symmetric or Hermitian packed matrix
CHPEVX
SSPGV Compute all eigenvalues and eigenvectors of a generalized symmetric-definite or
CHPGV Hermitian-definite packed eigenproblem
SSPSV Solves a real or complex symmetric indefinite system of linear equations AX = B; A is held
CSPSV in packed storage.
SSPSVX Solves a real or complex symmetric indefinite system of linear equations AX = B (A is held
CSPSVX in packed storage) and provides an estimate of the condition number and error bounds on
the solution.
SSTEV Compute all eigenvalues and eigenvectors of a real symmetric tridiagonal matrix
SSTEVX Compute selected eigenvalues and eigenvectors of a real symmetric tridiagonal matrix
SSYEV Compute all eigenvalues and eigenvectors of a symmetric or Hermitian matrix
CHEEV
SSYEVX Compute selected eigenvalues and eigenvectors of a symmetric or Hermitian matrix
CHEEVX
SSYGV Compute all eigenvalues and eigenvectors of a generalized symmetric-definite or
CHEGV Hermitian-definite eigenproblem
SSYSV Solves a real or complex symmetric indefinite system of linear equations AX = B.
CSYSV

338 004– 2081– 002


INTRO_LAPACK ( 3S ) INTRO_LAPACK ( 3S )

Name Purpose

SSYSVX Solves a real or complex symmetric indefinite system of linear equations AX = B and
CSYSVX provides an estimate of the condition number and error bounds on the solution.

Computational Routines
These computational routines are listed in alphabetical order, with real matrix routines and complex matrix
routines grouped together as appropriate.

Name Purpose

CHECON Estimates the reciprocal of the condition number of a complex Hermitian indefinite matrix,
using the factorization computed by CHETRF.
CHERFS Improves the computed solution to a complex Hermitian indefinite system of linear
equations AX = B and provides error bounds for the solution.
CHETRF Computes the factorization of a complex Hermitian indefinite matrix, using the diagonal
pivoting method.
CHETRI Computes the inverse of a complex Hermitian indefinite matrix, using the factorization
computed by CHETRF.
CHETRS Solves a complex Hermitian indefinite system of linear equations AX = B, using the
factorization computed by CHETRF.
CHPCON Estimates the reciprocal of the condition number of a complex Hermitian indefinite matrix
in packed storage, using the factorization computed by CHPTRF.
CHPRFS Improves the computed solution to a complex Hermitian indefinite system of linear
equations AX = B (A is held in packed storage) and provides error bounds for the solution.
CHPTRF Computes the factorization of a complex Hermitian indefinite matrix in packed storage,
using the diagonal pivoting method.
CHPTRI Computes the inverse of a complex Hermitian indefinite matrix in packed storage, using the
factorization computed by CHPTRF.
CHPTRS Solves a complex Hermitian indefinite system of linear equations AX = B (A is held in
packed storage) using the factorization computed by CHPTRF.
ILAENV Determines tuning parameters (such as the block size).
SBDSQR Compute the singular value decomposition of a general matrix reduced to bidiagonal form
CBDSQR
SGBCON Estimates the reciprocal of the condition number of a general band matrix, in either the 1-
CGBCON norm or the infinity-norm, using the LU factorization computed by SGBTRF or CGBTRF.

004– 2081– 002 339


INTRO_LAPACK ( 3S ) INTRO_LAPACK ( 3S )

Name Purpose

SGBEQU Computes row and column scalings to equilibrate a general band matrix and reduce its
CGBEQU condition number. Does not multiprocess or call any multiprocessing routines.
SGBRFS Improves the computed solution to any of the following general banded systems of linear
CGBRFS equations and provides error bounds for the solution.
AX = B
T
A X=B
H
A X=B
SGBTRF Computes an LU factorization of a general band matrix, using partial pivoting with row
CGBTRF interchanges.
SGBTRS Solves any of the following general banded systems of linear equations using the LU
CGBTRS factorization computed by SGBTRF or CGBTRF.
AX = B
T
A X=B
H
A X=B
SGEBAK Back transform the eigenvectors of a matrix transformed by SGEBAL/CGEBAL.
CGEBAK
SGEBAL Balances a general matrix A.
CGEBAL
SGEBRD Reduces a general matrix to upper or lower bidiagonal form by an orthogonal/unitary
CGEBRD transformation.
SGECON Estimates the reciprocal of the condition number of a general matrix, in either the 1-norm or
CGECON the infinity-norm, using the LU factorization computed by SGETRF or CGETRF.
SGEEQU Computes row and column scalings to equilibrate a general rectangular matrix and to reduce
CGEEQU its condition number.
SGEHRD Reduces a general matrix to upper Hessenberg form by an orthogonal/unitary transformation.
CGEHRD
SGELQF Computes an LQ factorization of a general rectangular matrix.
CGELQF
SGEQLF Computes a QL factorization of a general rectangular matrix.
CGEQLF
SGEQPF Computes a QR factorization with column pivoting of a general rectangular matrix.
CGEQPF

340 004– 2081– 002


INTRO_LAPACK ( 3S ) INTRO_LAPACK ( 3S )

Name Purpose

SGEQRF Computes a QR factorization of a general rectangular matrix.


CGEQRF
SGERFS Improves the computed solution to any of the following general systems of linear equations
CGERFS and provides error bounds for the solution.
AX = B
T
A X=B
H
A X=B
SGERQF Computes an RQ factorization of a general rectangular matrix.
CGERQF
SGETRF Computes an LU factorization of a general matrix, using partial pivoting with row
CGETRF interchanges.
SGETRI Computes the inverse of a general matrix, using the LU factorization computed by SGETRF
CGETRI or CGETRF.
SGETRS Solves any of the following general systems of linear equations using the LU factorization
CGETRS computed by SGETRF or CGETRF.
AX = B
T
A X=B
H
A X=B
SGGBAK Back transform the eigenvectors of a generalized eigenvalue problem transformed by
CGGBAK SGGBAL
SGGBAL Balance a pair of general matrices (A,B)
CGGBAL
SGGHRD Reduce a pair of matrices (A,B) to generalized upper Hessenberg form
CGGHRD
SGTCON Estimates the reciprocal of the condition number of a general tridiagonal matrix, in either
CGTCON the 1-norm or the infinity-norm, using the LU factorization computed by SGTTRF or
CGTTRF.
SGTRFS Improves the computed solution to any of the following general tridiagonal systems of linear
CGTRFS equations and provides error bounds for the solution.
AX = B
T
A X=B
H
A X=B

004– 2081– 002 341


INTRO_LAPACK ( 3S ) INTRO_LAPACK ( 3S )

Name Purpose

SGTTRF Computes an LU factorization of a general tridiagonal matrix, using partial pivoting with
CGTTRF row interchanges.
SGTTRS Solves a general tridiagonal system of linear equations using the LU factorization computed
CGTTRS by SGTTRF or CGTTRF. AX = B
T
A X=B
H
A X=B
SHGEQZ Compute the eigenvalues of a matrix pair (A,B) in generalized upper Hessenberg form using
CHGEQZ the QZ method
SHSEIN Compute eigenvectors of a upper Hessenberg matrix by inverse iteration
CHSEIN
SHSEQR Compute eigenvalues, Schur form, and Schur vectors of a upper Hessenberg matrix
CHSEQR
SLAMCH Computes machine-specific constants.
SLARF Applies an elementary reflector.
CLARF
SLARFB Applies a block reflector.
CLARFB
SLARFG Generates an elementary reflector.
CLARFG
SLARFT Forms the triangular factor of a block reflector.
CLARFT
SLARGV Generate a vector of real or complex plane rotations
CLARGV
SLARNV Generates a vector of random numbers.
CLARNV
SLARTG Generates a plane rotation.
CLARTG
SLARTV Apply a vector of real or complex plane rotations to two vectors
CLARTV
SLASR Apply a sequence of real plane rotations to a matrix
CLASR
SOPGTR Generates the orthogonal/unitary matrix Q from SSPTRD/CHPTRD.
CUPGTR

342 004– 2081– 002


INTRO_LAPACK ( 3S ) INTRO_LAPACK ( 3S )

Name Purpose

SOPMTR Multiplies by the orthogonal/unitary matrix Q from SSPTRD/CHPTRD.


CUPMTR
SORGBR Generates one of the orghogonal/unitary matrices Q or P H from SGEBRD/CGEBRD.
CUNGBR
SORGHR Generates the orthogonal/unitary matrix Q from SGEHRD/CGEHRD.
CUNGHR
SORGLQ Generates all or part of the orthogonal or unitary matrix Q from an LQ factorization
CUNGLQ determined by SGELQF or CGELQF.
SORGQL Generates all or part of the orthogonal or unitary matrix Q from a QL factorization
CUNGQL determined by SGEQLF or CGEQLF.
SORGQR Generates all or part of the orthogonal or unitary matrix Q from a QR factorization
CUNGQR determined by SGEQRF or CGEQRF.
SORGRQ Generates all or part of the orthogonal or unitary matrix Q from an RQ factorization
CUNGRQ determined by SGERQF or CGERQF.
SORGTR Generates the orthogonal/unitary matrix Q from SSYTRD/CHETRD.
CUNGTR
SORMBR Multiplies by one of the orthogonal/unitary matrices Q or P from SGEBRD/CGEBRD.
CUNMBR
SORMHR Multiplies by the orthogonal/unitary matrix Q from SGEHRD/CGEHRD.
CUNMHR
SORMLQ Multiplies a general matrix by the orthogonal or unitary matrix from an LQ factorization
CUNMLQ determined by SGELQF or CGELQF.
SORMQL Multiplies a general matrix by the orthogonal or unitary matrix from a QL factorization
CUNMQL determined by SGEQLF or CGEQLF.
SORMQR Multiplies a general matrix by the orthogonal or unitary matrix from a QR factorization
CUNMQR determined by SGEQRF or CGEQRF.
SORMRQ Multiplies a general matrix by the orthogonal or unitary matrix from an RQ factorization
CUNMRQ determined by SGERQF or CGERQF.
SORMTR Multiplies by the orthogonal/unitary matrix Q from SSYTRD/CHETRD.
CUNMTR
SPBCON Estimates the reciprocal of the condition number of a symmetric or Hermitian positive
CPBCON definite band matrix, using the Cholesky factorization computed by SPBTRF or CPBTRF.
SPBEQU Computes row and column scalings to equilibrate a symmetric or Hermitian positive definite
CPBEQU band matrix and to reduce its condition number.

004– 2081– 002 343


INTRO_LAPACK ( 3S ) INTRO_LAPACK ( 3S )

Name Purpose

SPBRFS Improves the computed solution to a symmetric or Hermitian positive definite banded
CPBRFS system of linear equations AX = B and provides error bounds for the solution.
SPBSTF Compute a split Cholesky factorization of a symmetric or Hermitian positive definite band
CPBSTF matrix.
SPBTRF Computes the Cholesky factorization of a symmetric or Hermitian positive definite band
CPBTRF matrix.
SPBTRS Solves a symmetric or Hermitian positive definite banded system of linear equations AX =
CPBTRS B, using the Cholesky factorization computed by SPBTRF or CPBTRF.
SPOCON Estimates the reciprocal of the condition number of a symmetric or Hermitian positive
CPOCON definite matrix, using the Cholesky factorization computed by SPOTRF or CPOTRF.
SPOEQU Computes row and column scalings to equilibrate a symmetric or Hermitian positive definite
CPOEQU matrix and reduces its condition number.
SPORFS Improves the computed solution to a symmetric or Hermitian positive definite system of
CPORFS linear equations AX = B and provides error bounds for the solution.
SPOTRF Computes the Cholesky factorization of a symmetric or Hermitian positive definite matrix.
CPOTRF
SPOTRI Computes the inverse of a symmetric or Hermitian positive definite matrix, using the
CPOTRI Cholesky factorization computed by SPOTRF or CPOTRF.
SPOTRS Solves a symmetric or Hermitian positive definite system of linear equations AX = B, using
CPOTRS the Cholesky factorization computed by SPOTRF or CPOTRF.
SPPCON Estimates the reciprocal of the condition number of a symmetric or Hermitian positive
CPPCON definite matrix in packed storage, using the Cholesky factorization computed by SPPTRF or
CPPTRF.
SPPEQU Computes row and column scalings to equilibrate a symmetric or Hermitian positive definite
CPPEQU matrix in packed storage and reduces its condition number.
SPPRFS Improves the computed solution to a symmetric or Hermitian positive definite system of
CPPRFS linear equations AX = B (A is held in packed storage) and provides error bounds for the
solution.
SPPTRF Computes the Cholesky factorization of a symmetric or Hermitian positive definite matrix in
CPPTRF packed storage.
SPPTRI Computes the inverse of a symmetric or Hermitian positive definite matrix in packed
CPPTRI storage, using the Cholesky factorization computed by SPPTRF or CPPTRF.
SPPTRS Solves a symmetric or Hermitian positive definite system of linear equations AX = B (A is
CPPTRS held in packed storage) using the Cholesky factorization computed by SPPTRF or CPPTRF.

344 004– 2081– 002


INTRO_LAPACK ( 3S ) INTRO_LAPACK ( 3S )

Name Purpose
H
SPTCON Uses the LDL factorization computed by SPTTRF or CPTTRF to compute the reciprocal
CPTCON of the condition number of a symmetric or Hermitian positive definite tridiagonal matrix.
SPTEQR Compute eigenvalues and eigenvectors of a symmetric or Hermitian positive definite
CPTEQR tridiagonal matrix.
SPTRFS Improves the computed solution to a symmetric or Hermitian positive definite tridiagonal
CPTRFS system of linear equations AX = B and provides error bounds for the solution.
SPTTRF Computes the LDL H factorization of a symmetric or Hermitian positive definite tridiagonal
CPTTRF matrix.
H
SPTTRS Uses the LDL factorization computed by SPTTRF or CPTTRF to solve a symmetric or
CPTTRS Hermitian positive definite tridiagonal system of linear equations.
SSBGST Reduce a symmetric or Hermitian definite banded generalized eigenproblem to standard
CHBGST form.
SSBTRD Reduce a symmetric or Hermitian band matrix to real symmetric tridiagonal form by an
CHBTRD orthogonal/unitary transformation.
SSPCON Estimates the reciprocal of the condition number of a real or complex symmetric indefinite
CSPCON matrix in packed storage, using the factorization computed by SSPTRF or CSPTRF.
SSPGST Reduce a symmetric or Hermitian definite generalized eigenproblem to standard form, using
CHPGST packed storage.
SSPRFS Improves the computed solution to a real or complex symmetric indefinite system of linear
CSPRFS equations AX = B (A is held in packed storage) and provides error bounds for the solution.
SSPTRD Reduces a symmetric/Hermitian packed matrix A to real symmetric tridiagonal form by an
CHPTRD orthogonal/unitary transformation.
SSPTRF Computes the factorization of a real or complex symmetric indefinite matrix in packed
CSPTRF storage, using the diagonal pivoting method.
SSPTRI Computes the inverse of a real or complex symmetric indefinite matrix in packed storage,
CSPTRI using the factorization computed by SSPTRF or CSPTRF.
SSPTRS Solves a real or complex symmetric indefinite system of linear equations AX = B (A is held
CSPTRS in packed storage) using the factorization computed by SSPTRF or CSPTRF.
SSTEBZ Compute eigenvalues of a symmetric tridiagonal matrix by bisection.
SSTEIN Compute eigenvectors of a real symmetric tridiagonal matrix by inverse iteration.
CSTEIN
SSTEQR Compute eigenvalues and eigenvectors of a real symmetric tridiagonal matrix using the
CSTEQR implicit QL or QR method.

004– 2081– 002 345


INTRO_LAPACK ( 3S ) INTRO_LAPACK ( 3S )

Name Purpose

SSTERF Compute all eigenvalues of a symmetric tridiagonal matrix using the root-free variant of the
QL or QR algorithm.
SSYCON Estimates the reciprocal of the condition number of a real or complex symmetric indefinite
CSYCON matrix, using the factorization computed by SSYTRF or CSYTRF.
SSYGST Reduce a symmetric or Hermitian definite generalized eigenproblem to standard form.
CHEGST
SSYRFS Improves the computed solution to a real or complex symmetric indefinite system of linear
CSYRFS equations AX = B and provides error bounds for the solution.
SSYTRD Reduces a symmetric/Hermitian matrix A to real symmetric tridiagonal form by an
CHETRD orthogonal/unitary transformation.
SSYTRF Computes the factorization of a real complex symmetric indefinite matrix, using the
CSYTRF diagonal pivoting method.
SSYTRI Computes the inverse of a real or complex symmetric indefinite matrix, using the
CSYTRI factorization computed by SSYTRF or CSYTRF.
SSYTRS Solves a real or complex symmetric indefinite system of linear equations AX = B, using the
CSYTRS factorization computed by SSYTRF or CSYTRF.
STBCON Estimates the reciprocal of the condition number of a triangular band matrix, in either the
CTBCON 1-norm or the infinity-norm.
STBRFS Provides error bounds for the solution of any of the following triangular banded systems of
CTBRFS linear equations:
AX = B
T
A X=B
H
A X=B
STBTRS Solves any of the following triangular banded systems of linear equations:
CTBTRS AX = B
T
A X=B
H
A X=B
STGEVC Compute eigenvectors of a pair of matrices (A,B) in generalized Schur form.
CTGEVC
STPCON Estimates the reciprocal of the condition number of a triangular matrix in packed storage, in
CTPCON either the 1-norm or the infinity-norm.

346 004– 2081– 002


INTRO_LAPACK ( 3S ) INTRO_LAPACK ( 3S )

Name Purpose

STPRFS Provides error bounds for the solution of any of the following triangular systems of linear
CTPRFS equations where A is held in packed storage.
AX = B
T
A X=B
H
A X=B
STPTRI Computes the inverse of a triangular matrix in packed storage.
CTPTRI
STPTRS Solves any of the following triangular systems of linear equations where A is held in packed
CTPTRS storage.
AX = B
T
A X=B
H
A X=B
STRCON Estimates the reciprocal of the condition number of a triangular matrix, in either the 1-norm
CTRCON or the infinity-norm.
STREVC Compute eigenvectors of a real upper quasi-triangular matrix.
CTREVC Compute eigenvectors of a complex triangular matrix.
STREXC Exchange diagonal blocks in the real Schur factorization of a real matrix.
CTREXC Exchange diagonal elements in the Schur factorization of a complex matrix.
STRRFS Provides error bounds for the solution of any of the following triangular systems of linear
CTRRFS equations:
AX = B
T
A X=B
H
A X=B
STRSEN Compute condition numbers to measure the sensitivity of a cluster of eigenvalues and its
CTRSEN corresponding invariant subspace.
STRSNA Compute condition numbers for specified eigenvalues and eigenvectors of a real upper
quasi-triangular matrix.
CTRSNA Compute condition numbers for specified eigenvalues and eigenvectors of a complex upper
triangular matrix.
STRSYL Solve the Sylvester matrix equation
CTRSYL

004– 2081– 002 347


INTRO_LAPACK ( 3S ) INTRO_LAPACK ( 3S )

Name Purpose

STRTRI Computes the inverse of a triangular matrix.


CTRTRI
STRTRS Solves any of the following triangular systems of linear equations:
CTRTRS AX = B
T
A X=B
H
A X=B
STRZRQF Reduces an upper trapezoidal matrix to upper triangular form by an orthogonal/unitary
CTZRQF transformation.

SEE ALSO
LINPACK(3S) which lists the names of the LINPACK routines that are superseded by the linear system
solvers in LAPACK
LAPACK User’s Guide, CRI publication TPD– 0003

348 004– 2081– 002


EISPACK ( 3S ) EISPACK ( 3S )

NAME
EISPACK – Introduction to Eigensystem computation for dense linear systems

IMPLEMENTATION
UNICOS systems (except Cray T90 systems that support IEEE arithmetic)

DESCRIPTION
EISPACK is a package of Fortran routines for solving the eigenvalue problem and for computing and using
the singular-value decomposition.
The original Fortran versions are described in the Matrix Eigensystem Routines – EISPACK Guide, second
edition, by B. T. Smith, J. M. Boyle, J. J. Dongarra, B. S. Garbow, Y. Ikebe, V. C. Klema, and C. B. Moler,
published by Springer-Verlag, New York, 1976, Library of Congress catalog card number 76– 2662. The
original Fortran versions also are documented in the Matrix Eigensystem Routines - EISPACK Guide
Extensions (Lecture Notes in Computer Science, Vol. 51) by B. S. Garbow, J. M. Boyle, J. J. Dongarra, and
C. B. Moler, published by Springer-Verlag, New York, 1977, Library of Congress catalog card number
77– 2802.
Most EISPACK routines are superseded by routines from the more recent public domain package, LAPACK,
described in the LAPACK User’s Guide (see INTRO_LAPACK(3S) for a complete reference). Of particular
interest to EISPACK users who want to switch to LAPACK is Appendix D, "Converting from LINPACK
and EISPACK," of the LAPACK User’s Guide. This appendix contains a table that shows the name of the
LAPACK routines that are functionally equivalent to each EISPACK routine.
Each Scientific Library version of the EISPACK routines has the same name, algorithm, and calling
sequence as the original version. Optimization of each routine includes the following:
• Use of the Level 1 BLAS routines when applicable, and use of the Level 2 and 3 BLAS in TRED1,
TRED2, TRBAK, and REDUC.
• Removal of Fortran IF statements when the result of either branch is the same.
• Unrolling complicated Fortran DO loops to improve vectorization.
• Use of Fortran compiler directives to aid vector optimization.
These modifications increase vectorization and use optimized library routines; therefore, they reduce
execution time. Only the order of computations within a loop is changed; the modified versions produce the
same answers as the original versions, unless the problem is sensitive to small changes in the data.

004– 2081– 002 349


EISPACK ( 3S ) EISPACK ( 3S )

The following table lists the routines, name, matrix or decomposition, and purpose for each routine.

Purpose Matrix or Decomposition Name

Forms eigenvectors by back transforming corresponding Real nonsymmetric tridiagonal BAKVEC


matrix determined by FIGI
Balances matrix and isolates eigenvalues when possible Real general BALANC
Forms eigenvectors by back transforming those of the Real general BALBAK
corresponding matrices determined by BALANC
Finds the eigenvalues that lie in a specified interval by using Real symmetric tridiagonal BISECT
bisection
Forms eigenvectors by back transforming those of the Complex general CBABK2
corresponding matrices determined by CBAL
Balances matrix and isolates eigenvalues when possible Complex general CBAL
Reduces to a symmetric tridiagonal matrix Real symmetric banded BANDR
Finds those eigenvectors that correspond to ordered list of Real symmetric banded BANDV
eigenvalues by using inverse iteration
Finds some eigenvalues by using QR algorithm with shifts Real symmetric banded BQR
of origin
Finds eigenvalues and eigenvectors Complex general CG
Finds eigenvalues and eigenvectors Complex Hermitian CH
Finds eigenvectors that correspond to specified eigenvalues Complex upper Hessenberg CINVIT
by using inverse iteration
Forms eigenvectors by back transforming those of the Complex general COMBAK
corresponding matrices determined by COMHES
Reduces matrix to upper Hessenberg form by using Complex general COMHES
elementary similarity transformations
Finds eigenvalues by using modified LR method Complex upper Hessenberg COMLR
Finds eigenvalues and eigenvectors, by using modified LR Complex upper Hessenberg COMLR2
method
Finds eigenvalues by QR method Complex upper Hessenberg COMQR
Finds eigenvalues and eigenvectors by QR method Complex upper Hessenberg COMQR2
Forms eigenvectors by back transforming those of the Complex general CORTB
corresponding matrices determined by CORTH

350 004– 2081– 002


EISPACK ( 3S ) EISPACK ( 3S )

Purpose Matrix or Decomposition Name

Reduces matrix to upper Hessenberg form by using unitary Complex general CORTH
similarity transformations
Forms eigenvectors by back transforming those of the Real general ELMBAK
corresponding matrices determined by ELMHES
Reduces matrix to upper Hessenberg form by using Real general ELMHES
elementary similarity transformations
Accumulates transformations used in the reduction to upper Real general ELTRAN
Hessenberg form done by ELMHES
Reduces to symmetric tridiagonal matrix that has the same Real nonsymmetric tridiagonal FIGI
eigenvalues
Reduces to symmetric tridiagonal matrix that has the same Real nonsymmetric tridiagonal FIGI2
eigenvalues, retaining the diagonal similarity transformations
Finds eigenvalues by QR method Real upper Hessenberg HQR
Finds eigenvalues and eigenvectors by QR method Real upper Hessenberg HQR2
Finds eigenvectors given the eigenvectors of the real Complex Hermitian HTRIBK
symmetric tridiagonal matrix calculated by HTRIDI
(including eigenvectors calculated by TQL2 or IMTQL2)
Finds eigenvectors given the eigenvectors of the real Complex Hermitian (packed) HTRIB3
symmetric tridiagonal matrix calculated by HTRID3
(eigenvectors calculated by TQL2 or IMTQL2, among
others)
Reduces to real symmetric tridiagonal form by using unitary Complex Hermitian HTRIDI
similarity transformations
Reduces to real symmetric tridiagonal form by using unitary Complex Hermitian (packed) HTRID3
similarity transformations
Finds eigenvalues by using implicit QL method, and Real symmetric tridiagonal IMTQLV
associates them with their corresponding submatrix indices
Finds eigenvalues by implicit QL method Real symmetric tridiagonal IMTQL1
Finds eigenvalues and eigenvectors by implicit QL method Real symmetric tridiagonal IMTQL2
Finds eigenvectors that correspond to specified eigenvalues Real upper Hessenberg INVIT
by using inverse iteration
Determines the singular-value decomposition A = USV T , Real rectangular MINFIT
forming U T B rather than U by using Householder
bidiagonalization and a variant of the QR algorithm

004– 2081– 002 351


EISPACK ( 3S ) EISPACK ( 3S )

Purpose Matrix or Decomposition Name

Forms eigenvectors by back transforming those of the Real general ORTBAK


corresponding matrices determined by ORTHES
Reduces matrix to upper Hessenberg form by using Real general ORTHES
orthogonal similarity transformations
Accumulates transformations used in the reduction to upper Real general ORTRAN
Hessenberg form done by ORTHES
Reduces matrices A and B in the generalized eigenproblem Real general QZHES
(Ax = λBx ) so that A is in upper Hessenberg form and B is
in upper triangular form by using orthogonal transformations
Further reduces matrices A and B as calculated by QZHES Real general QZIT
for the generalized eigenproblem (Ax = λBx ), so that A is in
quasi-upper triangular form and B is still upper triangular
Produces three arrays that can be used to calculate the Real general QZVAL
eigenvalues for the generalized eigenproblem (Ax = λBx ),
with A and B as calculated by QZIT
Finds the eigenvectors that correspond to a list of Real general QZVEC
eigenvalues for the generalized eigenproblem (Ax = λBx ),
with A and B as calculated by QZIT
Finds the smallest or largest eigenvalues by rational QR Real symmetric tridiagonal RATQR
method with Newton corrections
Forms generalized eigenvectors by back transforming those Real general REBAK
of the corresponding matrices determined by REDUC or
REDUC2
Forms eigenvectors by back transforming those of the Real general REBAKB
corresponding matrices determined by REDUC2
Reduces the generalized eigenproblem (Ax = λBx ) to a Real symmetric REDUC
standard symmetric eigenproblem by using the Cholesky
factorization of B
Reduces either of the generalized eigenproblems Real symmetric REDUC2
(ABx = λBx or BAx = λBx ) to a standard symmetric
eigenproblem by using the Cholesky factorization of B
Finds eigenvalues and eigenvectors Real general RG
Finds generalized eigenvalues and eigenvectors Real general RGG
(Ax = λBx )

352 004– 2081– 002


EISPACK ( 3S ) EISPACK ( 3S )

Purpose Matrix or Decomposition Name

Finds eigenvalues and eigenvectors Real symmetric RS


Finds eigenvalues and eigenvectors Real symmetric banded RSB
Finds generalized eigenvalues and eigenvectors Real symmetric RSG
(Ax = λBx )
Finds generalized eigenvalues and eigenvectors Real symmetric RSGAB
(ABx = λx )
Finds generalized eigenvalues and eigenvectors Real symmetric RSGBA
(BAx = λx )
Finds eigenvalues and eigenvectors Real symmetric RSM
Finds eigenvalues and eigenvectors Real symmetric packed RSP
Finds eigenvalues and eigenvectors Real symmetric tridiagonal RST
Finds eigenvalues and eigenvectors Special real tridiagonal RT
Determines the singular-value decomposition A = USV by T
Real rectangular SVD
using Householder bidiagonalization and a variant of the QR
algorithm
Finds the eigenvectors from a set of ordered eigenvalues by Real symmetric tridiagonal TINVIT
using inverse iteration
Finds the eigenvalues by rational QL method Real symmetric tridiagonal TQLRAT
Finds the eigenvalues and/or eigenvectors by the rational QL Real symmetric tridiagonal TQL1
or QL method
Finds the eigenvalues and/or eigenvectors by the rational QL Real symmetric tridiagonal TQL2
or QL method
Forms eigenvectors by back transforming those of the Real symmetric TRBAK
corresponding matrices determined by TRED1
Forms eigenvectors by back transforming those of the Real symmetric (packed) TRBAK3
corresponding matrices determined by TRED3
Reduces to symmetric tridiagonal matrix by using orthogonal Real symmetric TRED1
similarity transformations
Reduces to symmetric tridiagonal matrix by using and Real symmetric TRED2
accumulating orthogonal similarity transformations
Reduces to symmetric tridiagonal matrix by using orthogonal Real symmetric (packed) TRED3
similarity transformations

004– 2081– 002 353


EISPACK ( 3S ) EISPACK ( 3S )

Purpose Matrix or Decomposition Name

Finds the eigenvalues that lie between specified indices by Real symmetric tridiagonal TRIDIB
using bisection
Finds the eigenvalues that lie in a specified interval and each Real symmetric tridiagonal TSTURM
corresponding eigenvector by using bisection and inverse
iteration

SEE ALSO
LAPACK User’s Guide, CRI publication TPD– 0003

354 004– 2081– 002


LINPACK ( 3S ) LINPACK ( 3S )

NAME
LINPACK – Single-precision real and complex LINPACK routines

IMPLEMENTATION
UNICOS systems (except Cray T90 systems that support IEEE arithmetic)

DESCRIPTION
LINPACK is a public domain package of Fortran routines that solves systems of linear equations and
computes the QR, Cholesky, and singular value decompositions. The original Fortran programs are
described in the LINPACK User’s Guide by J. J. Dongarra, C. B. Moler, J. R. Bunch, and G. W. Stewart,
published by the Society for Industrial and Applied Mathematics (SIAM), Philadelphia, 1979, Library of
Congress catalog card number 78– 78206.
Most LINPACK routines are superseded by routines from the more recent public domain package, LAPACK,
described in the LAPACK User’s Guide (see INTRO_LAPACK(3S) for a complete reference). Of particular
interest to LINPACK users who want to switch to LAPACK is Appendix D, "Converting from LINPACK
and EISPACK," of the LAPACK User’s Guide. This appendix contains a table that shows the name of the
LAPACK routines that are functionally equivalent to each LINPACK routine.
Each single-precision Scientific Library version of the LINPACK routines has the same name, algorithm, and
calling sequence as the original version. Optimization of each routine includes the following:
• Replacement of calls to the BLAS routines SSCAL, SCOPY, SSWAP, SAXPY, and SROT with inline
Fortran code vectorized by the Cray Research Fortran compilers. (SROTG is still called by LINPACK.)
• Removal of Fortran IF statements in which the result of either branch is the same.
• Replacement of SDOT to solve triangular systems of linear equations in SPOSL, STRSL, and SCHDD
with more vectorizable code.
These optimizations affect only the execution order of floating-point operations in DO loops. See the
LINPACK User’s Guide for further descriptions. The complex routines have been added without much
optimization.
As mentioned previously, LAPACK does not completely supersede LINPACK. In the following table, an
asterick (*) marks LINPACK routines that are not superseded in public domain LAPACK. This table lists
the name, matrix or decomposition, and purpose for each routine.

Name Matrix or Decomposition Purpose

SGECO Real general Factors and estimates condition


SGEFA Factors
SGESL Solves
SGEDI Computes determinant and inverse

004– 2081– 002 355


LINPACK ( 3S ) LINPACK ( 3S )

Name Matrix or Decomposition Purpose

CGECO Complex general Factors and estimates condition


CGEFA Factors
CGESL Solves
CGEDI Computes determinant and inverse
SGBCO Real general banded Factors and estimates condition
SGBFA Factors
SGBSL Solves
SGBDI Computes determinant
CGBCO Complex general banded Factors and estimates condition
CGBFA Factors
CGBSL Solves
CGBDI Computes determinant
SPOCO Real positive definite Factors and estimates condition
SPOFA Factors
SPOSL Solves
SPODI Computes determinant and inverse
CPOCO Complex positive definite Factors and estimates condition
CPOFA Factors
CPOSL Solves
CPODI Computes determinant and inverse
SPPCO Real positive definite packed Factors and estimates condition
SPPFA Factors
SPPSL Solves
SPPDI Computes determinant and inverse
CPPCO Complex positive definite packed Factors and estimates condition
CPPFA Factors
CPPSL Solves
CPPDI Computes determinant and inverse
SPBCO Real positive definite banded Factors and estimates condition
SPBFA Factors
SPBSL Solves
SPBDI Computes determinant
CPBCO Complex positive definite banded Factors and estimates condition
CPBFA Factors
CPBSL Solves
CPBDI Computes determinant

356 004– 2081– 002


LINPACK ( 3S ) LINPACK ( 3S )

Name Matrix or Decomposition Purpose

SSICO Real symmetric indefinite Factors and estimates condition


SSIFA Factors
SSISL Solves
SSIDI Computes inertia, determinant, and inverse
CSICO Complex symmetric Factors and estimates condition
CSIFA Factors
CSISL Solves
CSIDI Computes determinant and inverse
CHICO Complex Hermitian indefinite Factors and estimates condition
CHIFA Factors
CHISL Solves
CHIDI Computes inertia, determinant, and inverse
SSPCO Real symmetric indefinite packed Factors and estimates condition
SSPFA Factors
SSPSL Solves
SSPDI Computes inertia, determinant, and inverse
CSPCO Complex symmetric indefinite packed Factors and estimates condition
CSPFA Factors
CSPSL Solves
CSPDI Computes inertia, determinant, and inverse
CHPCO Complex Hermitian indefinite packed Factors and estimates condition
CHPFA Factors
CHPSL Solves
CHPDI Computes inertia, determinant, and inverse
STRCO Real triangular Factors and estimates condition
STRSL Solves
STRDI Computes determinant and inverse
CTRCO Complex triangular Factors and estimates condition
CTRSL Solves
CTRDI Computes determinant and inverse
SGTSL Real tridiagonal Solves
CGTSL Complex tridiagonal Solves
SPTSL Real positive definite Solves tridiagonal
CPTSL Complex positive Solves definite tridiagonal

004– 2081– 002 357


LINPACK ( 3S ) LINPACK ( 3S )

Name Matrix or Decomposition Purpose

SCHDC * Real Cholesky decomposition Decomposes


SCHDD * Downdates
SCHUD * Updates
SCHEX * Exchanges
CCHDC * Complex Cholesky decomposition Decomposes
CCHDD * Downdates
CCHUD * Updates
CCHEX * Exchanges
SQRDC Real Performs orthogonal factorization
SQRSL Solves
CQRDC Complex Performs orthogonal factorization
CQRSL Solves
SSVDC Real Performs singular value decomposition
CSVDC Complex Performs singular value decomposition

SEE ALSO
INTRO_LAPACK(3S) for information and references about the LAPACK routines that supersede LINPACK
LAPACK User’s Guide, CRI publication TPD– 0003
Dongarra, J. J., C. B. Moler, J. R. Bunch, and G. W. Stewart, LINPACK User’s Guide. Society for
Industrial and Applied Mathematics (SIAM), Philadelphia, 1979.

358 004– 2081– 002


INTRO_SCALAPACK ( 3S ) INTRO_SCALAPACK ( 3S )

NAME
INTRO_SCALAPACK – Introduction to the ScaLAPACK routines for distributed matrix computations

IMPLEMENTATION
UNICOS/mk systems

DESCRIPTION
The ScaLAPACK library contains routines for solving real or complex general, triangular, or positive definite
distributed systems. It also contains routines for reducing distributed matrices to condensed form and an
eigenvalue problem solver for real symmetric distributed matrices. Finally, it also includes a set of routines
that perform basic operations involving distributed matrices and vectors, the PBLAS.
Individual man pages exist for all routines except the PBLAS. You can find more information on the
PBLAS on the World Wide Web at the following URL: http://www.netlib.org/.
Changes from Public Domain Version
The ScaLAPACK development team is directed by Jack Dongarra and consists of groups at UT Knoxville
and UC Berkeley. A version of the package is available in the public domain on the World Wide Web at
the following URL: http://www.netlib.org/.
In the UNICOS/mk version, the calling sequences to all ScaLAPACK routines remain unchanged.
Initialization
Some of the ScaLAPACK routines require the Basic Linear Algebra Communication Subprograms (BLACS)
to be initialized. This can be done through a call to BLACS_GRIDINIT(3S). Finally, each distributed array
that is passed as an argument to a ScaLAPACK routine, requires a descriptor, which is set through a call to
DESCINIT(3S). If a call is required, it is documented on the man page for the routine.
Available Routines
The following routines are available:
Linear Solvers
PSGETRF, PCGETRF LU factorization and solution of linear general
PSGETRS, PCGETRS distributed systems of linear equations.
PSTRTRS, PCTRTRS
PSGESV, PCGESV
PSPOTRF, PCPOTRF Cholesky factorization and solution of real symmetric
PSPOTRS, PCPOTRS or complex Hermitian distributed systems of linear
PSPOSV, PCPOSV equations.
PSGEQRF, PCGEQRF QR, RQ, QL, LQ, and QR with column pivoting for general
PSGERQF, PCGERQF distributed matrices.
PSGEQLF, PCGEQLF
PSGELQF, PCGELQF
PSGEQPF, CGEQPF

004– 2081– 002 359


INTRO_SCALAPACK ( 3S ) INTRO_SCALAPACK ( 3S )

PSGETRI, PCGETRI Inversion of general, triangular, real symmetric


PSTRTRI, PCTRTRI positive definite or complex Hermitian positive
PSPOTRI, PCPOTRI definite distributed matrices.

Similarity/Equivalence Reduction to Condensed Form


PSSYTRD, PCHETRD Reduction of real symmetric or complex Hermitian matrices to tridiagonal
form.
PSGEBRD, PCGEBRD Reduction of general matrices to bidiagonal form
Eigenvalue Routines
PCHEEVX Eigenvalue solver for complex hermitian matrices.
PCHEGVX Eigenvalue solver for Hermitian-definite generalized eigenproblem.
PSSYEVX Eigenvalue solver for real symmetric distributed matrices.
PSSYGVX Eigenvalue solver for real symmetric definite generalized eigenproblems.
Support Routines
INDXG2P Computes the coordinate of the processor in the two-dimensional (2D)
processor grid that owns an entry of the distributed array.
NUMROC Computes the number of local rows or columns of the distributed array owned
by a processor.
PBLAS
The following PBLAS routines are supported, but are not documented:
Level 1

PSAMAX PCAMAX
PSASUM PSCASU M
PSAXPY PCAXPY
PSNRM2 PSCNRM 2
PSCOPY PCCOPY
PSDOT PCDOTC PCD OTU
PSSCAL PCSCAL PCS SCAL
PSSWAP PCSWAP

360 004– 2081– 002


INTRO_SCALAPACK ( 3S ) INTRO_SCALAPACK ( 3S )

Level 2

PSG EMV PCG EMV


PSG ER PCG ERC PCG ERU
PSS YMV PCH EMV
PSSYR PCHER
PSS YR2 PCH ER2
PSTRMV PCTRMV
PST RSV PCT RSV

Level 3

PSG EMM PCG EMM


PSS YMM PCS YMM PCH EMM
PSS YRK PCS YRK PCH ERK
PSSYR2 K PCSYR2 K PCH ER2K
PST RMM PCTRMM
PST RSM PCT RSM
PST RAN PCT RAN C PCT RAN U

SEE ALSO
BLACS_GRIDINIT(3S), DESCINIT(3S)
Choi, J., J. Dongarra, R. Pozo, and D. Walker, ‘‘Scalapack: A scalable linear algebra library for distributed
memory concurrent computers,’’ in Proceedings of the Fourth Symposium on the Frontiers of Massively
Parallel Computation, IEEE Comput. Soc. Press, 1992.

004– 2081– 002 361


DESCINIT ( 3S ) DESCINIT ( 3S )

NAME
DESCINIT – Initializes a descriptor vector of a distributed two-dimensional array

SYNOPSIS
CALL DESCINIT (desc, m, n, mb, nb, irsrc, icsrc, icntxt, lld, info)

IMPLEMENTATION
UNICOS/mk systems

DESCRIPTION
DESCINIT associates a descriptor vector with a two-dimensional (2D) block, or block-cyclically distributed
array. The vector stores information required by the parallel 2D FFT and ScaLAPACK routines to establish
the mapping between an entry in the distributed 2D array and the processor that owns it.
The DESCINIT routine accepts the following arguments.
desc Integer array of dimension 9. (output)
Array descriptor.
m Integer. (input)
Number of global rows in the distributed matrix whose descriptor is being created.
n Integer. (input)
Number of global columns in the distributed matrix whose descriptor is being created.
mb Integer. (input)
Blocking size used to distribute the rows of the distributed matrix.
nb Integer. (input)
Blocking size used to distribute the columns of the distributed matrix.
irsrc Integer. (input)
Processor row that owns the first row of the distributed matrix.
icsrc Integer. (input)
Processor column that owns the first column of the distributed matrix.
icntxt Integer. (input)
Context handle that identifies the grid of processors over which the distributed matrix is
distributed as returned by a call to BLACS_GRIDINIT(3S).
lld Integer. (input)
The leading dimension of the local array that stores the local blocks of the distributed matrix.
info Integer. (output)
info = 0: Successful exit.
info < 0: If info = – i, the ith argument had an illegal value.

362 004– 2081– 002


DESCINIT ( 3S ) DESCINIT ( 3S )

SEE ALSO
BLACS_GRIDINFO(3S), BLACS_GRIDINIT(3S), BLACS_PCOORD(3S), INTRO_BLACS(3S)

004– 2081– 002 363


INDXG2P ( 3S ) INDXG2P ( 3S )

NAME
INDXG2P – Computes the coordinate of the processing element (PE) that possesses the entry of a
distributed matrix

SYNOPSIS
my_home=INDXG2P(indxglob, nb, iproc, isrcproc, nproc)

IMPLEMENTATION
UNICOS/mk systems

DESCRIPTION
INDXG2P computes the coordinate of the processing element (PE) that posseses the entry of a distributed
matrix specified by a global index indxglob. The formula for my_home is the following:
my_home = MOD(isrcproc+(indxglob-1)/nb, nprocs)
This routine accepts the following arguments:
indxglob Integer. (global input)
The global index of the element.
nb Integer. (global input)
Block size, size of the blocks the distributed matrix is split into.
iproc Integer. (local dummy)
Dummy argument; used to unify the calling sequence of the tool routines.
isrcproc Integer. (global input)
The coordinate of the process that possesses the first row/column of the distributed matrix.
nproc Integer. (global input)
Total number processes over which the matrix is distributed.

364 004– 2081– 002


NUMROC ( 3S ) NUMROC ( 3S )

NAME
NUMROC – Computes the number of rows or columns of a distributed matrix owned locally

SYNOPSIS
nrows_or_cols=NUMROC(n, nb, iproc, isrcproc, nprocs)

IMPLEMENTATION
UNICOS/mk systems

DESCRIPTION
NUMROC computes the number of rows or columns of a distributed matrix owned locally by the processor
indicated by iproc. If only a close upper-bound on the value is needed (for example, to determine how
much to allocate for a workspace), you can use the following formula to approximate the value returned by
this function:
nrows_or_cols ~= ((n/nb)/nprocs)*nb + nb
This routine accepts the following aruguments:
n Integer. (global input)
The number of rows/columns in distributed matrix.
nb Integer. (global input)
Block size. The size of the blocks which the blocks that the distributed matrix is split into.
iproc Integer. (local input)
The coordinate of the processor with the local array row or column to be determined.
isrcproc Integer. (global input)
The coordinate of the processor that possesses the first row or column of the distributed matrix.
nprocs Integer. (global input)
The total number of processors over which the matrix is distributed.

004– 2081– 002 365


PCHEEVX ( 3S ) PCHEEVX ( 3S )

NAME
PCHEEVX – Computes selected eigenvalues and eigenvectors of a Hermitian-definite eigenproblem

SYNOPSIS
CALL PCHEEVX (jobZ, range, uplo, n, A, iA, jA, descA, vl, vu, il, iu, abstol, m, nZ, w,
orfac, Z, iZ, jZ, descZ, work, lwork, rwork, lrwork, iwork, ifail, iclustr, gap, info)

IMPLEMENTATION
UNICOS/mk systems

DESCRIPTION
PCHEEVX computes all the eigenvalues and, optionally, eigenvectors of a complex Hermitian matrix A by
calling the recommended sequence of ScaLAPACK routines. Eigenvalues/vectors can be selected by
specifying a range of vlues or a range of indices for the desired eigenvalues.
These routines require square block decomposition (MB_A = NB_A, as defined in the following comments).
A description vector is associated with each two-dimensional (2D) block-cyclicly distributed matrix. This
vector stores the information required to establish the mapping between a matrix entry and its corresponding
process and memory location.
The following comments describe the elements of a block-cyclicly distributed matrix. In these comments,
the underline character (_) should be read as "of the distributed matrix". Let A be a generic term for any 2D
block cyclicly distributed matrix. Its description vector is descA and must be initialized through a call to
DESCINIT(3S).
M_A The number of rows in the distributed matrix.
N_A The number of columns in the distributed matrix.
MB_A The blocking factor used to distribute the rows of the matrix.
NB_A The blocking factor used to distribute the columns of the matrix.
RSRC_A The process row over which the first row of the matrix is distributed.
CSRC_A The process column over which the first column of the matrix is distributed.
CTXT_A The BLACS context handle, indicating the BLACS process grid A is distributed over. The
context itself is global, but the handle (the integer value) may vary.
LLD_A The leading dimension of the local array storing the local blocks of the distributed matrix A.
LLD_A ≥ MAX(1,LOCp(M_A)).
Let K be the number of rows or columns of a distributed matrix, and assume that its process grid has
dimension p-by-q. LOCp( K ) denotes the number of elements of K that a process would receive if K were
distributed over the p processes of its process column.

366 004– 2081– 002


PCHEEVX ( 3S ) PCHEEVX ( 3S )

Similarly, LOCq( K ) denotes the number of elements of K that a process would receive if K were distributed
over the q processes of its process row.
The values of LOCp() and LOCq() may be determined via a call to the NUMROC(3S) ScaLAPACK tool
function, as in the following:
LOCp(M )=N UMR OC(M, MB_A, MYR OW, RSR C_A, NPR OW)

LOCq(N )=N UMR OC(N, NB_A, MYC OL, CSR C_A, NPC OL)

An upper bound for these quantities may be computed by:


LOCp( M ) <= ceil( cei l(M /MB_A)/NP ROW )*MB_A
LOCq( N ) <= ceil( cei l(N /NB_A)/NP COL )*N B_A

These routines have the following arguments:


NP=number of rows local to a given process.
NQ=number of columns local to a given process.
These routines accept the following arguments:
jobZ Character*1. (global input)
Specifies whether to compute the eigenvectors:
jobZ =’N’: Compute only eigenvalues.
jobZ =’V’: Compute eigenvalues and eigenvectors.
range Character*1. (global input)
range =’A’: All eigenvalues will be found.
range =’V’: All eigenvalues in the half-open interval (vl,vu) will be found.
range =’I’: The ilth through iuth eigenvalues will be found.
uplo Character. (global input)
Specifies whether the upper or lower triangular part of the Hermitian matrix A is stored:
uplo =’U’: Upper triangular
uplo =’L’: Lower triangular
n Integer. (global input)
The number of rows and columns of the matrix A. n ≥ 0.
A Block cyclic complex array. (local input/workspace)
Global dimension (N,N), local dimension (DESCA(DLEN_),NQ).
On entry, this array contains the Hermitian matrix A.
If uplo = ’U’, only the upper triangular part of A is used to define the elements of the Hermitian
matrix. If uplo = ’L’, only the lower triangular part of A is used to define the elements of the
Hermitian matrix.

004– 2081– 002 367


PCHEEVX ( 3S ) PCHEEVX ( 3S )

On exit, the lower triangle (if uplo = ’L’) or the upper triangle (if uplo = ’U’) of A, including the
diagonal, is destroyed.
iA Integer. (global input)
The global row index of A, which points to the beginning of the submatrix that will be operated
on.
jA Integer. (global input)
The global column index of A which points to the beginning of the submatrix that will be operated
on.
descA Integer array of dimension dlen_. (global input)
The array descriptor for the distributed matrix A. If descA(CTXT_ ) is incorrect, this routine
cannot guarantee correct error reporting.
vl Real. (global input)
If range=’V’, the lower bound of the interval to be searched for eigenvalues. If range =’A’ or ’I’,
it is not referenced.
vu Real. (global input)
If range =’V’, the upper bound of the interval to be searched for eigenvalues. If range =’A’ or ’I’,
it is not referenced.
il Integer. (global input)
If range =’I’, the index (from smallest to largest) of the smallest eigenvalue to be returned. il ≥ 1.
If range=’A’ or ’V’, it is not referenced.
iu Integer. (global input)
If range =’I’, the index (from smallest to largest) of the largest eigenvalue to be returned.
min(il,n) ≤ iu ≤ n. If range =’A’ or ’V’, it is not referenced.
abstol Real. (global input)
If jobZ=’V’, setting abstol to PSLAMCH(CONTEXT,’U’) yields the most orthogonal
eigenvectors.
This is the absolute error tolerance for the eigenvalues. An approximate eigenvalue is accepted as
converged when it is determined to lie in an interval [a,b] of width less than or equal to the
following:
abstol + eps * MAX(|a|,|b|)
eps is the machine precision. If abstol is ≤ 0, eps * norm(T) will be used in its place, where
norm(T) is the 1-norm of the tridiagonal matrix obtained by reducing A to tridiagonal form.
Eigenvalues will be computed most accurately when abstol is set to twice the underflow threshold
2*PSLAMCH(’S’) not zero. If this routine returns with ((MOD(INFO,2).NE.0).OR.
(MOD(INFO/8,2).NE.0)), indicating that some eigenvalues or eigenvectors did not converge,
try setting abstol to 2*PSLAMCH(’S’).

368 004– 2081– 002


PCHEEVX ( 3S ) PCHEEVX ( 3S )

m Integer. (global output)


Total number of eigenvalues found. 0 ≤ m ≤ n.
nZ Integer. (global output)
Total number of eigenvectors computed. 0 ≤ nZ ≤ m. The number of columns of Z that are filled.
If jobZ is not equal to ’V’, nz is not referenced. If jobZ is equal to ’V’, nz = m unless the user
supplies insufficient space and PCHEEVX is not able to detect this before beginning computation.
To get all of the eigenvectors requested, the user must supply both sufficient space to hold the
eigenvectors in Z (m ≤ descZ(n)) and sufficient workspace to compute them. (See lwork below.)
PCHEEVX can always detect insufficient space without computation, unless range=’V’.
w Real array, dimension (n). (global output)
On normal exit, the first m entries contain the selected eigenvalues in ascending order.
orfac Real. (global input)
Specifies which eigenvectors should be reorthogonalized. Eigenvectors that correspond to
eigenvalues that are within tol = orfac*norm(A) of each other are reorthogonalized. However,
if the workspace is insufficient (see lwork), tol may be decreased until all eigenvectors to be
reorthogonalized can be stored in one process. No reorthogonalization will be done if orfac equals
-3
zero. A default value of 10 is used if orfac is negative. orfac should be identical on all
processes.
Z Real array. (local output)
Global dimension (n, n), local dimension (descZ(CTXT_), NQ). If jobZ = ’V’, on normal exit
the first m columns of Z contain the orthonormal eigenvectors of the matrix that corresponds to the
selected eigenvalues. If an eigenvector fails to converge, then that column of Z contains the latest
approximation to the eigenvector, and the index of the eigenvector is returned in ifail. If jobZ =
’N’, Z is not referenced.
iZ Integer. (global input)
The global row index of the submatrix of the distributed matrix Z to operate on.
jZ Integer. (global input)
The global column index of the submatrix of the distributed matrix Z to operate on.
descZ Integer array of dimension 9. (input)
The array descriptor for the distributed matrix Z. descZ(CTXT_) must equal descACTXT_).
work Complex array, dimension (lwork). (local workspace/output)
On output, work(1) returns the workspace needed to guarantee completion, but not orthogonality of
the eigenvectors. If the input parameters are incorrect, work(1) may also be incorrect.
This will be modified in the future so if enough workspace is given to complete the request,
work(1) will return the amount of workspace needed to guarantee orthogonality. This is described
as follows:

004– 2081– 002 369


PCHEEVX ( 3S ) PCHEEVX ( 3S )

If info ≥ 0
if jobZ = ’N’, work(1) equals the minimal and optimal amount of workspace;
if jobZ = ’V’, work(1) equals the minimal amount of workspace required to guarantee
orthogonal eigenvectors on the given input matrix with the given ortol. In version 1.0,
work(1) equals the minimal workspace required to compute eigenvales.
If info<0, then
if jobZ=’N’, work(1) equals the minimal and optimal amount of workspace
if jobZ=’V’
if range=’A’ or range=’I’, then work(1) equals the minimal workspace required
to compute all eigenvectors (no guarantee on orthogonality).
if range=’V’, then work(1) equals the minimal workspace required to compute
N_Z=DESCZ(N_) eigenvectors (no guarantee on orthogonality). In version 1.0,
work(1) equals the minimal workspace required to compute eigenvalues.
lwork Integer. (locak input)
Size of work array. If only eigenvalues are requested, lwork ≥ N + (NPO + MQP + NB) *
NB. If eigenvectors are requested, lwork ≥ N + MAX(NB*(NPO+1),3)
rwork Real array, dimension (lrwork). (local workspace/output)
lrwork Integer. (local input) The following variable definitions are used to define lrwork:
NN = MAX ( N, NB, 2 )
NEI G = number of eigenvectors requested
NB = des cA( MB_ ) = descA( NB_ ) = descZ( MB_ ) = descZ( NB_ )
des cA( RSR C_ ) = descA( NB_ ) = des cZ( RSRC_ ) = descZ( CSR C_ ) = 0
NP0 = NUM ROC ( NN, NB, 0, 0, NPROW )
MQ0 = NUM ROC (MA X(NEIG ,NB ,2) NV,0,0 ,NP COL)
ICEIL( X, Y ) is a ScaLAPACK function returning ceiling (X/ Y)

If no eigenvectors are requested (jobZ = ’N’), lrwork ≥ 5*NN + 4 * N


If eigenvectors are requested (jobZ = ’V’), the amount of workspace required to guarantee that all
eigenvectors are computed is the following:
lrwork≥4*N+MAX(5*NN,NP0*MQ0+ +ICEIL(NEIG,NPROW*NPCOL)*NN
The computed eigenvectors may not be orthogonal if the minimal workspace is supplied and ortol
is too small. If you want to guarantee orthogonality (at the cost of potentially poor performance)
you should add the following to lwork:
(CLUSTERSIZE-1)*N
CLUSTERSIZE is the number of eigenvalues in the largest cluster, where a cluster is defined as a
set of close eigenvalues:

370 004– 2081– 002


PCHEEVX ( 3S ) PCHEEVX ( 3S )

{W(K),...,W(K+CLUSTERSIZE-1)|W(J+1)≤ W(J)+orfac*norm(A)}
If lrwork is too small to guarantee orthogonality, PCHEEVX attempts to maintain orthogonality in
the clusters with the smallest spacing between the eigenvalues. If lrwork is too small to compute
all of the eigenvectors requested, no computation is performed and info = – 25 is returned. Note
that when range = ’V’, PCHEEVX does not know how many eigenvectors are requested until the
eigenvalues are computed. Therefore, when range = ’V’ and as long as lwork is large enough to
allow PCHEEVX to compute the eigenvalues, PCHEEVX will compute the eigenvalues and as many
eigenvectors as it can.
Relationship between workspace, orthogonality, and performance:
If CLUSTERSIZE ≥ N/SQRT(NPROW*NPCOL), providing enough space to compute all the
eigenvectors orthogonally will cause serious degradation in performance. In the limit (i.e.
CLUSTERSIZE = N-1), PCSTEIN will perform no better than CSTEIN on one processor. For
CLUSTERSIZE = N/SQRT(NPROW*NPCOL) reorthogonalizing all eigenvectors will increase the
total execution time by a factor of 2 or more.
For CLUSTERSIZE > N/SQRT(NPROW*NPCOL), execution time will grow as the square of the
cluster size, all other factors remaining equal and assuming enough workspace. Less workspace
means less reorthogonalization but faster execution.
iwork Integer array. (local workspace)
On return, iwork(1) contains the amount of integer workspace required. If the input parameters are
incorrect, iwork(1) may also be incorrect.
liwork Integer. (local input)
Size of iwork. liwork ≥ 6*NNP
where:

NNP =MAX(N,NP ROW *NPCOL +1,4)

ifail Integer array, dimension (N). (global output)


If jobZ=’V’, then on normal exit, the first m elements of ifail are set to 0. If
(MOD(INFO,2).NE.0) on exit, ifail contains the indices of the eigenvectors that failed to
converge. If jobz=’N’ then ifail is not referenced.
iclustr Integer array, dimension (2*NPROW*NPCOL). (global output)
This array contains indices of eigenvectors that corresponds to a cluster of eigenvalues that could
not be reorthogonalized due to insufficient workspace (see lwork, orfac, and info). Eigenvectors
that correspond to clusters of eigenvalues indexed iclustr(2*I-1) to iclustr(2*I) could not be
reorthogonalized due to lack of workspace. Hence, the eigenvectors that correspond to these
clusters may not be orthogonal. iclustr() is a 0-terminated array. (iclustr(2*K).NE.0 .AND.
iclustr(2*K+1).EQ.0) if and only if K is the number of clusters. iclustr is not referenced if
jobZ =’N’.

004– 2081– 002 371


PCHEEVX ( 3S ) PCHEEVX ( 3S )

gap Real array, dimension (NPROW*NPCOL). (global output)


This array contains the gap between eigenvalues whose eigenvectors could not be reorthogonalized.
The output values in this array correspond to the clusters indicated by the iclustr array. Therefore,
the dot product between eigenvectors that corresponds to the Ith cluster may be as high as
(C*n)/GAP(I) where C is a small constant.
info Integer. (global output)
info = 0 Successful exit.
info < 0 If the ith argument is an array and the j-entry had an illegal value,
info = -(i*100+j); if the ith argument is a scalar and had an illegal value, info = – i.
info > 0 If (MOD(info,2).NE.0), one or more eigenvectors failed to converge. Their indices
are stored in ifail. Send email to scalapack@cs.utk.edu.
If (MOD(info/2,2).NE.0), eigenvectors corresponding to one or more clusters of
eigenvalues could not be reorthogonalized because of insufficient workspace. The
indices of the clusters are stored in the ICLUSTR array.
If (MOD(info/4,2).NE.0), space limitations prevented PCHEEVX from computing
all of the eigenvectors between vl and vu. The number of eigenvectors computed is
returned in nZ.
If (MOD(info/8,2).NE.0), PSSTEBZ failed to compute eigenvalues. Send email
to scalapack@cs.utk.edu.
Differences between PCHEEVX and CHEEVX
A,L DA- >A,IA, JA,DES CA
Z,L DZ- >Z,IZ, JZ,DES CZ

WORKSPACE needs are larger for PCHEEVX.


lwork, orfac, icluster, and gap parameters added.
The meaning of info is changed.
Functional differences: PCHEEVX dos not promise orthogonality for eigenvectors associated with tightly
clustered eigenvalues. PCHEEVX does not reorthogonalize eigenvectors that are on different processes. The
extent of reorthogonalization is controlled by the input parameter lwork.
Current limitations:

372 004– 2081– 002


PCHEEVX ( 3S ) PCHEEVX ( 3S )

DESCA( M_) = DES CA(NB_ )


IA=JA= 1
IZ=JZ= 1
DESCA( RSR C_) = DESCA( CSR C_) = 0
DESCZ( RSR C_) = DESCZ( CSR C_) = 0
DESCA( M_) = DES CZ(M_)
DESCA( N_) = DES CZ(N_)
DESCA( MB_ ) = DES CZ(MB_ )
DESCA( NB_ ) = DES CZ(NB_ )
DESCA( M_) = DES CZ(M_)
DES CA(RSR C_) = DES CA(CSRC_)
DES CZ(RSR C_) = DESCZ( CSRC_)

SEE ALSO
BLACS_GRIDINIT(3S), DESCINIT(3S), NUMROC(3S)

004– 2081– 002 373


PCHEGVX ( 3S ) PCHEGVX ( 3S )

NAME
PCHEGVX – Computes selected eigenvalues and eigenvectors of a Hermitian-definite generalized
eigenproblem

SYNOPSIS
CALL PCHEGVX (ibtype, jobZ, range, uplo, n, A, iA, jA, descA, B, iB, jB, descB, vl, vu,
il, iu, abstol, m, nZ, w, orfac, Z, iZ, jZ, descZ, work, lwork, rwork, lrwork, iwork, ifail,
iclustr, gap, info)

IMPLEMENTATION
UNICOS/mk systems

DESCRIPTION
PCHEGVX computes all the eigenvalues and, optionally, eigenvectors of a complex generalized Hermitian-
definite eigenproblem, of the form:
sub(A) *x=(la mbda)* sub(B) *x, sub (A)*su b(B)x= (lambd a)*x

or
sub(B) *sub(A )*x=(l ambda) *x

Here sub(A) denoting A(IA:IA+N-1, JA:JA+N-1) is assumed to be Hermitian, and sub(B)


denoting B(IB:IB+N-1, JB:JB+N-1) is assumed to be Hermitian positive definite.
These routines require square block decomposition (MB_A = NB_A, as defined in the following comments).
A description vector is associated with each two-dimensional (2D) block-cyclicly distributed matrix. This
vector stores the information required to establish the mapping between a matrix entry and its corresponding
process and memory location.
The following comments describe the elements of a block-cyclicly distributed matrix. In these comments,
the underline character (_) should be read as "of the distributed matrix". Let A be a generic term for any 2D
block cyclicly distributed matrix. Its description vector is descA and must be initialized through a call to
DESCINIT(3S).
M_A The number of rows in the distributed matrix.
N_A The number of columns in the distributed matrix.
MB_A The blocking factor used to distribute the rows of the matrix.
NB_A The blocking factor used to distribute the columns of the matrix.
RSRC_A The process row over which the first row of the matrix is distributed.
CSRC_A The process column over which the first column of the matrix is distributed.

374 004– 2081– 002


PCHEGVX ( 3S ) PCHEGVX ( 3S )

CTXT_A The BLACS context handle, indicating the BLACS process grid A is distributed over. The
context itself is global, but the handle (the integer value) may vary.
LLD_A The leading dimension of the local array storing the local blocks of the distributed matrix A.
LLD_A ≥ MAX(1,LOCp(M_A)).
Let K be the number of rows or columns of a distributed matrix, and assume that its process grid has
dimension p-by-q. LOCp( K ) denotes the number of elements of K that a process would receive if K were
distributed over the p processes of its process column.
Similarly, LOCq( K ) denotes the number of elements of K that a process would receive if K were distributed
over the q processes of its process row.
The values of LOCp() and LOCq() may be determined via a call to the NUMROC(3S) ScaLAPACK tool
function, as in the following:
LOC p(M)=N UMROC( M, MB_A, MYROW, RSRC_A, NPROW)

LOC q(N)=N UMROC( N, NB_A, MYCOL, CSRC_A, NPCOL)

An upper bound for these quantities may be computed by:


LOC p( M ) <= cei l( cei l(M/MB_A) /NPROW )*M B_A
LOC q( N ) <= cei l( cei l(N/NB_A) /NPCOL )*N B_A

These routines accept the following arguments:


ibtype Integer. (global input)
Specifies the problem type to be solved:
= 1: sub(A)*x = (lambda)*sub(B)*x
= 2: sub(A)*sub(B)*x = (lambda)*x
= 3: sub(B)*sub(A)*x = (lambda)*x
jobZ Character*1. (global input)
Specifies whether to compute the eigenvectors:
jobZ =’N’: Compute only eigenvalues.
jobZ =’V’: Compute eigenvalues and eigenvectors.
range Character*1. (global input)
range =’A’: All eigenvalues will be found.
range =’V’: All eigenvalues in the half-open interval (vl,vu) will be found.
range =’I’: The ilth through iuth eigenvalues will be found.
uplo Character. (global input)
Specifies whether the upper or lower triangular part of the symmetric matrix A is stored:
uplo =’U’: Upper triangle of sub(A) is stored.
uplo =’L’: Lower triangle of sub(A) is stored.

004– 2081– 002 375


PCHEGVX ( 3S ) PCHEGVX ( 3S )

n Integer. (global input)


The order of the matrices sub(A) and sub(B). n must be ≥ 0.
A Complex pointer into local memory. (local input/output)
Real pointer into the local memory to an array of dimension (LLD_A, LOCq(JA+N-1)).
On entry, this array contains the local pieces of the N-by-N Hermitian distributed matrix sub(A).
If uplo = ’U’, the leading N-by-N upper triangular part of sub(A) contains the upper triangular
part of the matrix. If uplo = ’L’, the leading N-by-N lower triangular part of sub(A) contains the
lower triangular part of the matrix.
On exit, if jobz = ’V’, then if info = 0, sub(A) contains the distributed matrix Z of eigenvectors.
The eigenvectors are normalized as follows:
if ibtype = 1 or 2, Z**H*sub( B )*Z = I
if ibtype = 3, Z**H*inv( sub( B ) )*Z = I.
If jobz = ’N’, then on exit the upper triangle (if uplo= ’U’) or the lower triangle (if uplo= ’L’) of
sub(A), including the diagonal, is destroyed.
iA Integer. (global input)
The global row index of A, which points to the beginning of the submatrix that will be operated
on.
jA Integer. (global input)
The global column index of A which points to the beginning of the submatrix that will be operated
on.
descA Integer array of dimension dlen_. (global input)
The array descriptor for the distributed matrix A. If descA(CTXT_ ) is incorrect, this routine
cannot guarantee correct error reporting.
B Complex pointer into local memory. (local input/output)
Real pointer into the local memory to an array of dimension (LLD_B, LOCq(JB+N-1)).
On entry, this array contains the local pieces of the N-by-N symmetric distributed matrix sub(B).
If uplo = ’U’, the leading N-by-N upper triangular part of sub(B) contains the upper triangular
part of the matrix. If uplo = ’L’, the leading N-by-N lower triangular part of sub(B) contains the
lower triangular part of the matrix.
On exit, if info ≤ n, the part of sub(B) containing the matrix is overwritten by the triangular
factor U or L from the Cholesky factorization sub(B) = U**H*U or sub(B) = L*L**H.
iB Integer. (global input)
The global row index of B, which points to the beginning of the submatrix that will be operated
on.
jB Integer. (global input)
The global column index of B which points to the beginning of the submatrix that will be operated
on.

376 004– 2081– 002


PCHEGVX ( 3S ) PCHEGVX ( 3S )

descB Integer array of dimension dlen_. (global input)


The array descriptor for the distributed matrix A. descB(CTXT_) must equal descA(CTXT_).
vl Real. (global input)
If range=’V’, the lower bound of the interval to be searched for eigenvalues. If range =’A’ or ’I’,
it is not referenced.
vu Real. (global input)
If range =’V’, the upper bound of the interval to be searched for eigenvalues. If range =’A’ or ’I’,
it is not referenced.
il Integer. (global input)
If range =’I’, the index (from smallest to largest) of the smallest eigenvalue to be returned. il ≥ 1.
If range=’A’ or ’V’, it is not referenced.
iu Integer. (global input)
If range =’I’, the index (from smallest to largest) of the largest eigenvalue to be returned.
min(il,n) ≤ iu ≤ n. If range =’A’ or ’V’, it is not referenced.
abstol Real. (global input)
If jobZ=’V’, setting abstol to PSLAMCH(CONTEXT,’U’) yields the most orthogonal
eigenvectors.
This is the absolute error tolerance for the eigenvalues. An approximate eigenvalue is accepted as
converged when it is determined to lie in an interval [a,b] of width less than or equal to the
following:
abstol + eps * MAX(|a|,|b|)
eps is the machine precision. If abstol is ≤ 0, eps * norm(T) will be used in its place, where
norm(T) is the 1-norm of the tridiagonal matrix obtained by reducing A to tridiagonal form.
Eigenvalues will be computed most accurately when abstol is set to twice the underflow threshold
2*PSLAMCH(’S’) not zero. If this routine returns with ((MOD(INFO,2).NE.0).OR.
(MOD(INFO/8,2).NE.0)), indicating that some eigenvalues or eigenvectors did not converge,
try setting abstol to 2*PSLAMCH(’S’).
m Integer. (global output)
Total number of eigenvalues found. 0 ≤ m ≤ n.
nZ Integer. (global output)
Total number of eigenvectors computed. 0 ≤ nZ ≤ m. The number of columns of Z that are filled.
If jobZ is not equal to ’V’, nz is not referenced. If jobZ is equal to ’V’, nz = m unless the user
supplies insufficient space and PCHEGVX is not able to detect this before beginning computation.
To get all of the eigenvectors requested, the user must supply both sufficient space to hold the
eigenvectors in Z (m ≤ descZ(n)) and sufficient workspace to compute them. (See lwork below.)
PCHEGVX can always detect insufficient space without computation, unless range=’V’.

004– 2081– 002 377


PCHEGVX ( 3S ) PCHEGVX ( 3S )

w Real array, dimension (n). (global input)


On normal exit, the first m entries contain the selected eigenvalues in ascending order.
orfac Real. (global input)
Specifies which eigenvectors should be reorthogonalized. Eigenvectors that correspond to
eigenvalues that are within tol = orfac*norm(A) of each other are reorthogonalized. However,
if the workspace is insufficient (see lwork), tol may be decreased until all eigenvectors to be
reorthogonalized can be stored in one process. No reorthogonalization will be done if orfac equals
-3
zero. A default value of 10 is used if orfac is negative. orfac should be identical on all
processes.
Z Real array. (local output)
Global dimension (n, n), local dimension (descZ(CTXT_), NQ). If jobZ = ’V’, on normal exit
the first m columns of Z contain the orthonormal eigenvectors of the matrix that corresponds to the
selected eigenvalues. If an eigenvector fails to converge, then that column of Z contains the latest
approximation to the eigenvector, and the index of the eigenvector is returned in ifail. If jobZ =
’N’, Z is not referenced.
iZ Integer. (global input)
The global row index of the submatrix of the distributed matrix Z to operate on.
jZ Integer. (global input)
The global column index of the submatrix of the distributed matrix Z to operate on.
descZ Integer array of dimension 9. (input)
The array descriptor for the distributed matrix Z. descZ(CTXT_) must equal descACTXT_).
work Real array, dimension (work). (local workspace/output)
On output, work(1) returns the workspace needed to guarantee completion, but not orthogonality of
the eigenvectors. If the input parameters are incorrect, work(1) may also be incorrect.
If info ≥ 0
if jobZ = ’N’, work(1) equals the minimal and optimal amount of workspace;
if jobZ = ’V’, work(1) equals the minimal amount of workspace required to guarantee
orthogonal eigenvectors on the given input matrix with the given ortol. In version 1.0,
work(1) = the minimal workspace required to compute eigenvales.
If info<0, then
if jobZ=’N’, work(1) equals the minimal and optimal amount of workspace
if jobZ=’V’
if range=’A’ or range=’I’, then work(1) equals the minimal workspace required
to compute all eigenvectors (no guarantee on orthogonality).

378 004– 2081– 002


PCHEGVX ( 3S ) PCHEGVX ( 3S )

if range=’V’, then work(1) equals the minimal workspace required to compute


N_Z=DESCZ(N_) eigenvectors (no guarantee on orthogonality). In version 1.0,
work(1) equals the minimal workspace required to compute eigenvalues.
lwork Integer. (locak input)
Size of work array. If only eigenvalues are requested, lwork ≥ N + (NPO + MQP + NB) *
NB. If eigenvectors are requested, lwork ≥ N + MAX(NB*(NPO+!),3)
rwork Real array, dimension (lrwork). (local workspace/output)
lrwork Integer. (local input) The following variable definitions are used to define lrwork:
NN = MAX ( N, NB, 2 )
NEI G = number of eigenvectors requested
NB = des cA( MB_ ) = des cA( NB_ ) = des cZ( MB_ ) = des cZ( NB_ )
descA( RSR C_ ) = des cA( NB_ ) = des cZ( RSRC_ ) = des cZ( CSRC_ ) = 0
NP0 = NUMROC ( NN, NB, 0, 0, NPR OW )
MQ0 = NUMROC (MAX(N EIG ,NB ,2)NV, 0,0 ,NP COL )
ICE IL( X, Y ) is a ScaLAPACK function returning ceiling (X/Y)

If no eigenvectors are requested (jobZ = ’N’), lrwork ≥ 5*NN + 4 * N


If eigenvectors are requested (jobZ = ’V’), the amount of workspace required to guarantee that all
eigenvectors are computed is the following:
lrwork≥4*N+MAX(5*NN,NP0*MQ0+ +ICEIL(NEIG,NPROW*NPCOL)*NN
The computed eigenvectors may not be orthogonal if the minimal workspace is supplied and ortol
is too small. If you want to guarantee orthogonality (at the cost of potentially poor performance)
you should add the following to lwork:
(CLUSTERSIZE-1)*N
CLUSTERSIZE is the number of eigenvalues in the largest cluster, where a cluster is defined as a
set of close eigenvalues:
{W(K),...,W(K+CLUSTERSIZE-1)|W(J+1)≤ W(J)+orfac*norm(A)}
If lrwork is too small to guarantee orthogonality, PCHEGVX attempts to maintain orthogonality in
the clusters with the smallest spacing between the eigenvalues. If lrwork is too small to compute
all of the eigenvectors requested, no computation is performed and info = – 25 is returned. Note
that when range = ’V’, PCHEGVX does not know how many eigenvectors are requested until the
eigenvalues are computed. Therefore, when range = ’V’ and as long as lwork is large enough to
allow PCHEGVX to compute the eigenvalues, PCHEGVX will compute the eigenvalues and as many
eigenvectors as it can.
Relationship between workspace, orthogonality, and performance:

004– 2081– 002 379


PCHEGVX ( 3S ) PCHEGVX ( 3S )

If CLUSTERSIZE ≥ N/SQRT(NPROW*NPCOL), providing enough space to compute all the


eigenvectors orthogonally will cause serious degradation in performance. In the limit (i.e.
CLUSTERSIZE = N-1), PSSTEIN will perform no better than SSTEIN on one processor. For
CLUSTERSIZE = N/SQRT(NPROW*NPCOL) reorthogonalizing all eigenvectors will increase the
total execution time by a factor of 2 or more.
For CLUSTERSIZE > N/SQRT(NPROW*NPCOL), execution time will grow as the square of the
cluster size, all other factors remaining equal and assuming enough workspace. Less workspace
means less reorthogonalization but faster execution.
iwork Integer array. (local workspace)
On return, iwork(1) contains the amount of integer workspace required. If the input parameters are
incorrect, iwork(1) may also be incorrect.
liwork Integer. (local input)
Size of iwork. liwork ≥ 6*NNP
where:

NNP =MAX(N,NP ROW *NPCOL+1, 4)

ifail Integer array, dimension (N). (global output)


ifail provides additional information when INFO.NE.0. If (MOD(INFO/16,2).NE.0) then
ifail(1) indicates the order of the smallest minor which is not positive definite. If
(MOD(INFO,2).NE.0) on exit, then ifail contains the indices of the eigenvectors that failed to
converge.
If neither of the above error conditions hold and jobZ=’V’, then the first m elements of ifail are set
to 0.
iclustr Integer array, dimension (2*NPROW*NPCOL). (global output)
This array contains indices of eigenvectors that corresponds to a cluster of eigenvalues that could
not be reorthogonalized due to insufficient workspace (see lwork, orfac, and info). Eigenvectors
that correspond to clusters of eigenvalues indexed iclustr(2*I-1) to iclustr(2*I) could not be
reorthogonalized due to lack of workspace. Hence, the eigenvectors that correspond to these
clusters may not be orthogonal. iclustr() is a 0-terminated array. (iclustr(2*K).NE.0 .AND.
iclustr(2*K+1).EQ.0) if and only if K is the number of clusters. iclustr is not referenced if
jobZ =’N’.
gap Real array, dimension (NPROW*NPCOL). (global output)
This array contains the gap between eigenvalues whose eigenvectors could not be reorthogonalized.
The output values in this array correspond to the clusters indicated by the iclustr array. Therefore,
the dot product between eigenvectors that corresponds to the Ith cluster may be as high as
(C*n)/GAP(I) where C is a small constant.
Current limitations:

380 004– 2081– 002


PCHEGVX ( 3S ) PCHEGVX ( 3S )

DESCA( MB_ )=(DES CA(NB_ )


IA=JA= 1
IZ=JZ= 1
DES CA(RSR C_)=DE SCA (CSRC_)=0
DES CA(M_) =DE SCB (M_)=D ESCZ(M _)
DESCA( N_) =DESCB (N_)=D ESC Z(N _)
DESCA( MB_ )=DESC B(MB_) =DE SCZ (MB _)
DESCA( NB_ )=DESC B(NB_) =DE SCZ (NB _)
DESCA( RSRC_) =DE SCB(RS RC_ )=D ESC Z(R SRC_)
DES CA( CSRC_) =DESCB (CS RC_ )=D ESCZ(C SRC _)

info Integer. (global output)


info = 0 Successful exit.
info < 0 If the ith argument is an array and the j-entry had an illegal value,
info = -(i*100+j); if the ith argument is a scalar and had an illegal value, info = – i.
info > 0 If (MOD(info,2).NE.0), one or more eigenvectors failed to converge. Their indices
are stored in ifail. Send email to scalapack@cs.utk.edu.
If (MOD(info/2,2).NE.0), eigenvectors corresponding to one or more clusters of
eigenvalues could not be reorthogonalized because of insufficient workspace. The
indices of the clusters are stored in the ICLUSTR array.
If (MOD(info/4,2).NE.0), space limitations prevented PCHEGVX from computing
all of the eigenvectors between vl and vu. The number of eigenvectors computed is
returned in nZ.
If (MOD(info/8,2).NE.0), PSSTEBZ failed to compute eigenvalues. Send email
to scalapack@cs.utk.edu.
If (MOD(info/16,2).NE.0), B was not positive definite. ifail(1) indicates the order
of the smallest minor which is not positive definite.

SEE ALSO
BLACS_GRIDINIT(3S), DESCINIT(3S), NUMROC(3S)

004– 2081– 002 381


PSGEBRD ( 3S ) PSGEBRD ( 3S )

NAME
PSGEBRD, PCGEBRD – Reduces a real or complex distributed matrix to bidiagonal form

SYNOPSIS
CALL PSGEBRD (m, n, A, iA, jA, descA, D, E, tauQ, tauP, work, liwork, info)
CALL PCGEBRD (m, n, A, iA, jA, descA, D, E, tauQ, tauP, work, liwork, info)

IMPLEMENTATION
UNICOS/mk systems

DESCRIPTION
PSGEBRD and PCGEBRD reduce a real or complex general m-by-n distributed matrix of the following form:
sub(A)=A(iA:iA+n-1,jA:jA+n-1)
to upper or lower bidiagonal form B by the following orthogonal transformation:
Q’ sub(A)*P = B
If m ≥ n, B is upper bidiagonal; if m < n, B is lower bidiagonal.
These routines require square block decomposition (MB_A = NB_A, as defined in the following comments).
A description vector is associated with each two-dimensional (2D) block-cyclicly distributed matrix. This
vector stores the information required to establish the mapping between a matrix entry and its corresponding
process and memory location.
The following comments describe the elements of a block-cyclicly distributed matrix. In these comments,
the underline character (_) should be read as "of the distributed matrix". Let A be a generic term for any 2D
block cyclicly distributed matrix. Its description vector is descA and must be initialized through a call to
DESCINIT(3S).
M_A The number of rows in the distributed matrix.
N_A The number of columns in the distributed matrix.
MB_A The blocking factor used to distribute the rows of the matrix.
NB_A The blocking factor used to distribute the columns of the matrix.
RSRC_A The process row over which the first row of the matrix is distributed.
CSRC_A The process column over which the first column of the matrix is distributed.
CTXT_A The BLACS context handle, indicating the BLACS process grid A is distributed over. The
context itself is global, but the handle (the integer value) may vary.
LLD_A The leading dimension of the local array storing the local blocks of the distributed matrix A.
LLD_A ≥ MAX(1,LOCp(M_A)).

382 004– 2081– 002


PSGEBRD ( 3S ) PSGEBRD ( 3S )

Let K be the number of rows or columns of a distributed matrix, and assume that its process grid has
dimension p-by-q. LOCp( K ) denotes the number of elements of K that a process would receive if K were
distributed over the p processes of its process column.
Similarly, LOCq( K ) denotes the number of elements of K that a process would receive if K were distributed
over the q processes of its process row.
The values of LOCp() and LOCq() may be determined via a call to the NUMROC(3S) ScaLAPACK tool
function, as in the following:
LOC p(M)=N UMROC( M, MB_A, MYROW, RSRC_A, NPROW)

LOC q(N)=N UMROC( N, NB_A, MYCOL, CSRC_A, NPCOL)

These routines accept the following arguments. For PCGEBRD, the following real arguments must be
complex:
m Integer. (global input)
The number of rows to be operated on (the order of the distributed submatrix sub(A)).
n Integer. (global input)
The number of columns to be operated on (the number of columns of the distributed submatrix
sub(A)). n must be ≥ 0.
A Real pointer into the local memory to an array of dimension (LLD_A, LOCq(jA+n– 1). (local
input/local output)
On entry, this array contains the local pieces of the general distributed matrix sub(A).
On exit, if m ≥ n, the diagonal and the first superdiagonal of sub(A) are overwritten with the upper
bidiagonal matrix B; the elements below the diagonal, with the array tauQ, represent the orthogonal
matrix Q as a product of elementary reflectors, and the elements above the first superdiagonal, with
the array tauP, represent the orthogonal matrix P as a product of elementary reflectors.
If m < n, the diagonal and the first subdiagonal are overwritten with the lower bidiagonal matrix B;
the elements below the first subdiagonal, with the array tauQ, represent the orthogonal matrix Q as
a product of elementary reflectors, and the elements above the diagonal, with the array tauP,
represent the orthogonal matrix P as a product of elementary reflectors. See the Further Details
subsection for more information.
iA Integer. (global input)
The global row index of A, which points to the beginning of the submatrix that will be operated on.
jA Integer. (global input)
The global column index of A which points to the beginning of the submatrix that will be operated
on.
descA Integer array of dimension 9. (input)
The array descriptor for the distributed matrix A.

004– 2081– 002 383


PSGEBRD ( 3S ) PSGEBRD ( 3S )

D Real array. (local output)


If m ≥ n, the array dimension is LOCq(jA+MIN(m,n)-1). Otherwise, the dimension is
LOCp(iA+MIN(m,n)-1).
The distributed diagonal elements of the bidiagonal matrix B: D(i) = A(i,i). D is tied to the
distributed matrix A.
E Real array. (local output)
If m ≥ n, the array dimension is LOCp(iA+MIN(m,n)-1). Otherwise, the dimension is
LOCq(jA+MIN(m,n)-2).
The distributed off-diagonal elements of the bidiagonal distributed matrix B:
if m ≥ n, E(i) = A(i,i+1) for i = 1,2,...,n-1
if m < n, E(i) = A(i+1,i) for i = 1,2,...,m-1
E is tied to the distributed matrix A.
tauQ Real array, dimension LOCq(jA+MIN(m,n)-1). (local output)
This array contains the scalar factors of the elementary reflectors which represent the orthogonal
matrix Q. tauQ is tied to the distributed matrix A.
tauP Real array, dimension LOCp(iA+MIN(m,n)-1). (local output)
This array contains the scalar factors of the elementary reflectors which represent the orthogonal
matrix P. tauP is tied to the distributed matrix A.
work Real array, dimension (lwork). (local workspace)
On exit, work(1) returns the minimal and optimal lwork.
lwork Integer. (local input)
lwork=NB*(MpA0+NqA0+1)+NqA0
where

IROFF = MOD( IA-1, NB ), ICOFF = MOD( JA-1, NB)


IAR OW = IND XG2P( IA, NB, MYR OW, RSR C_A, NPR OW )
IAC OL = IND XG2P( JA, NB, MYC OL, CSR C_A, NPC OL )
MpA 0 = NUM ROC ( M+IROF F, NB, MYR OW, IAR OW, NPR OW )
NqA 0 = NUM ROC ( N+ICOF F, NB, MYC OL, IAC OL, NPC OL )

and NUMROC(3S) and INDXG2P(3S) are ScaLAPACK tool functions; MYROW, MYCOL, NPROW, and
NPCOL can be determined by calling the BLACS_GRIDINFO(3S) subroutine.
info Integer. (global output)
info = 0 Successful exit.
info < 0 If the ith argument is an array and the j-entry had an illegal value, info = -(i*100+j);
if the ith argument is a scalar and had an illegal value, info = -i.

384 004– 2081– 002


PSGEBRD ( 3S ) PSGEBRD ( 3S )

Alignment Requirements
The distributed submatrix sub(A) must verify some alignment properties, namely the following expressions
should be true:
(MB_A. EQ.NB_ A .AN D. IRO FFA.EQ .IC OFF A)

Further Details
The matrices Q and P are represented as products of elementary reflectors (if m ≥ n):
Q = H(1) H(2)...H( n) and P=G(1) G(2).. .G(n-1 )
Q = H(1 ) H(2 )...H( n)

Each H(i) and G(i) have the form:


H(i ) = I - tauQ * v * v’ and G(i) = I - tau P * u * u’

where tauQ and tauP are real scalars, and v and u are real vectors; v(1:i-1)=0, v(i)=1, and
v(i+1:m) is stored on exit in A(iA+i:iA+m-1,jA+i-1); u(1:i)=0, u(i+1)=1, and u(i+2:n) is
stored on exit in A(iA+i-1,jA+i+1:jA+n-1); tauQ is stored in tauQ(jA+i-1), and tauP is stored
in tauP(iA+i-1).
If m < n,
Q = H(1) H(2).. .H( m-1 ) and P = G(1) G(2 )...G( m)

Each H(i) and G(i) has the following form:


H(i) = I - tau Q * v * v’ and G(i) = I - tau P * u * u’

where tauQ and tauP are real scalars, and v and u are real vectors; v(1:i)=0, v(i+1)=1, and
v(i+2:m) is stored on exit in A(iA+i+1:iA+m-1,jA+i-1); u(1:i-1)=0, u(i)=1, and u(i+1:n)
is stored on exit in A(iA+i-1,jA+i:jA+n-1); tauQ is stored in tauQ(jA+i-1) and tauP is stored
in tauP(iA+i-1)
The following examples illustrate the contents of sub(A) on exit:
(m > n) (m < n)
m = 6 and n =5 m = 5 and n = 6
( d e u1 u1 u1 ) ( d u1 u1 u1 u1 u1 )
( v1 d e u2 u2 ) ( e d u2 u2 u2 u2 )
( v1 v2 d e u3 ) ( v1 e d u3 u3 u3 )
( v1 v2 v3 d e ) ( v1 v2 e d u4 u4 )
( v1 v2 v3 v4 d ) ( v1 v2 v3 e d u5 )
( v1 v2 v3 v4 v5 )

where d and e denote diagonal and off-diagonal elements of B, v1 denotes an element of the vector defining
H(i), and u1 an element of the vector defining G(i).

004– 2081– 002 385


PSGEBRD ( 3S ) PSGEBRD ( 3S )

BLACS_GRIDINIT(3S) must be called to initialize the virtual BLACS grid.

SEE ALSO
BLACS_GRIDINFO(3S), BLACS_GRIDINIT(3S), DESCINIT(3S), INDXG2P(3S), NUMROC(3S)

386 004– 2081– 002


PSGELQF ( 3S ) PSGELQF ( 3S )

NAME
PSGELQF, PCGELQF – Computes an LQ factorization of a real or complex distributed matrix

SYNOPSIS
CALL PSGELQF (m, n, A, iA, jA, descA, tau, work, lwork, info)
CALL PCGELQF (m, n, A, iA, jA, descA, tau, work, lwork, info)

IMPLEMENTATION
UNICOS/mk systems

DESCRIPTION
PSGELQF and PCGELQF compute a LQ factorization of a real or complex distributed m-by-n matrix:
sub(A)=A(iA:iA+n-1,jA:jA+n-1)=L*Q
These routines require square block decomposition (MB_A = NB_A, as defined in the following comments).
A description vector is associated with each two-dimensional (2D) block-cyclicly distributed matrix. This
vector stores the information required to establish the mapping between a matrix entry and its corresponding
process and memory location.
The following comments describe the elements of a block-cyclicly distributed matrix. In these comments,
the underline character (_) should be read as "of the distributed matrix". Let A be a generic term for any 2D
block cyclicly distributed matrix. Its description vector is descA and must be initialized through a call to
DESCINIT(3S).
M_A The number of rows in the distributed matrix.
N_A The number of columns in the distributed matrix.
MB_A The blocking factor used to distribute the rows of the matrix.
NB_A The blocking factor used to distribute the columns of the matrix.
RSRC_A The process row over which the first row of the matrix is distributed.
CSRC_A The process column over which the first column of the matrix is distributed.
CTXT_A The BLACS context handle, indicating the BLACS process grid A is distributed over. The
context itself is global, but the handle (the integer value) may vary.
LLD_A The leading dimension of the local array storing the local blocks of the distributed matrix A.
LLD_A ≥ MAX(1,LOCp(M_A)).
Let K be the number of rows or columns of a distributed matrix, and assume that its process grid has
dimension p-by-q. LOCp( K ) denotes the number of elements of K that a process would receive if K were
distributed over the p processes of its process column.

004– 2081– 002 387


PSGELQF ( 3S ) PSGELQF ( 3S )

Similarly, LOCq( K ) denotes the number of elements of K that a process would receive if K were distributed
over the q processes of its process row.
The values of LOCp() and LOCq() may be determined via a call to the NUMROC(3S) ScaLAPACK tool
function, as in the following:
LOC p( M )=NUMR OC( M, MB_ A, MYR OW, RSRC_A , NPROW )

LOC q( N )=NUMR OC( N, NB_ A, MYC OL, CSRC_A , NPCOL )

These routines accept the following arguments. For PCGELQF, the following arguments must be complex:
m Integer. (global input)
The number of rows to be operated on; that is, the order of the distributed submatrix sub(A). m
must be ≥ 0.
n Integer. (global input)
The number of columns to be operated on; that is, the number of columns of the distributed
submatrix sub(A). n must be ≥ 0.
A Real pointer into the local memory to an array of dimension (LLD_A, LOCq(jA+n– 1). (local
input/local output)
On entry, the local pieces of the m-by-n distributed matrix sub(A) to be factored.
On exit, the elements on and below the diagonal of sub(A) contain the (MIN(m,n)-by-m) lower
trapezoidal matrix L (L is lower triangular if m ≤ n); the elements above the diagonal, with the array
tau, represent the orthogonal matrix Q as a product of elementary reflectors. See the Further Details
subsection for more information.
iA Integer. (global input)
The global row index of A, which points to the beginning of the submatrix that will be operated on.
jA Integer. (global input)
The global column index of A which points to the beginning of the submatrix that will be operated
on.
descA Integer array of dimension 9. (input)
The array descriptor for the distributed matrix A.
tau Real array, dimension LOCp(iA+MIN(m,n)-1. (local output)
This array contains the scalar factors tau of the elementary reflectors. tau is tied to the distributed
matrix A.
work Real array, dimension (lwork). (local workspace)
On exit, work(1) returns the minimal and optimal lwork.
lwork Integer. (local input)
The dimension of the array work.

388 004– 2081– 002


PSGELQF ( 3S ) PSGELQF ( 3S )

lwork ≥ MB_A * (Mp0 + Nq0 + MB_A)


where

IROFF = MOD ( IA-1, MB_A ), ICOFF = MOD( JA-1, NB_A )


IAR OW = IND XG2P( IA, MB_ A, MYR OW, RSRC_A, NPROW )
IAC OL = IND XG2P( JA, NB_ A, MYC OL, CSRC_A, NPCOL )
Mp0 = NUM ROC( M+IROF F, MB_A, MYROW, IAR OW, NPR OW )
Nq0 = NUM ROC( N+ICOF F, NB_A, MYCOL, IAC OL, NPC OL )

and NUMROC(3S) and INDXG2P(3S) are ScaLAPACK tool functions; MYROW, MYCOL, NPROW, and
NPCOL can be determined by calling the BLACS_GRIDINFO(3S) subroutine.
info Integer. (global output)
info = 0 Successful exit.
info < 0 If the ith argument is an array and the j-entry had an illegal value, info = -(i*100+j);
if the ith argument is a scalar and had an illegal value, info = – i.
Further Details
The matrix Q is represented as a product of elementary reflectors:
Q = H(i A+k -1) H(i A+k -2) ...H(i a)

where k=MIN(m,n).
Each H(i) has the following form:
H = I - tau * v * v’

where tau is a real scalar, and v is a real vector with v(1:i-1)=0 and v(i)=1; v(i+1:n) is stored on
exit in A(iA+i-1:jA+i-1,jA+n-1) and tau is stored in tau(iA+i-1).

NOTES
BLACS_GRIDINIT(3S) must be called to initialize the virtual BLACS grid.

SEE ALSO
BLACS_GRIDINFO(3S), BLACS_GRIDINIT(3S), DESCINIT(3S), INDXG2P(3S), NUMROC(3S)

004– 2081– 002 389


PSGEQLF ( 3S ) PSGEQLF ( 3S )

NAME
PSGEQLF, PCGEQLF – Computes a QL factorization of a real or complex distributed matrix

SYNOPSIS
CALL PSGEQLF (m, n, A, iA, jA, descA, tau, work, lwork, info)
CALL PCGEQLF (m, n, A, iA, jA, descA, tau, work, lwork, info)

IMPLEMENTATION
UNICOS/mk systems

DESCRIPTION
PSGEQLF and PCGELQF compute a QL factorization of a real or complex distributed m-by-n matrix:
sub(A)=A(iA:iA+m-1,jA:jA+n-1)=Q*L
These routines require square block decomposition (MB_A = NB_A, as defined in the following comments).
A description vector is associated with each two-dimenstional (2D) block-cyclicly distributed matrix. This
vector stores the information required to establish the mapping between a matrix entry and its corresponding
process and memory location.
The following comments describe the elements of a block-cyclicly distributed matrix. In these comments,
the underline character (_) should be read as "of the distributed matrix". Let A be a generic term for any 2D
block cyclicly distributed matrix. Its description vector is descA and must be initialized through a call to
DESCINIT(3S).
M_A The number of rows in the distributed matrix.
N_A The number of columns in the distributed matrix.
MB_A The blocking factor used to distribute the rows of the matrix.
NB_A The blocking factor used to distribute the columns of the matrix.
RSRC_A The process row over which the first row of the matrix is distributed.
CSRC_A The process column over which the first column of the matrix is distributed.
CTXT_A The BLACS context handle, indicating the BLACS process grid A is distributed over. The
context itself is global, but the handle (the integer value) may vary.
LLD_A The leading dimension of the local array storing the local blocks of the distributed matrix A.
LLD_A ≥ MAX(1,LOCp(M_A)).
Let K be the number of rows or columns of a distributed matrix, and assume that its process grid has
dimension p-by-q. LOCp( K ) denotes the number of elements of K that a process would receive if K were
distributed over the p processes of its process column.

390 004– 2081– 002


PSGEQLF ( 3S ) PSGEQLF ( 3S )

Similarly, LOCq( K ) denotes the number of elements of K that a process would receive if K were distributed
over the q processes of its process row.
The values of LOCp() and LOCq() may be determined via a call to the NUMROC(3S) ScaLAPACK tool
function, as in the following:
LOCp( M ) = NUM ROC( M, MB_ A, MYR OW, RSRC_A, NPROW )

LOC q( N ) = NUM ROC ( N, NB_A, MYC OL, CSR C_A, NPC OL )

These routines accept the following arguments. For PCGEQLF, the following real arguments must be
complex:
m Integer. (global input)
The number of rows to be operated on (the order of the distributed submatrix sub(A)). m must be ≥
0.
n Integer. (global input)
The number of columns to be operated on (the number of columns of the distributed submatrix
sub(A)). n must be ≥ 0.
A Real pointer into the local memory to an array of dimension (LLD_A, LOCq(jA+n– 1). (local
input/local output)
On entry, the local pieces of the m-by-n distributed matrix sub(A) to be factored.
On exit, if m ≥ n, the lower triangle of the distributed submatrix sub(A) contains the n-by-n lower
triangular matrix L. If m ≤ n, the elements on and below the (n-to-m)-th superdiagonal contain the
m-by-n lower trapezoidal matrix L. The remaining elements, with the array tau, represent the
orthogonal matrix Q as a product of elementary reflectors. See the Further Details subsection for
more information.
iA Integer. (global input)
The global row index of A, which points to the beginning of the submatrix that will be operated on.
jA Integer. (global input)
The global column index of A which points to the beginning of the submatrix that will be operated
on.
descA Integer array of dimension 9. (input)
The array descriptor for the distributed matrix A.
tau Real array, dimension LOCq(N_A). (local output)
This array contains the scalar factors tau of the elementary reflectors. tau is tied to the distributed
matrix A.
work Real array, dimension (lwork). (local workspace)
On exit, work(1) returns the minimal and optimal lwork.

004– 2081– 002 391


PSGEQLF ( 3S ) PSGEQLF ( 3S )

lwork Integer. (local input)


The dimension of the array work.
lwork ≥ NB_A*(Mp0 + Nq0 + NB_A)
where

IRO FF = MOD ( IA-1, MB_A ), ICOFF = MOD( JA-1, NB_ A )


IAR OW = IND XG2 P( IA, MB_A, MYROW, RSRC_A, NPROW )
IAC OL = IND XG2 P( JA, NB_A, MYCOL, CSRC_A, NPCOL )
Mp0 = NUMROC ( M+IROF F, MB_A, MYR OW, IAR OW, NPROW )
Nq0 = NUM ROC( N+I COFF, NB_ A, MYCOL, IACOL, NPCOL )

and NUMROC(3S) and INDXG2P(3S) are ScaLAPACK tool functions; MYROW, MYCOL, NPROW, and
NPCOL can be determined by calling the BLACS_GRIDINFO(3S) subroutine.
info Integer. (global output)
info = 0 Successful exit.
info < 0 If the ith argument is an array and the j-entry had an illegal value, info = -(i*100+j);
if the ith argument is a scalar and had an illegal value, then info = – i.
Further Details
The matrix Q is represented as a product of elementary reflectors:
Q = H(ja+k -1) ... H(j a+1) (H(ja)

where k=MIN(m,n)
Each H(i) has the following form:
H = I - tau * v * v’

where tau is a real scalar, and v is a real vector with v(m-k+i+1:m)=0 and v(m-k+i)=1; v(1:m-
k+i-1) is stored on exit in A(iA:iA+m-k+i-2,jA+n-k+i-1), and tau is stored in
tau(jA+n-k+i-1).

NOTES
BLACS_GRIDINIT(3S) must be called to initialize the virtual BLACS grid.

SEE ALSO
BLACS_GRIDINFO(3S), BLACS_GRIDINIT(3S), DESCINIT(3S), INDXG2P(3S), NUMROC(3S)

392 004– 2081– 002


PSGEQPF ( 3S ) PSGEQPF ( 3S )

NAME
PSGEQPF, PCGEQPF – Computes a QR factorization with column pivoting of a real or complex distributed
matrix

SYNOPSIS
CALL PSGEQPF (m, n, A, iA, jA, descA, ipiv, tau, work, lwork, info)
CALL PCGEQPF (m, n, A, iA, jA, descA, ipiv, tau, work, lwork, info)

IMPLEMENTATION
UNICOS/mk systems

DESCRIPTION
PSGEQPF and PCGEQPF compute a QR factorization with column pivoting of a real or complex m-by-n
distributed matrix:
sub(A)=A(iA:iA+n-1,jA:jA+n-1)
sub(A)*P = Q*R
These routines require square block decomposition (MB_A = NB_A, as defined in the following comments).
A description vector is associated with each two-dimensional (2D) block-cyclicly distributed matrix. This
vector stores the information required to establish the mapping between a matrix entry and its corresponding
process and memory location.
The following comments describe the elements of a block-cyclicly distributed matrix. In these comments,
the underline character (_) should be read as "of the distributed matrix". Let A be a generic term for any 2D
block cyclicly distributed matrix. Its description vector is descA and must be initialized through a call to
DESCINIT(3S).
M_A The number of rows in the distributed matrix.
N_A The number of columns in the distributed matrix.
MB_A The blocking factor used to distribute the rows of the matrix.
NB_A The blocking factor used to distribute the columns of the matrix.
RSRC_A The process row over which the first row of the matrix is distributed.
CSRC_A The process column over which the first column of the matrix is distributed.
CTXT_A The BLACS context handle, indicating the BLACS process grid A is distributed over. The
context itself is global, but the handle (the integer value) may vary.
LLD_A The leading dimension of the local array storing the local blocks of the distributed matrix A.
LLD_A ≥ MAX(1,LOCp(M_A)).

004– 2081– 002 393


PSGEQPF ( 3S ) PSGEQPF ( 3S )

Let K be the number of rows or columns of a distributed matrix, and assume that its process grid has
dimension p-by-q. LOCp( K ) denotes the number of elements of K that a process would receive if K were
distributed over the p processes of its process column.
Similarly, LOCq( K ) denotes the number of elements of K that a process would receive if K were distributed
over the q processes of its process row.
The values of LOCp() and LOCq() may be determined via a call to the NUMROC(3S) ScaLAPACK tool
function, as in the following:
LOCp( M ) = NUMROC ( M, MB_A, MYROW, RSR C_A, NPROW )

LOCq( N ) = NUMROC ( N, NB_A, MYC OL, CSRC_A , NPCOL )

These routines accept the following arguments. For PCGEQPF, the following real arguments must be
complex:
m Integer. (global input)
The number of rows to be operated on (the order of the distributed submatrix sub(A)). m must be ≥
0.
n Integer. (global input)
The number of columns to be operated on (the number of columns of the distributed submatrix
sub(A)). n must be ≥ 0.
A Real pointer into the local memory to an array of dimension (LLD_A, LOCq(jA+n– 1). (local
input/local output)
On entry, the local pieces of the m-by-n distributed matrix sub(A) to be factored.
On exit, the elements on and above the diagonal of sub(A) contain the (MIN(m,n)-by-n) upper
trapezoidal matrix R (R is upper triangular if m ≥ n); the elements below the diagonal, with the
array tau represent the orthogonal matrix Q as a product of elementary reflectors. See the Further
Details subsection for more information.
iA Integer. (global input)
The global row index of A, which points to the beginning of the submatrix that will be operated on.
jA Integer. (global input)
The global column index of A which points to the beginning of the submatrix that will be operated
on.
descA Integer array of dimension 9. (input)
The array descriptor for the distributed matrix A.
ipiv Integer array, dimension (LOCq(jA+n-1). (local output)
On exit, if ipiv(i) = k, the local i-th column of A(iA:iA+n– 1, jA:jA+n-1)*P was the global kth
column of A(iA:iA+n-1, jA:jA+n-1). ipiv is tied to the distributed matrix A.

394 004– 2081– 002


PSGEQPF ( 3S ) PSGEQPF ( 3S )

tau Real array, dimension LOCq(jA+MIN(m,n)-1). (local output)


This array contains the scalar factors tau of the elementary reflectors. tau is tied to the distributed
matrix A.
work Real array, dimension (lwork). (local workspace)
On exit, work(1) returns the minimal and optimal lwork.
lwork Integer. (local input)
lwork ≥ MAX(3,Mp0 + Nq0) + LOCq(JA+N-1)+Nq0
where

IRO FF = MOD ( IA- 1, MB_A ), ICOFF = MOD( JA-1, NB_ A )


IAROW = IND XG2P( IA, MB_ A, MYROW, RSRC_A, NPROW )
IACOL = INDXG2 P( JA, NB_A, MYC OL, CSRC_A , NPCOL )
Mp0 = NUM ROC ( M+I ROFF, MB_ A, MYR OW, IAROW, NPROW )
Nq0 = NUM ROC( N+I COFF, NB_A, MYCOL, IAC OL, NPC OL )
LOC q(J A+N-1) = NUM ROC ( JA+ N-1 , NB_A, MYCOL, CSRC_A, NPCOL )

and NUMROC(3S) and INDXG2P(3S) are ScaLAPACK tool functions; MYROW, MYCOL, NPROW, and
NPCOL can be determined by calling the BLACS_GRIDINFO(3S) subroutine.
info Integer. (global output)
info = 0 Successful exit.
info < 0 If the ith argument is an array and the j-entry had an illegal value, info = -(i*100+j);
if the ith argument is a scalar and had an illegal value, info = – i.
Further Details
The matrix Q is represented as a product of elementary reflectors:
Q = H(1 ) H(2 ) ... H(n)

Each H(i) has the following form:


H = I - tau * v * v’

where tau is a real scalar, and v is a real vector with v(1:i-1) = 0 and v(i) = 1; v(i+1:m) is
stored on exit in A(iA+i-1:iA+m-1,jA+i-1).
The matrix P is represented in jpvt as follows: if jpvt(j) = i the jth column of P is the ith canonical
unit vector.

SEE ALSO
BLACS_GRIDINFO(3S), DESCINIT(3S), INDXG2P(3S), NUMROC(3S)

004– 2081– 002 395


PSGEQRF ( 3S ) PSGEQRF ( 3S )

NAME
PSGEQRF, PCGEQRF – Computes a QR factorization of a real or complex distributed matrix

SYNOPSIS
CALL PSGEQRF (m, n, A, iA, jA, descA, tau, work, lwork, info)
CALL PCGEQRF (m, n, A, iA, jA, descA, tau, work, lwork, info)

IMPLEMENTATION
UNICOS/mk systems

DESCRIPTION
PSGEQRF and PCGEQRF compute a QR factorization of a real or complex distributed m-by-n matrix of the
form:
sub(A)=A(iA:iA+n-1,jA:jA+n-1)=Q*R
These routines require square block decomposition (MB_A = NB_A, as defined in the comments which
follow).
A description vector is associated with each two-dimenstional (2D) block-cyclicly distributed matrix. This
vector stores the information required to establish the mapping between a matrix entry and its corresponding
process and memory location.
The following comments describe the elements of a block-cyclicly distributed matrix. In these comments,
the underline character (_) should be read as "of the distributed matrix". Let A be a generic term for any 2D
block cyclicly distributed matrix. Its description vector is descA and must be initialized through a call to
DESCINIT(3S).
M_A The number of rows in the distributed matrix.
N_A The number of columns in the distributed matrix.
MB_A The blocking factor used to distribute the rows of the matrix.
NB_A The blocking factor used to distribute the columns of the matrix.
RSRC_A The process row over which the first row of the matrix is distributed.
CSRC_A The process column over which the first column of the matrix is distributed.
CTXT_A The BLACS context handle, indicating the BLACS process grid A is distributed over. The
context itself is global, but the handle (the integer value) may vary.
LLD_A The leading dimension of the local array storing the local blocks of the distributed matrix A.
LLD_A ≥ MAX(1,LOCp(M_A)).
Let K be the number of rows or columns of a distributed matrix, and assume that its process grid has
dimension p-by-q. LOCp( K ) denotes the number of elements of K that a process would receive if K were
distributed over the p processes of its process column.

396 004– 2081– 002


PSGEQRF ( 3S ) PSGEQRF ( 3S )

Similarly, LOCq( K ) denotes the number of elements of K that a process would receive if K were distributed
over the q processes of its process row.
The values of LOCp() and LOCq() may be determined via a call to the NUMROC(3S) ScaLAPACK tool
function, as in the following:
LOCp(M )=N UMR OC(M, MB_A, MYR OW, RSR C_A, NPR OW)

LOCq(N )=N UMR OC(N, NB_A, MYC OL, CSR C_A, NPC OL)

These routines accept the following arguments. For PCGEQRF, the following real arguments must be
complex:
m Integer. (global input)
The number of rows to be operated on (the order of the distributed submatrix sub(A)). m must be ≥
0.
n Integer. (global input)
The number of columns to be operated on (the number of columns of the distributed submatrix
sub(A)). n must be ≥ 0.
A Real pointer into the local memory to an array of dimension (LLD_A, LOCq(jA+n– 1). (local
input/local output)
On entry, the local pieces of the m-by-n distributed matrix sub(A) to be factored.
On exit, the elements on and above the diagonal of sub(A) contain the (MIN(m,n) by n) upper
trapezoidal matrix R (R is upper triangular if m ≥ n); the elements below the diagonal, with the
array tau represent the orthogonal matrix Q as a product of elementary reflectors. See the Further
Details subsection for more information.
iA Integer. (global input)
The global row index of A, which points to the beginning of the submatrix that will be operated on.
jA Integer. (global input)
The global column index of A which points to the beginning of the submatrix that will be operated
on.
descA Integer array of dimension 9. (input)
The array descriptor for the distributed matrix A.
tau Real array, dimension LOCq(jA+MIN(m,n)-1). (local output)
This array contains the scalar factors tau of the elementary reflectors. tau is tied to the distributed
matrix A.
work Real array, dimension (lwork). (local workspace)
On exit, work(1) returns the minimal and optimal lwork.

004– 2081– 002 397


PSGEQRF ( 3S ) PSGEQRF ( 3S )

lwork Integer. (local input)


lwork ≥ NB_A*(Mp0 + Nq0 +NB_A)
where

IRO FF = MOD ( IA- 1, MB_A ), ICOFF = MOD( JA- 1, NB_A )


IAROW = IND XG2P( IA, MB_A, MYR OW, RSRC_A , NPROW )
IACOL = INDXG2 P( JA, NB_ A, MYC OL, CSRC_A, NPCOL )
Mp0 = NUM ROC ( M+I ROF F, MB_A, MYR OW, IAR OW, NPR OW )
Nq0 = NUM ROC( N+ICOFF, NB_A, MYC OL, IAC OL, NPCOL )

and NUMROC(3S) and INDXG2P(3S) are ScaLAPACK tool functions; MYROW, MYCOL, NPROW, and
NPCOL can be determined by calling the BLACS_GRIDINFO(3S) subroutine.
info Integer. (global output)
info = 0 Successful exit.
info < 0 If the ith argument is an array and the j-entry had an illegal value, info = -(i*100+j);
if the ith argument is a scalar and had an illegal value, then info = – i.
Further Details
The matrix Q is represented as a product of elementary reflectors:
Q = H(j A) H(j A+1 ) ... H(jA+k-1)

where k = min(m,n).
Each H(i) has the following form:
H = I - tau * v * v’

where tau is a real scalar, and v is a real vector with v(1:i-1)=0 and v(i)=1; v(i+1:m) is stored on
exit in A(iA+i-1:iA+m-1,jA+i-1) and tau is stored in TAU(jA+i-1).

NOTES
BLACS_GRIDINIT(3S) must be called to initialize the virtual BLACS grid.

SEE ALSO
BLACS_GRIDINFO(3S), BLACS_GRIDINIT(3S), DESCINIT(3S), INDXG2P(3S), NUMROC(3S)

398 004– 2081– 002


PSGERQF ( 3S ) PSGERQF ( 3S )

NAME
PSGERQF, PCGERQF – Computes a RQ factorization of a real or complex distributed matrix

SYNOPSIS
CALL PSGERQF (m, n, A, iA, jA, descA, tau, work, lwork, info)
CALL PCGERQF (m, n, A, iA, jA, descA, tau, work, lwork, info)

IMPLEMENTATION
UNICOS/mk systems

DESCRIPTION
PSGERQF and PCGERQF compute a RQ factorization of a real or complex distributed m-by-n matrix:
sub(A) = A(iA:iA+m-1,jA:jA+n-1) = R * Q
These routines require square block decomposition (MB_A = NB_A, as defined in the following comments).
A description vector is associated with each two-dimensional (2D) block-cyclicly distributed matrix. This
vector stores the information required to establish the mapping between a matrix entry and its corresponding
process and memory location.
The following comments describe the elements of a block-cyclicly distributed matrix. In these comments,
the underline character (_) should be read as "of the distributed matrix". Let A be a generic term for any 2D
block cyclicly distributed matrix. Its description vector is descA and must be initialized through a call to
DESCINIT(3S).
M_A The number of rows in the distributed matrix.
N_A The number of columns in the distributed matrix.
MB_A The blocking factor used to distribute the rows of the matrix.
NB_A The blocking factor used to distribute the columns of the matrix.
RSRC_A The process row over which the first row of the matrix is distributed.
CSRC_A The process column over which the first column of the matrix is distributed.
CTXT_A The BLACS context handle, indicating the BLACS process grid A is distributed over. The
context itself is global, but the handle (the integer value) may vary.
LLD_A The leading dimension of the local array storing the local blocks of the distributed matrix A.
LLD_A ≥ MAX(1,LOCp(M_A)).
Let K be the number of rows or columns of a distributed matrix, and assume that its process grid has
dimension p-by-q. LOCp( K ) denotes the number of elements of K that a process would receive if K were
distributed over the p processes of its process column.

004– 2081– 002 399


PSGERQF ( 3S ) PSGERQF ( 3S )

Similarly, LOCq( K ) denotes the number of elements of K that a process would receive if K were distributed
over the q processes of its process row.
The values of LOCp() and LOCq() may be determined via a call to the NUMROC(3S) ScaLAPACK tool
function, as in the following:
LOCp(M )=N UMR OC(M, MB_A, MYR OW, RSRC_A, NPROW)

LOCq(N )=N UMR OC(N, NB_A, MYC OL, CSRC_A, NPCOL)

These routines accept the following arguments. For PCGERQF, the following real arguments must be
complex:
m Integer. (global input)
The number of rows to be operated on (the order of the distributed submatrix sub(A)).
n Integer. (global input)
The number of columns to be operated on (the number of columns of the distributed submatrix
sub(A)). n must be ≥ 0.
A Real pointer into the local memory to an array of dimension (LLD_A, LOCq(jA+n– 1). (local
input/local output)
On entry, the local pieces of the m-by-n distributed matrix sub(A) to be factored.
On exit, if m≤ n, the upper triangle of sub(A) contains the m-by-m upper triangular matrix R. If m
≥ n, the elements on and above the (m-to-n)-th subdiagonal contain the m-by-n upper trapezoidal
matrix R; the remaining elements, with the array tau, represent the orthogonal matrix Q as a product
of elementary reflectors (see the Further Details subsection).
iA Integer. (global input)
The global row index of A, which points to the beginning of the submatrix that will be operated on.
jA Integer. (global input)
The global column index of A which points to the beginning of the submatrix that will be operated
on.
descA Integer array of dimension 9. (global and local input)
The array descriptor for the distributed matrix A.
tau Real array, dimension LOCp(M_A). (local output)
This array contains the scalar factors tau of the elementary reflectors. tau is tied to the distributed
matrix A.
work Real array, dimension (lwork). (local workspace)
On exit, work(1) returns the minimal and optimal lwork.
lwork Integer. (local input)
lwork ≥ MB_A * (Mp0 + Nq0 + MB_A)
where

400 004– 2081– 002


PSGERQF ( 3S ) PSGERQF ( 3S )

IRO FF = MOD( IA- 1, MB_A ), ICOFF = MOD ( JA- 1, NB_ A )


IAR OW = INDXG2P( IA, MB_ A, MYROW, RSR C_A, NPROW )
IAC OL = INDXG2P( JA, NB_ A, MYCOL, CSR C_A, NPCOL )
Mp0 = NUMROC( M+IROFF, MB_A, MYR OW, IAR OW, NPROW )
Nq0 = NUMROC( N+ICOF F, NB_A, MYC OL, IAC OL, NPC OL )

and NUMROC(3S) and INDXG2P(3S) are ScaLAPACK tool functions; MYROW, MYCOL, NPROW, and
NPCOL can be determined by calling the BLACS_GRIDINFO(3S) subroutine.
info Integer. (global output)
info = 0 Successful exit.
info < 0 If the ith argument is an array and the j-entry had an illegal value, info = -(i*100+j);
if the ith argument is a scalar and had an illegal value, info = – i.
Further Details
The matrix Q is represented as a product of elementary reflectors:
Q = H(i A) H(i A+1 ) ... H(iA+k-1)

where k = MIN(m,n).
Each H(i) has the following form:
H = I - tau * v * v’

where tau is a real scalar, and v is a real vector with v(n-k+i+1:n)=0 and v(n-k+1)=1;
v(1:n-k+1-1) is stored on exit in A(iA+i-1:iA+m-1,jA+i-1) and tau is stored in
TAU(iA+m-k+i-1).
BLACS_GRIDINIT(3S) must be called to initialize the virtual BLACS grid.

SEE ALSO
BLACS_GRIDINFO(3S), BLACS_GRIDINIT(3S), DESCINIT(3S), INDXG2P(3S), NUMROC(3S)

004– 2081– 002 401


PSGESV ( 3S ) PSGESV ( 3S )

NAME
PSGESV, PCGESV – Computes the solution to a real or complex system of linear equations

SYNOPSIS
CALL PSGESV (n, nrhs, A, iA, jA, descA, ipiv, B, iB, jB, descB, info)
CALL PCGESV (n, nrhs, A, iA, jA, descA, ipiv, B, iB, jB, descB, info)

IMPLEMENTATION
UNICOS/mk systems

DESCRIPTION
PSGESV and PCGESV compute the solution to a real or complex system of linear equations:
sub(A) X = sub(B)
where sub(A)=A(iA:iA+n-1,jA:jA+n-1) is an n-by-n distributed matrix and X and sub(B)=B(iB:iB+n-
1, jB:jB+nrhs-1) are n-by-nrhs distributed matrices.
The LU decomposition with partial pivoting and row interchanges is used to factor sub(A) as sub(A) = P *
L * U, where P is a permutation matrix, L is unit lower triangular, and U is upper triangular. L and U are
stored in sub(A). The factored form of sub(A) is then used to solve the system of equations sub(A)X=sub(B).
These routines require square block decomposition (MB_A = NB_A, as defined in the following comments).
A description vector is associated with each two-dimensional (2D) block-cyclicly distributed matrix. This
vector stores the information required to establish the mapping between a matrix entry and its corresponding
process and memory location.
The following comments describe the elements of block-cyclicly distributed matrics. In these comments, the
underline character (_) should be read as "of the distributed matrix". Let A be a generic term for any 2D
block cyclicly distributed matrix. Its description vector is descA and must be initialized through a call to
DESCINIT(3S).
M_A The number of rows in the distributed matrix.
N_A The number of columns in the distributed matrix.
MB_A The blocking factor used to distribute the rows of the matrix.
NB_A The blocking factor used to distribute the columns of the matrix.
RSRC_A The process row over which the first row of the matrix is distributed.
CSRC_A The process column over which the first column of the matrix is distributed.
CTXT_A The BLACS context handle, indicating the BLACS process grid A is distributed over. The
context itself is global, but the handle (the integer value) may vary.

402 004– 2081– 002


PSGESV ( 3S ) PSGESV ( 3S )

LLD_A The leading dimension of the local array storing the local blocks of the distributed matrix A.
LLD_A ≥ MAX(1,LOCp(M_A)).
Let K be the number of rows or columns of a distributed matrix, and assume that its process grid has
dimension p-by-q. LOCp( K ) denotes the number of elements of K that a process would receive if K were
distributed over the p processes of its process column.
Similarly, LOCq( K ) denotes the number of elements of K that a process would receive if K were distributed
over the q processes of its process row.
The values of LOCp() and LOCq() may be determined via a call to the NUMROC(3S) ScaLAPACK tool
function, as in the following:
LOC p(M ) = NUM ROC (M, MB_ A, MYR OW, RSRC_A , NPROW)

LOC q(N )=N UMROC(N, NB_ A, MYC OL, CSR C_A , NPCOL)

These routines accept the following arguments. For PCGESV, the following real arguments must be
complex:
n Integer. (global input)
The number of rows and columns to be operated on (the order of the distributed submatrix sub(A)).
n must be ≥ 0.
nrhs Integer. (global input)
The number of right hand sides (the number of columns of the distributed submatrix sub(A)). nrhs
must be ≥ 0.
A Real pointer into the local memory to an array of dimension (LLD_A, LOCq(jA+n– 1). (local
input/local output)
On entry, the local pieces of the n-by-n distributed matrix sub(A) to be factored.
On exit, this array contains the local pieces of the factors L and U from the factorization sub(A) =
P*L*U; the unit diagonal elements of L are not stored.
iA Integer. (global input)
The global row index of A, which points to the beginning of the submatrix that will be operated on.
jA Integer. (global input)
The global column index of A which points to the beginning of the submatrix that will be operated
on.
descA Integer array of dimension 9. (input)
The array descriptor for the distributed matrix A.
ipiv Integer array, dimension ( LOCp(M_A)+MB_A ). (local output)
This array contains the pivoting information ipiv(i), which is the global row that local row i was
swapped with. This array is tied to the distributed matrix A.

004– 2081– 002 403


PSGESV ( 3S ) PSGESV ( 3S )

B Real pointer into the local memory to an array of dimension (LLD_B,LOCq(jB+nrhs– 1). (local
input/local output)
On entry, the right hand side distributed matrix sub(B).
On exit, if info=0, sub(B) is overwritten by the solution distributed matrix X.
iB Integer. (global input)
The global row index of B, which points to the beginning of the submatrix that will be operated on.
jB Integer. (global input)
The global column index of B, which points to the beginning of the submatrix that will be operated
on.
descB Integer array of dimension 9. (input)
The array descriptor for the distributed matrix B.
info Integer. (global output)
info = 0 Successful exit.
info < 0 If the ith argument is an array and the j-entry had an illegal value, info = -(i*100+j);
if the ith argument is a scalar and had an illegal value, info = – i.
info > 0 If info = K, U(iA+K-1,jA+K-1) is exactly 0. The factorization has been completed,
but the factor U is exactly singular, so the solution could not be computed.

NOTES
BLACS_GRIDINIT(3S) must be called to initialize the virtual BLACS grid.

SEE ALSO
BLACS_GRIDINFO(3S), BLACS_GRIDINIT(3S), DESCINIT(3S), INDXG2P(3S), NUMROC(3S)

404 004– 2081– 002


PSGETRF ( 3S ) PSGETRF ( 3S )

NAME
PSGETRF, PCGETRF – Computes an LU factorization of a real or complex distributed matrix

SYNOPSIS
CALL PSGETRF (m, n, A, iA, jA, descA, ipiv, info)
CALL PCGETRF (m, n, A, iA, jA, descA, ipiv, info)

IMPLEMENTATION
UNICOS/mk systems

DESCRIPTION
PSGETRF and PCGETRF compute an LU factorization of a real or complex general m-by-n distributed
matrix of the form:
sub(A)=(iA:iA+m-1,jA:jA+n-1)
by using partial pivoting with row interchanges.
The factorization has the following form:
sub(A) = P * L * U
P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and
U is upper triangular (upper trapezoidal if m < n). L and U are stored in sub(A).
This is the right-looking Parallel Level 3 BLAS version of the algorithm.
These routines require square block decomposition (MB_A = NB_A, as defined in the following comments).
A description vector is associated with each two-dimensional (2D) block-cyclicly distributed matrix. This
vector stores the information required to establish the mapping between a matrix entry and its corresponding
process and memory location.
The following comments describe the elements of a block-cyclicly distributed matrix. In these comments,
the underline character (_) should be read as "of the distributed matrix". Let A be a generic term for any 2D
block cyclicly distributed matrix. Its description vector is descA and must be initialized through a call to
DESCINIT(3S).
M_A The number of rows in the distributed matrix.
N_A The number of columns in the distributed matrix.
MB_A The blocking factor used to distribute the rows of the matrix.
NB_A The blocking factor used to distribute the columns of the matrix.
RSRC_A The process row over which the first row of the matrix is distributed.
CSRC_A The process column over which the first column of the matrix is distributed.

004– 2081– 002 405


PSGETRF ( 3S ) PSGETRF ( 3S )

CTXT_A The BLACS context handle, indicating the BLACS process grid A is distributed over. The
context itself is global, but the handle (the integer value) may vary.
LLD_A The leading dimension of the local array storing the local blocks of the distributed matrix A.
LLD_A ≥ MAX(1,LOCp(M_A)).
Let K be the number of rows or columns of a distributed matrix, and assume that its process grid has
dimension p-by-q. LOCp( K ) denotes the number of elements of K that a process would receive if K were
distributed over the p processes of its process column.
Similarly, LOCq( K ) denotes the number of elements of K that a process would receive if K were distributed
over the q processes of its process row.
The values of LOCp() and LOCq() may be determined via a call to the NUMROC(3S) ScaLAPACK tool
function, as in the following:
LOCp(M)=N UMROC( M, MB_A, MYR OW, RSRC_A , NPROW)

LOCq(N)=N UMROC( N, NB_A, MYC OL, CSRC_A , NPC OL)

These routines accept the following arguments. For PCGETRF, the following real arguments must be
complex:
m Integer. (global input)
The number of rows to be operated on (the order of the distributed submatrix sub(A)). m must be ≥
0.
n Integer. (global input)
The number of columns to be operated on (the number of columns of the distributed submatrix
sub(A)). n must be ≥ 0.
A Real pointer into the local memory to an array of dimension (LLD_A, LOCq(jA+n– 1). (local
input/local output)
On entry, the local pieces of the m-by-n distributed matrix sub(A) to be factored.
On exit, this array contains the local pieces of the factors L and U from the factorization sub(A) =
P*L*U; the unit diagonal elements of L are not stored.
iA Integer. (global input)
The global row index of A, which points to the beginning of the submatrix that will be operated on.
jA Integer. (global input)
The global column index of A, which points to the beginning of the submatrix that will be operated
on.
descA Integer array of dimension 9. (input)
The array descriptor for the distributed matrix A.
ipiv Integer array, dimension ( LOCp(M_A)+MB_A). (local output)
This array contains the pivoting information. ipiv(i) is the global row that local row i was swapped
with. This array is tied to the distributed matrix A.

406 004– 2081– 002


PSGETRF ( 3S ) PSGETRF ( 3S )

info Integer. (global output)


info = 0 Successful exit.
info < 0 If the ith argument is an array and the j-entry had an illegal value, info = -(i*100+j);
if the ith argument is a scalar and had an illegal value, info = – i.
info > 0 If info = K, U(iA+K-1,jA+K-1) is exactly 0. The factorization has been completed,
but the factor U is exactly singular, and division by 0 will occur if it is used to solve a
system of equations.

NOTES
BLACS_GRIDINIT(3S) must be called to initialize the virtual BLACS grid.

SEE ALSO
BLACS_GRIDINIT(3S), DESCINIT(3S)

004– 2081– 002 407


PSGETRI ( 3S ) PSGETRI ( 3S )

NAME
PSGETRI, PCGETRI – Computes the inverse of a real or complex distributed matrix

SYNOPSIS
CALL PSGETRI (n, A, iA, jA, descA, ipiv, work, lwork, iwork, liwork, info)
CALL PCGETRI (n, A, iA, jA, descA, ipiv, work, lwork, iwork, liwork, info)

IMPLEMENTATION
UNICOS/mk systems

DESCRIPTION
PSGETRI and PCGETRI compute the inverse of a real or complex distributed matrix by using the LU
factorization computed by PSGETRF(3S) or PCGETRF(3S). This method inverts U and then computes the
inverse of InvA, which is the following:
sub(A)=A(iA:iA+n-1,jA:jA+n-1)
It does this by solving the system InvA*L=inv(U) for InvA.
These routines require square block decomposition (MB_A = NB_A, as defined in the following comments).
A description vector is associated with each two-dimensional (2D) block-cyclicly distributed matrix. This
vector stores the information required to establish the mapping between a matrix entry and its corresponding
process and memory location.
The following comments describe the elements of a block-cyclicly distributed matrix. In these comments,
the underline character (_) should be read as "of the distributed matrix". Let A be a generic term for any 2D
block cyclicly distributed matrix. Its description vector is descA and must be initialized through a call to
DESCINIT(3S).
M_A The number of rows in the distributed matrix.
N_A The number of columns in the distributed matrix.
MB_A The blocking factor used to distribute the rows of the matrix.
NB_A The blocking factor used to distribute the columns of the matrix.
RSRC_A The process row over which the first row of the matrix is distributed.
CSRC_A The process column over which the first column of the matrix is distributed.
CTXT_A The BLACS context handle, indicating the BLACS process grid A is distributed over. The
context itself is global, but the handle (the integer value) may vary.
LLD_A The leading dimension of the local array storing the local blocks of the distributed matrix A.
LLD_A ≥ MAX(1,LOCp(M_A)).

408 004– 2081– 002


PSGETRI ( 3S ) PSGETRI ( 3S )

Let K be the number of rows or columns of a distributed matrix, and assume that its process grid has
dimension p-by-q. LOCp( K ) denotes the number of elements of K that a process would receive if K were
distributed over the p processes of its process column.
Similarly, LOCq( K ) denotes the number of elements of K that a process would receive if K were distributed
over the q processes of its process row.
The values of LOCp() and LOCq() may be determined via a call to the NUMROC(3S) ScaLAPACK tool
function, as in the following:
LOC p(M)=N UMROC( M, MB_A, MYROW, RSRC_A, NPROW)

LOC q(N)=N UMROC( N, NB_A, MYCOL, CSRC_A, NPCOL)

These routines accept the following arguments. For PCGETRI, the following real arguments must be
complex:
n Integer. (global input)
The number of columns to be operated on (the number of columns of the distributed submatrix
sub(A)). n must be ≥ 0.
A Real pointer into the local memory to an array of dimension (LLD_A, LOCq(jA+n– 1). (local
input/local output)
On entry, the local pieces of the L and U obtained by the factorization sub(A)=P*L*U computed by
PSGETRF(3F).
On exit, if info = 0, sub(A) contains the inverse of the original distributed matrix sub(A).
iA Integer. (global input)
The global row index of A, which points to the beginning of the submatrix that will be operated on.
jA Integer. (global input)
The global column index of A, which points to the beginning of the submatrix that will be operated
on.
descA Integer array of dimension 9. (input)
The array descriptor for the distributed matrix A.
ipiv Integer array, dimension ( LOCp(M_A)+MB_A. (local output)
This arrray keeps track of the pivoting information. ipiv(i) is the global row index that the local
row i was swapped with. This array is tied to the distributed matrix A.
work Real array, dimension (lwork). (local workspace)
On exit, if info = 0, work(1) returns the minimal and optimal lwork.
lwork Integer. (local input)
lwork=LOCp(n+MOD(iA-1,MB_A))*NB_A. lwork is used to keep a copy of (at maximum) an
entire column block of sub(A).
iwork Integer array, dimension (liwork). (local workspace)
On exit, if info = 0, iwork(1) returns the minimal and optimal liwork.

004– 2081– 002 409


PSGETRI ( 3S ) PSGETRI ( 3S )

liwork Integer. (local input)


The dimension of array iwork used as workspace for physically transposing the pivots.
Where LCM is the least common multiple of process rows and columns (NPROW and NPCOL):

If NPR OW == NPCOL then


liwork = LOC q( M_A + MOD (IA -1, MB_ A) ) + MB_A
els e if PIV ROC == ’C’ the n
liw ork = LOC q( M_A + MOD(IA -1, MB_A) ) +
MB_ A*C EIL (CEIL( LOCp(M _A) /MB_A)/(L CM/ NPROW))
end if

info Integer. (global output)


info = 0 successful exit
info < 0 If the ith argument is an array and the j-entry had an illegal value, info = -(i*100+j);
if the ith argument is a scalar and had an illegal value, info = – i.
info > 0 If info = K, U(iA+K-1,jA+K-1) is exactly 0. Because the matrix is exactly singular,
the solution could not be computed.

NOTES
BLACS_GRIDINIT(3S) must be called to initialize the virtual BLACS grid.

SEE ALSO
BLACS_GRIDINIT(3S), DESCINIT(3S), PCGETRF(3S), PSGETRF(3S)

410 004– 2081– 002


PSGETRS ( 3S ) PSGETRS ( 3S )

NAME
PSGETRS, PCGETRS – Solves a real or complex distributed system of linear equations

SYNOPSIS
CALL PSGETRS (trans, n, nrhs, A, iA, jA, descA, ipiv, B, iB, jB, descB, info)
CALL PCGETRS (trans, n, nrhs, A, iA, jA, descA, ipiv, B, iB, jB, descB, info)

IMPLEMENTATION
UNICOS/mk systems

DESCRIPTION
PSGETRS and PCGETRS solve a system of real or complex distributed linear equations
op (sub(A)) * X = sub(B)
with a general n-by-n distributed matrix sub(A) by using the LU factorization computed by PSGETRF(3S).
sub(A) denotes
sub(A)=A(iA:iA+n-1,jA:jA+n-1)
T
and op(A)=A or A , and sub(B) denotes B(iB:iB+ n– 1, jB:jB+nrhs -1).
These routines require square block decomposition (MB_A = NB_A, as defined in the following comments).
A description vector is associated with each two-dimensional (2D) block-cyclicly distributed matrix. This
vector stores the information required to establish the mapping between a matrix entry and its corresponding
process and memory location.
The following comments describe the elements of a block-cyclicly distributed matrix. In these comments,
the underline character (_) should be read as "of the distributed matrix". Let A be a generic term for any 2D
block cyclicly distributed matrix. Its description vector is descA and must be initialized through a call to
DESCINIT(3S).
M_A The number of rows in the distributed matrix.
N_A The number of columns in the distributed matrix.
MB_A The blocking factor used to distribute the rows of the matrix.
NB_A The blocking factor used to distribute the columns of the matrix.
RSRC_A The process row over which the first row of the matrix is distributed.
CSRC_A The process column over which the first column of the matrix is distributed.
CTXT_A The BLACS context handle, indicating the BLACS process grid A is distributed over. The
context itself is global, but the handle (the integer value) may vary.
LLD_A The leading dimension of the local array storing the local blocks of the distributed matrix A.
LLD_A ≥ MAX(1,LOCp(M_A)).

004– 2081– 002 411


PSGETRS ( 3S ) PSGETRS ( 3S )

Let K be the number of rows or columns of a distributed matrix, and assume that its process grid has
dimension p-by-q. LOCp( K ) denotes the number of elements of K that a process would receive if K were
distributed over the p processes of its process column.
Similarly, LOCq( K ) denotes the number of elements of K that a process would receive if K were distributed
over the q processes of its process row.
The values of LOCp() and LOCq() may be determined via a call to the NUMROC(3S) ScaLAPACK tool
function, as in the following:
LOCp(M ) = NUM ROC (M, MB_ A, MYROW, RSRC_A, NPROW)

LOC q(N) = NUMROC (N, NB_ A, MYC OL, CSR C_A , NPC OL)

These routines accept the following arguments. For PCGETRS, the following real arguments must be
complex:
trans Character. (global input)
Specifies the form of the system of equations:
trans = ’N’: sub(A) * X = sub(B) (No transpose)
T
trans = ’T’: sub(A) * X = sub(B) (Transpose)
T
trans = ’C’: sub(A) * X = sub(B) (Transpose)
n Integer. (global input)
The number of columns to be operated on; (the number of columns of the distributed submatrix
sub(A)). n must be ≥ 0.
nrhs Integer. (global input)
The number of right-hand sides (the number of columns of the distributed submatrix sub(B)). nrhs
must be ≥ 0.
A Real pointer into the local memory to an array of dimension (LLD_A, LOCq(jA+n– 1). (local
input/local output)
On entry, the local pieces of the of the factors L and U from the factorization sub(A)=P*L*U; the
unit diagonal elements of L are not stored.
iA Integer. (global input)
The global row index of A, which points to the beginning of the submatrix that will be operated on.
jA Integer. (global input)
The global column index of A which points to the beginning of the submatrix that will be operated
on.
descA Integer array of dimension 9. (input)
The array descriptor for the distributed matrix A.
ipiv Integer array, dimension (LOCp(M_A+MB_A). (local input)
This array contains the pivoting information, ipiv(i), which is the global row that local row i was
swapped with. This array is tied to the distributed matrix A.

412 004– 2081– 002


PSGETRS ( 3S ) PSGETRS ( 3S )

B Real pointer into the local memory to an array of dimension (LLD_B, LOCq(jB +nrhs– 1)). (local
input/local output)
On entry, the right-hand side of distributed matrix sub(B).
On exit, sub(B) is overwritten by the solution distributed matrix X.
iB Integer. (global input)
The global row index of B, which points to the beginning of the submatrix that will be operated on.
jB Integer. (global input)
The global column index of B, which points to the beginning of the submatrix that will be operated
on.
descB Integer array of dimension 9. (input)
The array descriptor for the distributed matrix B.
info Integer. (global output)
info = 0 Successful exit.
info < 0 If the ith argument is an array and the j-entry had an illegal value, info = -(i*100+j);
if the ith argument is a scalar and had an illegal value, then info = – i.

NOTES
BLACS_GRIDINIT(3S) must be called to initialize the virtual BLACS grid.

SEE ALSO
BLACS_GRIDINIT(3S), DESCINIT(3S), NUMROC(3S), PSGETRF(3S)

004– 2081– 002 413


PSPOSV ( 3S ) PSPOSV ( 3S )

NAME
PSPOSV, PCPOSV – Solves a real symmetric or complex Hermitian system of linear equations

SYNOPSIS
CALL PSPOSV (uplo, n, nrhs, A, iA, jA, descA, B, iB, jB, descB, info)
CALL PCPOSV (uplo, n, nrhs, A, iA, jA, descA, B, iB, jB, descB, info)

IMPLEMENTATION
UNICOS/mk systems

DESCRIPTION
PSPOSV computes the solution to a real symmetric positive definite system of linear equations, as in the
following:
sub(A) X = sub(B)
where sub(A) denotes the following:
sub(A)=A(iA:iA+n-1,jA:jA+n-1)
sub(A) is an n-by-n symmetric distributed positive definite matrix and X and sub(B), which denotes the
following, are n-by-nrhs distributed matrices:
B(iB:iB+n-1,jB:jB+nrhs-1)
In the case of PCPOSV, the matrix must be Hermitian positive definite.
The Cholesky decomposition is used to factor sub(A) in the following way:
T
sub(A)=U * U if uplo = ’U’
T
sub(A)=L * L if uplo = ’L’
U is an upper triangular matrix, and L is a lower triangular matrix. The factored form of sub(A) is then used
to solve the system of equations.
These routines require square block decomposition (MB_A = NB_A, as defined in the following comments).
A description vector is associated with each two-dimensional (2D) block-cyclicly distributed matrix. This
vector stores the information required to establish the mapping between a matrix entry and its corresponding
process and memory location.
The following comments describe the elements of a block-cyclicly distributed matrix. In these comments,
the underline character (_) should be read as "of the distributed matrix". Let A be a generic term for any 2D
block cyclicly distributed matrix. Its description vector is descA and must be initialized through a call to
DESCINIT(3S).

414 004– 2081– 002


PSPOSV ( 3S ) PSPOSV ( 3S )

M_A The number of rows in the distributed matrix.


N_A The number of columns in the distributed matrix.
MB_A The blocking factor used to distribute the rows of the matrix.
NB_A The blocking factor used to distribute the columns of the matrix.
RSRC_A The process row over which the first row of the matrix is distributed.
CSRC_A The process column over which the first column of the matrix is distributed.
CTXT_A The BLACS context handle, indicating the BLACS process grid A is distributed over. The
context itself is global, but the handle (the integer value) may vary.
LLD_A The leading dimension of the local array storing the local blocks of the distributed matrix A.
LLD_A ≥ MAX(1,LOCp(M_A)).
Let K be the number of rows or columns of a distributed matrix, and assume that its process grid has
dimension p-by-q. LOCp( K ) denotes the number of elements of K that a process would receive if K were
distributed over the p processes of its process column.
Similarly, LOCq( K ) denotes the number of elements of K that a process would receive if K were distributed
over the q processes of its process row.
The values of LOCp() and LOCq() may be determined via a call to the NUMROC(3S) ScaLAPACK tool
function, as in the following:
LOC p(M)=N UMROC( M, MB_A, MYROW, RSRC_A, NPROW)

LOC q(N)=N UMROC( N, NB_A, MYCOL, CSRC_A, NPCOL)

These routines accept the following arguments. For PCPOSV, the following real arguments must be
complex:
uplo Character. (global input)
uplo = ’U’: Upper triangle of sub(A) is stored.
uplo = ’L’: Lower triangle of sub(A) is stored.
n Integer. (global input)
The number of columns to be operated on (the number of columns of the distributed submatrix
sub(A)). n must be ≥ 0.
nrhs Integer. (global input)
The number of right-hand sides (the number of columns of the distributed submatrix sub(B)). nrhs
must be ≥ 0.

004– 2081– 002 415


PSPOSV ( 3S ) PSPOSV ( 3S )

A Real pointer into the local memory to an array of dimension (LLD_A, LOCq(jA+n– 1). (local
input/local output)
On entry, the local pieces of the n-by-n symmetric distributed matrix sub(A) to be factored.
If uplo =’U’, the leading n-by-n upper triangular part of sub(A) contains the upper triangular part of
the matrix, and its strictly lower triangular part is not refrenced.
If uplo = ’L’, the leading n-by-n lower triangular part of sub(A) contains the lower triangular part of
the distributed matrix, and its strictly upper triangular part is not referenced.
On exit, if info = 0, this array contains the local pieces of the factor U or L from the Cholesky
T T
factorization sub(A) = U * U or L * L .
iA Integer. (global input)
The global row index of A, which points to the beginning of the submatrix that will be operated on.
jA Integer. (global input)
The global column index of A which points to the beginning of the submatrix that will be operated
on.
descA Integer array of dimension 9. (input)
The array descriptor for the distributed matrix A.
B Real pointer into the local memory to an array of dimension (LLD_B,LOC(jB+nrhs– 1). (local
input/local output)
On entry, the right hand side distributed matrix sub(BfR).
On exit, if info = 0, sub(B) is overwritten by the solution distributed matrix X.
iB Integer. (global input)
The global row index of B, which points to the beginning of the submatrix that will be operated on.
jB Integer. (global input)
The global column index of B, which points to the beginning of the submatrix that will be operated
on.
descB Integer array of dimension 9. (input)
The array descriptor for the distributed matrix B.
info Integer. (global output)
info = 0 Successful exit.
info < 0 If the ith argument is an array and the j-entry had an illegal value, info = -(i*100+j);
if the ith argument is a scalar and had an illegal value, info = – i.
>0 If info = K, the leading minor of order K, A(iA:iA+K-1,jA+K-1) is not positive
definite. The factorization could not be completed, and the solution could not be
computed.

416 004– 2081– 002


PSPOSV ( 3S ) PSPOSV ( 3S )

NOTES
BLACS_GRIDINIT(3S) must be called to initialize the virtual BLACS grid.

SEE ALSO
BLACS_GRIDINIT(3S), DESCINIT(3S), NUMROC(3S)

004– 2081– 002 417


PSPOTRF ( 3S ) PSPOTRF ( 3S )

NAME
PSPOTRF, PCPOTRF – Computes the Cholesky factorization of a real symmetric or complex Hermitian
positive definite distributed matrix

SYNOPSIS
CALL PSPOTRF (uplo, n, A, iA, jA, descA, info)
CALL PCPOTRF (uplo, n, A, iA, jA, descA, info)

IMPLEMENTATION
UNICOS/mk systems

DESCRIPTION
PSPOTRF computes the Cholesky factorization of an n-by-n real symmetric positive definite distributed
matrix of the form:
sub(A)=A(iA:iA+n-1,jA:jA+n-1)
PCPOTRF computes the Cholesky factorization of a Hermitian positive definite distributed matrix.
The factorization has the following form; U is an upper triangular matrix, and L is a lower triangular matrix.
sub(A)=U’ * U if uplo=’U’
sub(A)=L * L’ if uplo=’L’
These routines require square block decomposition (MB_A = NB_A, as defined in the following comments).
A description vector is associated with each two-dimensional (2D) block-cyclicly distributed matrix. This
vector stores the information required to establish the mapping between a matrix entry and its corresponding
process and memory location.
The following comments describe the elements of a block-cyclicly distributed matrix. In these comments,
the underline character (_) should be read as "of the distributed matrix". Let A be a generic term for any 2D
block cyclicly distributed matrix. Its description vector is descA and must be initialized through a call to
DESCINIT(3S).
M_A The number of rows in the distributed matrix.
N_A The number of columns in the distributed matrix.
MB_A The blocking factor used to distribute the rows of the matrix.
NB_A The blocking factor used to distribute the columns of the matrix.
RSRC_A The process row over which the first row of the matrix is distributed.
CSRC_A The process column over which the first column of the matrix is distributed.
CTXT_A The BLACS context handle, indicating the BLACS process grid A is distributed over. The
context itself is global, but the handle (the integer value) may vary.

418 004– 2081– 002


PSPOTRF ( 3S ) PSPOTRF ( 3S )

LLD_A The leading dimension of the local array storing the local blocks of the distributed matrix A.
LLD_A ≥ MAX(1,LOCp(M_A)).
Let K be the number of rows or columns of a distributed matrix, and assume that its process grid has
dimension p-by-q. LOCp( K ) denotes the number of elements of K that a process would receive if K were
distributed over the p processes of its process column.
Similarly, LOCq( K ) denotes the number of elements of K that a process would receive if K were distributed
over the q processes of its process row.
The values of LOCp() and LOCq() may be determined via a call to the NUMROC(3S) ScaLAPACK tool
function, as in the following:
LOC p(M)=N UMROC( M, MB_A, MYROW, RSRC_A, NPROW)
LOC q(N)=N UMROC( N, NB_A, MYCOL, CSRC_A, NPCOL)

These routines accept the following arguments. For PCPOTRF, the following real arguments must be
complex:
uplo Character. (global input)
uplo= ’U’: Upper triangle of sub(A) is stored;
uplo = ’L’: Lower triangle of sub(A) is stored.
n Integer. (global input)
The number of columns to be operated on (the number of columns of the distributed submatrix
sub(A)). n must be ≥ 0.
A Real pointer into the local memory to an array of dimension (LLD_A, LOCq(jA+n– 1). (local
input/local output)
On entry, the local pieces of the n-by-n symmetric distributed matrix sub(A) to be factored.
If uplo = ’U’, the leading n-by-n upper triangular part of the matrix sub(A) contains the upper
triangular matrix, and its strictly lower triangular part of sub(A) is not referenced.
If uplo = ’L’, the leading n-by-n lower triangular part of the matrix sub(A) contains the lower
triangular matrix, and the strictly upper triangular part of sub(A) is not referenced.
On exit, if uplo = ’U’, the upper triangular part of the distributed matrix contains the Cholesky
factor U; if uplo = ’L’, the lower triangular part of the distributed matrix contains the Cholesky
factor L.
iA Integer. (global input)
The global row index of A, which points to the beginning of the submatrix that will be operated on.
jA Integer. (global input)
The global column index of A, which points to the beginning of the submatrix that will be operated
on.
descA Integer array of dimension 9. (input)
The array descriptor for the distributed matrix A.

004– 2081– 002 419


PSPOTRF ( 3S ) PSPOTRF ( 3S )

info Integer. (global output)


info = 0 Successful exit.
info < 0 If the ith argument is an array and the j-entry had an illegal value, info = -(i*100+j);
if the ith argument is a scalar and had an illegal value, info = – i.
info > 0 If info = K, the leading minor of order K, A(iA:iA+K-1,jA+K-1) is not positive
definite, and the factorization could not be completed.

NOTES
BLACS_GRIDINIT(3S) must be called to initialize the virtual BLACS grid.

SEE ALSO
BLACS_GRIDINIT(3S), DESCINIT(3S), NUMROC(3S)

420 004– 2081– 002


PSPOTRI ( 3S ) PSPOTRI ( 3S )

NAME
PSPOTRI, PCPOTRI – Computes the inverse of a real symmetric or complex Hermitian positive definite
distributed matrix

SYNOPSIS
CALL PSPOTRI (uplo, n, A, iA, jA, descA, info)
CALL PCPOTRI (uplo, n, A, iA, jA, descA, info)

IMPLEMENTATION
UNICOS/mk systems

DESCRIPTION
PSPOTRI computes the inverse of a real symmetric positive definite distributed matrix of the form:
sub(A)=A(iA:iA+n-1,jA:jA+n-1)
T T
by using the Cholesky factorization sub(A)=U * U or L * L computed by PSPOTRF(3S).
PCPOTRI computes the inverse of a complex Hermitian positive definite matrix using the output from
PCPOTRF(3S).
These routines require square block decomposition (MB_A = NB_A, as defined in the following comments).
A description vector is associated with each two-dimensional (2D) block-cyclicly distributed matrix. This
vector stores the information required to establish the mapping between a matrix entry and its corresponding
process and memory location.
The following comments describe the elements of a block-cyclicly distributed matrix. In these comments,
the underline character (_) should be read as "of the distributed matrix". Let A be a generic term for any 2D
block cyclicly distributed matrix. Its description vector is descA and must be initialized through a call to
DESCINIT(3S).
M_A The number of rows in the distributed matrix.
N_A The number of columns in the distributed matrix.
MB_A The blocking factor used to distribute the rows of the matrix.
NB_A The blocking factor used to distribute the columns of the matrix.
RSRC_A The process row over which the first row of the matrix is distributed.
CSRC_A The process column over which the first column of the matrix is distributed.
CTXT_A The BLACS context handle, indicating the BLACS process grid A is distributed over. The
context itself is global, but the handle (the integer value) may vary.
LLD_A The leading dimension of the local array storing the local blocks of the distributed matrix A.
LLD_A ≥ MAX(1,LOCp(M_A)).

004– 2081– 002 421


PSPOTRI ( 3S ) PSPOTRI ( 3S )

Let K be the number of rows or columns of a distributed matrix, and assume that its process grid has
dimension p-by-q. LOCp( K ) denotes the number of elements of K that a process would receive if K were
distributed over the p processes of its process column.
Similarly, LOCq( K ) denotes the number of elements of K that a process would receive if K were distributed
over the q processes of its process row.
The values of LOCp() and LOCq() may be determined via a call to the NUMROC(3S) ScaLAPACK tool
function, as in the following:
LOCp(M)=N UMROC( M, MB_A, MYR OW, RSRC_A , NPROW)

LOCq(N)=N UMROC( N, NB_A, MYC OL, CSRC_A , NPC OL)

These routines accept the following arguments. For PCPOTRI, the following real arguments must be
complex:
uplo Character. (global input)
uplo = ’U’: Upper triangle of sub(A) is stored.
uplo = ’L’: Lower triangle of sub(A) is stored.
n Integer. (global input)
The number of columns to be operated on (the number of columns of the distributed submatrix
sub(A)). n must be ≥ 0.
A Real pointer into the local memory to an array of dimension (LLD_A, LOCq(jA+n– 1). (local
input/local output)
On entry, the local pieces of the triangular factor U or L from the Cholesky factorization of the
T T
distributed matrix sub(A)=U * U or L * L , as computed by PSPOTRF(3S).
On exit, the local pieces of the upper or lower triangle of the (symmetric) inverse of sub(A),
overwriting the input factor U or L.
iA Integer. (global input)
The global row index of A, which points to the beginning of the submatrix that will be operated on.
jA Integer. (global input)
The global column index of A, which points to the beginning of the submatrix that will be operated
on.
descA Integer array of dimension 9. (input)
The array descriptor for the distributed matrix A.
info Integer. (global output)
info = 0 Successful exit.
info < 0 If the ith argument is an array and the j-entry had an illegal value, info = -(i*100+j);
if the ith argument is a scalar and had an illegal value, info = – i.

422 004– 2081– 002


PSPOTRI ( 3S ) PSPOTRI ( 3S )

info > 0 If info = i, the (i,i) element of the factor U or L is 0, and the inverse could not be
computed.

NOTES
BLACS_GRIDINIT(3S) must be called to initialize the virtual BLACS grid.

SEE ALSO
BLACS_GRIDINIT(3S), DESCINIT(3S), NUMROC(3S), PSPOTRF(3S)

004– 2081– 002 423


PSPOTRS ( 3S ) PSPOTRS ( 3S )

NAME
PSPOTRS, PCPOTRS – Solves a real symmetric positive definite or complex Hermitian positive definite
system of linear equations

SYNOPSIS
CALL PSPOTRS (uplo, n, nrhs, A, iA, jA, descA, B, iB, jB, descB, info)
CALL PCPOTRS (uplo, n, nrhs, A, iA, jA, descA, B, iB, jB, descB, info)

IMPLEMENTATION
UNICOS/mk systems

DESCRIPTION
PSPOTRS solves a real symmetric positive definite system of linear equations of the form:
sub(A) * X = sub(B)
where sub(A) denotes the following:
sub(A)=A(iA:iA+n-1,jA:jA+n-1)
sub(A) is an n-by-n symmetric positive definite distributed matrix using the following Cholesky factorization
and computed by PSPOTRF(3S):
T
sub(A)=U * U
or
T
L*L
sub(B) denotes the following distributed matrix B:
sub(B)=B(iB:iB+n-1,jB:jB+nrhs-1)
PCPOTRS requires a Hermitian positive definite matrix.
These routines require square block decomposition (MB_A = NB_A, as defined in the following comments).
A description vector is associated with each two-dimensional (2D) block-cyclicly distributed matrix. This
vector stores the information required to establish the mapping between a matrix entry and its corresponding
process and memory location.
The following comments describe the elements of a block-cyclicly distributed matrix. In these comments,
the underline character (_) should be read as "of the distributed matrix". Let A be a generic term for any 2D
block cyclicly distributed matrix. Its description vector is descA and must be initialized through a call to
DESCINIT(3S).
M_A The number of rows in the distributed matrix.
N_A The number of columns in the distributed matrix.
MB_A The blocking factor used to distribute the rows of the matrix.

424 004– 2081– 002


PSPOTRS ( 3S ) PSPOTRS ( 3S )

NB_A The blocking factor used to distribute the columns of the matrix.
RSRC_A The process row over which the first row of the matrix is distributed.
CSRC_A The process column over which the first column of the matrix is distributed.
CTXT_A The BLACS context handle, indicating the BLACS process grid A is distributed over. The
context itself is global, but the handle (the integer value) may vary.
LLD_A The leading dimension of the local array storing the local blocks of the distributed matrix A.
LLD_A ≥ MAX(1,LOCp(M_A)).
Let K be the number of rows or columns of a distributed matrix, and assume that its process grid has
dimension p-by-q. LOCp( K ) denotes the number of elements of K that a process would receive if K were
distributed over the p processes of its process column.
Similarly, LOCq( K ) denotes the number of elements of K that a process would receive if K were distributed
over the q processes of its process row.
The values of LOCp() and LOCq() may be determined via a call to the NUMROC(3S) ScaLAPACK tool
function, as in the following:
LOC p(M)=N UMROC( M, MB_A, MYROW, RSRC_A, NPROW)

LOC q(N)=N UMROC( N, NB_A, MYCOL, CSRC_A, NPCOL)

These routines accept the following arguments. For PCPOTRS, the following real arguments must be
complex:
uplo Character. (global input)
uplo = ’U’: Upper triangle of sub(A) is stored.
uplo = ’L’: Lower triangle of sub(A) is stored.
n Integer. (global input)
The number of columns to be operated on (the number of columns of the distributed submatrix
sub(A)). n must be ≥ 0.
nrhs Integer. (global input)
The number of right-hand sides (the number of columns of the distributed submatrix sub(B)). nrhs
must be ≥ 0.
A Real pointer into the local memory to an array of dimension (LLD_A, LOCq(jA+n– 1). (local
input/local output)
T
On entry, this array contains the factors L or U from the Cholesky factorization sub(A)=L * L or
T
U * U, as computed by PSPOTRF(3S).
iA Integer. (global input)
The global row index of A, which points to the beginning of the submatrix that will be operated on.

004– 2081– 002 425


PSPOTRS ( 3S ) PSPOTRS ( 3S )

jA Integer. (global input)


The global column index of A which points to the beginning of the submatrix that will be operated
on.
descA Integer array of dimension 9. (input)
The array descriptor for the distributed matrix A.
B Real pointer into the local memory to an array of dimension (LLD_B, LOCq(jB +nrhs– 1). (local
input/local output)
On entry, the right-hand side distributed matrix sub(B).
On exit, this array contains the local pieces of the solution distributed matrix X.
iB Integer. (global input)
The global row index of B, which points to the beginning of the submatrix that will be operated on.
jB Integer. (global input)
The global column index of B, which points to the beginning of the submatrix that will be operated
on.
descB Integer array of dimension 9. (input)
The array descriptor for the distributed matrix B.
info Integer. (global output)
info = 0 Successful exit.
info < 0 If the ith argument is an array and the j-entry had an illegal value, info = -(i*100+j);
if the ith argument is a scalar and had an illegal value, info = – i.

NOTES
BLACS_GRIDINIT(3S) must be called to initialize the virtual BLACS grid.

SEE ALSO
BLACS_GRIDINIT(3S), DESCINIT(3S), NUMROC(3S), PSPOTRF(3S)

426 004– 2081– 002


PSSYEVX ( 3S ) PSSYEVX ( 3S )

NAME
PSSYEVX – Computes selected eigenvalues and eigenvectors of a real symmetric matrix

SYNOPSIS
CALL PSSYEVX (jobZ, range, uplo, n, A, iA, jA, descA, vl, vu, il, iu, abstol, m, nZ, w,
orfac, Z, iZ, jZ, descZ, work, lwork, iwork, liwork, ifail, iclustr, gap, info)

IMPLEMENTATION
UNICOS/mk systems

DESCRIPTION
PSSYEVX computes selected eigenvalues and, optionally, eigenvectors of a real symmetric matrix A by
calling the recommended sequence of ScaLAPACK routines. Eigenvalues/vectors can be selected by
specifying a range of values or a range of indices for the desired eigenvalues.
These routines require square block decomposition (MB_A = NB_A, as defined in the following comments).
A description vector is associated with each two-dimensional (2D) block-cyclicly distributed matrix. This
vector stores the information required to establish the mapping between a matrix entry and its corresponding
process and memory location.
The following comments describe the elements of a block-cyclicly distributed matrix. In these comments,
the underline character (_) should be read as "of the distributed matrix". Let A be a generic term for any 2D
block cyclicly distributed matrix. Its description vector is descA and must be initialized through a call to
DESCINIT(3S).
M_A The number of rows in the distributed matrix.
N_A The number of columns in the distributed matrix.
MB_A The blocking factor used to distribute the rows of the matrix.
NB_A The blocking factor used to distribute the columns of the matrix.
RSRC_A The process row over which the first row of the matrix is distributed.
CSRC_A The process column over which the first column of the matrix is distributed.
CTXT_A The BLACS context handle, indicating the BLACS process grid A is distributed over. The
context itself is global, but the handle (the integer value) may vary.
LLD_A The leading dimension of the local array storing the local blocks of the distributed matrix A.
LLD_A ≥ MAX(1,LOCp(M_A)).
Let K be the number of rows or columns of a distributed matrix, and assume that its process grid has
dimension p-by-q. LOCp( K ) denotes the number of elements of K that a process would receive if K were
distributed over the p processes of its process column.

004– 2081– 002 427


PSSYEVX ( 3S ) PSSYEVX ( 3S )

Similarly, LOCq( K ) denotes the number of elements of K that a process would receive if K were distributed
over the q processes of its process row.
The values of LOCp() and LOCq() may be determined via a call to the NUMROC(3S) ScaLAPACK tool
function, as in the following:
LOCp(M )=N UMR OC(M, MB_A, MYR OW, RSRC_A, NPROW)

LOCq(N )=N UMR OC(N, NB_A, MYC OL, CSRC_A, NPCOL)

In describing the following arguments, NP, the number of rows local to a given processor, and NQ, the
number of columns local to a given processor, are used.
These routines accept the following arguments:
jobZ Character*1. (global input)
Specifies whether to compute the eigenvectors:
jobZ =’N’: Compute only eigenvalues.
jobZ =’V’: Compute eigenvalues and eigenvectors.
range Character*1. (global input)
range =’A’: All eigenvalues will be found.
range =’V’: All eigenvalues in the half-open interval (vl,vu) will be found.
range =’I’: The ilth through iuth eigenvalues will be found.
uplo Character. (global input)
Specifies whether the upper or lower triangular part of the symmetric matrix A is stored:
uplo =’U’: Upper triangle of sub(A) is stored.
uplo =’L’: Lower triangle of sub(A) is stored.
n Integer. (global input)
The number of columns to be operated on (the number of columns of the distributed submatrix
sub(A)). n must be ≥ 0.
A Block cyclic real array. (local input/workspace)
Global dimension (n,n), local dimension (descA(9), NQ)
On entry, the symmetric matrix A.
If uplo=’U’, only the upper triangular part of A is used to define the elements of the symmetric
matrix.
If uplo=’L’, only the lower triangular part of A is used to define the elements of the symmetric
matrix.
On exit, the lower triangle (if iplo=’L’) or the upper triangle (if uplo=’U’) of A, including the
diagonal, is destroyed.

428 004– 2081– 002


PSSYEVX ( 3S ) PSSYEVX ( 3S )

iA Integer. (global input)


The global row index of A, which points to the beginning of the submatrix that will be operated
on.
jA Integer. (global input)
The global column index of A which points to the beginning of the submatrix that will be operated
on.
descA Integer array of dimension 9. (input)
The array descriptor for the distributed matrix A.
vl Real. (global input)
If range=’V’, the lower bound of the interval to be searched for eigenvalues. If range =’A’ or ’I’,
it is not referenced.
vu Real. (global input)
If range =’V’, the upper bound of the interval to be searched for eigenvalues. If range =’A’ or ’I’,
it is not referenced.
il Integer. (global input)
If range =’I’, the index (from smallest to largest) of the smallest eigenvalue to be returned. il ≥ 1.
If range=’A’ or ’V’, it is not referenced.
iu Integer. (global input)
If range =’I’, the index (from smallest to largest) of the largest eigenvalue to be returned.
min(il,n) ≤ iu ≤ n. If range =’A’ or ’V’, it is not referenced.
abstol Real. (global input)
If jobZ=’V’, setting abstol to PSLAMCH(CONTEXT,’U’) yields the most orthogonal
eigenvectors.
This is the absolute error tolerance for the eigenvalues. An approximate eigenvalue is accepted as
converged when it is determined to lie in an interval [a,b] of width less than or equal to the
following:
abstol + eps * MAX(|a|,|b|)
eps is the machine precision. If abstol is ≤ 0, eps * norm(A) will be used in its place, where
norm(A) is the 1-norm of A. For most problems this is the appropriate level of accuracy to
request. For certain strongly graded matrices, greater accuracy can be obtained in very small
eigenvalues by setting abstol to some very small positive number. However, if abstol is less than
SQRT(unfl), where unfl is the underflow threshold, SQRT(unfl) will be used in its place.
See "Computing Small Singular Values of Bidiagonal Matrices with Guaranteed High Relative
Accuracy," by Demmel and Kahan, LAPACK Working Note #3, and "On the correctness of Parallel
Bisection in Floating Point" by Demmel, Dhillon, and Ren, LAPACK Working Note #70.
m Integer. (global output)
Total number of eigenvalues found. 0 ≤ m ≤ n.

004– 2081– 002 429


PSSYEVX ( 3S ) PSSYEVX ( 3S )

nZ Integer. (global output)


Total number of eigenvectors computed. 0 ≤ nZ ≤ m. The number of columns of Z that are filled.
If jobZ is not equal to ’V’, nz is not referenced. If jobZ is equal to ’V’, nz = m unless the user
supplies insufficient space and PSSYEVX is not able to detect this before beginning computation.
To get all of the eigenvectors requested, the user must supply both sufficient space to hold the
eigenvectors in Z (m ≤ descZ(2)) and sufficient workspace to compute them. (See lwork below.)
PSSYEVX can always detect insufficient space without computation, unless range=’V’.
W Real array, dimension (n). (global output)
On normal exit, the first m entries contain the selected eigenvalues in ascending order.
orfac Real. (global input)
Specifies which eigenvectors should be reorthogonalized. Eigenvectors that correspond to
eigenvalues that are within tol = orfac*norm(A) of each other are reorthogonalized. However,
if the workspace is insufficient (see lwork), tol may be decreased until all eigenvectors to be
reorthogonalized can be stored in one process. No reorthogonalization will be done if orfac equals
-3
zero. A default value of 10 is used if orfac is negative. orfac should be identical on all
processes.
Z Real array. (local output)
Global dimension (n, n), local dimension (descZ(9), NQ). If jobZ = ’V’, on normal exit the first m
columns of Z contain the orthonormal eigenvectors of the matrix that corresponds to the selected
eigenvalues. If an eigenvector fails to converge, then that column of Z contains the latest
approximation to the eigenvector, and the index of the eigenvector is returned in ifail. If jobZ =
’N’, Z is not referenced.
iZ Integer. (global input)
The global row index of the submatrix of the distributed matrix Z to operate on.
jZ Integer. (global input)
The global column index of the submatrix of the distributed matrix Z to operate on.
descZ Integer array of dimension 9. (input)
The array descriptor for the distributed matrix Z.
work Real array, dimension (work). (local workspace/output)
On output, work(1) returns the workspace needed to guarantee completion, but not orthogonality of
the eigenvectors. If the input parameters are incorrect, work(1) may also be incorrect.
This will be modified in future releases so if enough workspace is given to complete the request,
work(1) will return the amount of workspace needed to guarantee orthogonality.
lwork Integer. (local input) The following variable definitions are used to define work:

430 004– 2081– 002


PSSYEVX ( 3S ) PSSYEVX ( 3S )

NN = MAX( N, NB, 2 )
NEI G = number of eigenvectors requested
NB = des cA( 3 ) = des cA( 4 ) = descZ( 3 ) = des cZ( 4 )
des cA( 5 ) = des cA( 4 ) = des cZ( 5 ) = des cZ( 6 ) = 0
IA = JA = IZ = JZ = 1
NP = NUMROC ( N, NB, MYR OW, 0, NPROW )
NP0 = NUMROC( NN, NB, 0, 0, NPR OW )
NQ0 = MAX ( NUM ROC( NEI G, NB, 0, 0, NPC OL ), NB )
ICE IL( X, Y ) is a ScaLAPACK function returning ceiling (X/ Y)

If no eigenvectors are requested (jobZ = ’N’), lwork ≥ 5*N + MAX( 5*NN, NB*(NP+1) ).
If eigenvectors are requested (jobZ = ’V’), the amount of workspace required to guarantee that all
eigenvectors are computed is the following:
work≥5*N+MAX(5*NN,NP0*NQ0)+ICEIL(NEIG,NPROW*NPCOL)*NN+2*NB*NB
The computed eigenvectors may not be orthogonal if the minimal workspace is supplied and ortol
is too small. If you want to guarantee orthogonality (at the cost of potentially poor performance)
you should add the following to lwork:
(CLUSTERSIZE-1)*N
CLUSTERSIZE is the number of eigenvalues in the largest cluster, where a cluster is defined as a
set of close eigenvalues:
{W(K),...,W(K+CLUSTERSIZE-1)|W(J+1)≤ W(J)+orfac*norm(A)}
If lwork is too small to guarantee orthogonality, PSSYEVX attempts to maintain orthogonality in
the clusters with the smallest spacing between the eigenvalues. If lwork is too small to compute all
of the eigenvectors requested, no computation is performed and info = – 23 is returned. Note that
when range = ’V’, PSSYEVX does not know how many eigenvectors are requested until the
eigenvalues are computed. Therefore, when range = ’V’ and as long as lwork is large enough to
allow PSSYEVX to compute the eigenvalues, PSSYEVX will compute the eigenvalues and as many
eigenvectors as it can.
Relationship between workspace, orthogonality, and performance:
If CLUSTERSIZE ≥ N/SQRT(NPROW*NPCOL), providing enough space to compute all the
eigenvectors orthogonally will cause serious degradation in performance. In the limit (i.e.
CLUSTERSIZE = N-1), PSSTEIN will perform no better than DSTEIN on one processor. For
CLUSTERSIZE = N/SQRT(NPROW*NPCOL) reorthogonalizing all eigenvectors will increase the
total execution time by a factor of 2 or more.
For CLUSTERSIZE > N/SQRT(NPROW*NPCOL), execution time will grow as the square of the
cluster size, all other factors remaining equal and assuming enough workspace. Less workspace
means less reorthogonalization but faster execution.

004– 2081– 002 431


PSSYEVX ( 3S ) PSSYEVX ( 3S )

iwork Integer array. (local workspace)


On return, iwork(1) contains the amount of integer workspace required. If the input parameters are
incorrect, iwork(1) may also be incorrect.
liwork Integer. (local input)
Size of iwork. liwork ≥ MAX( ISIZESTEIN, ISIZESTEBZ)+2*N
where:

ISI ZESTEIN = 3*N + NPROCS + 1


ISI ZES TEBZ = MAX ( 4*N , 14 )

ifail Integer array, dimension (N). (global output)


If jobZ = ’V’, then on normal exit, the first M elements of ifail are zero. If info > 0 on exit, ifail
contains the indices of the eigenvectors that failed to converge. If jobZ = ’N’, ifail is not
referenced.
iclustr Integer array, dimension (2*NPROW*NPCOL). (global output)
This array contains indices of eigenvectors that corresponds to a cluster of eigenvalues that could
not be reorthogonalized due to insufficient workspace (see lwork, orfac, and info). Eigenvectors
that correspond to clusters of eigenvalues indexed iclustr(2*I-1) to iclustr(2*I) could not be
reorthogonalized due to lack of workspace. Hence, the eigenvectors that correspond to these
clusters may not be orthogonal. iclustr() is a 0-terminated array. (iclustr(2*K).NE.0 .AND.
iclustr(2*K+1).EQ.0) if and only if K is the number of clusters. iclustr is not referenced if
jobZ =’N’.
gap Real array, dimension (NPROW*NPCOL). (global output)
This array contains the gap between eigenvalues whose eigenvectors could not be reorthogonalized.
The output values in this array correspond to the clusters indicated by the iclustr array. Therefore,
the dot product between eigenvectors that corresponds to the Ith cluster may be as high as
(C*n)/GAP(I) where C is a small constant.
info Integer. (global output)
info = 0 Successful exit.
info < 0 If the ith argument is an array and the j-entry had an illegal value,
info = -(i*100+j); if the ith argument is a scalar and had an illegal value, info = – i.
info > 0 If (MOD(info,1).NE.0), one or more eigenvectors failed to converge. Their indices
are stored in ifail.
If (MOD(info/2,1).NE.0), eigenvectors corresponding to one or more clusters of
eigenvalues could not be reorthogonalized because of insufficient workspace. The
indices of the clusters are stored in the ICLUSTR array.

432 004– 2081– 002


PSSYEVX ( 3S ) PSSYEVX ( 3S )

If (MOD(info/4,1).NE.0), space limitations prevented PSSYEVX from computing


all of the eigenvectors between vl and vu. The number of eigenvectors computed is
returned in nZ.
If (MOD(info/8,1).NE.0), PSSTEBZ failed to compute eigenvalues.
Differences between PSSYEVX and SSYEVX
• A, LDA -> A, IA, JA, DESCA
• Z, LDZ -> Z, IZ, JZ, DESCZ
• Workspace needs are larger for PSSYEVX
• liwork argument added
• orfac, iclustr, and gap arguments added
• Meaning of info is changed.
PSSYEVX does not promise orthogonality for eigenvectors that are associated with tightly clustered
eigenvalues.
PSSYEVX does not reorthogonalize eigenvectors that are on different processors. The extent of
reorthogonalization is controlled by the input argument lwork.
PE 1.2.2 limitations:
IA = JA = 1
IZ = JZ = 1

The following restrictions apply on the parameters passed to DESCINIT:


• RSRC_A = CSRC_A = 0: PE 0 should own the first entry of global A.
• RSRC_Z = CSRC_Z = 0: (PE 0 should own the first entry of global Z.
• M_A = M_Z: The global number of rows in A and Z must be the same.
• MB_A = MB_Z.
• NB_A = NB_Z: This and the previous restriction mean that the block-cyclic distributions of A and Z
should be based on the same block size.
• CTXT_A = CTXT_Z: A and Z must be distributed on the context (˜=grid).
SEE ALSO
BLACS_GRIDINIT(3S), DESCINIT(3S), NUMROC(3S)

004– 2081– 002 433


PSSYGVX ( 3S ) PSSYGVX ( 3S )

NAME
PSSYGVX – Computes selected eigenvalues and eigenvectors of a real symmetric-definite generalized
eigenproblem

SYNOPSIS
CALL PSSYGVX (ibtype, jobZ, range, uplo, n, A, iA, jA, descA, B, iB, jB, descB, vl, vu,
il, iu, abstol, m, nZ, w, orfac, Z, iZ, jZ, descZ, work, lwork, iwork, liwork, ifail, iclustr,
gap, info)

IMPLEMENTATION
UNICOS/mk systems

DESCRIPTION
PSSYGVX computes all the eigenvalues and, optionally, eigenvectors of a real generalized SY-definite
eigenproblem, of the form:
sub(A) *x=(la mbda)* sub(B) *x, sub (A)*su b(B)x= (lambd a)*x

or
sub(B) *sub(A )*x=(l ambda) *x

Here sub(A) denoting A(IA:IA+N-1, JA:JA+N-1) is assumed to be SY, and sub(B) denoting
B(IB:IB+N-1, JB:JB+N-1) is assumed to be symmetric positive definite.
These routines require square block decomposition (MB_A = NB_A, as defined in the following comments).
A description vector is associated with each two-dimensional (2D) block-cyclicly distributed matrix. This
vector stores the information required to establish the mapping between a matrix entry and its corresponding
process and memory location.
The following comments describe the elements of a block-cyclicly distributed matrix. In these comments,
the underline character (_) should be read as "of the distributed matrix". Let A be a generic term for any 2D
block cyclicly distributed matrix. Its description vector is descA and must be initialized through a call to
DESCINIT(3S).
M_A The number of rows in the distributed matrix.
N_A The number of columns in the distributed matrix.
MB_A The blocking factor used to distribute the rows of the matrix.
NB_A The blocking factor used to distribute the columns of the matrix.
RSRC_A The process row over which the first row of the matrix is distributed.
CSRC_A The process column over which the first column of the matrix is distributed.

434 004– 2081– 002


PSSYGVX ( 3S ) PSSYGVX ( 3S )

CTXT_A The BLACS context handle, indicating the BLACS process grid A is distributed over. The
context itself is global, but the handle (the integer value) may vary.
LLD_A The leading dimension of the local array storing the local blocks of the distributed matrix A.
LLD_A ≥ MAX(1,LOCp(M_A)).
Let K be the number of rows or columns of a distributed matrix, and assume that its process grid has
dimension p-by-q. LOCp( K ) denotes the number of elements of K that a process would receive if K were
distributed over the p processes of its process column.
Similarly, LOCq( K ) denotes the number of elements of K that a process would receive if K were distributed
over the q processes of its process row.
The values of LOCp() and LOCq() may be determined via a call to the NUMROC(3S) ScaLAPACK tool
function, as in the following:
LOC p(M)=N UMROC( M, MB_A, MYROW, RSRC_A, NPROW)

LOC q(N)=N UMROC( N, NB_A, MYCOL, CSRC_A, NPCOL)

An upper bound for these quantities may be computed by:


LOC p( M ) <= cei l( cei l(M/MB_A) /NPROW )*M B_A
LOC q( N ) <= cei l( cei l(N/NB_A) /NPCOL )*N B_A

These routines accept the following arguments:


ibtype Integer. (global input)
Specifies the problem type to be solved:
= 1: sub(A)*x = (lambda)*sub(B)*x
= 2: sub(A)*sub(B)*x = (lambda)*x
= 3: sub(B)*sub(A)*x = (lambda)*x
jobZ Character*1. (global input)
Specifies whether to compute the eigenvectors:
jobZ =’N’: Compute only eigenvalues.
jobZ =’V’: Compute eigenvalues and eigenvectors.
range Character*1. (global input)
range =’A’: All eigenvalues will be found.
range =’V’: All eigenvalues in the half-open interval (vl,vu) will be found.
range =’I’: The ilth through iuth eigenvalues will be found.
uplo Character. (global input)
Specifies whether the upper or lower triangular part of the symmetric matrix A is stored:
uplo =’U’: Upper triangle of sub(A) is stored.
uplo =’L’: Lower triangle of sub(A) is stored.

004– 2081– 002 435


PSSYGVX ( 3S ) PSSYGVX ( 3S )

n Integer. (global input)


The order of the matrices sub(A) and sub(B). n must be ≥ 0.
A Real pointer into local memory. (local input/output)
Real pointer into the local memory to an array of dimension (LLD_A, LOCq(JA+N-1)).
On entry, this array contains the local pieces of the N-by-N symmetric distributed matrix sub(A).
If uplo = ’U’, the leading N-by-N upper triangular part of sub(A) contains the upper triangular
part of the matrix. If uplo = ’L’, the leading N-by-N lower triangular part of sub(A) contains the
lower triangular part of the matrix.
On exit, if jobz = ’V’, then if info = 0, sub(A) contains the distributed matrix Z of eigenvectors.
The eigenvectors are normalized as follows:
if ibtype = 1 or 2, Z**T*sub( B )*Z = I
if ibtype = 3, Z**T*inv( sub( B ) )*Z = I.
If jobz = ’N’, then on exit the upper triangle (if uplo=’U’) or the lower triangle (if uplo=’L’) of
sub(A), including the diagonal, is destroyed.
iA Integer. (global input)
The global row index of A, which points to the beginning of the submatrix that will be operated
on.
jA Integer. (global input)
The global column index of A which points to the beginning of the submatrix that will be operated
on.
descA Integer array of dimension dlen_. (global input)
The array descriptor for the distributed matrix A. If descA(CTXT_ ) is incorrect, this routine
cannot guarantee correct error reporting.
B Real pointer into local memory. (local input/output)
REAL pointer into the local memory to an array of dimension (LLD_B, LOCq(JB+N-1)).
On entry, this array contains the local pieces of the N-by-N symmetric distributed matrix sub(B).
If uplo = ’U’, the leading N-by-N upper triangular part of sub(B) contains the upper triangular
part of the matrix. If uplo = ’L’, the leading N-by-N lower triangular part of sub(B) contains the
lower triangular part of the matrix.
On exit, if info ≤ n, the part of sub(B) containing the matrix is overwritten by the triangular
factor U or L from the Cholesky factorization sub(B) = U**T*U or sub(B) = L*L**T.
iB Integer. (global input)
The global row index of B, which points to the beginning of the submatrix that will be operated
on.
jB Integer. (global input)
The global column index of B which points to the beginning of the submatrix that will be operated
on.

436 004– 2081– 002


PSSYGVX ( 3S ) PSSYGVX ( 3S )

descB Integer array of dimension dlen_. (global input)


The array descriptor for the distributed matrix A. descB(CTXT_) must equal descA(CTXT_).
vl Real. (global input)
If range=’V’, the lower bound of the interval to be searched for eigenvalues. If range =’A’ or ’I’,
it is not referenced.
vu Real. (global input)
If range =’V’, the upper bound of the interval to be searched for eigenvalues. If range =’A’ or ’I’,
it is not referenced.
il Integer. (global input)
If range =’I’, the index (from smallest to largest) of the smallest eigenvalue to be returned. il ≥ 1.
If range=’A’ or ’V’, it is not referenced.
iu Integer. (global input)
If range =’I’, the index (from smallest to largest) of the largest eigenvalue to be returned.
min(il,n) ≤ iu ≤ n. If range =’A’ or ’V’, it is not referenced.
abstol Real. (global input)
If jobZ=’V’, setting abstol to PSLAMCH(CONTEXT,’U’) yields the most orthogonal
eigenvectors.
This is the absolute error tolerance for the eigenvalues. An approximate eigenvalue is accepted as
converged when it is determined to lie in an interval [a,b] of width less than or equal to the
following:
abstol + eps * MAX(|a|,|b|)
eps is the machine precision. If abstol is ≤ 0, eps * norm(T) will be used in its place, where
norm(T) is the 1-norm of the tridiagonal matrix obtained by reducing A to tridiagonal form.
Eigenvalues will be computed most accurately when abstol is set to twice the underflow threshold
2*PSLAMCH(’S’) not zero. If this routine returns with ((MOD(INFO,2).NE.0).OR.
(MOD(INFO/8,2).NE.0)), indicating that some eigenvalues or eigenvectors did not converge,
try setting abstol to 2*PSLAMCH(’S’).
m Integer. (global output)
Total number of eigenvalues found. 0 ≤ m ≤ n.
nZ Integer. (global output)
Total number of eigenvectors computed. 0 ≤ nZ ≤ m. The number of columns of Z that are filled.
If jobZ is not equal to ’V’, nz is not referenced. If jobZ is equal to ’V’, nz = m unless the user
supplies insufficient space and PSSYGVX is not able to detect this before beginning computation.
To get all of the eigenvectors requested, the user must supply both sufficient space to hold the
eigenvectors in Z (m ≤ descZ(n)) and sufficient workspace to compute them. (See lwork below.)
PSSYGVX can always detect insufficient space without computation, unless range=’V’.

004– 2081– 002 437


PSSYGVX ( 3S ) PSSYGVX ( 3S )

w Real array, dimension (n). (global input)


On normal exit, the first m entries contain the selected eigenvalues in ascending order.
orfac Real. (global input)
Specifies which eigenvectors should be reorthogonalized. Eigenvectors that correspond to
eigenvalues that are within tol = orfac*norm(A) of each other are reorthogonalized. However,
if the workspace is insufficient (see lwork), tol may be decreased until all eigenvectors to be
reorthogonalized can be stored in one process. No reorthogonalization will be done if orfac equals
-3
zero. A default value of 10 is used if orfac is negative. orfac should be identical on all
processes.
Z Real array. (local output)
Global dimension (n, n), local dimension (descZ(CTXT_), NQ). If jobZ = ’V’, on normal exit
the first m columns of Z contain the orthonormal eigenvectors of the matrix that corresponds to the
selected eigenvalues. If an eigenvector fails to converge, then that column of Z contains the latest
approximation to the eigenvector, and the index of the eigenvector is returned in ifail. If jobZ =
’N’, Z is not referenced.
iZ Integer. (global input)
The global row index of the submatrix of the distributed matrix Z to operate on.
jZ Integer. (global input)
The global column index of the submatrix of the distributed matrix Z to operate on.
descZ Integer array of dimension 9. (input)
The array descriptor for the distributed matrix Z. descZ(CTXT_) must equal descACTXT_).
work Real array, dimension (work). (local workspace/output)
On output, work(1) returns the workspace needed to guarantee completion, but not orthogonality of
the eigenvectors. If the input parameters are incorrect, work(1) may also be incorrect.
If info ≥ 0
if jobZ = ’N’, work(1) = minimal=optimal amount of workspace;
if jobZ = ’V’, work(1) = minimal amount of workspace required to guarantee orthogonal
eigenvectors on the given input matrix with the given ortol. In version 1.0, work(1) = the
minimal workspace required to compute eigenvales.
If info<0, then
if jobZ=’N’, work(1) equals the minimal=optimal amount of workspace
if jobZ=’V’
if range=’A’ or range=’I’, then work(1) equals the minimal workspace required
to compute all eigenvectors (no guarantee on orthogonality).

438 004– 2081– 002


PSSYGVX ( 3S ) PSSYGVX ( 3S )

if range=’V’, then work(1) equals the minimal workspace required to compute


N_Z=DESCZ(N_) eigenvectors (no guarantee on orthogonality). In version 1.0,
work(1) equals the minimal workspace required to compute eigenvalues.
lwork Integer. (local input) The following variable definitions are used to define work:
NN = MAX( N, NB, 2 )
NEIG = number of eigenvectors requested
NB = des cA( MB_ ) = des cA( NB_ ) = des cZ( MB_ ) = descZ( NB_ )
descA( RSRC_ ) = des cA( NB_ ) = des cZ( RSRC_ ) = des cZ( CSR C_ ) = 0
NP0 = NUM ROC( NN, NB, 0, 0, NPR OW )
MQ0 = NUM ROC(MA X(NEIG ,NB ,2) NV, 0,0,NP COL )
ICEIL( X, Y ) is a ScaLAPACK function returning ceiling (X/Y)

If no eigenvectors are requested (jobZ = ’N’), lwork ≥ 5*N + MAX( 5*NN, NB*(NP+1) ).
If eigenvectors are requested (jobZ = ’V’), the amount of workspace required to guarantee that all
eigenvectors are computed is the following:
lwork≥5*N+MAX(5*NN,NP0*MQ0+2*NB*NB) +ICEIL(NEIG,NPROW*NPCOL)*NN
The computed eigenvectors may not be orthogonal if the minimal workspace is supplied and ortol
is too small. If you want to guarantee orthogonality (at the cost of potentially poor performance)
you should add the following to lwork:
(CLUSTERSIZE-1)*N
CLUSTERSIZE is the number of eigenvalues in the largest cluster, where a cluster is defined as a
set of close eigenvalues:
{W(K),...,W(K+CLUSTERSIZE-1)|W(J+1)≤ W(J)+orfac*norm(A)}
If lwork is too small to guarantee orthogonality, PSSYGVX attempts to maintain orthogonality in
the clusters with the smallest spacing between the eigenvalues. If lwork is too small to compute all
of the eigenvectors requested, no computation is performed and info = – 23 is returned. Note that
when range = ’V’, PSSYGVX does not know how many eigenvectors are requested until the
eigenvalues are computed. Therefore, when range = ’V’ and as long as lwork is large enough to
allow PSSYGVX to compute the eigenvalues, PSSYGVX will compute the eigenvalues and as many
eigenvectors as it can.
Relationship between workspace, orthogonality, and performance:
If CLUSTERSIZE ≥ N/SQRT(NPROW*NPCOL), providing enough space to compute all the
eigenvectors orthogonally will cause serious degradation in performance. In the limit (i.e.
CLUSTERSIZE = N-1), PSSTEIN will perform no better than SSTEIN on one processor. For
CLUSTERSIZE = N/SQRT(NPROW*NPCOL) reorthogonalizing all eigenvectors will increase the
total execution time by a factor of 2 or more.

004– 2081– 002 439


PSSYGVX ( 3S ) PSSYGVX ( 3S )

For CLUSTERSIZE > N/SQRT(NPROW*NPCOL), execution time will grow as the square of the
cluster size, all other factors remaining equal and assuming enough workspace. Less workspace
means less reorthogonalization but faster execution.
iwork Integer array. (local workspace)
On return, iwork(1) contains the amount of integer workspace required. If the input parameters are
incorrect, iwork(1) may also be incorrect.
liwork Integer. (local input)
Size of iwork. liwork ≥ 6*NNP
where:

NNP=MA X(N ,NPROW *NP COL +1, 4)

ifail Integer array, dimension (N). (global output)


ifail provides additional information when *CINFO.NE.0. If (MOD(INFO/16,2).NE.0) then
ifail(1) indicates the order of the smallest minor which is not positive definite. If
(MOD(INFO,2).NE.0) on exit, then ifail contains the indices of the eigenvectors that failed to
converge.
If neither of the above error conditions hold and jobZ=’V’, then the first m elements of ifail are set
to 0.
iclustr Integer array, dimension (2*NPROW*NPCOL). (global output)
This array contains indices of eigenvectors that corresponds to a cluster of eigenvalues that could
not be reorthogonalized due to insufficient workspace (see lwork, orfac, and info). Eigenvectors
that correspond to clusters of eigenvalues indexed iclustr(2*I-1) to iclustr(2*I) could not be
reorthogonalized due to lack of workspace. Hence, the eigenvectors that correspond to these
clusters may not be orthogonal. iclustr() is a 0-terminated array. (iclustr(2*K).NE.0 .AND.
iclustr(2*K+1).EQ.0) if and only if K is the number of clusters. iclustr is not referenced if
jobZ =’N’.
gap Real array, dimension (NPROW*NPCOL). (global output)
This array contains the gap between eigenvalues whose eigenvectors could not be reorthogonalized.
The output values in this array correspond to the clusters indicated by the iclustr array. Therefore,
the dot product between eigenvectors that corresponds to the Ith cluster may be as high as
(C*n)/GAP(I) where C is a small constant.
Current limitations:

440 004– 2081– 002


PSSYGVX ( 3S ) PSSYGVX ( 3S )

DESCA( MB_ )=(DES CA(NB_ )


IA=JA= 1
IZ=JZ= 1
DES CA(RSR C_)=DE SCA (CSRC_)=0
DES CA(M_) =DE SCB (M_)=D ESCZ(M _)
DESCA( N_) =DESCB (N_)=D ESC Z(N _)
DESCA( MB_ )=DESC B(MB_) =DE SCZ (MB _)
DESCA( NB_ )=DESC B(NB_) =DE SCZ (NB _)
DESCA( RSRC_) =DE SCB(RS RC_ )=D ESC Z(R SRC_)
DES CA( CSRC_) =DESCB (CS RC_ )=D ESCZ(C SRC _)

info Integer. (global output)


info = 0 Successful exit.
info < 0 If the ith argument is an array and the j-entry had an illegal value,
info = -(i*100+j); if the ith argument is a scalar and had an illegal value, info = – i.
info > 0 If (MOD(info,2).NE.0), one or more eigenvectors failed to converge. Their indices
are stored in ifail. Send email to scalapack@cs.utk.edu.
If (MOD(info/2,2).NE.0), eigenvectors corresponding to one or more clusters of
eigenvalues could not be reorthogonalized because of insufficient workspace. The
indices of the clusters are stored in the ICLUSTR array.
If (MOD(info/4,2).NE.0), space limitations prevented PSSYGVX from computing
all of the eigenvectors between vl and vu. The number of eigenvectors computed is
returned in nZ.
If (MOD(info/8,2).NE.0), PSSTEBZ failed to compute eigenvalues. Send email
to scalapack@cs.utk.edu.
If (MOD(info/16,2).NE.0), B was not positive definite. ifail(1) indicates the order
of the smallest minor which is not positive definite.

SEE ALSO
BLACS_GRIDINIT(3S), DESCINIT(3S), NUMROC(3S)

004– 2081– 002 441


PSSYTRD ( 3S ) PSSYTRD ( 3S )

NAME
PSSYTRD, PCHETRD – Reduces a real symmetric or complex Hermitian distributed matrix to tridiagonal
form

SYNOPSIS
CALL PSSYTRD (uplo, n, A, iA, jA, descA, D, E, tau, work, lwork, info)
CALL PCHETRD (uplo, n, A, iA, jA, descA, D, E, tau, work, lwork, info)

IMPLEMENTATION
UNICOS/mk systems

DESCRIPTION
PSSYTRD reduces a real symmetric matrix sub(A) to symmetric tridiagonal form T by an orthogonal
similarity transformation:
Q’ sub(A) * Q = T
where sub(A)= A(iA:iA+n– 1, jA:jA+n– 1).
PCHETRD requires a complex Hermitian matrix.
These routines require square block decomposition (MB_A = NB_A, as defined in the following comments).
A description vector is associated with each two-dimensional (2D) block-cyclicly distributed matrix. This
vector stores the information required to establish the mapping between a matrix entry and its corresponding
process and memory location.
The following comments describe the elements of a block-cyclicly distributed matrix. In these comments,
the underline character (_) should be read as "of the distributed matrix". Let A be a generic term for any 2D
block cyclicly distributed matrix. Its description vector is descA and must be initialized through a call to
DESCINIT(3S).
M_A The number of rows in the distributed matrix.
N_A The number of columns in the distributed matrix.
MB_A The blocking factor used to distribute the rows of the matrix.
NB_A The blocking factor used to distribute the columns of the matrix.
RSRC_A The process row over which the first row of the matrix is distributed.
CSRC_A The process column over which the first column of the matrix is distributed.
CTXT_A The BLACS context handle, indicating the BLACS process grid A is distributed over. The
context itself is global, but the handle (the integer value) may vary.
LLD_A The leading dimension of the local array storing the local blocks of the distributed matrix A.
LLD_A ≥ MAX(1,LOCp(M_A)).

442 004– 2081– 002


PSSYTRD ( 3S ) PSSYTRD ( 3S )

Let K be the number of rows or columns of a distributed matrix, and assume that its process grid has
dimension p-by-q. LOCp( K ) denotes the number of elements of K that a process would receive if K were
distributed over the p processes of its process column.
Similarly, LOCq( K ) denotes the number of elements of K that a process would receive if K were distributed
over the q processes of its process row.
The values of LOCp() and LOCq() may be determined via a call to the NUMROC(3S) ScaLAPACK tool
function, as in the following:
LOC p(M)=N UMROC( M, MB_A, MYROW, RSRC_A, NPROW)

LOC q(N)=N UMROC( N, NB_A, MYCOL, CSRC_A, NPCOL)

These routines accept the following arguments. For PCHETRD, the following real arguments must be
complex:
uplo Character. (global input)
uplo = ’U’: Upper triangle of sub(A) is stored.
uplo = ’L’: Lower triangle of sub(A) is stored.
n Integer. (global input)
The number of columns to be operated on (the number of columns of the distributed submatrix
sub(A)). n must be ≥ 0.
A Real pointer into the local memory to an array of dimension (LLD_A, LOCq(jA+n– 1). (local
input/local output)
On entry, the local pieces of the symmetric distributed matrix sub(A) to be factored.
If uplo = ’U’, the leading n-by-n upper triangular part of sub(A) contains the upper triangular part of
the matrix, and its strictly lower triangular part is not refrenced.
If uplo = ’L’, the leading n-by-n lower triangular part of sub(A) contains the lower triangular part of
the distributed matrix, and its strictly upper triangular part is not referenced.
On exit, if uplo = ’U’, the diagonal and first superdiagonal of sub(A) are overwritten by the
corresponding elements of the tridiagonal matrix T, and the elements above the first superdiagonal,
with the array tau, represent the orthogonal matrix Q as a product of elementary reflectors; if uplo =
’L’, the diagonal and first subdiagonal of sub(A) are overwritten by the corresponding elements of
the tridiagonal matrix T, and the elements below the first subdiagonal, with the array tau, represent
the orthogonal matrix Q as a product of elementary reflectors. See the Further Details subsection
for more information.
iA Integer. (global input)
The global row index of A, which points to the beginning of the submatrix that will be operated on.
jA Integer. (global input)
The global column index of A which points to the beginning of the submatrix that will be operated
on.

004– 2081– 002 443


PSSYTRD ( 3S ) PSSYTRD ( 3S )

descA Integer array of dimension 9. (input)


The array descriptor for the distributed matrix A.
D Real array of dimension LOCq(jA+n– 1). (local input/local output)
The diagonal elements of the tridiagonal matrix T: D(i) = A(i,i). D is tied to the distributed
matrix A.
E Real array of dimension LOCq(jA+n– 1). (local input/local output)
The off-diagonal elements of the tridiagonal matrix T: E(i) = A(i,i) if uplo = ’U’; E(i) = A(i+1,i) if
uplo = ’L’. E is tied to the distributed matrix A.
tau Real array, dimension LOCq(jA+MIN(m,n)-1. (local output)
This array contains the scalar factors tau of the elementary reflectors. tau is tied to the distributed
matrix A.
work Real array, dimension (lwork). (local workspace)
On exit, work(1) returns the minimal lwork.
lwork Integer. (local input)
The dimension of the array work.
lwork ≥ MAX(NB * (NP+1), 3*NB)
where
NB = MB_A = NB_ A
NP = NUM ROC ( N, NB, MYR OW, IAR OW, NPROW )
IAROW = IND XG2 P( IA, NB, MYR OW, RSR C_A, NPR OW )

NUMROC(3S) and INDXG2P(3S) are ScaLAPACK tool function; MYROW, MYCOL, NPROW, and
NPCOL can be determined by calling the BLACS_GRIDINFO(3S) subroutine.
info Integer. (global output)
info = 0 Successful exit.
info < 0 If the ith argument is an array and the j-entry had an illegal value, info = -(i*100+j);
if the ith argument is a scalar and had an illegal value, info = – i.
Alignment Requirements
The distributed submatrix sub(A) must verify some alignment properties, namely the following expression
should be true:
( MB_ A.E Q.N B_A .AN D. IRO FFA .EQ .ICOFFA .AND. IRO FFA.EQ.0 )

with the following:


IROFFA = MOD( iA- 1, MB_ A ) and ICO FFA = MOD ( jA-1, NB_A )

444 004– 2081– 002


PSSYTRD ( 3S ) PSSYTRD ( 3S )

Further Details
If uplo = ’U’, the matrix Q is represented as a product of elementary reflectors
Q = H(n-1) ... H(2 ) H(1 )

Each H(i) has the following form:


H = I - tau * v * v’

tau is a real scalar, and v is a real vector with v(i+1:n) = 0 and v(i) = 1; v(1:i-1) is stored on
exit in A(iA+i-1:iA+m-1,jA+i-1), and tau is stored in tau(jA+i-1).
If uplo = ’L’, the matrix Q is represented as a product of elementary reflectors
Q = H(1) ... H(2) H(n -1)

Each H(i) has the following form:


H = I - tau * v * v’

where tau is a real scalar, and v is a real vector with v(1:i) = 0 and v(i+1) = 1; v(i+1:n) is
stored on exit in A(iA+i-1:iA+m-1,jA+i-1), and tau is stored in tau(jA+i-1).
The contents of sub(A) on exit are illustrated by the following examples with n = 5:
if uplo = ’U’: if uplo = ’L’:
( d e v2 v3 v4 ) ( d )
( d e v3 v4 ) ( e d )
( d e v4 ) ( v1 e d )
( d e ) ( v1 v2 e d )
( d ) ( v1 v2 v3 e d )

In this example, d and e denote diagonal and off-diagonal elements of T, and v1 denotes an element of the
vector defining H(i).

NOTES
BLACS_GRIDINIT(3S) must be called to initialize the virtual BLACS grid.

SEE ALSO
BLACS_GRIDINFO(3S), BLACS_GRIDINIT(3S), INDXG2P(3S), NUMROC(3S)

004– 2081– 002 445


PSTRTRI ( 3S ) PSTRTRI ( 3S )

NAME
PSTRTRI, PCTRTRI – Computes the inverse of a real or complex upper or lower triangular distributed
matrix

SYNOPSIS
CALL PSTRTRI (uplo, diag, n, A, iA, jA, descA, info)
CALL PCTRTRI (uplo, diag, n, A, iA, jA, descA, info)

IMPLEMENTATION
UNICOS/mk systems

DESCRIPTION
PSTRTRI and PCTRTRI compute the inverse of a real or complex upper or lower triangular distributed
matrix of the form:
sub(A)=A(iA:iA+n-1,jA:jA+n-1)
These routines require square block decomposition (MB_A = NB_A, as defined in the following comments).
A description vector is associated with each two-dimensional (2D) block-cyclicly distributed matrix. This
vector stores the information required to establish the mapping between a matrix entry and its corresponding
process and memory location.
The following comments describe the elements of a block-cyclicly distributed matrix. In these comments,
the underline character (_) should be read as "of the distributed matrix". Let A be a generic term for any 2D
block cyclicly distributed matrix. Its description vector is descA and must be initialized through a call to
DESCINIT(3S).
M_A The number of rows in the distributed matrix.
N_A The number of columns in the distributed matrix.
MB_A The blocking factor used to distribute the rows of the matrix.
NB_A The blocking factor used to distribute the columns of the matrix.
RSRC_A The process row over which the first row of the matrix is distributed.
CSRC_A The process column over which the first column of the matrix is distributed.
CTXT_A The BLACS context handle, indicating the BLACS process grid A is distributed over. The
context itself is global, but the handle (the integer value) may vary.
LLD_A The leading dimension of the local array storing the local blocks of the distributed matrix A.
LLD_A ≥ MAX(1,LOCp(M_A)).

446 004– 2081– 002


PSTRTRI ( 3S ) PSTRTRI ( 3S )

Let K be the number of rows or columns of a distributed matrix, and assume that its process grid has
dimension p-by-q. LOCp( K ) denotes the number of elements of K that a process would receive if K were
distributed over the p processes of its process column.
Similarly, LOCq( K ) denotes the number of elements of K that a process would receive if K were distributed
over the q processes of its process row.
The values of LOCp() and LOCq() may be determined via a call to the NUMROC(3S) ScaLAPACK tool
function, as in the following:
LOC p(M)=N UMROC( M, MB_A, MYROW, RSRC_A, NPROW)

LOC q(N)=N UMROC( N, NB_A, MYCOL, CSRC_A, NPCOL)

These routines accept the following arguments. For PCTRTRI, the following real arguments must be
complex:
uplo Character. (global input)
Specifies whether the distributed matrix sub(A) is upper or lower triangular:
uplo = ’U’: Upper triangle of sub(A) is stored.
uplo = ’L’: Lower triangle of sub(A) is stored.
diag Character. (global input)
Specifies whether the distributed matrix sub(A) is unit triangular:
diag = ’N’: Non-unit triangular.
diag = ’U’: Unit triangular.
n Integer. (global input)
The number of columns to be operated on (the number of columns of the distributed submatrix
sub(A)). n must be ≥ 0.
A Real pointer into the local memory to an array of dimension (LLD_A, LOCq(jA+n-1). (local
input/local output)
On entry, the local pieces of the triangular matrix sub(A).
If uplo = ’U’, the leading n-by-n upper triangular part of the matrix sub(A) contains the upper
triangular matrix to be inverted, and the strictly lower triangular part of sub(A) is not referenced.
If uplo = ’L’, the leading n-by-n lower triangular part of the matrix sub(A) contains the lower
triangular matrix, and the strictly upper triangular part of sub(A) is not referenced.
iA Integer. (global input)
The global row index of A, which points to the beginning of the submatrix that will be operated on.
jA Integer. (global input)
The global column index of A which points to the beginning of the submatrix that will be operated
on.

004– 2081– 002 447


PSTRTRI ( 3S ) PSTRTRI ( 3S )

descA Integer array of dimension 9. (input)


The array descriptor for the distributed matrix A.
info Integer. (global output)
info = 0 Successful exit.
info < 0 If the ith argument is an array and the j-entry had an illegal value, info = -(i*100+j);
if the ith argument is a scalar and had an illegal value, then info = – i.
info > 0 If info = K, U(iA+K-1,jA+K-1) is exactly 0. The factorization has been completed,
but the factor U is exactly singular, so the solution could not be computed.

NOTES
BLACS_GRIDINIT(3S) must be called to initialize the virtual BLACS grid.

SEE ALSO
BLACS_GRIDINIT(3S), DESCINIT(3S), NUMROC(3S)

448 004– 2081– 002


PSTRTRS ( 3S ) PSTRTRS ( 3S )

NAME
PSTRTRS, PCTRTRS – Solves a real or complex distributed triangular system

SYNOPSIS
CALL PSTRTRS (uplo, trans, diag, n, nrhs, A, iA, jA, descA, B, iB, jB, descB, info)
CALL PCTRTRS (uplo, trans, diag, n, nrhs, A, iA, jA, descA, B, iB, jB, descB, info)

IMPLEMENTATION
UNICOS/mk systems

DESCRIPTION
PSTRTRS and PCTRTRS solve a real or complex triangular system of the form
sub(A) * X = sub(B)
or
T
sub(A) * X = sub(B)
where sub(A) denotes the following:
sub(A)=A(iA:iA+n-1,jA:jA+n-1)
and sub(A) is a triangular distributed matrix of order N, and the following is an n-by-nrhs distributed matrix
denoted by sub(B):
sub(B)=B(iB:iB+n-1,jB:jB+n-1)
A check is made to verify that sub(A) is nonsingular.
These routines require square block decomposition (MB_A = NB_A, as defined in the following comments).
A description vector is associated with each two-dimensional (2D) block-cyclicly distributed matrix. This
vector stores the information required to establish the mapping between a matrix entry and its corresponding
process and memory location.
The following comments describe the elements of a block-cyclicly distributed matrix. In these comments,
the underline character (_) should be read as "of the distributed matrix". Let A be a generic term for any 2D
block cyclicly distributed matrix. Its description vector is descA and must be initialized through a call to
DESCINIT(3S).
M_A The number of rows in the distributed matrix.
N_A The number of columns in the distributed matrix.
MB_A The blocking factor used to distribute the rows of the matrix.
NB_A The blocking factor used to distribute the columns of the matrix.
RSRC_A The process row over which the first row of the matrix is distributed.
CSRC_A The process column over which the first column of the matrix is distributed.

004– 2081– 002 449


PSTRTRS ( 3S ) PSTRTRS ( 3S )

CTXT_A The BLACS context handle, indicating the BLACS process grid A is distributed over. The
context itself is global, but the handle (the integer value) may vary.
LLD_A The leading dimension of the local array storing the local blocks of the distributed matrix A.
LLD_A ≥ MAX(1,LOCp(M_A)).
Let K be the number of rows or columns of a distributed matrix, and assume that its process grid has
dimension p-by-q. LOCp( K ) denotes the number of elements of K that a process would receive if K were
distributed over the p processes of its process column.
Similarly, LOCq( K ) denotes the number of elements of K that a process would receive if K were distributed
over the q processes of its process row.
The values of LOCp() and LOCq() may be determined via a call to the NUMROC(3S) ScaLAPACK tool
function, as in the following:
LOCp(M)=N UMROC( M, MB_A, MYR OW, RSRC_A , NPROW)

LOCq(N)=N UMROC( N, NB_A, MYC OL, CSRC_A , NPC OL)

These routines accept the following arguments. For PCTRTRS, the following real arguments must be
complex:
uplo Character. (input)
uplo = ’U’: sub(A)=A(iA:iA+n-1,jA:jA+n-1) is upper triangular.
uplo = ’L’: sub(A)=A(iA:iA+n-1,jA:jA+n-1) is lower triangular.
trans Character. (global input)
Specifies the form of the system of equations:
trans = ’N’: sub(A) * X = sub(B) (No transpose).
T
trans = ’T’: sub(A) * X = sub(B) (Transpose).
T
trans = ’C’: sub(A) * X = sub(B) (Transpose).
diag Character. (global input)
diag = ’N’: sub(A) is non-unit triangular
diag = ’U’: sub(A) is unit triangular
n Integer. (global input)
The number of columns to be operated on (the number of columns of the distributed submatrix
sub(A)). n must be ≥ 0.
nrhs Integer. (global input)
The number of right-hand sides (the number of columns of the distributed matrix sub(B)). nrhs
must be ≥ 0.
A Real pointer into the local memory to an array of dimension (LLD_A, LOCq(jA+n-1). (local
input/local output)

450 004– 2081– 002


PSTRTRS ( 3S ) PSTRTRS ( 3S )

If uplo = ’U’, the leading n-by-n upper triangular part of the matrix sub(A) contains the upper
triangular matrix and its strictly lower triangular part of sub(A) is not referenced.
If uplo = ’L’, the leading n-by-n lower triangular part of the matrix sub(A) contains the lower
triangular matrix, and the strictly upper triangular part of sub(A) is not referenced.
If diag = ’U’, the diagonal elements of sub(A) are also not referenced and are assumed to be 1.
iA Integer. (global input)
The global row index of A, which points to the beginning of the submatrix that will be operated
on.
jA Integer. (global input)
The global column index of A which points to the beginning of the submatrix that will be
operated on.
descA Integer array of dimension 9. (input)
The array descriptor for the distributed matrix A.
B Real pointer into the local memory to an array of dimension (LLD_B, LOCq(jB +nrhs– 1). (local
input/local output)
On entry, the right-hand side distributed matrix sub(B).
On exit, if info = 0, sub(B) is overwritten by the solution distributed matrix X.
iB Integer. (global input)
The global row index of B, which points to the beginning of the submatrix that will be operated
on.
jB Integer. (global input)
The global column index of B, which points to the beginning of the submatrix that will be
operated on.
descB Integer array of dimension 9. (input)
The array descriptor for the distributed matrix B.
info Integer. (global output)
info = 0 Successful exit.
info < 0 If the ith argument is an array and the j-entry had an illegal value,
info = -(i*100+j); if the ith argument is a scalar and had an illegal value, info = – i.
info > 0: If info = i, the i-ith diagonal element of sub(A) is 0, which indicates that the
submatrix is singular and the solutions have not been computed.

004– 2081– 002 451


PSTRTRS ( 3S ) PSTRTRS ( 3S )

NOTES
BLACS_GRIDINIT(3S) must be called to initialize the virtual BLACS grid.

SEE ALSO
BLACS_GRIDINIT(3S), DESCINIT(3S), NUMROC(3S)

452 004– 2081– 002


INTRO_SPARSE ( 3S ) INTRO_SPARSE ( 3S )

NAME
INTRO_SPARSE – Introduction to solvers for sparse linear systems

IMPLEMENTATION
UNICOS systems

DESCRIPTION

The following table lists the purpose and name of the sparse linear system routines.

Purpose Name

Assigns values to parameters in arguments iparam and rparam for DFAULTS


SITRSOL (initializes iterative solver)
Solves a real general sparse system, using a preconditioned conjugate SITRSOL
gradient-like method (iterative solver)
Factors a real sparse general matrix (direct solver, threshold pivoting) SSGETRF
Solves a real sparse general system, using the factorization computed SSGETRS
in SSGETRF (direct solver, threshold pivoting)
Factors a real sparse symmetric definite matrix (direct solver, no SSPOTRF
pivoting)
Solves a real sparse symmetric definite system, using the factorization SSPOTRS
computed in SSPOTRF (direct solver, no pivoting)
Factors a real sparse structurally symmetric matrix (direct solver, no SSTSTRF
pivoting)
Solves a real sparse structurally symmetric system, using the SSTSTRS
factorization computed in SSTSTRF (direct solver, no pivoting)

A sparse matrix is a matrix that has relatively few nonzero values. This type of matrix occurs frequently in
key computational steps of a variety of engineering and scientific applications. Most sparse matrix software
takes advantage of this "sparseness" to reduce the amount of storage and arithmetic required by keeping track
of only the nonzero entries in the matrix.

004– 2081– 002 453


INTRO_SPARSE ( 3S ) INTRO_SPARSE ( 3S )

Storage Formats
Suppose that the n-by-n input matrix A has nza nonzero entries. The data structure used to represent A is a
column-oriented format, which is referred to as the sparse column format, in which the entries are grouped
by columns. In this format, the row indices of the nonzero elements in the first column are stored
contiguously in ascending order in an array irowind; then the row indices are stored for the second column,
and so on. The corresponding values are stored in an array values. A pointer array, icolptr, points to the
first entry in each column of A in irowind and values. icolptr(n+1) is set to nza+1. irowind and values are
arrays of length nza, and icolptr is of length n+1. Hence, 2nza+n+1 words of storage are required to
2
represent A, rather than the usual n words in the corresponding dense matrix format. Moreover, in the case
when A is symmetric, there is an even more compact symmetric column pointer format, in which only the
lower triangular part of A is stored.
Suppose A is a 5-by-5 matrix with 13 nonzero elements defined as follows:

 11 0 0 41 0 
 
 0 22 32 0 52 
A =  0 32 33 43 0 
 41 0 43 44 0 
 
 0 52 0 0 55 
The full sparse column format representation of A is as follows:
values = (11 41 22 32 52 32 33 43 41 43 44 52 55 )
irowind = ( 1 4 2 3 5 2 3 4 1 3 4 2 5 )
icolptr = ( 1 3 6 9 12 14 )
Because A is symmetric, the following symmetric sparse column format representation of A also is valid:
values = (11 41 22 32 52 33 43 44 55 )
irowind = ( 1 4 2 3 5 3 4 4 5 )
icolptr = ( 1 3 6 8 9 10 )
Direct Versus Iterative Solution
Techniques for the solution of sparse linear systems can be divided into two broad classes: direct and
iterative.
Direct solution
An explicit factorization of the matrix is computed, and it is used to solve for a solution of the linear system
given a right-hand side. The solution obtained by direct methods is certain to be as accurate as the problem
definition.

454 004– 2081– 002


INTRO_SPARSE ( 3S ) INTRO_SPARSE ( 3S )

Iterative solution
A sequence of approximations is generated iteratively, which should converge to a solution of the linear
system. Unlike direct methods, iterative methods tend to be more special-purpose, and it is well known that
no general, effective iterative algorithms exist for an arbitrary sparse linear system. However, for certain
classes of problems, the use of an appropriate iterative method can yield an approximate solution
significantly faster than direct methods. Also, iterative methods typically require less memory than direct
methods, making iterative methods the only feasible approach for some large problems. In an attempt to
compensate for the lack of robustness of any single iterative method and preconditioner, this package
provides a variety of methods and preconditioners. All are preconditioned conjugate gradient-type methods.
You can find a reference to a good introduction to these methods in the SEE ALSO section.
Analyze Phase for the Direct Sparse
In the direct solution of sparse linear systems, the structure of the input matrix usually is preprocessed prior
to the numerical factorization and the numerical solution phase. This is often referred to as the Analyze
phase. Only the structure of the matrix (that is, icolptr and irowind) is required at this stage. As
implemented in the package, the Analyze phase is further divided into the following:
• Fill-reduction reordering phase
• Symbolic factorization phase
• Execution sequence and memory management phase
Fill-reduction reordering phase
For a given sparse symmetric matrix A, the lower triangular matrix L from the LDL T factorization of A is
generally much more dense than A because of the fill-in generated at locations in which Ai j = 0. To reduce
this amount of fill-in, the routine applies an appropriate symmetric row and column permutation P to A
before carrying out the numerical factorization on PAP T . The system to be solved is then
PAP T y = Pb , x =P T y .
The reordering heuristic used in the package is based on the multiple minimum degree algorithm (see the
SEE ALSO section for a reference), which has proven to be a very effective practical method for reducing
the amount of fill-in created during the factorization. Moreover, in most problems, some of the columns of
the resultant factor L naturally have identical sparsity structure. These columns are grouped into what is
commonly referred to as supernode, and they are processed together in subsequent stages. This results in
significant performance improvement over previous sparse matrix solvers. The supernode concept can be
relaxed further by allowing additional fill-ins in L, so that more columns can be grouped together, resulting
in fewer and larger supernodes.
Experience shows that more often than not this trade-off of additional fill-ins (and therefore, more
operations) for fewer but larger supernodes reduces the execution time overall.
Symbolic factorization phase
Given the structure of the input matrix and a permutation matrix P as determined from the fill-reduction
reordering phase, the symbolic factorization phase builds the data structure for the nonzero entries of L.

004– 2081– 002 455


INTRO_SPARSE ( 3S ) INTRO_SPARSE ( 3S )

Execution sequence and memory management phase


The multifrontal method uses update matrices to carry the intermediate results from the variables (nodes)
being eliminated to the variables (nodes) that are not yet processed.
Before the elimination of a variable, update matrices that correspond to previously eliminated variables may
have to be "assembled" to form the current frontal matrix. The partial factorization of the current frontal
matrix is then carried out, and its update matrix is generated. This phase finds the processing sequence and
the amount of storage needed to store the temporary update matrices.

EXAMPLES
The following examples show the use of the iterative and direct sparse solver routines.
PRO GRAM EX1
PAR AME TER (NM AX = 5, NZA S = 9, NZA U = 13)
PAR AMETER (LI WORK = 350 , LWO RK = LIW ORK )
INT EGER NEQ NS, NZA , IPA TH, IER R, ROWU(N ZAU ), COL U(NZAU ),
& ROW S(N ZAS), COLS(N MAX +1) , IWO RK( LIW ORK), IPA RAM(40 )
REA L AMA TU(NZA U), AMATS( NZA S), RPA RAM (30 ), X(N MAX ), B(N MAX),
& BGE(NM AX) , BPO (NM AX) , BTS (NMAX) , SOL N(N MAX ), WORK(L WOR K)
CHA RAC TER *3 METHOD
c
c --- ------ --- --- --------- --- --- -
c Def ine mat rix , sol uti on and RHS
c --- ------ --- --- --------- --- --- -
c
c.. ...Ful l col umn pointer format
DATA COLU / 1, 4, 7, 9, 12, 14/
DAT A ROW U / 1, 2, 4, 1, 2, 3, 2, 3, 1, 4, 5, 4, 5/
DAT A AMA TU / 4., -1.,-1 .,- 1., 4., -1. ,-1 ., 4., -1. , 4., -1. ,-1 ., 4./
c
c.....Symmet ric col umn pointe r for mat
DAT A COL S / 1, 4, 6, 7, 9, 10/
DAT A ROW S / 1, 2, 4, 2, 3, 3, 4, 5, 5/
DATA AMA TS / 4., -1.,-1 ., 4., -1. , 4., 4.,-1. , 4./
c
DATA SOLN / 1., 1., 1., 1., 1. /
DATA B / 2., 2., 3., 2., 3. /
DATA BGE / 2., 2., 3., 2., 3. /
DATA BPO / 2., 2., 3., 2., 3. /
DATA BTS / 2., 2., 3., 2., 3. /
c
NEQNS = 5
c
c ------ --- --- ------ ------ ---
c Solve proble m using SIT RSO L

456 004– 2081– 002


INTRO_SPARSE ( 3S ) INTRO_SPARSE ( 3S )

c --- ------ --- ------ ------ ---


c
c..... Let the initia l gue ss for x be ran dom num ber s bet wee n 0 and 1
DO 20 I = 1, NEQ NS
X(I ) = RANF()
20 CONTIN UE
c
c..... Set def aul t par ameter val ues
CALL DFA ULTS ( IPARAM , RPA RAM )
c
c..... Select lef t Least- squ are s pre con dit ioning
IPARAM (9) = 1
IPARAM (10 ) = 5
c
c..... Cal l SIT RSO L to solve the proble m usi ng PCG
IPA TH = 2
MET HOD = ’PCG’
CAL L SITRSO L ( METHOD, IPATH, NEQ NS, NEQ NS, X, B, COL S, ROWS,
& AMA TS, LIW ORK , IWO RK, LWO RK, WOR K, IPA RAM, RPARAM , IERR )
c
c --- ------ ------ --- --- ------ --- --- --- ----
c Sol ve sam e pro ble m usi ng SSGETR F/S SGE TRS
c ------ --- ------ ------ --- --- --- ------ --- -
c
c..... use all def aul t val ues
IPA RAM(1) = 0
c..... do all 4 pha ses of fac tor izatio n
IDO = 14
c..... thr eshold for piv oti ng
THR ESH = 0.1
c
c..... comput e fac tor izatio n usi ng SSG ETR F
CAL L SSG ETR F ( IDO , NEQ NS, COL U, ROW U, AMATU, LWO RK,
& WOR K, IPARAM , THRESH , IER R )
c
c.....com pute sol ution usi ng SSG ETR S
c
c..... sol ve sta nda rd way
IDO = 1
c..... sol ve for 1 RHS wit h lea din g dim = neqns
NRHS = 1
LDB = NEQ NS
c
CAL L SSG ETR S ( IDO , LWO RK, WOR K, NRH S, BGE, LDB,

004– 2081– 002 457


INTRO_SPARSE ( 3S ) INTRO_SPARSE ( 3S )

& IPA RAM , IER R )


c
c ------ ------ ------ --- --- --- ------ --- --- -
c Solve same pro ble m usi ng SSP OTR F/S SPO TRS
c ------ ------ ------ --- --- --- ------ --- --- -
c
c.....use all def aul t val ues
IPA RAM (1) = 0
c..... do all 4 pha ses of factor iza tio n
IDO = 14
c
c.....com put e factor izatio n usi ng SSPOTR F
CAL L SSP OTRF ( IDO, NEQNS, COL S, ROW S, AMA TS, LWO RK,
& WOR K, IPA RAM, IER R )
c
c.....com pute soluti on usi ng SSPOTR S
c
c..... sol ve standa rd way
IDO = 1
c..... sol ve for 1 RHS with lea ding dim = neq ns
NRH S = 1
LDB = NEQ NS
c
CALL SSP OTRS ( IDO , LWO RK, WOR K, NRH S, BPO , LDB ,
& IPA RAM, IER R )
c

458 004– 2081– 002


INTRO_SPARSE ( 3S ) INTRO_SPARSE ( 3S )

c ------------------ ------ --- ------ --- --- -


c Sol ve sam e proble m usi ng SST STR F/SSTS TRS
c ------------------ ------ ------ --- --- --- -
c
c.. ...use all defaul t val ues
IPARAM (1) = 0
c..... do all 4 phases of fac tor izatio n
IDO = 14
c
c.. ...compute factor izatio n usi ng SST STRF
CAL L SSTSTR F ( IDO , NEQ NS, COL U, ROW U, AMA TU, LWO RK,
& WOR K, IPARAM , IER R )
c
c.....compute soluti on using SST STR S
c
c..... solve standa rd way
IDO = 1
c..... solve for 1 RHS wit h lea ding dim = neq ns
NRH S = 1
LDB = NEQNS
c
CALL SSTSTRS ( IDO, LWORK, WOR K, NRH S, BTS , LDB ,
& IPA RAM , IER R )
c
c --------------- --- --- --- --- --- -----
c Compar e soluti ons to exa ct soluti on
c ------------------ --- --- --- --- -----
c
c.. ...Com pute two -norm of the dif fer enc e bet ween exact and comput ed
c for all soluti on techni que s (SSxxTRS sol uti on is in Bxx)
c
c.....compute differenc es
CALL SAX PY ( NEQ NS, -1. , SOL N, 1, X, 1 )
CALL SAXPY ( NEQNS, -1. , SOL N, 1, BGE , 1 )
CAL L SAXPY ( NEQ NS, -1. , SOL N, 1, BPO , 1 )
CALL SAXPY ( NEQNS, -1., SOLN, 1, BTS , 1 )
c
c..... comput e norms
ERRI = SNR M2( NEQNS, X, 1 )
ERR GE = SNR M2( NEQNS, BGE , 1 )
ERRPO = SNRM2( NEQ NS, BPO, 1 )
ERRTS = SNRM2( NEQNS, BTS , 1 )
c
c..... print results

004– 2081– 002 459


INTRO_SPARSE ( 3S ) INTRO_SPARSE ( 3S )

WRITE(6,11)E RRI, ERR GE, ERR PO, ERR TS


11 FOR MAT (’* **** Out put fro m pro gra m: EX1 *** **’ , /
& ’ Iterat ive sol ution err or ’,E 15.8, /
& ’ "GE " sol ution err or ’,E 15.8, /
& ’ "PO " sol uti on error ’,E 15. 8 , /
& ’ "TS" soluti on err or ’,E15. 8 )
c..... all don e
END
Program EX1 yields the following output on CRAY Y-MP systems:
***** Out put fro m pro gra m: EX1 *** **
Iterative soluti on err or 0.1 230 696 1E- 13
"GE " soluti on err or 0.2 009 7183E- 13
"PO " soluti on err or 0.2 009 7183E- 13
"TS " soluti on err or 0.2 009 7183E- 13

SEE ALSO
Golub, G. H. and C. F. Van Loan, Matrix Computations, second edition. Baltimore, MD: Johns Hopkins
University Press, 1989.
Liu, J. W., "Modification of the Minimum Degree Algorithm by Multiple Elimination," ACM Transactions
on Math Software, 11, (1985): pp. 141– 153.

460 004– 2081– 002


DFAULTS ( 3S ) DFAULTS ( 3S )

NAME
DFAULTS – Assigns default values to the parameter arguments for SITRSOL(3S)

SYNOPSIS
CALL DFAULTS (iparam, rparam)

IMPLEMENTATION
UNICOS systems

DESCRIPTION
Users of SITRSOL usually would have to explicitly define each required parameter in the iparam and
rparam array arguments. DFAULTS lets you easily assign default values to the parameters in iparam and
rparam. After you set the default values by using DFAULTS, you can then change any of the parameter
values explicitly, as needed.
This routine has the following arguments:
iparam Integer array of dimension 40. (output)
Array of integer parameters required by SITRSOL.
rparam Real array of dimension 30. (output)
Array of real parameters required by SITRSOL.
To see the complete range of valid values for these arguments, see the SITRSOL(3S) man page.
Many of these parameters are set on exit from SITRSOL. After a call to SITRSOL, the DFAULTS setting
of these parameters is destructive.
The default values for iparam and rparam (output of DFAULTS) are as follows:
iparam
iparam(1): isym Full or symmetric format flag.
=1 Matrix is in symmetric column pointer format.
iparam(2): itest Stopping criterion.
=0 Use ’natural’ (cheapest) stopping criterion for the chosen iterative
method.
iparam(3): maxiter Maximum number of iterations allowed.
= 500
iparam(4): niter On exit, SITRSOL sets this to the number of iterations actually performed.
=0
iparam(5): msglvl Flag to control the level of messages output.
=2 Warning and fatal messages only.

004– 2081– 002 461


DFAULTS ( 3S ) DFAULTS ( 3S )

iparam(6): iunit Unit number for output information.


=6
iparam(7): iscale Diagonal scaling option.
=1 Apply symmetric diagonal scaling to make the diagonal of the scale
matrix all 1’s.
iparam(8): isympap Full or symmetric jagged diagonal format. (input)
Matrix vector product uses jagged diagonal format to achieve faster performance. If
mvformat = 0, this parameter is ignored.
=0 Convert A to full jagged diagonal format when A is in symmetric sparse
column format. If isym = 1, this requires more storage, but it provides
much faster performance than the symmetric jagged diagonal format.
iparam(9): iprelrb Applying preconditioning.
=0 Do not apply preconditioning.
iparam(10): ipretyp Type of preconditioning to use.
=0 No preconditioning.
iparam(11): lvlfill Level of fill-in allowed if incomplete factorization is used.
=0
iparam(12): maxlfil Maximum amount of fill allowed in the lower triangular factor of the incomplete
factorization with lvlfill > 0.
On exit, SITRSOL sets maxlfil to the amount of fill created.
=0 Use a crude estimate determined by using maxlfil = (1 + lvlfill)nza; nza
is the number of nonzero elements of A.
iparam(13): maxufil Maximum amount of fill allowed in the upper triangular factor of the incomplete LU
factorization with lvlfill > 0.
On exit, SITRSOL sets maxufil to the amount of fill created.
=0 Use a crude estimate determined by using maxufil = (1 + lvlfill)nza; nza
is the number of nonzero elements of A.
iparam(14): ifcomp Incomplete factor compute flag.
=1 Compute the incomplete factorization.
iparam(15): kdegree Degree of the polynomial preconditioner.
=2
iparam(16): ntrunc Number of vectors to be saved in GMRES[k] and OMN[k].
= 10

462 004– 2081– 002


DFAULTS ( 3S ) DFAULTS ( 3S )

iparam(17): nvorth Number of previous Krylov basis vectors to which each new basis vector is made
orthogonal. (GMRES method only.)
= 10
iparam(18): nrstrt Number of iterations between restart in OMN[k].
= 20
iparam(19): irestrt Save-and-restart control flag.
=0 No save-and-restart.
iparam(20): iosave Save-and-restart unit number of the unformatted file, which is assumed to have
been opened by the user.
=0 This is not a valid unit number for the save-and-restart operation. If
you change the value of irestrt to enable save-and-restart, you also must
change the value of iosave.
iparam(21): mvformat Desired format for computation of matrix-vector products.
=1 Use jagged diagonal form. This requires more storage, but it offers
faster performance.
iparam(22): nicfmax Maximum number of times to try IC[k] factorization by using shifted IC
factorization. (See rparam(15) and rparam(16).)
= 11
iparam(23): nicfacs On exit, SITRSOL sets this to the actual number of shifted IC[k] factorizations
tried. (See rparam(15) and rparam(16).)
=0
iparam(24) — iparam(40)
Presently unused. These parameters are reserved for future use.
rparam
rparam(1): tol Stopping criterion tolerance.
–6
= 1.0
rparam(2): err On exit, SITRSOL sets this to the computed error estimate at each iteration.
= 0.0
rparam(3): alpha Absolute value of the estimate of the smallest eigenvalue of A. Currently, this
parameter is unused and is assumed to be 0.
= 0.0
rparam(4): beta Absolute value of the estimate of the largest eigenvalue of A. This is needed only
by the least-squares polynomial preconditioner (ipretyp=5).
= 0.0 SITRSOL computes an estimate of the spectral radius.

004– 2081– 002 463


DFAULTS ( 3S ) DFAULTS ( 3S )

rparam(5): timscal On exit, SITRSOL sets this to the accumulated time (in seconds) to scale and
unscale the user matrix. (See the NOTES section.)
= 0.0
rparam(6): timsets On exit, SITRSOL sets this to the accumulated time (in seconds) to compute the
symbolic incomplete factorization. (See the NOTES section.)
= 0.0
rparam(7): timsetn On exit, SITRSOL sets this to the accumulated time (in seconds) to compute the
numerical incomplete factorization. (See the NOTES section.)
= 0.0
rparam(8): timset On exit, SITRSOL sets this to the accumulated total time (in seconds) to perform
the preconditioner setup. If incomplete factorization is used, this includes both
timsets and timsetn. (See the NOTES section.)
= 0.0
rparam(9): timpre On exit, SITRSOL sets this to the accumulated total time (in seconds) to apply the
preconditioner in the iteration phase of the solution process. (See the NOTES
section.)
= 0.0
rparam(10): timmvs On exit, SITRSOL sets this to the accumulated time (in seconds) to convert from
column pointer to jagged diagonal format. If parallel processing is used, this also
includes the setup time to perform the parallel matrix vector operations. (See the
NOTES section.)
= 0.0
rparam(11): timmv On exit, SITRSOL sets this to the accumulated time (in seconds) to perform the
matrix vector product (not including those performed in applying the polynomial
preconditioners). (See the NOTES section.)
= 0.0
rparam(12): timmtv On exit, SITRSOL sets this to the accumulated time (in seconds) to perform the
transpose matrix vector product (not including those in applying the polynomial
preconditioners). (See the NOTES section.)
= 0.0
rparam(13): timit On exit, SITRSOL sets this to the accumulated time (in seconds) spent in the
iterative routine (not including the time spent computing matrix vector products or
applying the preconditioners). (See the NOTES section.)
= 0.0
rparam(14): timtot On exit, SITRSOL sets this to the accumulated total time (in seconds) for this call
to SITRSOL, plus that of previous calls if not reset. (See the NOTES section.)

464 004– 2081– 002


DFAULTS ( 3S ) DFAULTS ( 3S )

= 0.0
rparam(15): gammin Minimum value for shift factor γ. For some problems, IC[k] preconditioning fails in
the factorization. In many cases, "shifting" the diagonal elements allows the
factorization to be computed for this modified matrix.
= 0.0
rparam(16): gammax Maximum value for shift factor γ. If the IC[k] factorization fails, SITRSOL
increments γ and tries again. γ may take on nicfmax different values between
gammin and gammax.
On exit from SITRSOL (if nicfacs > 1), gammax contains the actual value of
gamma used to compute the factorization.
= 0.3
rparam(17) – rparam(30)
Presently unused. These parameters are reserved for future use.

NOTES
If the timing parameters, rparam(5) through rparam(14), are not reset to 0.0 (for example, by DFAULTS),
timing information for subsequent calls to SITRSOL will be added to existing timing information.
If multiple CPUs are used, rparam(5) through rparam(14) report the cumulative time for all CPUs.

SEE ALSO
INTRO_SPARSE(3S) for an example of a Fortran program that uses DFAULTS and SITRSOL
SITRSOL(3S) for a more complete description of iparam and rparam
Scientific Libraries User’s Guide

004– 2081– 002 465


SITRSOL ( 3S ) SITRSOL ( 3S )

NAME
SITRSOL – Solves a real general sparse system, using a preconditioned conjugate gradient-like method

SYNOPSIS
CALL SITRSOL (method, ipath, neqns, nvars, x, b, icolptr, irowind, value, liwork, iwork,
lrwork, rwork, iparam, rparam, ierr)

IMPLEMENTATION
UNICOS systems

DESCRIPTION
SITRSOL uses any of several iterative techniques to solve a real general sparse system of equations.
Because no single robust iterative algorithm for solving sparse linear systems exists, SITRSOL lets users
select from a wide variety of iterative techniques, preconditioning schemes, and many tuning parameters.
You can initialize the iparam and rparam tuning parameter arguments by using a call to DFAULTS. Then
you must change only selected parameter values, rather than setting up the entire arrays of parameters
manually.
This routine has the following arguments:
method Character*3. (input)
Name used to select the iterative method.
= ’BCG’ Biconjugate gradient method
= ’CGN’ Conjugate gradient method applied to the equations:
T T
AA y = b x = A y (Craig’s method)
= ’CGS’ Conjugate gradient squared method
= ’GMR’ Generalized minimum residual (GMRES) method
= ’GMN’ Orthomin or generalized conjugate residual (GCR) method
= ’PCG’ Preconditioned conjugate gradient method
ipath Integer. (input)
Value used to control the execution path in the solver. This argument is useful when the driver is
used to solve similar problems or a large problem in pieces.
= 1 Processes only the structure of the matrix. No solution is computed.
= 2 Processes both the structure and values of the matrix. The solution is computed.
= 3 Processes only the values of the matrix. The solution is computed. It is assumed that
SITRSOL has been called with ipath equal to 1 or 2 and that the structure previously set up
is used.
= 4 Solves the same linear system with different right-hand side.
= 5 Restarts from a previously saved run.
neqns Integer. (input)
Number of equations (rows) in the system.

466 004– 2081– 002


SITRSOL ( 3S ) SITRSOL ( 3S )

nvars Integer. (input)


Number of variables (columns) in the system. Currently, only square matrices are allowed;
therefore, nvars must equal neqns.
x Real array of dimension nvars. (input and output)
On input, x must contain an initial approximation to the solution vector of the system. On output,
x contains an approximation to the solution vector as computed by the chosen iterative scheme.
b Real array of dimension neqns. (input and output)
On input, b contains the right-hand side vector of the linear system. On output, if scaling was
used, b can be changed slightly, because b is scaled and then unscaled.
icolptr Integer array of dimension neqns + 1. (input)
Column pointer array for the sparse matrix A. The first and last elements of the array must be set
as follows:
icolptr(1) = 1 icolptr(neqns+1) = nza + 1
where nza is the number of nonzero elements in the sparse matrix A.
irowind Integer array of dimension nza (see icolptr). (input)
Row indices array for the sparse matrix A.
value Real array of dimension nza (see icolptr). (input)
Array of nonzero values for the sparse matrix A. Taken together, icolptr, irowind, and value
arguments contain the input matrix in sparse column format. See the introduction to the sparse
solvers (INTRO_SPARSE(3S)) for a full description of the sparse column format.
liwork Integer. (input)
Length of the work array iwork. Workspace requirements vary from phase to phase. See the
Workspace subsection.
iwork Integer array of dimension liwork. (output)
Work array. On output, the first four elements of iwork have the following special meanings:
iwork(1) Amount of real workspace needed at the time of exit
iwork(2) Amount of integer workspace needed at the time of exit
iwork(3) Amount of real workspace that must be left untouched if subsequent calls to SITRSOL
will be made with ipath set to 3 or 4
iwork(4) Amount of integer workspace that must be left untouched if subsequent calls to
SITRSOL will be made with ipath set to 3 or 4
If SITRSOL aborts because not enough workspace is available, you can use iwork(1) and iwork(2)
to determine how much workspace is needed.
lrwork Integer. (input)
Length of the work array rwork. Workspace requirements vary from phase to phase. See the
Workspace subsection.
rwork Real array of dimension lrwork. (output)
Work array.

004– 2081– 002 467


SITRSOL ( 3S ) SITRSOL ( 3S )

iparam Integer array of dimension 40. (input and output)


Parameter array.
rparam Real array of dimension 30. (input and output)
Parameter array. On input, most of the elements of iparam and rparam contain user control
parameters. To assign default values for these parameters, call the DFAULTS(3S) routine, as
follows:
CALL DFAULTS (iparam, rparam)
On output, some values of iparam and rparam are changed to report what occurred in the solution
process. See the Parameters subsection.
ierr Integer. (output)
Error code. A six-digit code to report any error condition detected by the iterative solver.
= 0 Normal completion
> 0 Warning error
< 0 Fatal error
See the Error Code subsection.
Parameters
You can assign default values to the required parameters in iparam and rparam by calling the DFAULTS
routine, as noted previously. After calling DFAULTS, you can then reset any parameter to a more
appropriate value. See the example in the introduction to the sparse solvers, INTRO_SPARSE(3S).
The parameters and their default values (as assigned by DFAULTS) are as follows:
iparam(1): isym Full or symmetric format flag. (input)
= 0 Matrix is in full column pointer format.
= 1 Matrix is in symmetric column pointer format.
DFAULTS returns isym=1.
iparam(2): itest Stopping criterion. (input)
= 0 Use "natural" (cheapest) stopping criterion for the chosen iterative method.
= 1 Use the 2-norm of the residual divided by the 2-norm of the right-hand side.
= 2 Use the 2-norm of the preconditioned residual divided by the 2-norm of the
preconditioned right-hand side.
If method = ’BCG’, ’CGN’, or ’PCG’, itest = 0 has the same effect as itest = 1.
If method = ’CGS’, ’GMR’, or ’OMN’, itest = 0 has the same effect as itest = 2.
DFAULTS returns itest=0.
iparam(3): maxiter Maximum number of iterations allowed. (input)
DFAULTS returns maxiter=500.
iparam(4): niter Number of iterations actually performed. (output)
DFAULTS returns niter=0.
iparam(5): msglvl Flag to control the level of messages output. (input)
= 0 No output.
= 1 Fatal messages only.

468 004– 2081– 002


SITRSOL ( 3S ) SITRSOL ( 3S )

=2 Warning and fatal messages only.


=3 Warning and fatal messages and a brief summary.
=4 Warning and fatal messages, information about each iteration, and a brief
summary.
DFAULTS returns msglvl=2.
iparam(6): iunit Unit number for output information. (input)
DFAULTS returns iunit=6.
iparam(7): iscale Diagonal scaling option. (input)
= 0 No scaling is applied to the system
= 1 Apply symmetric diagonal scaling to make the diagonal of the scaled matrix all
1’s
= 2 Apply symmetric row-sum scaling
= 3 Apply symmetric column-sum scaling
= 4 Apply left-diagonal scaling to make the diagonal of the scaled matrix all 1’s
= 5 Apply left-row scaling to make the infinity-norm of the scaled matrix = 1
= 6 Apply right-column scaling to make the 1-norm of the scaled matrix = 1
DFAULTS returns iscale=1.
iparam(8): isympap Full or symmetric jagged diagonal format. (input)
Matrix vector product uses jagged diagonal format to improve performance. If
mvformat = 0, this parameter is ignored.
= 0 Convert A to full jagged diagonal format when A is in symmetric sparse column
format. If isym = 1, this requires more storage, but it provides much better
performance than the symmetric jagged diagonal format.
= 1 Convert A to symmetric jagged diagonal format if A is in symmetric sparse
column format.
DFAULTS returns isympap = 0.
iparam(9): iprelrb Applying preconditioning. (input)
= 0 Do not apply preconditioning.
= 1 Apply left preconditioning.
= 2 Apply right preconditioning (not available for the PCG and CGN methods).
= 3 Apply two-sided preconditioning (not available for the PCG and CGN methods.
Only for use with IC[k] and ILU[k] preconditioners).
DFAULTS returns iprelrb = 0.
iparam(10): ipretyp Type of preconditioning to use. (input)
= 0 No preconditioning.
= 1 Diagonal (Jacobi) preconditioning.
= 2 Incomplete Cholesky factorization (IC[k]).
= 3 Incomplete LU factorization (ILU[k]).
= 4 Truncated Neumann polynomial expansion.
= 5 Truncated least-squares polynomial expansion.
DFAULTS returns ipretyp = 0.

004– 2081– 002 469


SITRSOL ( 3S ) SITRSOL ( 3S )

iparam(11): lvlfill Level of fill-in allowed if incomplete factorization is used. (input)


DFAULTS returns lvlfill = 0.
iparam(12): maxlfil Amount of fill in lower triangular factor. (input and output)
On input, maxlfil controls the amount of fill allowed in the lower triangular factor of
the incomplete factorization with lvlfill < 0.
> 0 maxlfil is the maximum fill allowed.
= 0 Uses the following crude estimate:
maxlfil = (1+lvlfill)nza
where nza is the number of nonzero elements in the sparse matrix A.
On output, maxlfil contains the amount of fill created.
DFAULTS returns maxlfil = 0.
iparam(13): maxufil Amount of fill in upper triangular factor. (input and output)
On input, maxufil controls the amount of fill allowed in the upper triangular factor of
the incomplete factorization with lvlfill < 0.
> 0 maxufil is the maximum fill allowed
= 0 Uses the following crude estimate:
maxufil = (1+lvlfill)nza
On output, maxufil contains the amount of fill created.
DFAULTS returns maxufil = 0.
iparam(14): ifcomp Incomplete factor compute flag. (input)
= 0 Do not compute the incomplete factorization. This is useful when a previous
run has already computed the incomplete factors for a slightly perturbed system.
= 1 Compute the incomplete factorization.
DFAULTS returns ifcomp = 1.
iparam(15): kdegree Degree of the polynomial preconditioner. (input)
DFAULTS returns kdegree = 2.
iparam(16): ntrunc Number of vectors to be saved in GMRES[k] and OMN[k]. (input)
DFAULTS returns ntrunc = 10.
iparam(17): nvorth Number of previous Krylov basis vectors to which each new basis vector is made
orthogonal (GMRES method only). (input)
DFAULTS returns nvorth = 10.
iparam(18): nrstrt Number of iterations between restart in OMN[k]. (input)
DFAULTS returns nrstrt=20.
iparam(19): irestrt Save-and-restart control flag. (input)
= 0 No save-and-restart.
= 1 Save-and-restart after preconditioner setup. The active part of the work arrays
is saved to or restored from the unit specified by iosave (iparam(20)).
= 2 Save-and-restart after completion of iterative phase. The x and b arrays and the
active part of the work arrays is saved to or restored from the unit specified by
iosave (iparam(20)).

470 004– 2081– 002


SITRSOL ( 3S ) SITRSOL ( 3S )

DFAULTS returns irestrt = 0.


iparam(20): iosave Save-and-restart unit number. (input)
Unit number of the unformatted save-and-restart file. SITRSOL assumes that the
user has already opened this file.
DFAULTS returns iosave = 0. This default value is not a valid unit number for the
save-and-restart operation. If you change the value of irestrt to enable
save-and-restart, you also must change the value of iosave.
iparam(21): mvformat Desired format for computation of matrix-vector products. (input)
= 0 Use matrix supplied by user. This can save a substantial amount of storage, but
performance is usually poor.
= 1 Use jagged diagonal form. This requires more storage, but it offers faster
performance.
DFAULTS returns mvformat = 1.
iparam(22): nicfmax Maximum number of times to try IC[k] factorization by using shifted IC factorization
(see rparam(15) and rparam(16)). (input)
DFAULTS returns nicfmax = 11.
iparam(23): nicfacs Actual number of shifted IC[k] factorizations tried (see rparam(15) and rparam(16)).
(output)
DFAULTS returns nicfacs = 0.
iparam(24) through iparam(40)
Currently unused. These parameters are reserved for future use.
rparam(1): tol Stopping criterion tolerance. (input)
–6
DFAULTS returns tol = 1.0 .
rparam(2): err Computed error estimate at each iteration. (output)
DFAULTS returns err = 0.0.
rparam(3): alpha Absolute value of the estimate of the smallest eigenvalue of A. (input)
Currently, this parameter is unused and is assumed to be 0.
DFAULTS returns alpha = 0.0.
rparam(4): beta Absolute value of the estimate of the largest eigenvalue of A. (input)
This is needed only by the least-squares polynomial preconditioner (ipretyp = 5). If
beta = 0, an estimate of the spectral radius will be computed.
DFAULTS returns beta = 0.0.
rparam(5): timscal Accumulated time (in seconds) to scale and unscale the user matrix. (input and
output)
DFAULTS returns timscal = 0.0.
rparam(6): timsets Accumulated time (in seconds) to compute the symbolic incomplete factorization.
(input and output)
DFAULTS returns timsets = 0.0.

004– 2081– 002 471


SITRSOL ( 3S ) SITRSOL ( 3S )

rparam(7): timsetn Accumulated time (in seconds) to compute the numerical incomplete factorization.
(input and output)
DFAULTS returns timsetn = 0.0.
rparam(8): timset Accumulated total time (in seconds) to perform the preconditioner setup. (input and
output)
If incomplete factorization is used, this includes both timsets and timsetn.
DFAULTS returns timset = 0.0.
rparam(9): timpre Accumulated total time (in seconds) to apply the preconditioner in the iteration phase
of the solution process. (input and output)
DFAULTS returns timpre = 0.0.
rparam(10): timmvs Accumulated time (in seconds) to convert from column pointer to jagged diagonal
format. (input and output)
If you use parallel processing, this also includes the setup time to perform the parallel
matrix vector operations.
DFAULTS returns timmvs = 0.0.
rparam(11): timmv Accumulated time (in seconds) to perform the matrix vector product. (input and
output)
This does not include the products that apply the polynomial preconditioners.
DFAULTS returns timmv = 0.0.
rparam(12): timmtv Accumulated time (in seconds) to perform the transpose matrix vector product. (input
and output)
This does not include the products that apply the polynomial preconditioners.
DFAULTS returns timmtv = 0.0.
rparam(13): timit Accumulated time (in seconds) spent in the iterative routine. (input and output)
This does not include the time spent computing matrix vector products or applying
the preconditioners.
DFAULTS returns timit = 0.0.
rparam(14): timtot Accumulated total time (in seconds) for this call to SITRSOL. (input and output)
DFAULTS returns timtot = 0.0.
rparam(15): gammin Minimum value for shift factor γ. (input)
For some problems, IC[k] preconditioning fails in the factorization. In many cases,
"shifting" the diagonal elements (that is, letting a(i,i) = (1 + γ) . a(i,i)) allows the
factorization to be computed for this modified matrix.
DFAULTS returns gammin=0.0.

472 004– 2081– 002


SITRSOL ( 3S ) SITRSOL ( 3S )

rparam(16): gammax Value for shift factor γ. (input and output)


On input, gammax is the maximum value that the shift factor γ may have. If the
IC[k] factorization fails (that is, a negative diagonal element is computed for the
factor L), SITRSOL increases γ by the following amount:
gammax −gammin
∆γ =
nic f max −1
and it tries to recompute the factorization. SITRSOL tries a maximum of nicfmax
factorizations with the following values of γ:
γi = gammin + i ∆γ, for i = 0, 1, . . ., nic f max −1
On output, the number of factorizations tried is returned in nicfacs, and, if nicfacs >
1, gammax contains the actual value of γ used to compute the factorization.
DFAULTS returns gammax = 0.3.
rparam(17) through rparam(30)
Currently unused. They are reserved for future use.
If rparam(5) through rparam(14) are not reset to 0, timing information for subsequent calls to SITRSOL
will be added to existing timing information.
If multiple CPUs are used, rparam(5) through rparam(14) report the cumulative time for all CPUs.
Error Code
The error flag ierr in routine SITRSOL is a six-decimal-digit signed integer. Immediately after an error
condition is detected, the error flag is set, the error-handling routine is called, and a message is printed if the
user requests it. If a warning error occurs (ierr > 0), execution continues on returning from the
error-handling routine. If the error is fatal (ierr < 0), control is returned to the user.
Unless otherwise specified, an error code with xx denoting the third and fourth digits (for example, 03xx01)
represents a common error condition from various routines. Any error code with ii denoting the fifth and
sixth digits represents a range of two-digit numbers (or one-digit numbers with a leading 0).
Phase 1 error: parameter checking
ierr =
– 010001 Illegal value for method.
– 010002 Illegal value for ipath.
– 010003 Illegal value for neqns or nvars.
– 010004 Illegal value for liwork.
– 010005 Illegal value for lrwork.
– 010006 Illegal value for ncpus (see INTRO_LIBSCI(3S)).
– 0101ii Illegal value for iparam(ii), for ii = 01, 02, . . ., 22.
– 010201 Full-storage column pointer matrix cannot be converted to half-storage jagged-diagonal matrix.
– 010202 Left and right nonsymmetric scaling cannot be applied to a symmetric matrix in half-storage
mode.
– 010203 PCG and CGN allow only left preconditioning.
– 010204 ILU preconditioning cannot be applied to a symmetric matrix in half-storage mode.

004– 2081– 002 473


SITRSOL ( 3S ) SITRSOL ( 3S )

– 010205 iprelrb is 3 and ipretyp is neither 2 nor 3.


– 010301 Illegal value for tol, tol < 0.
– 010315 Illegal value for gammin.
– 010316 Illegal value for gammax.
Phase 2 error: user matrix preprocessing
ierr =
– 02xx01 Not enough real workspace allocated.
– 02xx02 Not enough integer workspace allocated.
– 020203 Zero diagonal element found, cannot scale matrix.
– 020204 Zero row found, cannot scale matrix.
– 020205 Zero column found, cannot scale matrix.
– 020303 Input matrix structure is not valid.
Phase 3 error: preconditioner setup
ierr =
+030101 During the incomplete Cholesky factorization, a diagonal element was found to be nonpositive.
The shift parameter was increased and the factorization was recomputed.
+030201 During the incomplete LU factorization, a diagonal element was found to be 0. The element
was modified to allow the factorization to continue.
+03xx02 Tried to use the incomplete factorization from a previous run, but it has been overwritten.
Must recompute factorization.
– 03xx01 Not enough real workspace allocated.
– 03xx02 Not enough integer workspace allocated.
– 03xx03 Ran out of integer workspace while performing the symbolic factorization. The estimated
amount in maxlfil was insufficient.
– 03xx04 Ran out of integer workspace while performing the symbolic factorization. The estimated
amount in maxufil was insufficient.
– 030104 L is structurally singular. A diagonal element is missing from the structure of L.
– 030105 Cannot compute IC factorization in nicfmax tries. Set gammin and gammax to larger values.
– 030205 U is structurally singular. A diagonal element is missing from the structure of U.
– 030603 Zero diagonal element found, cannot scale matrix.
– 030703 Zero diagonal element found, cannot scale matrix.
Phase 4 error: iterative process
ierr =
+04xx01 Method did not converge in maxiter steps.
+04xx03 Preconditioning matrix is not positive definite.
– 04xx01 Not enough real workspace allocated.
– 040203 Matrix AA T is not positive definite in CGN method.
– 04xx03 Breakdown occurred when computing direction vector magnitude, the divisor is near 0.
– 040403 GMRES iteration stalled. The norm of the residual was not reduced on the most recently
restart iteration.
– 040603 Matrix A is not positive definite in PCG method.

474 004– 2081– 002


SITRSOL ( 3S ) SITRSOL ( 3S )

Workspace
The following are three methods for estimating your workspace needs:
Rough estimate Fastest (only one multiply), but least accurate.
SITRSOL estimate Much slower, but also a lot more accurate.
Hand-coded estimate If you already know certain information about the size of the final factorization, a
hand-coded SITRSOL estimation algorithm with your information will be more
accurate than SITRSOL’s calculations.
Rough estimate
You can make a very rough estimate of your workspace needs by setting
liwork =lrwork =6 . nza
where nza is the number of nonzero elements in matrix A (= icolptr(neqns+1)– 1). This estimate is usually
sufficient.
If you are not using certain memory-intensive preconditioning or matrix formatting options, you can refine
this estimate further:
A. If you are not using IC[k] or ILU[k] preconditioning, subtract 2 . nza from your previous estimate for
liwork and lrwork.
B. If you are not using jagged diagonal format, subtract 2 . nza from your previous estimate for liwork and
lrwork.
One, both, or neither of the preceding conditions might be true; therefore, the estimate could end up being
any of the following:
4 . nza If only one or the other of A and B were true
2 . nza If both A and B were true
6 . nza If neither A nor B were true
SITRSOL estimate
You can get a more accurate estimate by calling SITRSOL with liwork or lrwork set to 0. This causes
SITRSOL to generate an error flag (– 10004 or – 10005) and to return an estimate of workspace requirements
in iwork(1) for lrwork and iwork(2) for liwork. You can then use these estimates in another call to
SITRSOL. In computing this estimate, SITRSOL uses the precise formulas that follow in the Algorithm for
Accurate Workspace Estimate subsection.
Hand-coded estimate
If you are using IC[k] or ILU[k] preconditioning and you already know the number of nonzero elements in
the IC[k] factor matrix L or in the ILU[k] factor matrices L and U, you can get the most accurate estimate
by hand-coding the high-precision algorithm used by SITRSOL.

004– 2081– 002 475


SITRSOL ( 3S ) SITRSOL ( 3S )

Then you can use your exact numbers in the algorithm in formulas for which SITRSOL has only estimates.
This means your final result will be more accurate than SITRSOL’s, even though the algorithm is the same.

NOTES
This section discusses parallel processing and workspace considerations.
Using Parallel Processing in SITRSOL
SITRSOL is designed to exploit the parallel processing capabilities of Cray Y-MP systems. In particular,
the preconditioners and matrix-vector operations are designed to achieve significant speedup on multiple
CPUs; however, the parallelism is designed to be effective only for large problems. Small problems will not
benefit, and performance probably will be degraded. What constitutes a "small" problem or a "large"
problem is difficult to define. Also, a large gray area exists in which using fewer than all CPUs gives better
performance than using all CPUs. Experimentation is the best way to decide on the optimal number of
CPUs.
To select the number of CPUs, define the NCPUS environment variable to the desired value. SITRSOL will
then obtain this value and try to use that number of CPUs. In a batch environment, it is unusual to get all of
the physical CPUs on the machine. In this case, it is better to request a smaller number of CPUs than is
physically available. If you do not define the NCPUS variable, the default value for NCPUS will be the total
number of physical processors on the machine.
Timing and parallel processing
SITRSOL uses the system timing function SECOND(3F), which is a real-valued function that returns the
accumulated CPU time for all processors. If you use parallel processing and you want wall-clock timing
information, replace the SECOND function with the following function, which uses IRTC(3I) to do the
timing:
REAL FUNCTION SEC OND ()
C.....CRAY Y-M P C-9 0 clock period
PARAME TER ( CP= 4.2E-9 )
C.. ...CRA Y Y-MP clo ck period
C PAR AME TER ( CP=6.0 E-9 )
C
SEC OND = FLOAT( IRTC() )*C P
RET URN
END

Based on your system, use the appropriate parameter CP. You should replace the SECOND system function
because it returns the accumulated CPU time for all CPUs; if you do not replace SECOND, the multiple-CPU
timings will always be worse than the single-CPU timings.
A drawback exists to using the IRTC function. It returns the real system time and does not subtract time
spent being swapped out. Thus, in a batch environment, timing information typically will not be consistent
between two identical runs.

476 004– 2081– 002


SITRSOL ( 3S ) SITRSOL ( 3S )

Algorithm for Accurate Workspace Estimate


You can break down the solution process into three steps:
Step 1: User matrix preprocessing
Step 2: Preconditioner setup
Step 3: Compute solution by using the selected iterative method
The following notation is used in this discussion:
ncpus= Number of requested CPUs for parallel processing.
neqns= Dimension of the linear system.
nza= Number of nonzero elements in A (= icolptr(neqns+1)– 1).
maxnz= Maximum number of nonzero elements in a row of A.
nzpap= Number of elements in the jagged diagonal matrix that is used in the
matrix-vector product.
If ( mvformat = 0 ) then
nzpap = 0
Else if ( isympap = isym ) then
nzpap = nza
Else If ( isympap = 0 & isym = 1 ) then
nzpap = 2*nza - neqns
End If
nsegs= 0 when ncpus = 1.
MAX(ncpus, neqns / 1024) when ncpus does not equal 1.
Iuse(k)= Amount of integer workspace used in step k.
Iret(k)= Amount of integer workspace retained by step k.
Ruse(k)= Amount of real workspace used in step k.
Rret(k)= Amount of real workspace retained by step k.
liwork= Total amount of integer workspace needed.
lrwork= Total amount of real workspace needed.
liwork1= Integer workspace needed in step 1 = Iuse(1).
liwork2= Integer workspace needed in step 2 = Iret(1)+Iuse(2).
liwork3= Integer workspace needed in step 3 = Iret(1)+Iret(2)+Iuse(3).
lrwork2= Real workspace needed in step 2 = Rret(1)+Ruse(2).
lrwork3= Real workspace needed in step 3 = Rret(1)+Rret(2)+Ruse(3).
Total workspace needed is defined as follows:
liwork = MAX( liwork1, liwork2, liwork3 ) + 100
lrwork = MAX( lrwork2, lrwork3 )

004– 2081– 002 477


SITRSOL ( 3S ) SITRSOL ( 3S )

Step 1: User matrix preprocessing


In this step, locate all diagonal elements, apply scaling (if selected), and convert the user matrix to the
jagged diagonal format for use in the matrix-vector operations. The workspace usage is as follows:

If ( mvformat = 0 ) then
Iuse(1)= Iret(1) = neqns
Ruse(1)= Rret(1) = nscale

Else
Iuse(1)= nzpap + 4*neqns + maxnz + nsegs + 2
Iret(1)= nzpap + 3*neqns + maxnz + nsegs + 2
Ruse(1)= nzpap + neqns + nscale
Rret(1)= nzpap + nscale
End If

where
If ( iscale > 0 ) then
nscale = neqns
Else If ( iscale = 0 ) then
nscale = 0
End If

Step 2: Preconditioning setup


In this step, do the necessary setup work for efficient application of the preconditioner. For diagonal scaling,
this means computing the inverse of the diagonal and storing it. For incomplete Cholesky and LU
factorization, this means computing the factorization. For Jacobi least-squares polynomial preconditioning,
this means computing the polynomial coefficients. For Neumann polynomial preconditioning, there is no
setup. The amount of workspace used and retained varies greatly, depending on the type of preconditioning
selected.
For no preconditioning:
Iuse(2) = Iret(2) = Ruse(2) = Rret(2) = 0
For diagonal preconditioning:
Iuse(2)= Iret(2)= 0
Ruse(2)= 2*neqns
Rret(2)= neqns
For incomplete Cholesky preconditioning:
Iuse(2)= max( (4*neqns + 2*nzlE + 3), (2*neqns + 2*nzlA + 2 + nparU) )
Iret(2) = 2*nzlA + 2*neqns + 2 + nparR
Ruse(2) = 2*nzlA + ncpus*neqns
Rret(2) = 2*nzlA

478 004– 2081– 002


SITRSOL ( 3S ) SITRSOL ( 3S )

where
• nzlE = Estimated number of nonzero elements in L as defined by maxlfil on input
• nzlA = Actual number of nonzero elements in L as defined by maxlfil on exit from the preconditioner
setup phase
If ( ncpus = 1 ) Then
nparU = nparR = 0
Else If ( ncpus > 1 ) Then
nparU = 6*neqns + 4
nparR = 4*neqns + 4
End If
In the hand-coded version, if you already know the value of nzlA, you can improve your workspace estimate
by setting nzlE = nzlA.
For incomplete LU preconditioning:
Iuse(2) = max( (4*neqns + 2*nzlE + nzuE + 3), (2*neqns + nzlA + nzuA + 2 + nparU) )
Iret(2) = nzlA + nzuA + 2*neqns + 2 + nparR
If ( ncpus = 1 ) then
Ruse(2) = nzlA + nzuA + neqns
Else If ( ncpus > 1 ) then
Ruse(2) = nzlA + nzuA + max(ncpus*neqns, nzlA, nzuA )
Rret(2) = nzlA + nzuA
where
• nzlE = Estimated number of nonzero elements in L as defined by maxlfil on input
• nzuE = Estimated number of nonzero elements in U as defined by maxufil on input
• nzlA = Actual number of nonzero elements in L as defined by maxlfil on exit from the preconditioner
setup phase
• nzuA = Actual number of nonzero elements in U as defined by maxufil on exit from the preconditioner
setup phase
If ( ncpus = 1 ) Then
nparU = nparR = 0
Else If ( ncpus > 1 ) Then
nparU = max(nzlA,nzuA) + 5*neqns + 4
nparR = 4*neqns + 4
End If
In the hand-coded version, if you already know the values of nzlA and nzuA, you can improve your
workspace estimate by setting nzlE = nzlA and nzuE = nzuA.

004– 2081– 002 479


SITRSOL ( 3S ) SITRSOL ( 3S )

For Neumann polynomial preconditioning:


Iuse(2) = Iret(2) = 0
If ( method = ‘CGN’ ) Then
Ruse(2) = neqns
Rret(2) = neqns
Else
Ruse(2) = Rret(2) = 0
End If
For Jacobi least-squares polynomial preconditioning:
Iuse(2) = Iret(2) = 0
If ( method = ‘BCG’ ) Then
Ruse(2) = neqns + 2*(kdegree + 1)
Rret(2) = 2*(kdegree + 1)
Else If ( method = ‘CGN’ ) Then
Ruse(2) = Rret(2) = neqns + kdegree + 1
Else
Ruse(2) = neqns + kdegree + 1
Rret(2) = kdegree + 1
End If
Step 3: Compute solution by using the selected iterative method
In this step, compute the solution by using the selected preconditioner and iterative method. The amount of
workspace used depends on the method, preconditioner, stopping test, and number of CPUs used.
Iuse(3)= Iret(3) = Rret(3) = 0
Ruse(3)= nmethod + max( npresol, ntest ) + nmatvec
where

480 004– 2081– 002


SITRSOL ( 3S ) SITRSOL ( 3S )

 6*neqns if method = ‘BCG’


 5*neqns if method = ‘CGN’
 6*neqns if method = ‘CGS’
nmethod =  ntrunc*neqns + 4*neqns + ntrunc*ntrunc + 3*ntrunc + 1
 if method = ‘GMR’
 2*ntrunc*neqns + 4*neqns + ntrunc
 if method = ‘OMN’
 3*neqns if method = ‘PCG’
 neqns if ipretyp = 2 and method = ‘CGN’
 neqns if ipretyp = 3 and method = ‘CGN’
npresol = neqns if ipretyp = 4
 2*neqns if ipretyp = 5
 0 otherwise
ntest = 0 if itest = 0
nmatvec = ncpus*neqns if ncpus>1 and (mvformat=0 or isympap=1 or
 0 method = ‘BCG’ or ‘CGN’) otherwise.

SEE ALSO
DFAULTS(3S)
INTRO_SPARSE(3S) for an example of using this routine and the other sparse matrix routines

004– 2081– 002 481


SSGETRF ( 3S ) SSGETRF ( 3S )

NAME
SSGETRF – Factors a real sparse general matrix with threshold pivoting implemented

SYNOPSIS
CALL SSGETRF (ido, neqns, icolptr, irowind, value, lwork, work, iparam, thresh, ierror)

IMPLEMENTATION
UNICOS systems

DESCRIPTION
Given a real sparse general matrix A, SSGETRF computes the LU factorization of PA(transpose of P), in
which P is an internally computed permutation matrix. Threshold pivoting is implemented for stability.
This routine has the following arguments:
ido Integer. (input)
Controls the execution path through the routine. ido is a two-digit integer whose digits are
represented on this man page as i and j. i indicates the starting phase of execution, and j
indicates the ending phase. For SSGETRF, there are four phases of execution, as follows:
Phase 1: Fill reduction reordering
Phase 2: Symbolic factorization
Phase 3: Determination of the node execution sequence and the storage requirement for the
frontal matrices
Phase 4: Numerical factorization
If a previous call to the routine has computed information from previous phases, execution can
start at any phase.
ido = 10i + j 1≤i≤j≤4
neqns Integer. (input)
Number of equations (or unknowns, rows, or columns).
icolptr Integer array of dimension neqns + 1. (input)
Column pointer array for the sparse matrix A. The first and last elements of the array must be
set as follows:
icolptr(1) = 1 icolptr(neqns+1) = nza + 1
where nza is the number of nonzero elements in the sparse matrix A.
irowind Integer array of dimension nza (see icolptr). (input)
Row indices array for the sparse matrix A.
value Real array of dimension nza (see icolptr). (input)

482 004– 2081– 002


SSGETRF ( 3S ) SSGETRF ( 3S )

Array of nonzero values for the sparse matrix A. The icolptr, irowind, and value arguments
taken together contain the input matrix in sparse column format. See the introduction to the
sparse solvers (INTRO_SPARSE(3S)) for a full description of the sparse column format.
lwork Integer. (input)
Length of the work array work. Workspace requirements vary from phase to phase. If lwork is
not sufficient to execute a particular phase successfully, the routine will return with an indication
of how much workspace is required to continue. See the Workspace subsection.
work Real array of dimension lwork. (input and output)
Work array used to hold the results of each phase that are needed to process the next phase.
Between calls to SSGETRF to compute subsequent phases, the user must not modify this array.
iparam Integer array of dimension 13. (input)
List of user control parameters. The value of iparam(1) controls the use of the parameter array:
0 Uses default values for all parameters.
1 Overrides default values by using iparam.
For a full description, see the Parameters subsection.
thresh Real. (input)
The thresh variable determines whether pivoting occurs. 0 ≤ thresh ≤ 1.
ierror Integer. (output)
Error code to report any error condition detected.
0 Normal completion.
–1 ido is not a valid path for a fresh start.
–2 ido is not a valid path for a restart run.
– 10000 Input matrix structure is incorrect.
– k0001 Insufficient storage allocated for phase k. (1 ≤ k ≤ 4)
– 20002 Fatal error from the symbolic factorization. Either the input structure is incorrect or the
active part of array work was changed between successive calls to SSGETRF.
– 40002 Input matrix structure is not consistent with the structure of the lower triangular factor.
The active part of array work may have been changed between successive calls to
SSGETRF.
– 40301 Fatal error from the numerical factorization. The input matrix is numerically singular.
Parameters
The following is a list of user control parameters and their default values to be used by SSGETRF and
SSGETRS routines. To use the default values, pass a constant 0 as the iparam argument, as follows:
CALL SSGETR F(IDO, NEQ ,IC OL, IROW,V AL, LWK ,WO RK, 0 ,TH R,I ER)

iparam(1) 0 Use default values for all options.


1 Override default values.
iparam(2) Unit number for warning and error messages.
Default is 6.

004– 2081– 002 483


SSGETRF ( 3S ) SSGETRF ( 3S )

iparam(3) Flag to control level of messages output.


≤ 0 Report only fatal errors.
= 1 Report timing and workspace for each phase.
≥ 2 Report detailed information for each phase.
Default is 0.
iparam(4) 0 Do not save the adjacency structure.
1 Save the adjacency structure.
Default is 0.
iparam(5) 0 A fresh start.
1 A restart from previously saved data.
Default is 0.
iparam(6) 0 Output will not be saved for subsequent restart.
k Active part of the work array through phase k will be saved for subsequent restart.
Default is 0.
iparam(7) Save-and-restart unit number of the unformatted file, which it is assumed that the user has
opened. No meaningful default exists. If you use the default for iparam(6), no unit number is
needed; therefore, no unit number default value is needed.
iparam(8) Relaxation factor that specifies the maximum additional fill-ins allowed per supernode.
Default is 0.
iparam(9) Relaxation factor that specifies the maximum additional fill-ins allowed as a percentage of the
size of L. iparam(9) is essentially used as a constraint in allowing additional fill-ins for each
supernode that uses iparam(8). See INTRO_SPARSE(3S).
Default is 0.
iparam(10) Reserved for future usage.
iparam(11) Reserved for future usage.
iparam(12) 0 Check for valid input structure.
1 Do not check input structure.
Default is 0.
iparam(13) Flag to control saving the sorted order of the input matrix. This is recommended if the same
sparsity pattern is being used repeatedly.
0 Do not save the sorted order of the input matrix.
1 Save the sorted order.
Default is 0.
Workspace
You can determine the amount of workspace needed to execute phase k (denoted Use(k)) and the amount of
workspace retained after the execution of phase k (denoted Ret(k)) by using the following algorithm:
neqns = Number of unknowns or equations.

484 004– 2081– 002


SSGETRF ( 3S ) SSGETRF ( 3S )

nsup = Number of supernodes.


This can be obtained from work(32) after phase 1.
nza = Number of nonzero elements in A (=icolptr(neqns+1)– 1).
nadj = 2*(nza – neqns), size of the adjacency structure of A.
nfctnzs = Number of nonzero elements in L.
This can be obtained from work(11) after phase 1.
ngssubs = Number of row subscripts required to represent L.
This can be obtained from work(14) after phase 1.
nnzsym = Number of nonzero elements of A + (transpose of A).
This can be obtained from work(66) after phase 1.
lusize = Size of final LU decomposition.
This can be obtained from work(68) after phase 3.

Phase 1:
Use(1) = 150 + 12*neqns + 4*nza + 4
Ret(1) = 150 + 5*neqns + 3*nsup + nnzsym + 3

Phase 2:
I1 = Ret(1) + ngssubs + 4*neqns + nsup + 2
I2 = Ret(1) + 2*ngssubs + 10*nsup + 2*neqns + 4
If adjacency structure is saved
Use(2) = max ( I1, I2 )
Ret(2) = I2
Otherwise
Use(2) = max ( I1, I2 - nnzsym - neqns - 1 )
Ret(2) = I2(1)

Phase 3:
Use(3) = Ret(2) + 3*nsup
Ret(3) = Ret(2) + nsup

Phase 4:
If the sort information is saved
Use(4) ≥ Ret(3) + 5*neqns + nfctnzs + 3*nza + 4
Ret(4) = Ret(3) + nza + lusize
Otherwise
Use(4) ≥ Ret(3) + 6*neqns + nfctnzs + 2*nza + 4
Ret(4) = Ret(3) + lusize

004– 2081– 002 485


SSGETRF ( 3S ) SSGETRF ( 3S )

SEE ALSO
INTRO_SPARSE(3S) for general information on sparse solvers and a usage example
SSGETRS(3S) to solve one or more right-hand sides by using the factorization computed by SSGETRF

486 004– 2081– 002


SSGETRS ( 3S ) SSGETRS ( 3S )

NAME
SSGETRS – Solves a real sparse general system, using the factorization computed in SSGETRF(3S)

SYNOPSIS
CALL SSGETRS (ido, lwork, work, nrhs, rhs, ldrhs, iparam, ierror)

IMPLEMENTATION
UNICOS systems

DESCRIPTION
Given the LU factorization computed from SSGETRF and a (set of) right-hand side(s), SSGETRS solves the
linear systems.
This routine has the following arguments:
ido Integer. (input)
Variable used to control the execution path in SSGETRS.
= 1 Solve AX = B
= 2 Forward solve
= 3 Backward solve
Calling SSGETRS with ido = 2 and again with ido = 3 gives the same result as calling SSGETRS
once with ido = 1.
lwork Integer. (input)
Length of the work array work as in SSGETRF.
work Real array of dimension lwork. (input and output)
Work array exactly as output from SSGETRF. The user must not have modified this array because
it contains information about the LU factorization.
nrhs Integer. (input)
Number of right-hand sides.
rhs Real array of dimension (ldrhs,nrhs). (input and output)
On entry, rhs contains the nrhs vectors. If ido = 1 or 2, the vectors are the right-hand side vectors
b from the system of equations Ax = b. If ido = 3, the right-hand sides should be the intermediate
result z obtained by calling SSGETRS with ido = 2.
On exit, rhs contains the nrhs corresponding solution vectors.
ldrhs Integer. (input)
Leading dimension of array rhs exactly as specified in the calling program.
iparam Integer array of dimension 13. (input)
List of user control options as in SSGETRF. Only four elements, iparam(1), iparam(2), iparam(3),
and iparam(5), are required for the solution phase.

004– 2081– 002 487


SSGETRS ( 3S ) SSGETRS ( 3S )

iparam(1) 0 Use default values for all options.


1 Override default values.
iparam(2) Unit number for warning and error messages.
Default is 6.
iparam(3) Flag to control level of messages output.
≤ 0 Report only fatal errors.
= 1 Report timing and workspace for each phase.
≥ 2 Report detailed information for each phase.
Default is 0.
iparam(5) 0 A fresh start.
1 A restart from previously saved data.
Default is 0.
ierror Integer. (output)
Error code, as follows:
0 Normal completion.
–3 ido is not a valid input.
– 50001 Insufficient workspace for ido = 1.
– 60001 Insufficient workspace for ido = 2.
– 70001 Insufficient workspace for ido = 3.

SEE ALSO
INTRO_SPARSE(3S) for general information on sparse solvers and a usage example
SSGETRF(3S) to compute the factorization used by SSGETRS

488 004– 2081– 002


SSPOTRF ( 3S ) SSPOTRF ( 3S )

NAME
SSPOTRF – Factors a real sparse symmetric definite matrix

SYNOPSIS
CALL SSPOTRF (ido, neqns, icolptr, irowind, value, lwork, work, iparam, ierror)

IMPLEMENTATION
UNICOS systems

DESCRIPTION
Given a real sparse symmetric definite matrix A, SSPOTRF computes the LD(transpose of L) factorization of
(PA(transpose of P); P is an internally computed permutation matrix.
This routine has the following arguments:
ido Integer. (input)
Controls the execution path through the routine. ido is a two-digit integer whose digits are
represented on this man page as i and j. i indicates the starting phase of execution, and j indicates
the ending phase. For SSPOTRF, there are four phases of execution, as follows:
Phase 1: Fill reduction reordering
Phase 2: Symbolic factorization
Phase 3: Determination of the node execution sequence and the storage requirement for the
frontal matrices
Phase 4: Numerical factorization
If a previous call to the routine has computed information from previous phases, execution can
start at any phase.
ido = 10i + j 1≤i≤j≤4
neqns Integer. (input)
Number of equations (or unknowns, rows, or columns).
icolptr Integer array of dimension neqns + 1. (input)
Column pointer array for the sparse matrix A. The first and last elements of the array must be set
as follows:
icolptr(1) = 1 icolptr(neqns+1) = nza + 1
where nza is the number of nonzero elements in the sparse matrix A.
irowind Integer array of dimension nza (see icolptr). (input)
Row indices array for the sparse matrix A.

004– 2081– 002 489


SSPOTRF ( 3S ) SSPOTRF ( 3S )

value Real array of dimension nza (see icolptr). (input)


Array of nonzero values for the sparse matrix A. The icolptr, irowind, and value arguments taken
together contain the input matrix in sparse column format. Because A is symmetric, only the lower
triangle is specified by these arguments. See the introduction to the sparse solvers
(INTRO_SPARSE(3S)) for a full description of the sparse column format.
lwork Integer. (input)
Length of the work array work. Workspace requirements vary from phase to phase. If lwork is
not sufficient to execute a particular phase successfully, the routine will return with an indication of
how much workspace is required to continue. See the Workspace subsection.
work Real array of dimension lwork. (input and output)
Work array used to hold the results of each phase that are needed to process the next phase. The
user must not modify this array between calls to SSPOTRF to compute subsequent phases.
iparam Integer array of dimension 12. (input)
List of user control parameters. The value of iparam(1) controls the use of the parameter array:
0 Uses default values for all parameters.
1 Overrides default values by using iparam.
For a full description, see the Parameters subsection.
ierror Integer. (output) Error code to report any error condition detected.
0 Normal completion.
–1 ido is not a valid path for a fresh start.
–2 ido is not a valid path for a restart run.
– 10000 Input matrix structure is incorrect.
– k0001 Insufficient storage allocated for phase k (1 ≤ k ≤ 4).
– 20002 Fatal error from the symbolic factorization. Either the input structure is incorrect or
the active part of array work has been changed between successive calls to SSPOTRF.
– 40002 Input matrix structure is not consistent with the structure of the lower triangular
factor. The active part of array work may have been changed between successive
calls to SSPOTRF.
– 40301 Fatal error from the numerical factorization. The input matrix is numerically singular.
Parameters
The following is a list of user control parameters and their default values to be used by subroutines
SSPOTRF and SSPOTRS. iparam(1)=0.
CAL L SSP OTRF(I DO, NEQ,IC OL,IROW,V AL, LWK,WO RK, 0 ,IE R)

iparam(1) 0 Use default values for all parameters.


1 Override default values.
iparam(2) Unit number for warning and error messages.
Default is 6.

490 004– 2081– 002


SSPOTRF ( 3S ) SSPOTRF ( 3S )

iparam(3) Flag to control level of messages output.


≤ 0 Report only fatal errors.
= 1 Report timing and workspace for each phase.
≥ 2 Report detailed information for each phase.
Default is 0.
iparam(4) 0 Do not save the adjacency structure.
1 Save the adjacency structure.
Default is 0.
iparam(5) 0 A fresh start.
1 A restart from previously saved data.
Default is 0.
iparam(6) 0 No output will be saved for subsequent restart.
k Active portion of the work array through phase k will be saved for subsequent restart.
Default is 0.
iparam(7) Save-and-restart unit number of the unformatted file, which it is assumed that the user has
opened.
No meaningful default exists. That is, if the default for iparam(6) is used, no unit number is
needed; therefore, no need for a unit number default value.
iparam(8) Relaxation factor that specifies the maximum additional fill-ins allowed per supernode.
Default is 0.
iparam(9) Relaxation factor that specifies the maximum additional fill-ins allowed as a percentage of the
size of L. iparam(9) is essentially used as a constraint in allowing additional fill-ins for each
supernode that uses iparam(8). See the introduction to the sparse solvers,
INTRO_SPARSE(3S).
Default is 0.
iparam(10) Size of the frontal matrix above which parallelism is exploited only in the factorization and
partial updating of the dense frontal matrix.
Usually, this type of parallelism is more effective toward the end of the factorization, when
there tend to be fewer independent supernodes and the frontal matrices tend to be larger.
Default is 0.
iparam(11) Size of the fixed block to accommodate the grouping of temporary frontal matrices. This is
needed only when you want to exploit the parallelism in the elimination of independent
supernodes; in this case, workspace for temporary frontal and update matrices of the
independent supernodes are allocated using a fixed-block scheme. When in use, iparam(11)
must be greater than or equal to iparam(10).
Default is 0.
iparam(12) 0 Check for valid input structure.
1 Do not check input structure.
Default is 0.

004– 2081– 002 491


SSPOTRF ( 3S ) SSPOTRF ( 3S )

Workspace
You can determine the amount of workspace needed to execute phase k (denoted Use(k)) and the amount of
workspace retained after the execution of phase k (denoted Ret(k)) by using the following notation:
ncpus = Number of CPUs.
neqns = Number of unknowns or equations.
nsup = Number of supernodes.
This can be obtained from work(32) after phase 1.
nza = Number of nonzero elements in A (=icolptr(neqns+1)– 1).
nadj = 2*(nza – neqns), size of the adjacency structure of A.
nfctnzs = Number of nonzero elements in L.
This can be obtained from work(11) after phase 1.
gs subs = Number of row subscripts required to represent L.
This can be obtained from work(14) after phase 1.
maxrow = Maximum number of nonzero elements in a row of L.
This can be obtained from work(20) after phase 1.
maxsup = Maximum size of a supernode.
This can be obtained from work(21) after phase 1.
minstk = Minimum amount of workspace required for the temporary frontal matrices.
This can be obtained from work(22) after phase 3.

Phase 1:
Use(1) = 150 + 2*nadj + 11*neqns + 4
Ret(1) = 150 + 4*neqns + 3*nsup + nadj + 3

Phase 2:
I1 = Ret(1) + ngssubs + 3*neqns + nsup + 1
I2 = Ret(1) + 2*ngssubs + 10*nsup + 3
If saving the adjacency structure
Use(2) = max ( I1, I2-(2*nza+1) )
Ret(2) = 150 + 3*neqns + 2*ngssubs + 13*nsup + 5
Otherwise
Use(2) = max ( I1, I2 )
Ret(2) = Ret(1) + 2*ngssubs + 10*nsup + 3

492 004– 2081– 002


SSPOTRF ( 3S ) SSPOTRF ( 3S )

Phase 3:
For single processing (ncpus = 1)
Use(3) = Ret(2) + 2*nsup
Ret(3) = Ret(2)
For multiple processing (ncpus > 1)
Use(3) = Ret(2) + 12*nsup + 2
Ret(3) = Ret(2) + 8*nsup + 1

Phase 4:
For single processing (ncpus = 1)
Use(4) = Ret(3) + neqns + nfctnzs + 2*(maxsup+maxrow) +
nsup + minstk
For multiple processing (ncpus > 1)
Use(4) = Ret(3) + neqns + nfctnzs + ncpus + maxsup + 3*(nsup) +
(ncpus+1)*(maxsup+2*maxrow) + minstk
Ret(4) = Ret(3) + neqns + nfctnzs

SEE ALSO
INTRO_SPARSE(3S) for general information on sparse solvers and a usage example
SSPOTRS(3S) to solve one or more right-hand sides, using the factorization computed by SSPOTRF

004– 2081– 002 493


SSPOTRS ( 3S ) SSPOTRS ( 3S )

NAME
SSPOTRS – Solves a real sparse symmetric definite system, using the factorization computed in
SSPOTRF(3S)

SYNOPSIS
CALL SSPOTRS (ido, lwork, work, nrhs, rhs, ldrhs, iparam, ierror)

IMPLEMENTATION
UNICOS systems

DESCRIPTION
T T
Given the LDL factorization of PAP computed from SSPOTRF and a (set of) right-hand side(s),
SSPOTRS solves the following linear system for the solution of the system Ax = b. P is an internally
computed permutation matrix.
T T
PAP y = Pb, x = P y
This routine has the following arguments:
ido Integer. (input)
Variable used to control the execution path in SSPOTRS.
T T T
ido = 1 Solves P (LDL )(Px) = P P(b))
ido = 2 Solves L(Px) = P(rhs)
ido = 3 Solves Dx = rhs
T T T
ido = 4 Solves P L x = P (rhs)
1/2
ido = 5 Solves (LD )(Px) = P(rhs)
T 1/2 T T
ido = 6 Solves P (LD ) x = P (rhs)
lwork Integer. (input)
Length of the work array work as in SSPOTRF.
work Real array of dimension lwork. (input and output)
Work array exactly as output from SSPOTRF. The user must not have modified this array because
it contains information about the LD(transpose of L) factorization.
nrhs Integer. (input)
Number of right-hand sides.
rhs Real array of dimension (ldrhs,nrhs). (input and output)
On entry, rhs contains the nrhs right-hand side b for which to solve. On exit, rhs contains the nrhs
corresponding solutions.
ldrhs Integer. (input)
Leading dimension of array rhs exactly as specified in the calling program.

494 004– 2081– 002


SSPOTRS ( 3S ) SSPOTRS ( 3S )

iparam Integer array of dimension 12. (input)


List of user control options as in SSPOTRF. Only four elements, iparam(1), iparam(2), iparam(3),
and iparam(5), are required for the solution phase.
iparam(1) 0 Use default values for all options.
1 Override default values.
iparam(2) Unit number for warning and error messages.
Default is 6.
iparam(3) Flag to control level of messages output.
≤ 0 Report only fatal errors.
= 1 Report timing and workspace for each phase.
≥ 2 Report detailed information for each phase.
Default is 0.
iparam(5) 0 A fresh start.
1 A restart from previously saved data.
Default is 0.
ierror Integer. (output)
Error code, as follows:
0 Normal completion.
–3 ido is not a valid input.
– 50001 Insufficient workspace for ido = 1.
– 60001 Insufficient workspace for ido = 2.
– 70001 Insufficient workspace for ido = 3.
– 80001 Insufficient workspace for ido = 4.
– 90001 Insufficient workspace for ido = 5.
– 100001 Insufficient workspace for ido = 6.

SEE ALSO
INTRO_SPARSE(3S) for general information on sparse solvers and a usage example
SSPOTRF(3S) to compute the factorization used by SSPOTRS

004– 2081– 002 495


SSTSTRF ( 3S ) SSTSTRF ( 3S )

NAME
SSTSTRF – Factors a real sparse general matrix with a symmetric nonzero pattern (no form of pivoting is
implemented)

SYNOPSIS
CALL SSTSTRF (ido, neqns, icolptr, irowind, value, lwork, work, iparam, ierror)

IMPLEMENTATION
UNICOS systems

DESCRIPTION
Given a real sparse general matrix A with a symmetric nonzero pattern, SSTSTRF computes the LU
factorization of PA(transpose of P). P is an internally computed permutation matrix. No form of pivoting is
implemented.
This routine has the following arguments:
ido Integer. (input)
Controls the execution path through the routine. ido is a two-digit integer whose digits are
represented on this man page as i and j. i indicates the starting phase of execution, and j
indicates the ending phase. For SSTSTRF, there are four phases of execution, as follows:
Phase 1: Fill reduction reordering
Phase 2: Symbolic factorization
Phase 3: Determination of the node execution sequence and the storage requirement for the
frontal matrices
Phase 4: Numerical factorization
If a previous call to the routine has computed information from previous phases, execution may
start at any phase.
ido = 10i + j 1≤i≤j≤4
neqns Integer. (input)
Number of equations (or unknowns, rows, or columns).
icolptr Integer array of dimension neqns + 1 . (input)
Column pointer array for the sparse matrix A. The first and last elements of the array must be
set as follows:
icolptr(1) = 1 icolptr(neqns+1) = nza + 1
where nza is the number of nonzero elements in the sparse matrix A.
irowind Integer array of dimension nza (see icolptr). (input)
Row indices array for the sparse matrix A.

496 004– 2081– 002


SSTSTRF ( 3S ) SSTSTRF ( 3S )

value Real array of dimension nza (see icolptr). (input)


Array of nonzero values for the sparse matrix A.
The icolptr, irowind, and value arguments taken together contain the input matrix in sparse
column format. See the introduction to the sparse solvers (INTRO_SPARSE(3S)) for a full
description of the sparse column format.
lwork Integer. (input)
Length of the work array work. Workspace requirements vary from phase to phase. If lwork is
not sufficient to execute a particular phase successfully, the routine will return with an
indication of how much workspace is required to continue. See the Workspace subsection.
work Real array of dimension lwork.
On input equal to least amount of workspace required for the step. On output equal to least
amount of workspace needed to be saved for the intermediate results of the step being
processed.
Work array used to hold the results of each phase that are needed to process the next phase.
The user must not modify this array between calls to SSTSTRF to compute subsequent phases.
iparam Integer array of dimension 12. (input)
List of user control parameters. The value of iparam(1) controls the use of the parameter array:
0 Uses default values for all parameters.
1 Overrides default values by using iparam.
For a full description, see the Parameters subsection.
ierror Integer. (output)
Error code to report any error condition detected.
0 Normal completion.
–1 ido is not a valid path for a fresh start.
–2 ido is not a valid path for a restart run.
– 10000 Input matrix structure is incorrect.
– k0001 Insufficient storage allocated for phase
k (1 ≤ k ≤ 4).
– 20002 Fatal error from the symbolic factorization. Either
the input structure is incorrect or the active part of
array work has been changed between successive
calls to SSTSTRF.
– 40002 Input matrix structure is not consistent with the
structure of the lower triangular factor. The active
part of array work may have been changed between
successive calls to SSTSTRF.
– 40301 Fatal error from the numerical factorization. The
input matrix is numerically singular.

004– 2081– 002 497


SSTSTRF ( 3S ) SSTSTRF ( 3S )

Parameters
The following is a list of user control parameters and their default values to be used by SSTSTRF and
SSTSTRS routines. iparam(1)=0.
CAL L SST STRF(I DO,NEQ ,ICOL, IROW,V AL,LWK ,WO RK, 0 ,IER)

iparam(1) 0 Use default values for all parameters.


1 Override default values.
iparam(2) Unit number for warning and error messages.
Default is 6.
iparam(3) Flag to control level of messages output.
≤ 0 Report only fatal errors.
= 1 Report timing and workspace for each phase.
≥ 2 Report detailed information for each phase.
Default is 0.
iparam(4) 0 Do not save the adjacency structure.
1 Save the adjacency structure.
Default is 0.
iparam(5) 0 A fresh start.
1 A restart from previously saved data.
Default is 0.
iparam(6) 0 No output will be saved for subsequent restart.
k Active part of the work array through phase k will be saved for subsequent restart.
Default is 0.
iparam(7) Save-and-restart unit number of the unformatted file, which it is assumed that the user has
opened.
No meaningful default exists. That is, if the default for iparam(6) is used, no unit number is
needed; therefore, no unit number default value is needed.
iparam(8) Relaxation factor that specifies the maximum additional fill-ins allowed per supernode.
Default is 0.
iparam(9) Relaxation factor that specifies the maximum additional fill-ins allowed as a percentage of the
size of L. iparam(9) is essentially used as a constraint in allowing additional fill-ins for each
supernode that uses iparam(8). See the introduction to the sparse solvers,
INTRO_SPARSE(3S).
Default is 0.
iparam(10) Size of the frontal matrix above which parallelism is exploited only in the factorization and
partial updating of the dense frontal matrix. Usually, this type of parallelism is more effective
toward the end of the factorization, when there are usually fewer independent supernodes and
the frontal matrices are usually larger.
Default is 0.

498 004– 2081– 002


SSTSTRF ( 3S ) SSTSTRF ( 3S )

iparam(11) Size of the fixed block to accommodate the grouping of temporary frontal matrices. This is
needed only when you want to exploit the parallelism in the elimination of independent
supernodes; in this case, workspace for temporary frontal and update matrices of the
independent supernodes are allocated using a fixed-block scheme. When in use, iparam(11)
must be greater than or equal to iparam(10).
Default is 0.
iparam(12) 0 Check for valid input structure.
1 Do not check input structure.
Default is 0.
Workspace
You can determine the amount of workspace needed to execute phase k (denoted Use(k)) and the amount of
workspace retained after the execution of phase k (denoted Ret(k)) by using the following notation:
ncpus = Number of CPUs.
neqns = Number of unknowns or equations.
nsup = Number of supernodes. This can be obtained from work(32) after phase 1.
nza = Number of nonzero elements in A (=icolptr(neqns+1)– 1).
nadj = (nza – neqns), size of the adjacency structure of A.
nfctnzs = Number of nonzero elements in L. This can be obtained from work(11) after phase 1.
ngssubs = Number of row subscripts required to represent L. This can be obtained from work(14) after
phase 1.
minstk = Minimum amount of workspace required for the temporary frontal matrices. This can be
obtained from work(22) after phase 3.
Phase 1:
Use(1) = 150 + 2*nadj + 11*neqns + 4
Ret(1) = 150 + 4*neqns + 3*nsup + nadj + 3

Phase 2:
Use(2) = Ret(1) + 2*ngssubs + 10*nsup + neqns + 5

If adjacency structure is saved then


Ret(2) = Use(2)
else
Ret(2) = Use(2) - neqns - nadj
end

004– 2081– 002 499


SSTSTRF ( 3S ) SSTSTRF ( 3S )

Phase 3:
For single processing (ncpus = 1)
Use(3) = Ret(2) + 2*nsup
Ret(3) = Ret(2)

For multiple processing (ncpus > 1)


Use(3) = Ret(2) + 12*nsup + 2
Ret(3) = Ret(2) + 8*nsup + 1

Phase 4:
For single processing (ncpus = 1)
Use(4) = Ret(3) + neqns + nfctnzs + nsup + minstk

For multiple processing (ncpus > 1)


Use(4) = Ret(3) + neqns + nfctnzs + ncpus + +3*nsup + minstk
Ret(4) = Ret(3) + neqns + nfctnzs

SEE ALSO
INTRO_SPARSE(3S) for general information on sparse solvers and a usage example
SSTSTRS(3S) to solve one or more right-hand sides, using the factorization computed by SSTSTRF

500 004– 2081– 002


SSTSTRS ( 3S ) SSTSTRS ( 3S )

NAME
SSTSTRS – Solves a real sparse general system with a symmetric nonzero pattern, using the factorization
computed in SSTSTRF(3S)

SYNOPSIS
CALL SSTSTRS (ido, lwork, work, nrhs, rhs, ldrhs, iparam, ierror)

IMPLEMENTATION
UNICOS systems

DESCRIPTION
T
Given the LU factorization of PAP computed from SSTSTRF(3S) and a (set of) right-hand side(s),
SSTSTRS solves the following linear system for the solution of the system Ax = b.
T
P is an internally computed permutation matrix. P is the transpose of P.
T T
PAP y = Pb, x = P
This routine has the following arguments:
ido Integer. (input)
Variable used to control the execution path in SSTSTRS.
T
ido = 1 Solves P (LU)Px = b (that is, Ax = b) for x
T
ido = 2 Solves P Lz = b for z
ido = 3 Solves UPx = z for x
Calling SSTSTRS with ido = 2 and again with ido = 3 has the same result as calling SSTSTRS
once with ido = 1.
lwork Integer. (input)
Length of the work array work as in SSTSTRF.
work Real array of dimension lwork. (input and output)
Work array exactly as output from SSTSTRF. The user must not have modified this array
because it contains information about the LU factorization.
nrhs Integer. (input)
Number of right-hand sides.
rhs Real array of dimension (ldrhs, nrhs). (input and output)
On entry, rhs contains the nrhs right-hand side vectors. If ido = 1 or 2, the right-hand side
vectors should be b from the system of equations Ax = b. If ido = 3, the right-hand sides
should be the intermediate result z obtained by calling SSTSTRS with ido = 2.
On exit, rhs contains the nrhs solution vectors.
ldrhs Integer. (input)
Leading dimension of array rhs exactly as specified in the calling program.

004– 2081– 002 501


SSTSTRS ( 3S ) SSTSTRS ( 3S )

iparam Integer array of dimension 12. (input)


List of user control options as in SSTSTRF. Only four elements, iparam(1), iparam(2),
iparam(3), and iparam(5), are required for the solution phase.
iparam(1) 0 Use default values for all options
1 Override default values
iparam(2) Unit number for warning and error messages
Default is 6.
iparam(3) Flag to control level of messages output.
≤ 0 Report only fatal errors
= 1 Report timing and workspace for each phase
≥ 2 Report detailed information for each phase
Default is 0.
iparam(5) 0 A fresh start
1 A restart from previously saved data
Default is 0.
ierror Integer. (output)
0 Normal completion.
–3 ido is not a valid input.
– 50001 Insufficient workspace for ido = 1.
– 60001 Insufficient workspace for ido = 2.
– 70001 Insufficient workspace for ido = 3.
– 80001 Insufficient workspace for ido = 4.
– 90001 Insufficient workspace for ido = 5.
– 100001 Insufficient workspace for ido = 6.

SEE ALSO
INTRO_SPARSE(3S) for general information on sparse solvers and a usage example
SSTSTRF(3S) to compute the factorization used by SSTSTRF

502 004– 2081– 002


INTRO_SPEC ( 3S ) INTRO_SPEC ( 3S )

NAME
INTRO_SPEC – Introduction to solvers for special linear systems

IMPLEMENTATION
UNICOS systems

DESCRIPTION
All solvers for special linear systems run only on Cray PVP systems.
The following table lists the solvers for special linear systems. The first name in each block of the table is
the name of the man page that documents all of the routines listed in that block.

Purpose Name

Solves first-order linear recurrences, overwriting input vector FOLRP


FOLR
Solves first-order linear recurrences and writes the solutions to a new vector FOLR2
FOLR2P
Solves special first-order linear recurrences FOLRC
Solves for the last term of a first-order linear recurrence FOLRN
FOLRNP
Solves a partial product or a partial summation problem RECPP
RECPS
Solves a real- or complex-valued tridiagonal system with one right-hand side SDTSOL
CDTSOL
Factors a real- or complex-valued tridiagonal system SDTTRF
CDTTRF
Solves a real- or complex-valued tridiagonal system with one right-hand side, using its SDTTRS
factorization as computed by SDTTRF(3S) or CDTTRF(3S) CDTTRS
Solves a second-order linear recurrence SOLR
Solves a second-order linear recurrence for only the last term SOLRN
Solves a second-order linear recurrence for three terms SOLR3

004– 2081– 002 503


FOLR ( 3S ) FOLR ( 3S )

NAME
FOLR, FOLRP – Solves first-order linear recurrences

SYNOPSIS
CALL FOLR (n, x, incx, a, inca)
CALL FOLRP (n, x, incx, a, inca)

IMPLEMENTATION
UNICOS and UNICOS/mk systems

DESCRIPTION
FOLR solves first-order linear recurrences, as follows:
a1 = a1
a i = a i – x i a i– 1 for i = 2, 3, . . ., n
FOLRP solves first-order linear recurrences, as follows:
a1 = a1
a i = a i + x i a i– 1 for i = 2, 3, . . ., n
These routines have the following arguments:
n Integer. (input)
Length of linear recurrence. If n ≤ 1, neither routine performs any computation.
x Real array of dimension 1+(n – 1) .  incx  . (input)
Contains multiplier vector. The first element of x in the recurrence is arbitrary.
incx Integer. (input)
Increment between elements of x.
a Real array of dimension 1+(n – 1) .  inca  . (input and output)
Contains operand vector. On input, a contains the initial values for the recurrence relation. On
output, a receives the result of the linear recurrence.
inca Integer. (input)
Increment between recurrence elements of a.

NOTES
When working backward (incx < 0 or inca < 0), each routine starts at the end of the vector and moves
backward, as follows:

504 004– 2081– 002


FOLR ( 3S ) FOLR ( 3S )

x (1−incx . (n −1)), x (1−incx . (n −2)),. . ., x (1)


a (1−inca . (n −1)), a (1−inca . (n −2)),. . ., a (1)
If incx = 0, x is a scalar multiplier.

CAUTIONS
Do not specify inca as 0, because unpredictable results may occur.

EXAMPLES
The following examples illustrate the use of these routines with positive and negative increments. (The first
three executable statements of each example are Fortran 90 array syntax.)
Example 1: FOLR with positive increments
PRO GRAM EX1
PARAME TER (NMAX = 100)
REA L X(N MAX), A(N MAX ), A1( NMAX)
C
C.. ...Loa d vec tor s wit h ran dom number s, initializ e N.
X = RAN F()
A = RANF()
A1 = A
N = NMA X
C
C.. ... Call to FOLR
CALL FOLR(N,X, 1,A 1,1)
C
C.. ...Equ iva len t FOR TRA N cod e
A(1 )=A (1)
DO 10 I = 2, N
A(I )=A (I) -X(I)* A(I-1)
10 CON TINUE
C
C.. ... Ver ify res ult s
A = A - A1
PRINT* ,’D iff erence = ’,S NRM2(N ,A, 1)
END

004– 2081– 002 505


FOLR ( 3S ) FOLR ( 3S )

Example 2: FOLR with negative increments


PROGRA M EX2
PAR AME TER (NM AX = 100 )
REAL X(NMAX), A(N MAX ), A1( NMAX)
C
C.....Loa d vec tor s wit h random number s, ini tializ e N.
X = RANF()
A = RAN F()
A1 = A
N = NMA X
C
C..... Cal l to FOLR
CAL L FOL R(N ,X,-1, A1, -1)
C
C.. ... Equiva lent FORTRA N code
A(N )=A(N)
DO 10 I = N-1, 1, -1
A(I)=A (I)-X( I)* A(I+1)
10 CONTIN UE
C
C..... Verify res ults
A = A - A1
PRI NT* ,’D iffere nce = ’,SNRM 2(N ,A,1)
END

Example 3: FOLRP with positive increments

506 004– 2081– 002


FOLR ( 3S ) FOLR ( 3S )

PRO GRA M EX3


PARAME TER (NMAX = 100 )
REA L X(N MAX), A(N MAX), A1(NMA X)
C
C..... Loa d vec tors wit h ran dom num bers, initia liz e N.
X = RAN F()
A = RAN F()
A1 = A
N = NMAX
C
C.. ... Call to FOL RP
CALL FOLRP( N,X,1, A1, 1)
C
C.....Equ iva lent FOR TRAN code
A(1 )=A (1)
DO 10 I = 2, N
A(I )=A (I) +X(I)*A(I -1)
10 CON TINUE
C
C..... Ver ify res ult s
A = A - A1
PRI NT* ,’D iffere nce = ’,S NRM2(N ,A,1)
END

Example 4: FOLRP with negative increments

004– 2081– 002 507


FOLR ( 3S ) FOLR ( 3S )

PRO GRA M EX4


PARAME TER (NMAX = 100 )
REA L X(N MAX), A(N MAX ), A1( NMA X)
C
C..... Loa d vec tors wit h ran dom num ber s, ini tia liz e N.
X = RAN F()
A = RAN F()
A1 = A
N = NMAX
C
C.. ... Call to FOL RP
CALL FOLRP( N,X,-1 ,A1 ,-1 )
C
C.....Equ iva lent FOR TRAN cod e
A(N )=A (N)
DO 10 I = N-1 , 1, -1
A(I )=A (I) +X(I)* A(I+1)
10 CON TINUE
C
C..... Ver ify res ult s
A = A - A1
PRI NT* ,’D iffere nce = ’,S NRM 2(N,A, 1)
END

SEE ALSO
FOLR2(3S) and FOLR2P(3S) to solve the same recurrences as solved by FOLR and FOLRP, without
overwriting the a operand
FOLRC(3S) to solve a first-order linear recurrence by using scalar multiplier
FOLRN(3S) and FOLRNP(3S) to solve for only the last term of the same recurrences as solved by FOLR and
FOLRP
SOLR(3S), SOLR3(3S), SOLRN(3S) to solve various forms of second-order linear recurrence

508 004– 2081– 002


FOLR2 ( 3S ) FOLR2 ( 3S )

NAME
FOLR2, FOLR2P – Solves first-order linear recurrences without overwriting the operand vector

SYNOPSIS
CALL FOLR2 (n, x, incx, a, inca, b, incb)
CALL FOLR2P (n, x, incx, a, inca, b, incb)

IMPLEMENTATION
UNICOS and UNICOS/mk systems

DESCRIPTION

FOLR2 solves first-order linear recurrences, as follows:


b1 = a1
b i = a i – x i b i– 1 for i = 2, 3, . . ., n
FOLR2P solves first-order linear recurrences, as follows:
b1 = b1
b i = b i + x i b i– 1 for i = 2, 3, . . ., n
These routines have the following arguments:
n Integer. (input)
Length of linear recurrence. If n ≤ 0, neither routine performs any computation.
x Real array of dimension 1+(n – 1) .  incx  . (input)
Contains multiplier vector. The first element of x in the recurrence is arbitrary.
incx Integer. (input)
Increment between elements of x.
a Real array of dimension 1+(n – 1) .  inca  . (input)
Contains operand vector.
inca Integer. (input)
Increment between recurrence elements of a.
b Real array of dimension 1+(n – 1) .  incb  . (output)
Contains result vector.
incb Integer. (input)
Increment between recurrence elements of b.
The following is the Fortran equivalent of FOLR2 (given for case incx = inca = incb = 1):

004– 2081– 002 509


FOLR2 ( 3S ) FOLR2 ( 3S )

B(1)=A (1)
DO 10 I=2 ,N
B(I )=A (I)-X( I)*B(I -1)
10 CONTIN UE

The following is the Fortran equivalent of FOLR2P (given for case incx = inca = incb = 1):
B(1 )=A(1)
DO 10 I=2 ,N
B(I)=A (I) +X( I)*B(I-1)
10 CON TINUE

NOTES
When working backward (incx < 0, inca < 0 or incb < 0), each routine starts at the end of the vector and
moves backward, as follows:
x (1−incx . (n −1)), x (1−incx . (n −2)),. . ., x (1)
a (1−inca . (n −1)), a (1−inca . (n −2)),. . ., a (1)
b (1−incb . (n −1)), b (1−incb . (n −2)),. . ., b (1)

If incx = 0, x is a scalar multiplier.

CAUTIONS
Do not specify inca or incb as 0, because unpredictable results may occur.

SEE ALSO
FOLR(3S), FOLRP(3S) to solve the same recurrences as solved by FOLR2 and FOLRP2, but they overwrite
the a operand rather than producing a separate result vector
FOLRC(3S) to solve a first-order linear recurrence by using scalar multiplier
FOLRN(3S), FOLRNP(3S) to solve for only the last term of the same recurrences as solved by FOLR and
FOLRP
SOLR(3S), SOLR3(3S), SOLRN(3S) to solve various forms of second-order linear recurrence

510 004– 2081– 002


FOLRC ( 3S ) FOLRC ( 3S )

NAME
FOLRC – Solves a first-order linear recurrence with a scalar multiplier

SYNOPSIS
CALL FOLRC (n, b, incb, a, inca, alpha)

IMPLEMENTATION
UNICOS and UNICOS/mk systems

DESCRIPTION
FOLRC solves first-order linear recurrences, as follows:
b1 = a1
b i = a i + α . b i– 1 for i = 2, 3, . . ., n
This routine has the following arguments:
n Integer. (input)
Length of linear recurrence. If n ≤ 0, FOLRC returns without any computation.
b Real array of dimension 1+(n – 1) .  incb  . (output)
Contains result vector.
incb Integer. (input)
Increment between recurrence elements of b.
a Real array of dimension 1+(n – 1) .  inca  . (input)
Contains operand vector.
inca Integer. (input)
Increment between recurrence elements of a.
alpha Real. (input)
Scalar multiplier α.
The following is the Fortran equivalent of FOLRC (given for case inca = incb = 1):
B(1)=A (1)
DO 10 I=2 ,N
B(I )=A(I) +ALPHA *B( I-1 )
10 CON TIN UE

004– 2081– 002 511


FOLRC ( 3S ) FOLRC ( 3S )

NOTES
When working backward (inca < 0 or incb < 0), this routine starts at the end of the vector and moves
backward, as follows:
a (1−inca . (n −1)), a (1−inca . (n −2)),. . ., a (1)
b (1−incb . (n −1)), b (1−incb . (n −2)),. . ., b (1)

CAUTIONS
Do not specify incb as 0, because unpredictable results may occur.

SEE ALSO
FOLR(3S), FOLRP(3S) to solve recurrences similar to that solved by FOLRC, but they require a vector of
multipliers rather than one scalar multiplier
FOLR2(3S), FOLR2P(3S) to solve the same recurrences as solved by FOLR and FOLRP, without overwriting
the a operand
FOLRN(3S), FOLRNP(3S) to solve for only the last term of the same recurrences as solved by FOLR and
FOLRP
RECPS(3S) to perform a partial summation operation (same as FOLRC with α = 1.0)
SOLR(3S), SOLR3(3S), SOLRN(3S) to solve various forms of second-order linear recurrence

512 004– 2081– 002


FOLRN ( 3S ) FOLRN ( 3S )

NAME
FOLRN, FOLRNP – Solves for the last term of first-order linear recurrence

SYNOPSIS
r = FOLRN (n, x, incx, a, inca)
r = FOLRNP (n, x, incx, a, inca)

IMPLEMENTATION
UNICOS and UNICOS/mk systems

DESCRIPTION
FOLRN solves for r, the last term of first-order linear recurrence, as follows:
r ← a1
r ← a i – x i r for i = 2,3,. . .,n
FOLRNP solves for r, the last term of first-order linear recurrence, as follows:
r ← a1
r ← a i + x i r for i = 2,3,. . .,n
These functions have the following arguments:
r Real. (output)
Value of the last term of the linear recurrence.
n Integer. (input)
Length of linear recurrence. If n ≤ 0, neither routine performs any computation.
x Real array of dimension 1+(n – 1) .  incx  . (input)
Contains multiplier vector. The first element of x in the recurrence is arbitrary.
incx Integer. (input)
Increment between recurrence elements of x.
a Real array of dimension 1+(n – 1) .  inca  . (input)
Contains operand vector.
inca Integer. (input)
Increment between recurrence elements of a.
The following is the Fortran equivalent of FOLRN (given for case incx = inca = 1):
R=A (1)
DO 10 I=2 ,N
R=A (I) -X(I)* R
10 CON TIN UE

004– 2081– 002 513


FOLRN ( 3S ) FOLRN ( 3S )

The following is the Fortran equivalent of FOLRNP (given for case incx = inca = 1):
R=A (1)
DO 10 I=2,N
R=A (I)+X( I)*R
10 CON TIN UE

NOTES
When working backward (incx < 0 or inca < 0), each routine starts at the end of the vector and moves
backward, as follows:
x (1−incx . (n −1)), x (1−incx . (n −2)),. . ., x (1)
a (1−inca . (n −1)), a (1−inca . (n −2)),. . ., a (1)

If incx = 0, x is a scalar multiplier.

CAUTIONS
Do not specify inca as 0, because unpredictable results may occur.

EXAMPLES
You can use FOLRNP to perform Horner’s rule, an efficient method for evaluation of polynomials.
m
Let p (x ) = Σ
i =0
ai x m−i , a polynomial of degree m.

Then, Horner’s rule states that:


p (x ) = (. . .((a 0x + a 1) x + a 2) x +. . .am )

Thus, the following is the Fortran equivalent to Horner’s rule for evaluating p(x):
REA L A(0 :M) , PX, X
. . .

PX = A(0)
DO 10 I = 1, M
PX = PX * X + A(I)
10 CON TINUE

This is the same as the Fortran equivalent to FOLRNP, when x is a scalar (incx = 0); that is, the following is
also an equivalent to Horner’s rule for evaluating p(x):

514 004– 2081– 002


FOLRN ( 3S ) FOLRN ( 3S )

REA L A(0 :M) , PX, X


. . .

PX = FOL RNP (M+1,X ,0, A(0 ),1 )

SEE ALSO
FOLR(3S), FOLRP(3S) to solve for all terms (not just the last term) in the same recurrences as solved by
FOLRN and FOLRNP, overwriting the a operand with the results
FOLR2(3S), FOLR2P(3S) to solve for all terms in the same recurrences as solved by FOLRN and FOLRNP,
without overwriting the a operand
FOLRC(3S) to solve for all terms in a first-order linear recurrence by using scalar multiplier
SOLR(3S), SOLR3(3S), SOLRN(3S) to solve various forms of second-order linear recurrence

004– 2081– 002 515


RECPP ( 3S ) RECPP ( 3S )

NAME
RECPP, RECPS – Solves a partial product or partial summation problem

SYNOPSIS
CALL RECPP (n, y, incy, x,
incx)
CALL RECPS (n, y, incy, x, incx)

IMPLEMENTATION
UNICOS systems

DESCRIPTION
RECPP solves a partial product problem, as follows:
y1 ← x1
y 1 ← x 1 . y i– 1 for i = 2, 3 . . ., n
RECPS solves a partial summation problem, as follows:
y1 ← x1
y i ← x i + y i– 1 for i = 2, 3 . . ., n
These routines have the following arguments:
n Integer. (input)
Length of linear recurrence. If n ≤ 0, neither routine performs any computation.
y Real array of dimension 1+(n – 1) .  incy  . (output)
Contains recurrent operand vector. Array y receives the result.
incy Integer. (input)
Increment between recurrence elements of y.
x Real array of dimension 1+(n – 1) .  incx  . (input)
Contains nonrecurrent operand vector.
incx Integer. (input)
Increment between recurrence elements of x.

NOTES
When working backward (incx < 0 or incy < 0), this routine starts at the end of the vector and moves
backward, as follows:

516 004– 2081– 002


RECPP ( 3S ) RECPP ( 3S )

x (1−incx . (n −1)), x (1−incx . (n −2)),. . ., x (1)


y (1−incy . (n −1)), y (1−incy . (n −2)),. . ., y (1)

CAUTIONS
Do not specify incy as 0, because unpredictable results may occur.

004– 2081– 002 517


SDTSOL ( 3S ) SDTSOL ( 3S )

NAME
SDTSOL, CDTSOL – Solves a real-valued or complex-valued tridiagonal system with one right-hand side

SYNOPSIS
CALL SDTSOL (n, c, d, e, inct, b, incb)
CALL CDTSOL (n, c, d, e, inct, b, incb)

IMPLEMENTATION
UNICOS systems

DESCRIPTION
SDTSOL solves a real-valued tridiagonal system with one right-hand side by combination of
burn-at-both-ends and 3:1 cyclic reduction.
CDTSOL solves a complex-valued tridiagonal system with one right-hand side by combination of
burn-at-both-ends and 3:1 cyclic reduction.
These routines have the following arguments:
n Integer. (input)
Dimension of the tridiagonal matrix. If n < 1, these routines return without any computation.
c SDTSOL: Real array of dimension (1+(n – 1) . inct ). (input and output)
Lower off-diagonal of the real-valued tridiagonal matrix with c(1) = 0.0.
CDTSOL: Complex array of dimension (1+(n – 1) . inct ). (input and output)
Lower off-diagonal of the complex-valued tridiagonal matrix with c(1) = (0.0,0.0).
d SDTSOL: Real array of dimension (1+(n – 1) . inct ). (input and output)
Main diagonal of the real-valued tridiagonal matrix.
CDTSOL: Complex array of dimension (1+(n – 1) . inct ). (input and output)
Main diagonal of the complex-valued tridiagonal matrix.
e SDTSOL: Real array of dimension (1+(n – 1) . inct ). (input and output)
Upper off-diagonal of the real-valued tridiagonal matrix with e (1+(n – 1) . inct ) = 0.0.
CDTSOL: Complex array of dimension (1+(n – 1) . inct ). (input and output)
Upper off-diagonal of the complex-valued tridiagonal matrix with e (1+(n – 1) . inct )=(0.0,0.0).
inct Integer. (input)
Increment between elements in each of the input vectors c, d, and e. inct must be positive.
Typically inct = 1, in which case, the elements of c are contiguous in memory, as are the elements
of d and e.
b SDTSOL: Real array of dimension (1+(n – 1) . incb ). (input and output)
CDTSOL: Complex array of dimension (1+(n – 1) . incb ). (input and output)

518 004– 2081– 002


SDTSOL ( 3S ) SDTSOL ( 3S )

On entry, b contains the right-hand-side values. On exit, it contains the solution.


incb Integer. (input)
Increment between elements in each column of b. incb must be positive. Typically, incb = 1, in
which case, the elements in each row of b are contiguous in memory.

NOTES
A 3:1 cyclic reduction is used until the size of the system is reduced to 40. Then the reduced system is
solved directly using a burn-at-both-ends algorithm. The remaining values are obtained by backfilling.
When calling these routines, the elements of c(1) and e (1+(n – 1) . inct ) must be allocated and set equal to
0.0. See the EXAMPLES section.
These routines are appropriate only for tridiagonal matrices that require no pivoting.

EXAMPLES
The following example shows how to set up the arguments c, d, and e, given the tridiagonal matrix T.
Let T be the tridiagonal matrix:
11 12 0 0 0 
 
21 22 23 0 0 
T =  0 32 33 34 0 
 0 0 43 44 45 
 
 0 0 0 54 55 
Then to pass T to TRID (with inct = 1), set the following:
0  11  12 
     
21  22  23 
c = 32  d = 33  e = 34 
43  44  45 
     
54  55  0 

004– 2081– 002 519


SDTTRF ( 3S ) SDTTRF ( 3S )

NAME
SDTTRF, CDTTRF – Factors a real-valued or complex-valued tridiagonal system

SYNOPSIS
CALL SDTTRF (n, c, d, e, inct, work, lwork, info)
CALL CDTTRF (n, c, d, e, inct, work, lwork, info)

IMPLEMENTATION
UNICOS systems

DESCRIPTION
SDTTRF factors a real-valued tridiagonal system by combination of burn-at-both-ends and 3:1 cyclic
reduction.
CDTTRF factors a complex-valued tridiagonal system by combination of burn-at-both-ends and 3:1 cyclic
reduction.
These routines have the following arguments:
n Integer. (input)
Dimension of the tridiagonal matrix. If n < 1, these routines return without any computation.
c SDTTRF: Real array of dimension (1+(n – 1) . inct ). (input and output)
Lower off-diagonal of the real-valued tridiagonal matrix with c(1) = 0.0.
CDTTRF: Complex array of dimension (1+(n – 1) . inct ). (input and output)
Lower off-diagonal of the complex-valued tridiagonal matrix with c(1) = (0.0,0.0).
d SDTTRF: Real array of dimension (1+(n – 1) . inct ). (input and output)
Main diagonal of the real-valued tridiagonal matrix.
CDTTRF: Complex array of dimension (1+(n – 1) . inct ). (input and output)
Main diagonal of the complex-valued tridiagonal matrix.
e SDTTRF: Real array of dimension (1+(n – 1) . inct ). (input and output)
Upper off-diagonal of the real-valued tridiagonal matrix with e (1+(n – 1) . inct ) = 0.0.
CDTTRF: Complex array of dimension (1+(n – 1) . inct ). (input and output)
Upper off-diagonal of the complex-valued tridiagonal matrix with e (1+(n – 1) . inct )=(0.0,0.0).
inct Integer. (input)
Increment between elements in each of the input vectors c, d, and e. inct must be positive.
Typically, inct = 1, in which case, the elements of c are contiguous in memory as are the
elements of d and e.
work SDTTRF: Real array of dimension (lwork). (output)
Storage for intermediate results needed for subsequent calls to SDTTRS. This space must not be
modified between calls to this routine and SDTTRS.

520 004– 2081– 002


SDTTRF ( 3S ) SDTTRF ( 3S )

CDTTRF: Complex array of dimension (lwork). (output)


Storage for intermediate results needed for subsequent calls to CDTTRS. This space must not be
modified between calls to this routine and CDTTRS.
lwork Integer. (input)
Length of work. lwork must be greater than or equal to 2n. The value of lwork must not
change between calls to this routine and SDTTRS or CDTTRS.
info Integer. (output)
On exit, info has one of the following values:
= 0 No error detected.
= – 1 lwork is too small ( < 2n ).

NOTES
A 3:1 cyclic reduction is used until the size of the system is reduced to 40. Then the reduced system is
factored directly using a burn-at-both-ends algorithm. You should use these routines with SDTTRS or
CDTTRS, either of which solves for one right-hand side given the factorization computed in SDTTRF or
CDTTRF, respectively.
When calling these routines, the elements of c(1) and e (1+(n – 1) . inct ) must be allocated and set equal to 0.
See the EXAMPLES section.
These routines are appropriate only for tridiagonal matrices that require no pivoting.
CDTTRF only: Because this routine is for complex data, the amount of memory needed is 4n words, which
is 2n complex elements.

EXAMPLES
The following example shows how to set up the arguments c, d, and e, given the tridiagonal matrix T.
Let T be the tridiagonal matrix:
11 12 0 0 0 
 
21 22 23 0 0 
T= 0 32 33 34 0 
0 0 43 44 45 
 
0 0 0 54 55 
Then to pass T to TRID (with inct = 1), set the following:
0  11  12 
     
21  22  23 
c = 32  d = 33  e = 34 
43  44  45 
     
54  55  0 

004– 2081– 002 521


SDTTRF ( 3S ) SDTTRF ( 3S )

SEE ALSO
SDTSOL(3S) for a description of SDTSOL and CDTSOL, which factor and solve tridiagonal systems
SDTTRS(3S) for a description of SDTTRS(3S) and CDTTRS(3S), which solve tridiagonal systems based on
the factorization computed by SDTTRF or CDTTRF, respectively

522 004– 2081– 002


SDTTRS ( 3S ) SDTTRS ( 3S )

NAME
SDTTRS, CDTTRS – Solves a real-valued or complex-valued tridiagonal system with one right-hand side,
using its factorization as computed by SDTTRF(3S) or CDTTRF(3)

SYNOPSIS
CALL SDTTRS (n, c, d, e, inct, b, incb, work, lwork, info)
CALL CDTTRS (n, c, d, e, inct, b, incb, work, lwork, info)

IMPLEMENTATION
UNICOS systems

DESCRIPTION
SDTTRS solves a real-valued tridiagonal system with one right-hand-side by combination of
burn-at-both-ends and 3:1 cyclic reduction. SDTTRF(3S) must be called first to factor the matrix.
CDTTRS solves a complex-valued tridiagonal system with one right-hand-side by combination of
burn-at-both-ends and 3:1 cyclic reduction. CDTTRF(3S) must be called first to factor the matrix.
These routines have the following arguments:
n Integer. (input)
Dimension of the tridiagonal matrix. If n < 1, these routines return without any computation.
c SDTTRS: Real array of dimension (1+(n – 1) . inct ). (input)
Factored lower off-diagonal of the real-valued tridiagonal matrix as computed by SDTTRF.
CDTTRS: Complex array of dimension (1+(n – 1) . inct ). (input)
Factored lower off-diagonal of the complex-valued tridiagonal matrix as computed by CDTTRF.
d SDTTRS: Real array of dimension (1+(n – 1) . inct ). (input)
Factored main diagonal of the real-valued tridiagonal matrix as computed by SDTTRF.
CDTTRS: Complex array of dimension (1+(n – 1) . inct ). (input)
Factored main diagonal of the complex-valued tridiagonal matrix as computed by CDTTRF.
e SDTTRS: Real array of dimension (1+(n – 1) . inct ). (input)
Factored upper off-diagonal of the real-valued tridiagonal matrix as computed by SDTTRF.
CDTTRS: Complex array of dimension (1+(n – 1) . inct ). (input)
Factored upper off-diagonal of the complex-valued tridiagonal matrix as computed by CDTTRF.
inct Integer. (input)
Increment between elements in each of the input vectors c, d, and e. inct must be positive.
Typically, inct = 1, in which case, the elements of c are contiguous in memory as are the
elements of d and e.

004– 2081– 002 523


SDTTRS ( 3S ) SDTTRS ( 3S )

b SDTTRS: Real array of dimension (1+(n – 1) . incb ). (input and output)


CDTTRS: Complex array of dimension (1+(n – 1) . incb ). (input and output)
On entry, b contains the right-hand-side values. On exit, it contains the solution.
incb Integer. (input)
Increment between elements in each column of b. incb must be positive. Typically, incb = 1, in
which case, the elements in each row of b are contiguous in memory.
work SDTTRS: Real array of dimension (lwork). (input)
Storage for intermediate results computed by SDTTRF for subsequent calls to SDTTRS. This
space must not be modified between calls to SDTTRF and this routine.
CDTTRS: Complex array of dimension (lwork). (input)
Storage for intermediate results computed by CDTTRF for subsequent calls to CDTTRS. This
space must not be modified between calls to CDTTRF and this routine.
lwork Integer. (input)
Length of work. lwork must be greater than or equal to 2n. The value of lwork must not
change between calls to SDTTRF or CDTTRF and this routine.
info Integer. (output)
On exit, info has one of the following values:
= 0 No error detected.
= – 1 lwork is too small ( < 2n ).

NOTES
A 3:1 cyclic reduction is used until the size of the system is reduced to 40. Then the reduced system is
solved directly using a burn-at-both-ends algorithm. You should use these routines after factoring the
tridiagonal matrix with SDTTRF or CDTTRF.
CDTTRS only: Because this routine is for complex data, the amount of memory needed is 4n words, which
is 2n complex elements.

EXAMPLES
The following example shows how to set up the arguments c, d, and e, given the tridiagonal matrix T.
Let T be the tridiagonal matrix:
11 12 0 0 0 
 
21 22 23 0 0 
T= 0 32 33 34 0 
0 0 43 44 45 
 
0 0 0 54 55 

524 004– 2081– 002


SDTTRS ( 3S ) SDTTRS ( 3S )

Then to pass T to TRID (with inct = 1), set the following:


0  11  12 
     
21  22  23 
c = 32  d = 33  e = 34 
43  44  45 
     
54  55  0 

SEE ALSO
SDTSOL(3S) for a description of SDTSOL and CDTSOL, which factor and solve tridiagonal systems
SDTTRF(3S) for a description of SDTTRF(3S) and CDTTRF(3S), which compute the factorization used by
SDTTRS or CDTTRS, respectively

004– 2081– 002 525


SOLR ( 3S ) SOLR ( 3S )

NAME
SOLR – Solves a second-order linear recurrence

SYNOPSIS
CALL SOLR (n, x, incx, y, incy, a, inca)

IMPLEMENTATION
UNICOS and UNICOS/mk systems

DESCRIPTION
SOLR solves second-order linear recurrences, as in the following equation:
a i ← x i– 1 a i– 1 + y i– 2 a i– 2 for i = 3, . . ., n
a 1 and a 2 are input to this routine, and a 3 , a 4 , . . ., a n are output.
This routine has the following arguments:
n Integer. (input)
Length of linear recurrence. If n ≤ 2, SOLR returns without any computation.
x Real array of dimension 1+(n – 1) .  incx  . (input)
Contains vector of multipliers for the first-order term of the recurrence.
If incx > 0, x (incx . (n – 2)+1) and x (incx . (n – 1)+1) are arbitrary.
If incx < 0, x(1) and x(1– incx) are arbitrary.
If incx = 0, x is a scalar multiplier.
incx Integer. (input)
Increment between elements of x.
y Real array of dimension 1+(n – 1) .  incy  . (input)
Contains vector of multipliers for the second-order term of the recurrence.
If incy > 0, y (incy . (n – 2)+1) and y (incy . (n – 1)+1) are arbitrary.
If incy < 0, y(1) and y(1– incy) are arbitrary.
If incy = 0, y is a scalar multiplier.
incy Integer. (input)
Increment between elements of y.
a Real array of dimension 1+(n – 1) .  inca  . (input and output)
Contains result vector.
inca Integer. (input)
Increment between elements of a.

526 004– 2081– 002


SOLR ( 3S ) SOLR ( 3S )

The following is the Fortran equivalent of SOLR (given for case incx = incy = inca = 1):
DO 10 I=3 ,N
A(I)=X (I- 2)*A(I -1)+Y( I-2 )*A (I- 2)
10 CON TINUE

NOTES
When working backward (incx < 0, incy < 0, or inca < 0), each routine starts at the end of the vector and
moves backward, as follows:
x (1−incx . (n −1)), x (1−incx . (n −2)),. . ., x (1−2 . incx)
y (1−incy . (n −1)), y (1−incy . (n −2)),. . ., y (1−2 . incy)
a (1−inca . (n −1)), a (1−inca . (n −2)),. . ., a (1)

If incx = 0 or incy = 0, x or y (respectively) is a scalar multiplier.

CAUTIONS
Do not specify inca as 0, because unpredictable results may occur.

SEE ALSO
FOLR(3S), FOLR2(3S), FOLR2P(3S), FOLRC(3S), FOLRN(3S), FOLRNP(3S), FOLRP(3S) to solve various
forms of first-order linear recurrence
SOLR3(3S) to solve a three-term, second-order linear recurrence
SOLRN(3S) to solve the same recurrence as SOLR, but SOLRN calculates only the last term

004– 2081– 002 527


SOLR3 ( 3S ) SOLR3 ( 3S )

NAME
SOLR3 – Solves a second-order linear recurrence for three terms

SYNOPSIS
CALL SOLR3 (n, x, incx, y, incy, a, inca)

IMPLEMENTATION
UNICOS and UNICOS/mk systems

DESCRIPTION
SOLR3 solves second-order linear recurrences of three terms, as in the following equation:
a i ← a i + x i– 1 a i– 1 + y i– 2 a i– 2 for i = 3, . . ., n
All values of a are input to this routine, and a 3 , a 4 , . . ., a n are output.
This routine has the following arguments:
n Integer. (input)
Length of linear recurrence. If n ≤ 2, SOLR3 returns without any computation.
x Real array of dimension 1+(n – 1) .  incx  . (input)
Contains vector of multipliers for the first-order term of the recurrence.
If incx > 0, x (incx . (n – 2)+1) and x (incx . (n – 1)+1) are arbitrary.
If incx < 0, x(1) and x(1– incx) are arbitrary.
If incx = 0, x is a scalar multiplier.
incx Integer. (input)
Increment between elements of x.
y Real array of dimension 1+(n – 1) .  incy  . (input)
Contains vector of multipliers for the second-order term of the recurrence.
If incy > 0, y (incy . (n – 2)+1) and y (incy . (n – 1)+1) are arbitrary.
If incy < 0, y(1) and y(1– incy) are arbitrary.
If incy = 0, y is a scalar multiplier.
incy Integer. (input)
Increment between elements of y.
a Real array of dimension 1+(n – 1) .  inca  . (input and output)
Contains result vector.
inca Integer. (input)
Increment between elements of a.

528 004– 2081– 002


SOLR3 ( 3S ) SOLR3 ( 3S )

The following is the Fortran equivalent of SOLR (given for case incx = incy = inca = 1):
DO 10 I=3 ,N
A(I)=A (I)+X( I-2 )*A (I-1)+ Y(I -2) *A( I-2)
10 CON TINUE

NOTES
When working backward (incx < 0, incy < 0, or inca < 0), each routine starts at the end of the vector and
moves backward, as follows:
x (1−incx . (n −1)), x (1−incx . (n −2)),. . ., x (1−incx . 2)
y (1−incy . (n −1)), y (1−incy . (n −2)),. . ., y (1−incy . 2)
a (1−inca . (n −1)), a (1−inca . (n −2)),. . ., a (1)
If incx = 0 or incy = 0, x or y (respectively) is a scalar multiplier.

CAUTIONS
Do not specify inca as 0, because unpredictable results may occur.

EXAMPLES
You can use SOLR3 to solve a lower triangular two-subdiagonal system of linear equations La = b. That is,
because
| 1 0 0 0 . . . . 0| |a( 1)| |b( 1)|
|e( 1) 1 0 0 . . . . 0| |a( 2)| |b( 2)|
|f( 1) e(2 ) 1 0 . . . . 0| |a( 3)| |b( 3)|
| 0 f(2 ) e(3 ) 1 0 . . . 0| |a( 4)| |b( 4)|
La =| 0 0 f(3 ) e(4 ) 1 0 . . 0| | . | = | . | = b
| . . . . . . . . 0| | . | | . |
| . . . . . . . . 0| | . | | . |
| . . . . . . . . 0| | . | | . |
| 0 0 0 . . . f(n -2) e(n -1) 1| |a( n)| |b( n)|

can be written as:


a1 = b1
a2 = b2 – e1 a1
a i = b i – e i– 1 a i– 1 – f i– 2 a i– 2 i = 3, . . ., n
To solve this problem, use the following Fortran code:

004– 2081– 002 529


SOLR3 ( 3S ) SOLR3 ( 3S )

DO 10 I=1,N- 1
10 E(I )=-E(I )
DO 20 I=1 ,N-2
20 F(I )=-F(I )
B(2 )=B(2) +E(1)* B(1 )
CALL SOLR3( N,E(2) ,1, F(1),1 ,B(1), 1)

where the solution vector a is returned in array B.

SEE ALSO
FOLR(3S), FOLR2(3S), FOLR2P(3S), FOLRC(3S), FOLRN(3S), FOLRNP(3S), FOLRP(3S) to solve various
forms of first-order linear recurrence
SOLR(3S) to solve a two-term second-order linear recurrence
SOLRN(3S) to solve the same recurrence as SOLR, but SOLRN calculates only the last term

530 004– 2081– 002


SOLRN ( 3S ) SOLRN ( 3S )

NAME
SOLRN – Solves a second-order linear recurrence for only the last term

SYNOPSIS
r = SOLRN (n, x, incx, y, incy, a, inca)

IMPLEMENTATION
UNICOS and UNICOS/mk systems

DESCRIPTION
SOLRN solves for r, the last term in the following second-order linear recurrence:
a i ← x i– 2 a i– 1 + y i– 2 a i– 2 i = 3,4,. . .,n

r ← an
Only a 1 and a 2 are used as input. The remaining elements of a are workspace that is overwritten on output.
This function has the following arguments:
r Real. (output)
Value of the last term of the linear recurrence.
If n ≤ 0, r is set to 0.
If n = 1, r is set to the first element of a.
If n = 2, r is set to the second element of a.
n Integer. (input)
Length of linear recurrence.
x Real array of dimension 1+(n – 1) .  incx  . (input)
Contains vector of multipliers for the first-order term of the recurrence.
If incx > 0, x (incx . (n – 2)+1) and x (incx . (n – 1)+1) are arbitrary.
If incx < 0, x(1) and x(1– incx) are arbitrary.
If incx = 0, x is a scalar multiplier.
incx Integer. (input)
Increment between elements of x.
y Real array of dimension 1+(n – 1) .  incy  . (input)
Contains vector of multipliers for the second-order term of the recurrence.
If incy > 0, y (incy . (n – 2)+1) and y (incy . (n – 1)+1) are arbitrary.
If incy < 0, y(1) and y(1– incy) are arbitrary.
If incy = 0, y is a scalar multiplier.
incy Integer. (input)
Increment between elements of y.

004– 2081– 002 531


SOLRN ( 3S ) SOLRN ( 3S )

a Real array of dimension 1+(n – 2) .  inca  . (input and output)


Contains vector of starting terms. In the course of calculating the result r, a is overwritten with
scratch work.
inca Integer. (input)
Increment between elements of a.
The following is the Fortran equivalent of SOLRN (given for case incx = incy = inca = 1):
DO 10 I=3 ,N
A(I )=X (I-2)* A(I-1) +Y(I-2 )*A (I-2)
10 CON TIN UE
RES ULT =A( N)

For SOLRN, even though only the last term is computed, array a (A in this Fortran code) is used to hold
intermediate results and, therefore, it is overwritten.

NOTES
When working backward (incx < 0, incy < 0, or inca < 0), each routine starts at the end of the vector and
moves backward, as follows:
x (1−incx . (n −1)), x (1−incx . (n −2)),. . ., x (1−2 . incx)
y (1−incy . (n −1)), y (1−incy . (n −2)),. . ., y (1−2 . incy)
a (1−inca . (n −1)), a (1−inca . (n −2)),. . ., a (1)
If incx = 0 or incy = 0, x or y (respectively) is a scalar multiplier.

CAUTIONS
Do not specify inca as 0, because unpredictable results may occur.

EXAMPLES
SOLRN might be used to find r 2 of the calculation
         
 x 1 y 1   x 2 y 2  . . .  x n−2 y n−2   a 2  =  r 2 
1 0 1 0  1 0  a1  r1 
with the following call:
R2 = SOL RN( N,X,1, Y,1,A, 1)

The Fortran equivalent for the example follows:

532 004– 2081– 002


SOLRN ( 3S ) SOLRN ( 3S )

R1=A(1 )
R2=A(2 )
DO 10 I=1,N- 2
TEM P=R2
R2= X(I)*R 2+Y(I) *R1
R1= TEMP
10 CON TIN UE

SEE ALSO
FOLR(3S), FOLR2(3S), FOLR2P(3S), FOLRC(3S), FOLRN(3S), FOLRNP(3S), FOLRP(3S) to solve various
forms of first-order linear recurrence
SOLR(3S) to solve the same recurrence as SOLRN, but it calculates all terms, not just the last term
SOLR3(3S) to solve a three-term second-order linear recurrence

004– 2081– 002 533


534 004– 2081– 002
INTRO_BLACS ( 3S ) INTRO_BLACS ( 3S )

NAME
INTRO_BLACS – Introduction to Basic Linear Algebra Communication Subprograms

IMPLEMENTATION
UNICOS/mk systems

DESCRIPTION
The Basic Linear Algebra Communication Subprograms (BLACS) is a package of routines for UNICOS/mk
systems that provides the same functionality for message-passing linear algebra communication as the Basic
Linear Algebra Subprograms (BLAS) provide for linear algebra computation. With these two packages,
software for dense linear algebra on UNICOS/mk systems can use calls to the BLAS for computation and
calls to the BLACS for communication.
The BLACS consist of communication primitives routines, and global reduction routines. There are several
support routines.
The current version of the BLACS is compatible with the version last released by the ScaLAPACK group at
the University of Tennessee. Arrays passed to the BLACS routines must not be dynamically allocated from
the heap.
Communication Primitives
The communication primitives send a matrix to another processor or receive a matrix from another
processor; if a processor has data to be broadcast to all or a subset of processors, a broadcast communication
primitive must be used to send or receive the data. Any processor involved in a send or receive operation
must have the same amount of available matrix space.
The communication primitives can work on matrices (as indicated by the m, n, and lda arguments to the
routines) of data types of integer, real, or complex. The user can specify that only a portion of the matrix (a
trapezoidal matrix) be referenced in the operation. The uplo argument specifies whether the upper or lower
trapezoid should be used; the diag argument specifies if the matrix is a unit trapedoizal matrix or a non-unit
trapezoidal matrix.
When using the scope argument for the BLACS routines, operations can be expressed in terms of all
processors, a row of processors, or a column of processors. All processors indicated by the scope argument
will be involved in the operation being performed, even if the processor does not have data to contribute or
does not need the data being communicated.
When broadcast operations are involved, a communication pattern must be selected. The top argument
denotes the communication topology for a communication primitive or global operation.
The following table describes the available communication primitives, the routine names, and the man page
where the primitive is described:

004– 2081– 002 535


INTRO_BLACS ( 3S ) INTRO_BLACS ( 3S )

Routine
Description name Man page

Sends an integer rectangular matrix to another processor IGESD2D


Sends a real rectangular matrix to another processor SGESD2D IGESD2D
Sends a complex rectangular matrix to another processor CGESD2D
Receives an integer rectangular matrix from another processor IGERV2D
Receives a real rectangular matrix from another processor SGERV2D IGERV2D
Receives a complex rectangular matrix from another processor CGERV2D
Broadcasts an integer rectangular matrix to all or a subset of processors IGEBS2D
Broadcasts a real rectangular matrix to all or a subset of processors SGEBS2D IGEBS2D
Broadcasts a complex rectangular matrix to all or a subset of processors CGEBS2D
Receives a broadcast integer rectangular matrix from all or subset of processors IGEBR2D
Receives a real rectangular matrix from all or a subset of processors SGEBR2D IGEBR2D
Receives a complex rectangular matrix from all or a subset of processors CGEBR2D
Sends an integer trapezoidal matrix to another processor ITRSD2D
Sends a real trapezoidal matrix to another processor STRSD2D ITRSD2D
Sends a complex trapezoidal matrix to another processor CTRSD2D
Receives an integer trapezoidal matrix from another processor ITRRV2D
Receives a real trapezoidal matrix from another processor STRRV2D ITRRV2D
Receives a complex trapezoidal matrix from another processor CTRRV2D
Broadcasts an integer trapezoidal matrix to all or a subset of processors ITRBS2D
Broadcasts a real trapezoidal matrix to all or a subset of processors STRBS2D ITRBS2D
Broadcasts a complex trapezoidal matrix to all or a subset of processors CTRBS2D
Receives an integer trapezoidal matrix from all or a subset of processors ITRBR2D
Receives a real trapezoidal matrix from all or a subset of processors STRBR2D ITRBR2D
Receives a complex trapezoidal matrix from all or a subset of processors CTRBR2D

Global Reduction Routines


The global reduction routines perform element-wise operations on rectangular matrices. These operations
include summations, maximum absolute values, and minimum absolute values. For an operation to work
properly, all processors indicated by the scope argument must call the given routine.
The hypercube topology is the only supported topology for global primitives. Using this topology, all
processors in the scope of the operation get the information.

536 004– 2081– 002


INTRO_BLACS ( 3S ) INTRO_BLACS ( 3S )

The following table describes the available global reduction routines, the routine names, and the man page
name where the primitive is described:

Routine
Description name Man page

Performs summations on specified parts of an integer matrix IGSUM2D


Performs summations on specified parts of a real matrix SGSUM2D IGSUM2D
Performs summations on specified parts of a complex matrix CGSUM2D
Finds maximum absolute value of specified parts of an integer matrix IGAMX2D
Finds maximum absolute value of specified parts of a real matrix SGAMX2D IGAMX2D
Finds maximum absolute value of specified parts of a complex matrix CGAMX2D
Finds minimum absolute value of specified parts of an integer matrix IGAMN2D
Finds minimum absolute value of specified parts of a real matrix SGAMN2D IGAMN2D
Finds minimum absolute value of specified parts of a complex matrix CGAMN2D

Topologies
Different communication topologies can be used to optimize performance. Several factors can be used to
determine the best topologies. For example, a ring topology is often preferred if one processor’s time is
preferred over another processor’s; or a minimum spanning tree can be used if all processors need the
information as quickly as possible. The following topologies are supported on UNICOS/mk systems:
• Unidirectional ring. Using the unidirectional ring topology, the source processor issues one broadcast,
and each processor then receives and forwards the message. There are two types of unidirectional rings:
the increasing ring topology and the decreasing ring topology. These are "quiet" topologies (only one
processor is communicating at a time).
• Hypercube or minimum spanning tree. Hypercube broadcasts follow the physical connection of the
system; these are most useful when distributing information to all processors is more important than
saving processor time. In addition, hypercube broadcasts are more noisy, because several processors are
sending data simultaneously.
Support Routines
The BLACS package contains several routines that are not directly releated to linear processing. These
routines are used to compute grid coordinates, to initialize routines, and to return information about
processors.
The following table describes the available support routines, the routine names, and the man page name
where the routine is described:

004– 2081– 002 537


INTRO_BLACS ( 3S ) INTRO_BLACS ( 3S )

Description Routine name Man page

Initializes counters, variables, and so on for BLACS routines BLACS_GRIDINIT BLACS_GRIDINIT


Initializes processors BLACS_GRIDMAP BLACS_GRIDMAP
Returns information about processor grid BLACS_GRIDINFO BLACS_GRIDINFO
Returns the processor element number for specified coordinates BLACS_PNUM BLACS_PNUM
Returns processor’s number MYNODE MYNODE
Computes processor grid coordinates BLACS_PCOORD BLACS_PCOORD
Stops execution until all specified processors have called a routine BLACS_BARRIER BLACS_BARRIER

Context Argument
A new feature in this release of the BLACS is the added capability for the BLACS routines to communicate
over any of many coexisting grids or contexts. Each of the grids (contexts) is identified by an integer called
a context handle. The context handle is output by BLACS_GRIDINIT upon the creation of the grid.

SEE ALSO
Dongarra, Jack J. and Robert A. van de Geijn, "Two Dimensional Basic Linear Algebra Communication
Subprograms," Technical Report CS-91-138, University of Tennessee, October 1991.

538 004– 2081– 002


BLACS_BARRIER ( 3S ) BLACS_BARRIER ( 3S )

NAME
BLACS_BARRIER – Stops execution until all specifed processes have called a routine

SYNOPSIS
CALL BLACS_BARRIER (icntxt, scope)

IMPLEMENTATION
UNICOS/mk systems

DESCRIPTION
BLACS_BARRIER stops execution until all specified processes have called a routine.
This routine has the following arguments:
icntxt Integer. (input)
Context handle returned by a call to BLACS_GRIDINIT(3S).
scope Character*1. (input)
Specifies the processors that participate in the operation, using the grid specified by a previous
call to BLACS_GRIDINIT.
scope = R or r: row of processors
scope = C or c: column of processors
scope = A or a: all processors

SEE ALSO
BLACS_GRIDINIT(3S), INTRO_BLACS(3S)

004– 2081– 002 539


BLACS_EXIT ( 3S ) BLACS_EXIT ( 3S )

NAME
BLACS_EXIT – Frees all existing grids

SYNOPSIS
CALL BLACS_EXIT()

IMPLEMENTATION
UNICOS/mk systems

DESCRIPTION
BLACS_EXIT frees all the grids that have been created in the course of a user’s program. The call frees
internal buffer space that was allocated when the different grids were created.

SEE ALSO
BLACS_GRIDEXIT(3S), BLACS_GRIDINIT(3S), INTRO_BLACS(3S)

540 004– 2081– 002


BLACS_GRIDEXIT ( 3S ) BLACS_GRIDEXIT ( 3S )

NAME
BLACS_GRIDEXIT – Frees a grid

SYNPOSIS
CALL BLACS_GRIDEXIT(icntxt)

IMPLEMENTATION
UNICOS/mk systems

DESCRIPTION
BLACS_GRIDEXIT frees a grid that has been created by a call to BLACS_GRIDINIT(3S). The call frees
internal buffer space that has been allocated upon the creation of the grid.
This routine has the following argument:
icntxt Integer. (input)
The context handle identifying the grid returned by BLACS_GRIDINIT(3S) upon the creation
of the grid.

NOTES
If a call to a BLACS routine is made after a call to BLACS_GRIDEXIT with the same context handle, the
program will abort.

SEE ALSO
BLACS_EXIT(3S), BLACS_GRIDINIT(3S), INTRO_BLACS(3S)

004– 2081– 002 541


BLACS_GRIDINFO ( 3S ) BLACS_GRIDINFO ( 3S )

NAME
BLACS_GRIDINFO – Returns information about the two-dimensional processor grid

SYNOPSIS
CALL BLACS_GRIDINFO (icntxt, nprow, npcol, myrow, mycol)

IMPLEMENTATION
UNICOS/mk systems

DESCRIPTION
BLACS_GRIDINFO returns information about the processor grid, such as: the number of processor rows,
the number of processor columns, and the grid coordinates of the calling processor.
This routine has the following arguments:
icntxt Integer. (input)
Context handle returned by a call to BLACS_GRIDINIT(3S). This argument must be passed
but is currently ignored internally.
nprow Integer. (output)
The number of processor rows.
npcol Integer. (output)
The number of processor columns.
myrow Integer. (output)
Row coordinate of processor.
mycol Integer. (output)
Column coordinate of processor.

SEE ALSO
BLACS_GRIDINIT(3S), INTRO_BLACS(3S)

542 004– 2081– 002


BLACS_GRIDINIT ( 3S ) BLACS_GRIDINIT ( 3S )

NAME
BLACS_GRIDINIT – Initializes counters, variables, and so on, for the BLACS routines

SYNOPSIS
CALL BLACS_GRIDINIT (icntxt, order, nprow, npcol)

IMPLEMENTATION
UNICOS/mk systems

DESCRIPTION
BLACS_GRIDINIT initializes a nprow-by-npcol grid of processors in a row-major or column-major fashion.
The BLACS_GRIDINIT routine assigns grid coordinates to each processor. Users must call this routine and
it must be called before any access to the other BLACS routines, ScaLAPACK routines, BLAS_S routines,
or the parallel two-dimensional FFT routines. The arguments should be the same on all nodes.
This routine has the following arguments:
icntxt Integer. (output)
Context handle identifying the grid being initialized.
order Character*1. (input)
Specifies whether the grid of processors will be initialized in row-major or col-major order. If
the grid is to match the distribution of a SHARED array, the order should be c.
order = R or r: row-major order
order = C or c: col-major order
nprow Integer. (input)
Indicates the number of processor rows for the processor grid.
npcol Integer. (input)
Indicates the number of processor columns for the processor grid.

SEE ALSO
INTRO_BLACS(3S)

004– 2081– 002 543


BLACS_GRIDMAP ( 3S ) BLACS_GRIDMAP ( 3S )

NAME
BLACS_GRIDMAP – a grid of processors

SYNOPSIS
CALL BLACS_GRIDMAP (icntxt, gridmap ld, nprow, npcol)

IMPLEMENTATION
UNICOS/mk systems

DESCRIPTION
BLACS_GRIDMAP initializes a nprow-by-npcol grid of processors in the image of the (input) array gridmap.
This routine can be used as an alternative to BLACS_GRIDINIT in cases where the user’s application
requires a mapping of the processors to the grid that is different from those implemented in
BLACS_GRIDINIT.
This routine has the following arguments:
icntxt Integer. (output)
The context handle identifying the grid being initialized.
gridmap Integer array of dimension (ld, npcol). (input)
Array specifying the map of the processors to the grid. gridmap(i, j) will fill the (i-1)-th
row and (j-1)-th column of the grid (assuming indexing starts from 1).
ld Integer. (input)
Specifies the first dimension of array gridmap as declared in the calling program.
nprow Integer. (input)
Indicates the number of processor rows for the processor grid.
npcol Integer. (input)
Indicates the number of processor columns for the processor grid.

SEE ALSO
BLACS_GRIDINIT(3S), INTRO_BLACS(3S)

544 004– 2081– 002


BLACS_PCOORD ( 3S ) BLACS_PCOORD ( 3S )

NAME
BLACS_PCOORD – Computes coordinates in two-dimensional grids

SYNOPSIS
CALL BLACS_PCOORD (icntxt, pe_num, prow, pcol)

IMPLEMENTATION
UNICOS/mk systems

DESCRIPTION
BLACS_PCOORD computes processor grid coordinates prow and pcol by using pe_num.
This routine has the following arguments:
icntxt Integer. (input)
The context handle returned by a call to BLACS_GRIDINIT(3S). This argument must be
passed but is currently ignored internally.
pe_num Integer. (input)
Processing element.
prow Integer. (output)
Row coordinate for processor.
pcol Integer. (output)
Column coordinate for processor.

SEE ALSO
BLACS_GRIDINIT(3S), INTRO_BLACS(3S)

004– 2081– 002 545


BLACS_PNUM ( 3S ) BLACS_PNUM ( 3S )

NAME
BLACS_PNUM – Returns the processor element number for specified coordinates in two-dimensional grids

SYNOPSIS
PE_number = BLACS_PNUM (icntxt, prow, pcol)

IMPLEMENTATION
UNICOS/mk systems

DESCRIPTION
BLACS_PNUM returns the processor element number at grid coordinate prow, pcol.
This routine has the following arguments:
icntxt Integer. (input)
The context handle returned by a call to BLACS_GRIDINIT(3S). This argument must be
passed but is currently ignored internally.
prow Integer. (input)
Row coordinate of processor.
pcol Integer. (output)
Column coordinate of processor.

SEE ALSO
BLACS_GRIDINIT(3S), INTRO_BLACS(3S)

546 004– 2081– 002


GRIDINFO3D ( 3S ) GRIDINFO3D ( 3S )

NAME
GRIDINFO3D – Returns information about the three-dimensional processor grid

SYNOPSIS
CALL GRIDINFO3D (ictxt, npx, npy, npz, mypex, mypey, mypez)

IMPLEMENTATION
UNICOS/mk systems

DESCRIPTION
GRIDINFO3D returns information about the processor grid, such as the number of processors assigned to the
X, Y, and Z dimensions and the grid coordinates of the calling processor.
The following arguments are available with this routine:
ictxt Integer. (input)
Handle that describes the grid initialized by GRIDINIT3D(3S).
npx Integer. (output)
Number of processors assigned to the X dimension.
npy Integer. (output)
Number of processors assigned to the Y dimension.
npz Integer. (output)
Number of processors assigned to the Z dimension.
mypex Integer. (output)
X coordinate of processor.
mypey Integer. (output)
Y coordinate of processor.
mypez Integer. (output)
Z coordinate of processor.

NOTES
The GRIDINIT3D(3S) routine must be called somewhere in the program before the first call to
GRIDINFO3D.

SEE ALSO
DESCINIT3D(3S), GRIDINIT3D(3S), PCOORD3D(3S), PNUM3D(3S)

004– 2081– 002 547


GRIDINIT3D ( 3S ) GRIDINIT3D ( 3S )

NAME
GRIDINIT3D – Initializes variables for a three-dimensional (3D) grid partition of processor set

SYNOPSIS
CALL GRIDINIT3D (ictxt, npx, npy, npz)

IMPLEMENTATION
UNICOS/mk systems

DESCRIPTION
GRIDINIT3D initializes a npx-by-npy-by-npz grid of processors in a column-major fashion. The
GRIDINIT3D routine assigns grid coordinates to each processor. Users must call this routine before calling
any routine that uses information about the 3D grid of processors. The arguments should be the same on all
nodes.
The GRIDINIT3D routine accepts the following arguments:
ictxt Integer. (output)
Handle that describes the 3D grid.
npx Integer. (input)
Number of processors assigned to the X dimension of the processor grid. This argument must
be a power of 2.
npy Integer. (input)
Number of processors assigned to the Y dimension of the processor grid. This argument must
be a power of 2.
npz Integer. (input)
Number of processors assigned to the Z dimension of the processor grid. This argument must be
a power of 2.
As an example, consider a partition of 16 processors (N$PES = 16) that will be initialized as a 3D grid of
size 2-by-4-by-2 (that is, 2 processors assigned to the X dimension, 4 to the Y dimension and 2 to the Z
dimension). GRIDINIT3D assigns the following coordinates to the processors:

548 004– 2081– 002


GRIDINIT3D ( 3S ) GRIDINIT3D ( 3S )

Z = 0

Y 0 1 2 3
X |----| ----|- --- |-- --|
0 | 0 | 2 | 4 | 6 |
|-- --|--- -|- ---|-- --|
1 | 1 | 3 | 5 | 7 |
|-- --| ----|- ---|-- --|

Z = 1

Y 0 1 2 3
X |-- --| ----|- ---|-- --|
0 | 8 | 10 | 12 | 14 |
|----| ----|- --- |-- --|
1 | 9 | 11 | 13 | 15 |
|-- --|--- -|- ---|----|

In this case processor 2 would have coordinates (0,1,0) and processor 13 would have coordinates (1,2,1).

SEE ALSO
DESCINIT3D(3S), GRIDINFO3D(3S), PCOORD3D(3S), PNUM3D(3S)

004– 2081– 002 549


IGAMN2D ( 3S ) IGAMN2D ( 3S )

NAME
IGAMN2D, SGAMN2D, CGAMN2D – Determines minimum absolute values of rectangular matrices

SYNOPSIS
CALL IGAMN2D (icntxt, scope, top, m, n, a, lda, ra, ca, ldia, rdest, cdest)
CALL SGAMN2D (icntxt, scope, top, m, n, a, lda, ra, ca, ldia, rdest, cdest)
CALL CGAMN2D (icntxt, scope, top, m, n, a, lda, ra, ca, ldia, rdest, cdest)

IMPLEMENTATION
UNICOS/mk systems

DESCRIPTION
IGAMN2D determines minimum absolute values of rectangular matrices.
IGAMN2D communicates integer data. SGAMN2D communicates real data. CGAMN2D communicates
complex data.
These routines have the following arguments:
icntxt Integer. (input)
Context handle returned by a call to BLACS_GRIDINIT(3S).
scope Character*1. (input)
Specifies the processors that participate in the operation, using the grid specified by a previous
call to BLACS_GRIDINIT(3S).
scope = R or r: row of processors
scope = C or c: column of processors
scope = A or a: all processors
top Character*1. (input)
Network topology. Only the h topology (minimum spanning tree) is currently supported.
m Integer. (input)
Specifies the number of rows in matrix a. m must be ≥ 0.
n Integer. (input)
Specifies the number of columns in matrix a. n must be ≥ 0.
a IGAMN2D: Integer array, dimension (lda,n). (input/output)
SGAMN2D: Real array, dimension (lda,n). (input/output)
CGAMN2D: Complex array, dimension (lda,n). (input/output)
On entry, a is an m-by-n matrix of values. a is such that, a(i, j) is the element of maximum
absolute value from the (i, j) entry of all the input arrays.

550 004– 2081– 002


IGAMN2D ( 3S ) IGAMN2D ( 3S )

lda Integer. (input)


The leading dimension of the array a. lda ≥ MIN(m,1).
ra Integer array of dimension (ldia,n). (output)
On exit, ra(i, j) is the row index of the processor that provided a(i, j) in the output array.
ca Integer array of dimension (ldia,n). (output)
On exit, ca(i, j) is the column index of the processor that provided a(i, j) in the output array.
ldia Integer.
Leading dimension of integer arrays ra and ca. ldia ≥ MAX(m,1).
rdest Ignored.
cdest Ignored.

NOTES
The m, n, and lda arguments determine the matrix shape. For an operation to proceed, all processors
indicated by the scope argument must call the given routine. The result is left on all processors indicated by
the scope argument.
These routines were named IGMIN2D, SGMIN2D, and CGMIN2D in a previous release.

SEE ALSO
BLACS_GRIDINIT(3S), IGAMX2D(3S), IGSUM2D(3S), INTRO_BLACS(3S)

004– 2081– 002 551


IGAMX2D ( 3S ) IGAMX2D ( 3S )

NAME
IGAMX2D, SGAMX2D, CGAMX2D – Determines maximum absolute values of rectangular matrices

SYNOPSIS
CALL IGAMX2D (icntxt, scope, top, m, n, a, lda, ra, ca, ldia, rdest, cdest)
CALL SGAMX2D (icntxt, scope, top, m, n, a, lda, ra, ca, ldia, rdest, cdest)
CALL CGAMX2D (icntxt, scope, top, m, n, a, lda, ra, ca, ldia, rdest, cdest)

IMPLEMENTATION
UNICOS/mk systems

DESCRIPTION
IGAMX2D determines maximum absolute values of rectangular matrices.
IGAMX2D communicates integer data. SGAMX2D communicates real data. CGAMX2D communicates
complex data.
These routines have the following arguments:
icntxt Integer. (input)
Context handle returned by a call to BLACS_GRIDINIT(3S).
scope Character*1. (input)
Specifies the processors that participate in the operation, using the grid specified by a previous
call to BLACS_GRIDINIT(3S).
scope = R or r: row of processors
scope = C or c: column of processors
scope = A or a: all processors
top Character*1. (input)
Network topology. Only the h topology (minimum spanning tree) is currently supported.
m Integer. (input)
Specifies the number of rows in matrix a. m must be ≥ 0
n Integer. (input)
Specifies the number of columns in matrix a. n must be ≥ 0.
a IGAMX2D: Integer array, dimension (lda,n). (input/output)
SGAMX2D: Real array, dimension (lda,n). (input/output)
CGMAX: Complex array, dimension (lda,n). (input/output)
On entry, a is an m-by-n matrix of values. On exit, a is such that a(i, j) is the element of
maximum absolute value from the (i, j) entry of all the input arrays.

552 004– 2081– 002


IGAMX2D ( 3S ) IGAMX2D ( 3S )

lda Integer. (input)


The leading dimension of the array a. lda ≥ MAX(m,1).
ra Integer array of dimension (ldia,n). (output)
On exit, ra(i, j) is the row index of the processor that provided a(i, j) in the output array.
ca Integer array of dimension (ldia,n). (output)
On exit, ca(i, j) is the column index of the processor that provided a(i, j) in the output array.
ldia Integer. (input)
Leading dimension of integer arrays ra and ca. ldia ≥ MAX(m,1).
rdest Ignored.
cdest Ignored.

NOTES
The m, n, and lda arguments determine the matrix shape. For an operation to proceed, all processors
indicated by the scope argument must call the given routine. The result is left on all processors indicated by
the scope argument.
These routines were named IGMAX2D, SGMAX2D, and CGMAX2D in a previous release.

SEE ALSO
BLACS_GRIDINIT(3S), IGAMN2D(3S), IGSUM2D(3S), INTRO_BLACS(3S)

004– 2081– 002 553


IGEBR2D ( 3S ) IGEBR2D ( 3S )

NAME
IGEBR2D, SGEBR2D, CGEBR2D – Receives a broadcast general rectangular matrix from all or a subset of
processors

SYNOPSIS
CALL IGEBR2D (icntxt, scope, top, m, n, a, lda, rsrc, csrc)
CALL IGEBR2D (icntxt, scope, top, m, n, a, lda, rsrc, csrc)
CALL IGEBR2D (icntxt, scope, top, m, n, a, lda, rsrc, csrc)

IMPLEMENTATION
UNICOS/mk systems

DESCRIPTION
IGEBR2D receives a broadcast general rectangular matrix from all or a subset of processors. The source of
the broadcast uses the IGEBS2D(3S) routine to send the matrix. Execution does not resume until the data
arrives.
IGEBR2D communicates integer data. SGEBR2D communicates real data. CGEBR2D communicates
complex data.
These routines have the following arguments:
icntxt Integer. (input)
Context handle returned by a call to BLACS_GRIDINIT(3S).
scope Character*1. (input)
Specifies the processors that participate in the operation, using the grid specified by a previous
call to BLACS_GRIDINIT(3S).
scope = R or r: row of processors
scope = C or c: column of processors
scope = A or a: all processors
top Character*1. (input)
Specifies the network topology used by the broadcast.
top = I or i: increasing ring
top = D or d: decreasing ring
top = H or h: hypercube
m Integer. (input)
Specifies the number of rows in matrix a. m must be ≥ 0.
n Integer. (input)
Specifies the number of columns in matrix a. n must be ≥ 0.

554 004– 2081– 002


IGEBR2D ( 3S ) IGEBR2D ( 3S )

a IGEBR2D: Integer array, dimension (lda,n). (output)


SGEBR2D: Real array, dimension (lda,n). (output)
CGEBR2D: Complex array, dimension (lda,n). (output)
The m-by-n array at which the message is to be received.
lda Integer. (input)
The leading dimension of the array a. lda ≥ MAX (m,1).
rsrc Integer. (input)
The row index of the source processor in the processor grid.
csrc Integer. (input)
Column index of the source processor in the processor grid.

NOTES
The m, n, and lda arguments determine the matrix shape. Any processor using a send operation and the
matching receive operation must have the same m and n.
For an operation to proceed, all processors indicated by the scope argument must call the routine.
These routines will default to the h topology if called with any of the other values of top that are supported
by the standard version of the BLACS from the University of Tennessee (except on UNICOS/mk systems).

SEE ALSO
BLACS_GRIDINIT(3S), IGEBS2D(3S), INTRO_BLACS(3S)

004– 2081– 002 555


IGEBS2D ( 3S ) IGEBS2D ( 3S )

NAME
IGEBS2D, SGEBS2D, CGEBS2D – Broadcasts a general rectangular matrix to all or a subset of processors

SYNOPSIS
CALL IGEBS2D (icntxt, scope, top, m, n, a, lda)
CALL SGEBS2D (icntxt, scope, top, m, n, a, lda)
CALL CGEBS2D (icntxt, scope, top, m, n, a, lda)

IMPLEMENTATION
UNICOS/mk systems

DESCRIPTION
IGEBS2D broadcasts a general rectangular matrix to all or a subsection of processors. The other processors
use the IGERV2D(3S) routine to receive the broadcast matrix. Execution does not resume until the data
arrives.
IGEBS2D communicates integer data. SGEBS2D communicates real data. CGEBS2D communicates
complex data.
These routines have the following arguments:
icntxt Integer. (input)
Context handle returned by a call to BLACS_GRIDINIT(3S).
scope Character*1. (input)
Specifies the processors that participate in the operation, using the grid specified by a previous
call to BLACS_GRIDINIT(3S).
scope = R or r: row of processors
scope = C or c: column of processors
scope = A or a: all processors
top Character*1. (input)
Specifies the network topology used by the broadcast.
top = I or i: increasing ring
top = D or d: decreasing ring
top = H or h: hypercube
m Integer. (input)
Specifies the number of rows in matrix a. m must be ≥ 0.
n Integer. (input)
Specifies the number of columns in matrix a. n must be ≥ 0.

556 004– 2081– 002


IGEBS2D ( 3S ) IGEBS2D ( 3S )

a IGEBS2D: Integer array, dimension (lda,n). (input)


SGEBS2D: Real array, dimension (lda,n). (input)
CGEBS2D: Complex array, dimension (lda,n). (input)
The m-by-n array to be sent.
lda Integer. (input)
The leading dimension of the array a. lda ≥ MAX(m,1).

NOTES
The m, n, and lda arguments determine the matrix shape. Any processor using a send operation and the
matching receive operation must have the same m and n.
For an operation to proceed, all processors indicated by the scope argument must call the routine.
These routines will default to the h topology if called with any of the other values of top that are supported
by the standard version of the BLACS from the University of Tennessee (except on Cray T3D systems).

SEE ALSO
BLACS_GRIDINIT(3S), IGEBR2D(3S), IGERV2D(3S), INTRO_BLACS(3S)

004– 2081– 002 557


IGERV2D ( 3S ) IGERV2D ( 3S )

NAME
IGERV2D, SGERV2D, CGERV2D – Receives a general rectangular matrix from another processor

SYNOPSIS
CALL IGERV2D (icntxt, m, n, a, lda, rsrc, csrc)
CALL SGERV2D (icntxt, m, n, a, lda, rsrc, csrc)
CALL CGERV2D (icntxt, m, n, a, lda, rsrc, csrc)

IMPLEMENTATION
UNICOS/mk systems

DESCRIPTION
IGERV2D receives a general rectangular matrix from another processor. The other processor uses the
IGESD2D(3S) routine to send the matrix. Execution does not resume until the data arrives.
IGERV2D communicates integer data. SGERV2D communicates real data. CGERV2D communicates
complex data.
These routines have the following arguments:
icntxt Integer. (input)
Context handle returned by a call to BLACS_GRIDINIT(3S).
m Integer. (input)
Specifies the number of rows in matrix a. m must be ≥ 0.
n Integer. (input)
Specifies the number of columns in matrix a. n must be ≥ 0.
a IGERV2D: Integer array, dimension (lda,n). (output)
SGERV2D: Real array, dimension (lda,n). (output)
CGERV2D: Complex array, dimension (lda,n). (output)
The m-by-n array at which the message is to be received.
lda Integer. (input)
The leading dimension of the array a. lda ≥ MAX(m,1).
rsrc Integer. (input)
Row index of source processor.
csrc Integer. (input)
Column index of source processor.

558 004– 2081– 002


IGERV2D ( 3S ) IGERV2D ( 3S )

NOTES
The m, n, and lda arguments determine the matrix shape. Any processor using a send operation and the
matching receive operation must have the same m and n.

SEE ALSO
BLACS_GRIDINIT(3S), IGESD2D(3S), INTRO_BLACS(3S)

004– 2081– 002 559


IGESD2D ( 3S ) IGESD2D ( 3S )

NAME
IGESD2D, SGESD2D, CGESD2D – Sends a general rectangular matrix to another processor

SYNOPSIS
CALL IGESD2D (icntxt, m, n, a, lda, rdest, cdest)
CALL SGESD2D (icntxt, m, n, a, lda, rdest, cdest)
CALL CGESD2D (icntxt, m, n, a, lda, rdest, cdest)

IMPLEMENTATION
UNICOS/mk systems

DESCRIPTION
IGESD2D sends a general rectangular matrix to another processor. The other processor uses the
IGERV2D(3S) routine to receive the matrix. Execution does not resume until the data arrives.
IGESD2D communicates integer data. SGESD2D communicates real data. CGESD2D communicates
complex data.
These routines have the following arguments:
icntxt Integer. (input)
Context handle returned by a call to BLACS_GRIDINIT(3S).
m Integer. (input)
Specifies the number of rows in matrix a. m must be ≥ 0.
n Integer. (input)
Specifies the number of columns in matrix a. n must be ≥ 0.
a IGESD2D: Integer array, dimension (lda,n). (input)
SGESD2D: Real array, dimension (lda,n). (input)
CGESD2D: Complex array, dimension (lda,n). (input)
The m-by-n array to be sent.
lda Integer. (input)
The leading dimension of the array a. lda ≥ MAX(m,1).
rdest Integer. (input)
Row index of destination processor.
cdest Integer. (input)
Column index of destination processor.

560 004– 2081– 002


IGESD2D ( 3S ) IGESD2D ( 3S )

NOTES
The m, n, and lda arguments determine the matrix shape. Any processor using a send operation and the
matching receive operation must have the same m and n.

SEE ALSO
BLACS_GRIDINIT(3S), IGERV2D(3S), INTRO_BLACS(3S)

004– 2081– 002 561


IGSUM2D ( 3S ) IGSUM2D ( 3S )

NAME
IGSUM2D, SGSUM2D, CGSUM2D – Performs element summation operations on rectangular matrices

SYNOPSIS
CALL IGSUM2D (icntxt, scope, top, m, n, a, lda, rdest, cdest)
CALL SGSUM2D (icntxt, scope, top, m, n, a, lda, rdest, cdest)
CALL CGSUM2D (icntxt, scope, top, m, n, a, lda, rdest, cdest)

IMPLEMENTATION
UNICOS/mk systems

DESCRIPTION
IGSUM2D performs element summation operations on rectangular matrices.
IGSUM2D communicates integer data. SGSUM2D communicates real data. CGSUM2D communicates
complex data.
These routines have the following arguments:
icntxt Integer. (input)
Context handle returned by a call to BLACS_GRIDINIT(3S).
scope Character*1. (input)
Specifies the processors that participate in the operation, using the grid specified by a previous
call to BLACS_GRIDINIT(3S).
scope = R or r: row of processors
scope = C or c: column of processors
scope = A or a: all processors
top Character*1. (input)
Network topology. Only the h topology (minimum spanning tree) is currently supported.
m Integer. (input)
Specifies the number of rows in matrix a. m must be ≥ 0.
n Integer. (input)
Specifies the number of columns in matrix a. n must be ≥ 0.
a IGSUM2D: Integer array, dimension (lda,n). (input/output)
SGSUM2D: Real array, dimension (lda,n). (input/output)
CGSUM2D: Complex array, dimension (lda,n). (input/output)
On exit, a is such that a(i, j) is the sum of all (i, j) entries in the input arrays.
lda Integer. (input)
The leading dimension of the array a. lda ≥ MAX(m,1).

562 004– 2081– 002


IGSUM2D ( 3S ) IGSUM2D ( 3S )

rdest Ignored.
cdest Ignored.

NOTES
The m, n, and lda arguments determine the matrix shape. For an operation to proceed, all processors
indicated by the scope argument must call the given routine. The result is left on all processors indicated by
the scope argument.

SEE ALSO
BLACS_GRIDINIT(3S), IGAMX2D(3S), IGAMN2D(3S), INTRO_BLACS(3S)

004– 2081– 002 563


ITRBR2D ( 3S ) ITRBR2D ( 3S )

NAME
ITRBR2D, STRBR2D, CTRBR2D – Receives a broadcast trapezoidal rectangular matrix from all or a subset
of processors

SYNOPSIS
CALL ITRBR2D (icntxt, scope, top, uplo, diag, m, n, a, lda, rsrc, csrc)
CALL STRBR2D (icntxt, scope, top, uplo, diag, m, n, a, lda, rsrc, csrc)
CALL CTRBR2D (icntxt, scope, top, uplo, diag, m, n, a, lda, rsrc, csrc)

IMPLEMENTATION
UNICOS/mk systems

DESCRIPTION
ITRBR2D receives a broadcast trapezoidal matrix from all or a subset of processors. The source of the
broadcast uses the ITRBS2D(3S) routine to send the matrix. Execution does not resume until the data
arrives.
ITRBR2D communicates integer data. STRBR2D communicates real data. CTRBR2D communicates
complex data.
These routines have the following arguments:
icntxt Integer. (input)
Context handle returned by a call to BLACS_GRIDINIT(3S).
scope Character*1. (input)
Specifies the processors that participate in the operation, using the grid specified by a previous
call to BLACS_GRIDINIT(3S).
scope = R or r: row of processors
scope = C or c: column of processors
scope = A or a: all processors
top Character*1. (input)
Specifies the network topology used by the broadcast.
top = I or i: increasing ring
top = D or d: decreasing ring
top = H or h: hypercube
uplo Character*1. (input)
Specifies whether the trapezoid is in the upper or lower triangular part of the matrix a, as
follows:
If uplo = ’U’ or ’u’, the trapezoid is in the upper triangular part of the matrix.
If uplo = ’L’ or ’l’, the trapezoid is in the lower triangular part of the matrix.

564 004– 2081– 002


ITRBR2D ( 3S ) ITRBR2D ( 3S )

diag Character*1. (input)


Specifies whether the upper or lower triangular part of the matrix a is referenced, as follows:
If diag = ’U’ or ’u’, specifies a unit trapezoidal matrix.
If diag = ’N’ or ’n’, specifies a non-unit trapeziodal matrix.
m Integer. (input)
Specifies the number of rows in matrix a. m must ber ≥ 0.
n Integer. (input)
Specifies the number of columns in matrix a. n must be ≥ 0.
a ITRBR2D: Integer array, dimension (lda,n). (output)
STRBR2D: Real array, dimension (lda,n). (output)
CTRBR2D: Complex array, dimension (lda,n). (output)
The m-by-n matrix containing the trapezoidal matrix to be sent.
lda Integer. (input)
The leading dimension of the array a. lda ≥ MAX (m,1).
rsrc Row index of source processor. (input)
csrc Column index of source processor. (input)

NOTES
The m, n, and lda arguments determine the matrix shape. Any processor using a send operation and the
matching receive operation must have the same m and n.
For an operation to proceed, all processors indicated by the scope argument must call the routine.
These routines will default to the h topology if called with any of the other values of top that are supported
by the standard version of the BLACS from the University of Tennessee (except on Cray T3D systems).

SEE ALSO
BLACS_GRIDINIT(3S), INTRO_BLACS(3S), ITRBS2D(3S)

004– 2081– 002 565


ITRBS2D ( 3S ) ITRBS2D ( 3S )

NAME
ITRBS2D, STRBS2D, CTRBS2D – Broadcasts a trapezoidal rectangular matrix to all or a subset of
processors

SYNOPSIS
CALL ITRBS2D (icntxt, scope, top, uplo, diag, m, n, a, lda)
CALL STRBS2D (icntxt, scope, top, uplo, diag, m, n, a, lda)
CALL CTRBS2D (icntxt, scope, top, uplo, diag, m, n, a, lda)

IMPLEMENTATION
UNICOS/mk systems

DESCRIPTION
ITRBS2D broadcasts a trapezoidal rectangular matrix to all or a subset of processors. The other processors
use the ITRBR2D(3S) routine to receive the broadcast matrix. Execution does not resume until the data
arrives.
ITRBS2D communicates integer data. STRBS2D communicates real data. CTRBS2D communicates
complex data.
These routines have the following arguments:
icntxt Integer. (input)
Context handle returned by a call to BLACS_GRIDINIT(3S).
scope Character*1. (input)
Specifies the processors that participate in the operation, using the grid specified by a previous
call to BLACS_GRIDINIT(3S).
scope = R or r: row of processors
scope = C or c: column of processors
scope = A or a: all processors
top Character*1. (input)
Specifies the network topology used by the broadcast.
top = I or i: increasing ring
top = D or d: decreasing ring
top = H or h: hypercube
uplo Character*1. (input)
Specifies whether the trapezoid is in the upper or lower triangular part of the matrix a, as
follows:
If uplo = ’U’ or ’u’, the trapezoid is in the upper triangular part of the matrix.
If uplo = ’L’ or ’l’, the trapezoid is in the lower triangular part of the matrix.

566 004– 2081– 002


ITRBS2D ( 3S ) ITRBS2D ( 3S )

diag Character*1. (input)


Specifies whether the matrix a has ones on the diagonal, as follows:
If diag = ’U’ or ’u’, specifies a unit trapezoidal matrix.
If diag = ’N’ or ’n’, specifies a non-unit trapeziodal matrix.
m Integer. (input)
Specifies the number of rows in matrix a. m must be ≥ 0.
n Integer. (input)
Specifies the number of columns in matrix a. n must be ≥ 0.
a ITRBS2D: Integer array, dimension (lda,n). (input)
STRBS2D: Real array, dimension (lda,n). (input)
CTRBS2D: Complex array, dimension (lda,n). (input)
The m-by-n matrix containing the trapezoidal matrix where the message is to be sent.
lda Integer. (input)
The leading dimension of the array a. lda ≥ MAX (m,1).

NOTES
The m, n, and lda arguments determine the matrix shape. Any processor using a send operation and the
matching receive operation must have the same m and n.
For an operation to proceed, all processors indicated by the scope argument must call the routine.
These routines will default to the h topology if called with any of the other values of top that are supported
by the standard version of the BLACS from the University of Tennessee (except on Cray T3D systems).

SEE ALSO
BLACS_GRIDINIT(3S), INTRO_BLACS(3S), ITRBR2D(3S)

004– 2081– 002 567


ITRRV2D ( 3S ) ITRRV2D ( 3S )

NAME
ITRRV2D, STRRV2D, CTRRV2D – Receives a trapezoidal rectangular matrix from another processor

SYNOPSIS
CALL ITRRV2D (icntxt, uplo, diag, m, n, a, lda, rsrc, csrc)
CALL STRRV2D (icntxt, uplo, diag, m, n, a, lda, rsrc, csrc)
CALL CTRRV2D (icntxt, uplo, diag, m, n, a, lda, rsrc, csrc)

IMPLEMENTATION
UNICOS/mk systems

DESCRIPTION
ITRRV2D receives a trapezoidal matrix from another processor. The other processor uses the ITRSD2D(3S)
routine to send the matrix. Execution does not resume until the data arrives.
ITRRV2D communicates integer data. STRRV2D communicates real data. CTRRV2D communicates
complex data.
These routines have the following arguments:
icntxt Integer. (input)
Context handle returned by a call to BLACS_GRIDINIT(3S).
uplo Character*1. (input)
Specifies whether the trapezoid is in the upper or lower triangular part of the matrix a, as
follows:
If uplo = ’U’ or ’u’, the trapezoid is in the upper triangular part of the matrix.
If uplo = ’L’ or ’l’, the trapezoid is in the lower triangular part of the matrix.
diag Character*1. (input)
Specifies whether the matrix a has ones on the diagonal, as follows:
If diag = ’U’ or ’u’, specifies a unit trapezoidal matrix.
If diag = ’N’ or ’n’, specifies a non-unit trapezoidal matrix.
m Integer. (input)
Specifies the number of rows in matrix a. m must be ≥ 0.
n Integer. (input)
Specifies the number of columns in matrix a. n must be ≥ 0.
a ITRRV2D: Integer array, dimension (lda,n). (output)
STRRV2D: Real array, dimension (lda,n). (output)
CTRRV2D: Complex array, dimension (lda,n). (output)
The m-by-n matrix containing the trapezoidal matrix to be sent.

568 004– 2081– 002


ITRRV2D ( 3S ) ITRRV2D ( 3S )

lda Integer. (input)


The leading dimension of the array a. lda ≥ MAX (m,1).
rsrc Integer. (input)
Row index of source processor.
csrc Integer. (input)
Column index of source processor.

NOTES
The m, n, and lda arguments determine the matrix shape. Any processor using a send operation and the
matching receive operation must have the same m and n.

SEE ALSO
INTRO_BLACS(3S), ITRSD2D(3S)

004– 2081– 002 569


ITRSD2D ( 3S ) ITRSD2D ( 3S )

NAME
ITRSD2D, STRSD2D, CTRSD2D – Sends a trapezoidal rectangular matrix to another processor

SYNOPSIS
CALL ITRSD2D (icntxt, uplo, diag, m, n, a, lda, rdest, cdest)
CALL STRSD2D (icntxt, uplo, diag, m, n, a, lda, rdest, cdest)
CALL CTRSD2D (icntxt, uplo, diag, m, n, a, lda, rdest, cdest)

IMPLEMENTATION
UNICOS/mk systems

DESCRIPTION
ITRSD2D sends a trapezoidal matrix to another processor. The other processor uses the ITRRV2D(3S)
routine to receive the matrix. Execution does not resume until the data arrives.
ITRSD2D communicates integer data. STRSD2D communicates real data. CTRSD2D communicates
complex data.
These routines have the following arguments:
icntxt Integer. (input)
Context handle returned by a call to BLACS_GRIDINIT(3S).
uplo Character*1. (input)
Specifies whether the trapezoid is in the upper or lower triangular part of the matrix a, as
follows:
If uplo = ’U’ or ’u’, the trapezoid is in the upper triangular part of the matrix.
If uplo = ’L’ or ’l’, the trapezoid is in the lower triangular part of the matrix.
diag Character*1. (input)
Specifies whether the matrix a has ones on the diagonal, as follows:
If diag = ’U’ or ’u’, specifies a unit trapezoidal matrix.
If diag = ’N’ or ’n’, specifies a non-unit trapezoidal matrix.
m Integer. (input)
Specifies the number of rows in matrix a. m must be ≥ 0.
n Integer. (input)
Specifies the number of columns in matrix a. n must be ≥ 0.
a ITRSD2D: Integer array, dimension (lda,n). (input)
STRSD2D: Real array, dimension (lda,n). (input)
CTRSD2D: Complex array, dimension (lda,n). (input)
The m-by-n matrix containing the trapezoidal matrix where the message is to be sent.

570 004– 2081– 002


ITRSD2D ( 3S ) ITRSD2D ( 3S )

lda Integer. (input)


The leading dimension of the array a. lda ≥ MAX (m,1).
rdest Integer. (input)
Row index of destination processor.
cdest Integer. (input)
Column index of destination processor.

NOTES
The m, n, and lda arguments determine the matrix shape. Any processor using a send operation and the
matching receive operation must have the same m and n.

SEE ALSO
BLACS_GRIDINIT(3S), INTRO_BLACS(3S), ITRRV2D(3S)

004– 2081– 002 571


MYNODE ( 3S ) MYNODE ( 3S )

NAME
MYNODE – Returns the calling processor’s assigned number

SYNOPSIS
MY_NUMBER = MYNODE()

IMPLEMENTATION
UNICOS/mk systems

DESCRIPTION
MYNODE returns a number between 0 and NPES– 1, where NPES is the number of processors in the
mainframe partition on which the program is executing.

SEE ALSO
BLACS_GRIDINFO(3S), BLACS_PCOORD(3S), BLACS_PNUM(3), INTRO_BLACS(3S)

572 004– 2081– 002


PCOORD3D ( 3S ) PCOORD3D ( 3S )

NAME
PCOORD3D – Computes three-dimensional (3D) processor grid coordinates

SYNOPSIS
CALL PCOORD3D (ictxt, pe_num, pex, pey, pez)

IMPLEMENTATION
UNICOS/mk systems

DESCRIPTION
PCOORD3D computes processor grid coordinates pex, pey, and pez by using pe_num.
This routine accepts the following arguments:
ictxt Integer. (input)
Handle that describes the grid initalized by GRIDINIT3D(3S).
pe_num Integer. (input)
Processing element.
pex Integer. (output)
X coordinate for processor.
pey Integer. (output)
Y coordinate for processor.
pez Integer. (output)
Z coordinate for processor.

NOTES
The GRIDINIT3D(3S) routine must be called somewhere in the program before the first call to PCOORD3D.

SEE ALSO
DESCINIT3D(3S), GRIDINFO3D(3S), GRIDINIT3D(3S), PNUM3D(3S)

004– 2081– 002 573


PNUM3D ( 3S ) PNUM3D ( 3S )

NAME
PNUM3D – Returns the processor element number for specified three-dimensional (3D) coordinates

SYNOPSIS
PE_number = PNUM3D (ictxt, pex, pey, pez)

IMPLEMENTATION
UNICOS/mk systems

DESCRIPTION
PNUM3D returns the processor element number at grid coordinate pex, pey, and pez.
This routine accepts the following arguments:
ictxt Integer. (input)
Handle that describes the grid initalized by GRIDINIT3D(3S).
pex Integer. (input)
X coordinate of processor.
pey Integer. (input)
Y coordinate of processor.
pez Integer. (input)
Z coordinate of processor.

NOTES
The routine GRIDINIT3D(3S) must be called somewhere in the program before the first call to PNUM3D.

SEE ALSO
DESCINIT3D(3S), GRIDINFO3D(3S), GRIDINIT3D(3S), PCOORD3D(3S)

574 004– 2081– 002


INTRO_CORE ( 3S ) INTRO_CORE ( 3S )

NAME
INTRO_CORE – Introduction to the Scientific Library out-of-core routines for linear algebra

IMPLEMENTATION
UNICOS systems

DESCRIPTION
The Scientific Library out-of-core routines for linear algebra let you solve problems in which it is not
possible, or not convenient, to store all of the data in main memory during program execution. The central
concept on which these routines are based is the idea of the virtual matrix, which is stored outside main
memory (perhaps on disk or on SSD), and referenced through a Fortran I/O unit number.
The following list describes the purpose and name of each out-of-core routine. The first name listed is the
name of the man page that documents the routines.
Virtual Matrix Initialization and Termination Routines
• VBEGIN: Initializes out-of-core routine data structures.
• VEND: Handles terminal processing for the out-of-core routines.
• VSTORAGE: Declares packed storage mode for a triangular, symmetric, or Hermitian virtual matrix.
Virtual Matrix Copy Routines
• SCOPY2RV, CCOPY2RV: Copies a submatrix of a real (in memory) matrix to a virtual matrix.
• SCOPY2VR, CCOPY2VR: Copies a submatrix of a virtual matrix to a real (in memory) matrix.
Virtual Linear Algebra Package Routines
• VSGETRF, VCGETRF: Computes an LU factorization of a virtual general matrix, using partial pivoting
with row interchanges.
• VSGETRS, VCGETRS: Solves a system of linear equations AX = B; A is a virtual general matrix whose
LU factorization has been computed by VSGETRF(3S).
• VSPOTRF: Computes the Cholesky factorization of a virtual real symmetric positive definite matrix.
• VSPOTRS: Solves a system of linear equations AX = B; A is a virtual real symmetric positive definite
matrix whose Cholesky factorization has been computed by VSPOTRF(3S).
Virtual Level 3 Basic Linear Algebra
• VSGEMM, VCGEMM: Multiplies a virtual general matrix by a virtual general matrix.
• VSTRSM, VCTRSM: Solves a virtual triangular system of equations with multiple right-hand sides.
• VSSYRK: Performs symmetric rank k update of virtual symmetric matrix.

004– 2081– 002 575


INTRO_CORE ( 3S ) INTRO_CORE ( 3S )

General Introduction
Some problems are so large that it is not possible, or at least not convenient, to store all of the data in main
memory during program execution. For such problems, you can use an out-of-core technique. This term is
an anachronism, referring as it does to magnetic core memory, but the name is still used to refer to
algorithms that combine input and output with computation to solve problems in which the data resides on
disk or some other secondary random-access storage device.
Consider the problem of solving a system of simultaneous linear equations. If the system contains n
equations with n unknowns, the amount of data required to represent the problem is n 2 floating-point
numbers. The amount of computation required to compute a solution is approximately 2n 3 ⁄ 3 floating-point
operations. For example, if n = 30,000, the amount of memory required to store the matrix is 900 Mwords.
If the effective computational rate were 2.0 GFLOPS, the amount of time required to solve the problem
would be 9,000 seconds (2.5 hours).
This amount of computation is large compared to the amount of input and output required, so this problem is
computationally intensive. Therefore, it is an excellent candidate for solution by an out-of-core technique
(especially because solving large problems of this type is of great practical importance in many areas of
application).
Out-of-core Linear Algebra Software
The Scientific Library contains a unified set of routines for out-of-core solution of problems in dense linear
algebra. These routines are designed to be easy to use and highly efficient. The design of the out-of-core
routines is parallel to the design of library software that solves similar problems "in-core" (in memory),
namely LAPACK (Linear Algebra PACKage) and BLAS (Basic Linear Algebra Subprograms).
The LAPACK library is a state-of-the-art package for solving problems in dense linear algebra (see the
INTRO_LAPACK(3S) man page for more information on the Cray implementation of LAPACK). For out-
of-core problems in dense linear algebra, the Scientific Library has followed a software design which, from a
user perspective, is very similar to the LAPACK routines, and which uses similar or identical algorithms.
These routines are called the Virtual LAPACK, or VLAPACK, routines.
The LAPACK routines perform much of their computational work through calls to the Level 3 Basic Linear
Algebra Subprograms (Level 3 BLAS), which are designed to perform very efficiently on parallel vector
computers (see the INTRO_BLAS3(3S) man page for more information on the Level 3 BLAS routines).
Likewise, the Virtual LAPACK routines are based on a set of Virtual BLAS, called VBLAS.
Some features of these out-of-core library routines include the following:
• The routines are based on state-of-the-art algorithms for numerical linear algebra.
• Highly-efficient computational kernels perform at peak attainable speed on the hardware.
• Highly-efficient input and output is done automatically by the software; therefore, users do not have to be
involved in the details of the I/O routines.
• Virtual matrices are easy to create and use.
• Detailed performance measurement capabilities are built in. Performance statistics can be printed
automatically to give users complete information on software and hardware performance.

576 004– 2081– 002


INTRO_CORE ( 3S ) INTRO_CORE ( 3S )

• Users can easily change certain tuning parameters to optimize the software for each specific problem and
computing environment.
Virtual Matrices
An important concept in the out-of-core routines is that of a virtual matrix. You can think of a virtual
matrix as a mathematical matrix, the elements of which are accessed in a certain way, using subroutine calls.
In some ways, a virtual matrix is like a two-dimensional Fortran array. Like a Fortran array, a virtual matrix
has elements that are real numbers. Like a Fortran array, a virtual matrix has subscripts that are integers
between 1 and some positive number n, and has a certain "leading dimension," which the user defines when
creating the virtual matrix.
Unlike a Fortran array, a virtual matrix is not accessed directly from a Fortran (or C) program. Instead, you
access a virtual matrix by using calls to the out-of-core routines.
These subroutines provide the only mechanism for manipulating a virtual matrix. In particular, a user never
has to do any explicit input or output to read or write a virtual matrix. Even though a virtual matrix is
actually stored as a file, users do not have to be concerned with the actual I/O. The library software handles
the I/O details automatically and efficiently, leaving users free to concentrate on the mathematical solution to
the problem at hand, and for the most part, to ignore the fact that out-of-core techniques are in use.
The next subsection, "Subroutine Types," briefly describes these routines. After that, the NOTES section
provides more specific information about virtual matrices.
Subroutine Types
The Scientific Library out-of-core software user interface comprises four types of subroutines:
• Initialization and termination routines
• Virtual matrix copy (VCOPY) routines
• Virtual LAPACK (VLAPACK) routines
• Virtual Level 3 BLAS (VBLAS) routines
The subsections that follow describe each subroutine.
Initialization and termination routines
You must initialize the underlying library routines by a call to the VBEGIN(3S) routine. This routine has
several optional arguments, all of which relate to tuning performance of the package. The most important
argument is an integer that specifies how many words to use for buffer space. VBEGIN(3S) automatically
allocates the requested amount of memory, using a call to the operating system. Likewise, you must call the
VEND(3S) routine when you are done with virtual linear algebra. VEND(3S) closes any open files that are
being used for virtual matrices and deallocates the memory that was allocated by VBEGIN(3S).
VSTORAGE(3S) declares that an existing virtual matrix (initialized with VBEGIN(3S)) is stored and
referenced in packed form. See the NOTES section for more on packed storage.

004– 2081– 002 577


INTRO_CORE ( 3S ) INTRO_CORE ( 3S )

Virtual matrix copy routines


As previously mentioned, an important feature of this software is that users never have to do any explicit
input or output to a virtual matrix. To initially create a virtual matrix, or to find out what is in a virtual
matrix after it is created, users should call the virtual copy routines.
To create a virtual matrix, users copy sections of an in-memory matrix to a section of a virtual matrix, or
vice versa, using virtual copy routines. For example, if you want to work with one column of the matrix at
a time, you can call a virtual copy routine to "Copy this vector, x, to column j of the virtual matrix A." Or,
conversely, you could call a virtual copy routine to "Get row i of the virtual matrix A and store it in vector
y."
The virtual copy routines are named SCOPY2RV(3S), SCOPY2VR(3S), CCOPY2RV(3S), and
CCOPY2VR(3S). The initial letter in each routine name, "S" or "C," stands for "Single-precision real" or
"Complex," respectively. The numeral "2" stands for "two-dimensional," because the routines copy parts of
matrices, as opposed to vectors. The last two letters, "RV" or "VR," stand for "real to virtual" or "virtual to
real," respectively.
You can use these routines to copy one row at a time, or one column at a time, or any rectangular submatrix
of the virtual matrix. You give the copy routine the subscripts of the upper-left corner of the submatrix and
the dimensions of the submatrix. In fact, you can even use the routines to fetch or store one element at a
time, although that would be less efficient. Users never have to do any explicit input or output to access
their virtual matrices. The library routines do all of the necessary I/O automatically. From the user
perspective, it is just a matrix copy operation.
Virtual LAPACK routines
The design of the out-of-core routines is parallel to the design of library software that solves similar
problems in memory, namely LAPACK and BLAS.
The LAPACK library is a state-of-the-art package for solving problems in dense linear algebra (see the
INTRO_LAPACK(3S) man page). For out-of-core problems in dense linear algebra, the Scientific Library
has followed a software design which, from a user perspective, is very similar to the LAPACK routines and
uses similar or identical algorithms. These routines are called the Virtual LAPACK (or VLAPACK)
routines.
The LAPACK routine for LU matrix factorization (with partial pivoting) is called SGETRF(3L). The virtual
matrix counterpart is called VSGETRF(3S). This virtual routine is very similar in its use, and its design, to
the original LAPACK routine. The main difference is that in place of the argument for the matrix name, the
virtual routine requires the name of a virtual matrix. The name of a virtual matrix is just an integer constant
or variable that gives the unit number of the file in which the matrix resides.
The virtual LAPACK routine VSGETRS(3S) performs back-substitution for solving systems of equations,
based on the matrix factorization produced by VSGETRF(3S). VSGETRS(3S) is the counterpart of the
LAPACK routine SGETRS(3L). Man pages for LAPACK routines, such as SGETRS, are available only
online, using the man(1) command).

578 004– 2081– 002


INTRO_CORE ( 3S ) INTRO_CORE ( 3S )

For applications in which the matrices contain complex numbers, rather than real numbers, you can use the
Virtual LAPACK routines VCGETRF(3S) and VCGETRS(3S).
Virtual Level 3 BLAS routines
The LAPACK routines perform much of their actual computation by calling Level 3 BLAS routines, which
are designed for speed and efficiency on parallel-vector computers (see the INTRO_BLAS3(3S) man page).
Likewise, the Virtual LAPACK routines are based on a set of Virtual Level 3 BLAS (VBLAS).
For example, the BLAS routine to perform a matrix multiply is called SGEMM(3S) (single-precision real) or
CGEMM(3S) (complex). The corresponding out-of-core routine for virtual matrices is VSGEMM(3S) or
VCGEMM(3S), respectively.
The calling sequences of the Virtual LAPACK and Virtual BLAS routines are similar to those of the
corresponding LAPACK and BLAS routines, but when an in-memory routine requires a matrix argument, the
corresponding virtual routine requires one or more arguments that specify a virtual matrix.

NOTES
This section describes further aspects of virtual matrices and the out-of-core routines that operate on them.
Unit Numbers
The name of a virtual matrix is an integer number between 1 and 99, inclusive. The name identifies the
Fortran unit number of the file in which the virtual matrix file is stored. By default, unit number 1 is
associated with file fort.1, unit 2 with file fort.2, and so on.
Do not use any unit number that your program is using for another purpose.
Also, do not use any of the following units: 0, 5, 6, 100, 101, or 102, because these unit numbers are, by
default, associated with the following special files:
stdin 5 and 100
stdout 6 and 101
stderr 0 and 102
You may close and reopen units 0, 5, and 6 as virtual matrices, but not units 100, 101, and 102.
You can associate a particular file with a particular unit number by using the "assign by unit" option of the
assign(1) command (see the assign(1) man page for more information about the assign command).
As an example, suppose you want to store your virtual matrix on a file in directory /tmp/xxx and call the
file mydata. If you choose to use Fortran unit number 3 for the file, prior to executing the program that
calls the out-of-core software, you could issue the following command line:
ass ign -a /tm p/xxx/ myd ata u:3

Within the out-of-core subroutines, you would use the number 3 as the value of the argument for the virtual
matrix name.

004– 2081– 002 579


INTRO_CORE ( 3S ) INTRO_CORE ( 3S )

File Format
A virtual matrix is actually stored as a file, in a special format that is useful only for the virtual linear
algebra routines. But outside of the program, at the operating system level, such a file can be copied,
moved, archived, compressed, and so on, just like any other binary file. The assign(1) command
determines the actual characteristics of the file, including the device to which it is assigned (that is, disk or
SSD).
Technically, a virtual matrix is a binary unblocked file. You do not have to specify the -s u option on the
assign command; you cannot use other formats or conversions in conjunction with the out-of-core
routines. If you try to use other formats or conversions, your program will abort with an Asynchronous
Queued I/O (AQIO) error message. Actually, the virtual matrix file is blocked into "pages," but this
blocking is done by the Scientific Library out-of-core routines, not by the system I/O routines; therefore, for
the assign command, the virtual matrix file is considered to be an unblocked file.
The actual input and output is done internally using a feature called Asynchronous Queued I/O (AQIO).
This feature allows highly-efficient, random-access I/O without using any unnecessary intermediate buffering
of data.
If you want to use a file of data that was created by some means other than using Virtual LAPACK or
Virtual BLAS routines, you should write a program that reads the file, using the usual Fortran I/O facilities,
and copies the file, one section at a time, to a virtual matrix, by using the virtual copy routines. Likewise, if
you want to use a virtual matrix as input to some other program, you should write a program that uses the
virtual copy routines to get data from the virtual matrix, and then write it out using the usual Fortran I/O
facilities. If only the virtual linear algebra routines use the data, it is most convenient to just work with the
virtual matrix files themselves, using the subroutines provided.
Leading Virtual Dimension
A virtual matrix has a certain "leading dimension" (that is, the first dimension) just like a Fortran
two-dimensional array. For instance, if the virtual matrix is 1000 by 2000 elements, the first (leading)
dimension is 1000. You should supply the value 1000 for the leading dimension argument in the
subroutines.
You can use any value for the leading dimension, but after it is defined, you cannot change it. If you
originally created the virtual matrix with 1000 for the leading dimension, you must always use the same
value in subsequent subroutine calls.
Definition and Redefinition of Elements
When accessing elements of a virtual matrix, the value of the first subscript must be in the range
1 ≤ i ≤ lvd

where i is the subscript, and lvd is the leading virtual dimension, as defined in the subroutine call. The
second subscript must be a positive integer. No set upper limit to the value of the second subscript exists.

580 004– 2081– 002


INTRO_CORE ( 3S ) INTRO_CORE ( 3S )

When you first create a virtual matrix, you must explicitly define every element of the matrix before you use
it in a computation. You can consider that any element you have not explicitly defined is undefined, and it
should not be referenced. For example, if you want to create an identity matrix of size 2000 by 2000, you
could zero out all 4,000,000 elements, then set the 2000 diagonal elements to 1, using the virtual copy
routines. You should not just set the diagonal elements to 1 and assume that the off-diagonal elements are 0.
After the elements of a virtual matrix are defined, their values remain defined unless you explicitly change
them or remove the file.
File Size
The size (in words) of a virtual matrix file is slightly larger than the total number of elements it contains.
Thus, a virtual matrix of size 5000 by 5000 would contain slightly more than 25 million words, or 200
Mbytes of data. The reason that it is not exactly 25 million words has to do with the way that the software
organizes data internally into pages.
When you define the value of a virtual matrix element, you are implicitly creating file space for all elements
up to the one you define. For example, if you declare that a virtual matrix has a leading dimension of 5000,
and you define a value for element (1, 1000), the software will create a virtual matrix file large enough to
contain elements (i, j) for 1 ≤ i ≤ 5000, 1 ≤ j ≤ 1000, which is 5 Mwords, or 40 Mbytes of file space.
Page Size
At the internal level, the software organizes virtual matrices into "pages." The size of a page is, by default,
256 by 256 words, or 65,536 words. I/O transfers, internally, are done in minimum units of one page. For
both disk and SSD, this size gives excellent performance.
You may redefine the page size that the out-of-core routines use, although it is not recommended unless
special performance tuning considerations are involved. The internal file structure of a virtual matrix
depends on the page size. Thus, a virtual matrix created with a certain page size could not later be read or
written using a different page size; but instead, it would have to be re-created.
Lower-level Routines
The user-level out-of-core routines are built on lower-level routines that manage work request queues, active
page queues, and other tasks. These routines in turn, depend on the AQIO routines and the operating system
routines.

004– 2081– 002 581


INTRO_CORE ( 3S ) INTRO_CORE ( 3S )

This layered software design can be illustrated as follows:


___ ______ ___ ______ ______ ___ ___ ______ ___ ___ ___ ______ ______ ___ __
| | | Vir tua l LAP ACK | |
| | | rou tin es | Ini tializ ation |
| USE R | Vir tual | ___ ______ ___ ___ ___ _| and |
| LEV EL | cop y | | ter min ati on |
| | rou tines | Virtua l BLA S | rou tin es |
| | | rou tines | |
|__ ______ _|____ ___ ___ ____|_ ___ ___ ___ ___ ___ ____|_ ___ ______ ______|
| | |
| | Que uin g |
| | rou tin es |
| | |
| |__ ______ ______ ___ ___ ______ ___ ___ ___ ______ ______ ___ __|
| LIBRAR Y | | |
| LEV EL | Pag e man age men t rou tines| Wor k man age men t rou tin es|
| |__ ______ ______ ___ ___ _____| ___ ___ ___ ______ ______ ___ __|
| | | |
| | AQI O | BLA S |
| | | |
|__ ______ _|____ ___ ___ ______ ___ ___ ___ ___ ___ ______ ___ ______ ______|
| | |
| OS | UNI COS ope rat ing sys tem |
| LEVEL | |
|__ ______ _|____ ___ ___ ______ ___ ___ ___ ___ ___ ______ ___ ______ ______|

Strassen’s Algorithm
Strassen’s algorithm for matrix multiplication is a recursive algorithm that is slightly faster than the ordinary
(inner product) algorithm. This additional speed is purchased at the expense of requiring some additional
memory for intermediate workspace. Because the Virtual LAPACK and Virtual BLAS routines are
managing their own memory anyway, and performing their work on individual page size blocks, it is an easy
matter to use Strassen’s algorithm everywhere that a matrix multiplication is required.
Strassen’s algorithm performs the floating-point operations for matrix multiplication in an order that is very
different than the usual vector method. In some cases, this could cause differences in round-off, possibly
leading to numerical differences in the result.
You may choose whether to use Strassen’s algorithm when calling VBEGIN(3S), either by passing an
argument to VBEGIN(3S), or by setting the VBLAS_STRASSEN environment variable before run time. For
C shell, use the following command:
set env VBLAS_ STRASS EN

582 004– 2081– 002


INTRO_CORE ( 3S ) INTRO_CORE ( 3S )

For POSIX or Korn shell, use the following command:


exp ort VBL AS_ STRASS EN

If the user selects Strassen’s algorithm, VBEGIN(3S) automatically allocates the necessary workspace. In
subsequent virtual matrix computations, Strassen’s algorithm is then automatically used for all matrix
multiplications, including matrix multiplications done as part of the VSGEMM(3S) and VSTRSM(3S) routines.
Multitasking
Like most of the Scientific Library routines, the Virtual LAPACK and Virtual BLAS routines perform
multitasking automatically. To control the use of multitasking, set the value of the NCPUS environment
variable before run time to an integer number that indicates the number of processors you want to use. For
example, to use only one CPU (which effectively turns off multitasking), use one of the following
commands. For the C shell, enter the following command:
set env NCP US 1

For the POSIX or Korn shell, enter the following command:


NCPUS= 1 ; exp ort NCP US

If you enter the following command for the C shell:


set env NCP US 4

or the following command for the POSIX or Korn shell:


NCPUS= 4 ; exp ort NCP US

the software will try to use four CPUs. The actual number of CPUs used depends on the availability of
resources (see the INTRO_LIBSCI(3S) man page for more information on multitasking in the Scientific
Library).
Complex Routines
Most of the out-of-core software described previously deals with matrices of real numbers. There are also
counterparts to these routines that work with matrices of complex numbers (numbers that have a real and
imaginary part). For example, the complex two-dimensional counterpart of the virtual copy routine
SCOPY2RV(3S) is routine CCOPY2RV(3S). Likewise, routine VCGETRF(3S) factors a general complex
virtual matrix. In the naming conventions for all routines, the letter "S" denotes real (that is, "single-
precision") data; the letter "C" denotes complex data.
Packed Storage
Packed storage of a triangular or symmetric matrix means that only half of the matrix is actually stored on
disk or SSD. If a real matrix is declared to be lower triangular, only the lower triangle is stored; if upper
triangular, only the upper triangle is stored. If the matrix is symmetric, either the lower or upper triangular
part may be stored.

004– 2081– 002 583


INTRO_CORE ( 3S ) INTRO_CORE ( 3S )

Likewise, a complex matrix may be lower or upper triangular, or may be symmetric, with only the lower or
upper triangle being stored. Additionally, a complex matrix may be Hermitian (equal to the conjugate of its
transpose), with either the lower or upper triangle being stored.
For the purpose of storing a matrix, the out-of-core routines do not have to distinguish between a triangular,
symmetric, or Hermitian matrix; they must know only which part of the matrix is being stored (that is, the
full matrix, the lower triangle, or the upper triangle).
In the Level 2 BLAS routines, packed storage implies a linearized storage scheme. For the out-of-core
routines, packed storage is similar, but more complicated. Because it is the page structure of the virtual
matrix binary file that is linearized, pages that correspond to the upper (or lower) part of a triangular matrix
are omitted.
Three possible storage modes are possible:
• FULL — The full matrix is stored
• LOWER — Only the lower triangle is stored
• UPPER — Only the upper triangle is stored
To define this storage mode, call the VSTORAGE(3S) routine, which has the calling sequence:
CAL L VSTORAGE( nunit, mode)

The nunit argument is an integer that gives the unit number of the virtual matrix, and mode is a character
string giving the storage mode. See VSTORAGE(3S) for further information.
Performance Measurement
The out-of-core software has a built-in feature for performance measurement, and it will collect various
performance statistics automatically. The user can print these statistics when calling the VEND(3S) routine,
either by providing a nonzero argument to the VEND(3S) routine (within the program) or by setting the
VBLAS_STATISTICS environment variable (before run time). To set this environment variable in the C
shell, enter the following:
set env VBL AS_ STA TISTIC S

In the POSIX or Korn shell, enter the following:


exp ort VBL AS_ STA TISTIC S

Statistics reported include the following:


• Total elapsed time
• Total CPU time
• Total I/O wait time
• Total workspace used
• Number of words read and written
• A distribution of wait times

584 004– 2081– 002


INTRO_CORE ( 3S ) INTRO_CORE ( 3S )

You may use this feature in addition to the usual performance tools (for example, see the procstat(1)
man page).
Error Reporting
When the out-of-core software diagnoses an error, it writes an error diagnostic to stderr and aborts. If the
error is diagnosed by the out-of-core routines themselves, the error message should be complete and self-
explanatory. For instance, a common error is to provide an insufficient amount of memory for workspace.
In this case, the error diagnostic will indicate how much memory was needed.
Example:
*** Error in rou tin e: VBE GIN
*** Insuff ici ent mem ory was given;
min imum req uired (decim al wor ds) = 198 144

If the error was diagnosed by a lower-level system or library routine, the diagnostic will include the error
code. Usually, you can use the explain(1) command to get more information about the error by entering
one of the following commands:
exp lai n sys -xxx

explai n lib-xxx

The character string xxx represents the error code listed in the diagnostic. Use explain sys for error
status codes less than 100, and explain lib for higher-numbered codes (see the explain(1) man page
in the UNICOS User Commands Reference Manual, for more information).
For example, suppose that unit 1 was assigned to file /tmp/xxx/yyy/zzz, using the command:
ass ign -a /tm p/xxx/ yyy/zz z u:1

But suppose that the /tmp/xxx/yyy directory has not been created. When the out-of-core routine tries to
create the file, it cannot, and aborts after printing the following message:
*** Error in routin e: pag e_requ est
*** Error sta tus on AQO PEN for uni t num ber : 1
*** Error sta tus on AQOPEN = -2

Because AQIO routines are used internally for input and output, the error is usually detected by
AQOPEN(3F), AQREAD(3F), or AQWRITE(3F). In this case, it was AQOPEN. Of more concern to users,
however, is the specific error status. The diagnostic denotes that the error occurred on unit number 1, and
that the error status code was -2. You can enter the following command, which prints a further description,
that explains that one of the directories in a path name does not exist:
exp lain sys -2

004– 2081– 002 585


INTRO_CORE ( 3S ) INTRO_CORE ( 3S )

Performance Tuning
The most important tuning parameter for the out-of-core routines is the value of nwork, the amount of buffer
space. This value is set either as an explicit argument to VBEGIN(3S) or by setting the
VBLAS_WORKSPACE environment variable before run time. If the virtual matrix is disk resident, larger
buffer space means faster I/O performance, within certain limits.
CPU time is essentially unaffected by this parameter; only I/O wait time, and hence, total wall-clock time,
are affected.
As always with out-of-core techniques, a trade-off exists between performance and size. If you use more
memory, performance will be better, but the program size increases. It is difficult to give firm rules for how
much memory you should use, but the following are some guidelines:
• The absolute minimum amount of out-of-core routine page-buffer space must be enough to hold three
pages.
• If the virtual matrix is disk resident, larger buffer space means better I/O performance, within certain
limits.
• If the virtual matrix is SSD resident, much less buffer space is needed to obtain good performance.
• If running in a dedicated environment, you should use as much memory as is available.
• If running in a batch environment, it may be desirable to use less memory, so that the job can be
scheduled and run at the same time that other user jobs are running; that is, the turnaround time of a
smaller job might be much less than for a large job, even if the I/O wait time for the smaller job is larger.
• Use enough buffer space for one "column" of pages; that is, n . np words, for which np is the number of
columns per page, and n is the leading dimension of the matrix (rounded up to a multiple of np). If you
use twice this much memory, performance will improve.
• The use of Strassen’s algorithm almost always speeds the computation for a small increase in memory.
The VBLAS statistics report the amount of memory used by Strassen’s algorithm.
• Packed storage mode should be used when appropriate, because it will save disk space with no penalty in
CPU time.
For solution of a general matrix (with VSGETRF and VSGETRS, or with VCGETRF and VCGETRS), a
special memory requirement exists. These routines need enough buffer space to contain one "column" of
pages; that is, if the matrix is 5000 by 5000, the buffer space must be 5000 by 256 (assuming 256 is the
page size). This requirement is necessary because of the nature of Gaussian elimination with partial
pivoting. To do the pivots, the performance might be extremely poor if less memory was used.

ENVIRONMENT VARIABLES
These environment variables change the default behavior of either the VBEGIN or the VEND routine. You
can override the effect of any of these settings by the corresponding argument of the affected routine.

586 004– 2081– 002


INTRO_CORE ( 3S ) INTRO_CORE ( 3S )

VBLAS_PAGESIZE
Numeric value of the default page size, np. VBEGIN uses this variable to set up in-memory pages
for virtual matrices. Each page acts as an np-by-np submatrix of a virtual matrix. If unspecified,
VBEGIN defaults to np = 256.
VBLAS_STATISTICS
Flag to determine whether to print performance statistics after using the out-of-core routines.
VEND uses this variable to determine whether it should print statistics to stdout after terminating
out-of-core processing. If this variable is set (even if it has no value), the default behavior of
VEND is to print the statistics. If unspecified, VEND prints no statistics by default.
VBLAS_STRASSEN
Flag to determine whether to use Strassen’s algorithm for matrix multiplication. VBEGIN uses this
variable to determine whether it should set up data structures for Strassen’s algorithm. If the data
structures are set up, all virtual matrix multiplies use Strassen’s algorithm. If this variable is set
(even if it has no value), the default behavior of VBEGIN is to set up for Strassen’s algorithm. If
unspecified, VBEGIN defaults to the regular (inner product) matrix multiply algorithm.
VBLAS_WORKSPACE
Numeric value of nwork, the number of words of memory to set aside for I/O buffering (pages). If
unspecified, VBEGIN defaults to nwork =6 . np 2 (the number of words of memory required for six
pages).

EXAMPLES
Some short examples illustrate how you can use these subroutines to manipulate virtual matrices. For an
explanation of the specific arguments to the subroutines, see the man pages for the individual routines.
Example 1: This example shows how you can use routine SCOPY2RV(3S) to create a virtual matrix. This
program creates a virtual matrix on unit number 1 (which, by default, is on file fort.1). Within the
program, this matrix is referred to as V, which corresponds to IV, an integer parameter set equal to 1.
The first step is to call routine VBEGIN(3S) for initialization. Next, create a vector, X, of random numbers,
and copy it to one column of the virtual matrix, using routine SCOPY2RV(3S). This procedure is repeated
for each column J of the virtual matrix. The program is as follows:

004– 2081– 002 587


INTRO_CORE ( 3S ) INTRO_CORE ( 3S )

* Cre ate a vir tual mat rix of ran dom number s of siz e n by n.

INTEGE R N
PARAME TER (N = 2000)
INTEGE R IV ! uni t num ber of the vir tual matrix V
PARAME TER (IV = 1)
REAL X(N) ! vec tor for sto rin g a col umn of V

CALL VBEGIN

DO, J = 1, N ! for eac h col umn of V

* Cre ate a vec tor X of random num bers


DO, I = 1, N
X(I ) = RAN F()
END DO

* Cop y the vector X to col umn J of vir tua l mat rix V


CAL L SCOPY2 RV(N, 1, X, N, IV, J, 1, N)
END DO

CAL L VEND
END

Example 2: This example illustrates the Virtual BLAS routine VSGEMM(3S) and multiplying a virtual matrix
by itself. The example assumes that the virtual matrix on unit 1 was already created, possibly by the
program in example 1. The example multiplies this virtual matrix by itself, resulting in a new virtual matrix,
called W, that corresponds to unit number 2 (integer IW). The following program copies the first column of
the result matrix W into array X, and then it prints out the first element of X, which is the value of virtual
matrix element W(1,1):

588 004– 2081– 002


INTRO_CORE ( 3S ) INTRO_CORE ( 3S )

INT EGE R N
PARAME TER (N = 200 0)
INT EGER IV, IW ! uni t num bers of the virtua l matric es
PARAME TER (IV = 1, IW = 2)
REAL X(N) ! vec tor for sto ring a col umn of W

CALL VBEGIN

* Multip ly virtua l mat rix V by itself , cre ati ng virtua l


* matrix W = V*V

CAL L VSG EMM(’N OTR ANS POS E’, ’NOTRA NSPOSE ’, N, N, N, 1.0 ,
& IV, 1, 1, N, IV, 1, 1, N, 0.0 , IW, 1, 1, N)

* Store the first col umn of vir tua l W in vec tor X.


CAL L SCO PY2VR( N, 1, W, 1, 1, N, X, N)

* Print the value of the first ele men t


WRI TE(*,* ) ’Va lue of W(1 ,1) = ’, X(1)

CAL L VEN D
END

Example 3: This example shows sample usage protocol for solving systems of equations. A program to
solve a large system of equations by using the Virtual LAPACK routines might be organized according to
the following general outline. This sample outline assumes that the user can generate the original matrix one
row at a time (by computing it, reading it, or whatever).
1. Call VBEGIN(3S) to initialize the virtual matrix routines.
2. For each row of the matrix, call a virtual copy routine to store the row in a virtual matrix. Likewise,
create a virtual matrix of right-hand sides.
3. Call routine VSGETRF(3S) to factor the general matrix.
4. Call routine VSGETRS(3S) to solve the right-hand sides.
5. For each column of the solution matrix, call a virtual copy routine to fetch the solution vector and
process it.
6. Call the VEND(3S) routine to terminate the virtual matrix routines and to close the files.

004– 2081– 002 589


SCOPY2RV ( 3S ) SCOPY2RV ( 3S )

NAME
SCOPY2RV, CCOPY2RV – Copies a submatrix of a real or complex matrix in memory into a virtual matrix

SYNOPSIS
CALL SCOPY2RV (m, n, a, lda, nunit, iv, jv, ldv)
CALL CCOPY2RV (m, n, a, lda, nunit, iv, jv, ldv)

IMPLEMENTATION
UNICOS systems

DESCRIPTION
SCOPY2RV copies a matrix contained in a two-dimensional array of type REAL located in central memory
into a submatrix of a virtual matrix.
CCOPY2RV copies a matrix contained in a two-dimensional array of type COMPLEX located in central
memory into a submatrix of a virtual matrix.
These routines have the following arguments:
m Integer. (input)
Number of rows.
n Integer. (input)
Number of columns.
a SCOPY2RV: Real array of dimension (lda, *). (input)
CCOPY2RV: Complex array of dimension (lda, *). (input)
Contains real (in memory) input matrix.
lda Integer. (input)
Leading (first) dimension of a.
nunit Integer. (input)
Unit number of the virtual matrix.
This routine changes the contents of the virtual matrix. The virtual matrix itself contains either
real (SCOPY2RV) or complex (CCOPY2RV) elements.
iv Integer. (input)
Starting row index of the virtual matrix (1 to ldv).
jv Integer. (input)
Starting column of the virtual matrix (1 to n).
ldv Integer. (input)
Leading (first) dimension of the virtual matrix.

590 004– 2081– 002


SCOPY2RV ( 3S ) SCOPY2RV ( 3S )

NOTES
These routines are two-dimensional analogues of the Level 1 BLAS routines SCOPY and CCOPY (see
SCOPY(3S)). The initial S in SCOPY2RV means "single-precision real," the initial C in CCOPY2RV means
"complex," the 2 means "two-dimensional," and the RV means "real (in memory) to virtual (on disk or
SSD)."
These routines provide the only available method for reading data directly from memory into a virtual
matrix. Companion routines SCOPY2VR and CCOPY2VR go in the opposite direction: virtual to real.

EXAMPLES
The following examples show how to copy various types of matrices from central memory into a virtual
matrix.
Example 1: Copy vector X, of N real elements, to row I of the virtual matrix on unit number 3. Suppose
that the virtual matrix is of size N by N, so that the leading dimension is N. Because X is a vector, the
leading dimension of X is irrelevant, and you can use the constant 1 for the lda argument.
CAL L SCO PY2 RV(1, N, X, 1, 3, I, 1, N)

Example 2: Copy vector X, of N complex elements, to column J of the virtual matrix on unit number 3,
with the same assumptions as in example 1.
CALL CCOPY2 RV( N, 1, X, 1, 3, 1, J, N)

Example 3: Copy the 100-by-100 matrix A to the 100-by-100 submatrix of the virtual matrix on unit
NUNIT, beginning at virtual matrix subscript location (I, J). Assume that the virtual matrix has leading
dimension 3000.
CALL SCOPY2 RV( 100 , 100, A, 100, NUN IT, I, J, 300 0)

Example 4: Copy the single element X(I, J) of a complex matrix X to element (IV, JV) of the virtual
matrix on unit NUNITX. Assume that the leading dimension of the virtual matrix is LDXV. Because this
subroutine call copies one element, the leading dimension of X is irrelevant, and you can use the constant 1
for the lda argument.
CALL CCOPY2 RV( 1, 1, X(I, J), 1, NUNITX , IV, JV, LDX V)

Example 5: Copy the lower triangular part (the part below the main diagonal) of the first 100 rows and
columns of matrix A to the lower triangular part of the virtual matrix NVA, of leading dimension 1000,
starting at virtual array element NVA(101, 101).
CALL VSTORA GE( NVA , ’LOWER ’)
DO, I = 1, 100
CAL L SCO PY2 RV(1, I, A(I , 1), 100 , NVA , 100+I, 101 , 1000)
END DO

004– 2081– 002 591


SCOPY2RV ( 3S ) SCOPY2RV ( 3S )

SEE ALSO
INTRO_CORE(3S) for an introduction to the out-of-core routines, including usage examples
MSCOPY2VR(3S) for a desciption of SCOPY2VR and CCOPY2VR, each of which copy a submatrix of a real
or complex virtual matrix into a real or complex matrix in central memory (the copy routines for the
opposite direction are SCOPY2RV and CCOPY2RV)

592 004– 2081– 002


SCOPY2VR ( 3S ) SCOPY2VR ( 3S )

NAME
SCOPY2VR, CCOPY2VR – Copies a submatrix of a virtual matrix to a real or complex (in memory) matrix

SYNOPSIS
CALL SCOPY2VR (m, n, nunit, iv, jv, ldv, a, lda)
CALL CCOPY2VR (m, n, nunit, iv, jv, ldv, a, lda)

IMPLEMENTATION
UNICOS systems

DESCRIPTION
SCOPY2VR copies a submatrix of a virtual matrix into a two-dimensional array of type REAL located in
central memory.
CCOPY2VR copies a submatrix of a virtual matrix into a two-dimensional array of type COMPLEX located in
central memory.
These routines have the following arguments:
m Integer. (input)
Number of rows.
n Integer. (input)
Number of columns.
nunit Integer. (input)
Unit number of the virtual matrix. The virtual matrix itself contains either real (SCOPY2VR) or
complex (CCOPY2VR) elements.
iv Integer. (input)
Starting row index of the virtual matrix (1 to m).
jv Integer. (input)
Starting column of the virtual matrix (1 to n).
ldv Integer. (input)
Leading (first) dimension of the virtual matrix.
a SCOPY2VR: Real array of dimension (lda, *). (output)
CCOPY2VR: Complex array of dimension (lda, *). (output)
Contains real (in memory) output matrix.
lda Integer. (input)
Leading (first) dimension of a.

004– 2081– 002 593


SCOPY2VR ( 3S ) SCOPY2VR ( 3S )

NOTES
These routines are two-dimensional analogues of the Level 1 BLAS routines SCOPY and CCOPY (see
SCOPY(3S)). The initial S in SCOPY2VR means "single-precision real," the initial C in CCOPY2VR means
"complex," the 2 means "two-dimensional," and the VR means "virtual (on disk or SSD) to real (in
memory)."
These routines provide the only available method for reading data from a virtual matrix into memory.
Companion routines SCOPY2RV and CCOPY2RV go in the opposite direction: real to virtual.

EXAMPLES
The following examples show how to copy various types of matrices from a virtual matrix into central
memory.
Example 1: Copy row I of the virtual matrix on unit number 3 to the real vector X. Suppose that the
virtual matrix is of size N by N, so that the leading dimension is N. Because X is a vector, the leading
dimension of X is irrelevant, and you can use the constant 1 for the lda argument.

CALL SCOPY2 VR(1, N, 3, I, 1, 1000, X, 1)

Example 2: Copy column J of the complex virtual matrix on unit number 3 to the complex vector X, with
the same assumptions as in example 1.

CAL L CCO PY2VR( N, 1, 3, 1, J, N, X, 1)

Example 3: Copy the 100-by-100 submatrix of the virtual matrix on unit NUNIT, beginning at virtual
matrix subscript location (I, J), to the 100-by-100 matrix A. Assume that the virtual matrix has leading
dimension 3000.

CAL L SCO PY2VR( 100, 100 , NUN IT, I, J, 3000, A, 100 )

Example 4: Copy the single element of the complex virtual matrix on unit NUNITX, at subscript position
(IV, JV), to the single element X(I, J) of the complex matrix X. Assume that the leading dimension
of the virtual matrix is LDXV. Because this subroutine call copies one element, the leading dimension of X
is irrelevant, and you can use the constant 1 for the lda argument.

CAL L CCO PY2VR( 1, 1, NUNITX , IV, JV, LDX V, X(I, J), 1)

SEE ALSO
INTRO_CORE(3S) for an introduction to the out-of-core routines, including usage examples
SCOPY2RV(3S), which documents SCOPY2RV and CCOPY2RV, each of which copies a submatrix of a real
or complex matrix in central memory into a virtual matrix (the copy routines for the opposite direction are
SCOPY2VR and CCOPY2VR)

594 004– 2081– 002


VBEGIN ( 3S ) VBEGIN ( 3S )

NAME
VBEGIN – Initializes the out-of-core routine data structures

SYNOPSIS
CALL VBEGIN
CALL VBEGIN [(nwork)]
CALL VBEGIN [(nwork, nstrassen)]
CALL VBEGIN [(nwork, nstrassen, np)]

IMPLEMENTATION
UNICOS systems

DESCRIPTION
VBEGIN initializes the data structures in memory that are used to handle virtual matrices. A program must
call VBEGIN once before beginning virtual matrix work and must call the companion routine, VEND(3S),
after virtual matrix work is complete. A program can have more than one "code block" of virtual matrix
work, but each block must begin with a call to VBEGIN and end with a call to VEND.
This routine takes as its first argument the minimum size of the I/O buffer space the user wants to allocate.
The routine allocates this much memory for buffers (using a malloc(3C) library call). The routine also
allocates a small additional amount of memory for its own data structures. When the VEND(3S) routine is
called to handle terminal processing, all allocated memory is freed. Other arguments determine some of the
inner working of the out-of-core routines.
All of these arguments are optional; you can specify them by using environment variables, with the
following order of precedence:
1. Explicit argument; if the argument is given explicitly in the VBEGIN call, any conflicting settings are
ignored.
2. Environment variable; if no explicit argument is given, but there is an environment variable setting, that
setting is used.
3. If neither the argument nor the environment variable is set, there is an internal default (shown in the
argument list that follows as "DEFAULT: . . .").
This routine has the following optional arguments:
nwork Integer. (input)
Minimum number of words to use for buffer space for I/O. The minimum number of words is
3 . np 2, enough space for three "pages" (see the np argument description).
The corresponding environment variable is VBLAS_WORKSPACE, which you should set to a
numeric value that gives the number of words to use.
DEFAULT: nwork = 6 . np 2, enough space for six "pages."

004– 2081– 002 595


VBEGIN ( 3S ) VBEGIN ( 3S )

nstrassen Integer. (input)


This argument determines whether to use Strassen’s algorithm for matrix multiplication. If you
use Strassen’s algorithm, VBEGIN allocates 2.4 . np 2 words of storage for workspace for
Strassen’s algorithm.
nstrassen = 0 Do not use Strassen’s algorithm.
nstrassen = 1 Use Strassen’s algorithm.
nstrassen = – 1 Use Strassen’s algorithm if and only if the VBLAS_STRASSEN environment
variable is set.
DEFAULT: Do not use Strassen’s algorithm.
np Integer. (input)
Size (matrix order) of one page. Each page is a square submatrix of the entire virtual matrix.
np is the order (number of rows or columns) of a page submatrix (for example, set np = 512 to
use pages of 512 by 512 words each). The value of np should be a multiple of 32.
The corresponding environment variable is VBLAS_PAGESIZE, which you should set to a
numeric value that gives the order of a page.
DEFAULT: np = 256 (You should use this default size, unless you have some special reason to
change it.)

NOTES
The most important tuning parameter for the out-of-core routines is the value of nwork, the minimum
amount of buffer space.
If the virtual matrix is disk resident, larger buffer space means faster I/O performance, within certain limits.
CPU time is essentially unaffected by the amount of buffer space; only I/O wait time, and hence, total wall-
clock time, are affected.
As always with out-of-core techniques, a trade-off exists between performance and size. If you use more
memory, performance will be better, but the program size increases. It is difficult to give firm guidelines as
to how much memory you should use. If running in a multiuser environment, it may be desirable to use less
memory, so that the job can be scheduled and run at the same time that other user jobs are running. This
means that the turnaround time of a smaller job might be much less than for a large job, even if the I/O wait
time for the smaller job is larger. If running in a dedicated environment, it would make sense to use as
much available memory as possible.
One rule of thumb for good performance is to use enough buffer space for one column of pages. For
example, if the page size is np and the largest of the leading dimensions of your virtual matrices (rounded up
to the next multiple of np) is n, set the minimum buffer size to be nwork = n . np . If you use twice this
much memory (nwork = 2 . n . np ), performance will improve.

596 004– 2081– 002


VBEGIN ( 3S ) VBEGIN ( 3S )

If the virtual matrix resides in SSD, much less buffer space is needed to obtain good performance.
For solving a general virtual matrix (with VSGETRF(3S) and VSGETRS(3S)), the preceding rule of thumb
becomes a minimum memory requirement. These routines need enough buffer space to contain one column
of pages; that is, if you are factoring a virtual matrix of size 5000 by 4000 and the page size is np = 256, the
minimum buffer space setting must be nwork = 5000 . 256 = 1,280,000words. This size restriction is needed
because of the nature of Gaussian elimination with partial pivoting. If less memory is used, the performance
when doing pivots might be extremely poor.

ENVIRONMENT VARIABLES
These environment variables change the default behavior of the VBEGIN routine. To override the effect of
any of these settings, use the corresponding argument of VBEGIN.
VBLAS_PAGESIZE
Numeric value of the default page size, np. VBEGIN uses this variable to set up in-memory pages
for virtual matrices. Each page acts as an np-by-np submatrix of a virtual matrix. If unspecified,
VBEGIN defaults to np = 256.
VBLAS_STRASSEN
Flag to determine whether to use Strassen’s algorithm for matrix multiplication. VBEGIN uses this
variable to determine whether it should set up data structures for Strassen’s algorithm. If the data
structures are set up, all virtual matrix multiplies use Strassen’s algorithm. If this variable is set
(even if it has no value), the default behavior of VBEGIN is to set up for Strassen’s algorithm. If
unspecified, VBEGIN defaults to the regular (inner product) matrix multiply algorithm.
VBLAS_WORKSPACE
Numeric value of nwork, the number of words of memory to set aside for I/O buffering (pages). If
unspecified, VBEGIN defaults to nwork = 6 . np 2 (the number of words of memory required for
six pages).

SEE ALSO
INTRO_CORE(3S) for an introduction to the out-of-core routines, including usage examples
VEND(3S), VSGETRF(3S), VSGETRS(3S)
malloc(3C) in the UNICOS System Libraries Reference Manual

004– 2081– 002 597


VEND ( 3S ) VEND ( 3S )

NAME
VEND – Handles terminal processing for the out-of-core routines

SYNOPSIS
CALL VEND [(info)]

IMPLEMENTATION
UNICOS systems

DESCRIPTION
The VEND routine does termination processing for the out-of-core routines. You must call VEND as the last
step in out-of-core processing. This routine ensures that any output in progress from the out-of-core routines
is completed, and then it deallocates all of the storage space that VBEGIN(3S) allocated.
After calling VEND, you can call the VBEGIN(3S) routine again, if you desire, to reinitialize the out-of-core
routines (including their performance statistics).
This routine has the following optional argument:
info Integer. (input)
Flag to request out-of-core routine performance statistics output.
If you supply this argument with a nonzero value, a set of performance statistics about the out-of-
core routines is printed on stdout. If you omit the argument, the statistics are printed if and only
if the VBLAS_STATISTICS environment variable is set. Statistics reported include the following:
• Total elapsed time
• Total CPU time
• Total I/O wait time
• Total workspace used
• Number of words read and written
• A distribution of wait times
You can use this performance statistics feature in addition to the usual performance tools.

NOTES
If a program terminates abnormally before VEND is called, you should assume that any virtual matrices
created or changed by the program were destroyed, because the integrity of their data cannot be guaranteed.
Virtual matrices used only for input will remain valid.

598 004– 2081– 002


VEND ( 3S ) VEND ( 3S )

You can use the optional statistics report that VEND prints to judge whether using more or less memory for
buffer space would significantly affect performance.
Generally, if the total I/O wait time is a small percentage of wall-clock time, the program is compute-bound
and no more memory is needed.
Other useful statistics are the virtual read and write rates. These statistics measure the amount of data
transferred divided by the time when the out-of-core routines were idle because they were waiting for I/O. If
the virtual read and write rates are much faster than the physical speed of the device being used, ample
memory was used for buffer space.

ENVIRONMENT VARIABLES
The following environment variable changes the default behavior of the VEND routine. To override the
effect of this setting use the info argument.
VBLAS_STATISTICS
Flag to determine whether to print performance statistics after using the out-of-core routines.
VEND uses this variable to determine whether it should print statistics to stdout after terminating
out-of-core processing. If this variable is set (even if it has no value), the default behavior of
VEND is to print the statistics. If unspecified, VEND prints no statistics by default.

SEE ALSO
INTRO_CORE(3S) for an introduction to the out-of-core routines, including usage examples
VBEGIN(3S)

004– 2081– 002 599


VSGEMM ( 3S ) VSGEMM ( 3S )

NAME
VSGEMM, VCGEMM – Multiplies a virtual real or complex general matrix by a virtual real or complex general
matrix

SYNOPSIS
CALL VSGEMM (transa, transb, m, n, l, alpha, nunita, ia1, ja1, lda, nunitb, ib1, jb1, ldb,
beta, nunitc, ic1, jc1, ldc)
CALL VCGEMM (transa, transb, m, n, l, alpha, nunita, ia1, ja1, lda, nunitb, ib1, jb1, ldb,
beta, nunitc, ic1, jc1, ldc)

IMPLEMENTATION
UNICOS systems

DESCRIPTION
VSGEMM and VCGEMM each perform one of the following matrix-matrix operations:
C ← α op(A)op(B) + β C
where α and β are scalars; A, B, and C are virtual matrices or submatrices; op(A) is an m-by-k matrix; op(B)
is a k-by-n matrix; C is an m-by-n matrix; and op(X) is one of the following:
op(X) = X
T
op(X) = X (transpose of X)
H
op(X) = X (conjugate transpose of X; VCGEMM only)
These routines have the following arguments:
transa Character*1. (input)
Specifies the form of op(A) to be used in the matrix multiplication, as follows:
transa = ’N’ or ’n’: op(A) = A
T
transa = ’T’ or ’t’: op(A) = A
T H
transa = ’C’ or ’c’: op(A) = A (VSGEMM), or op(A) = A (VCGEMM)
This argument can be any length. Only the first character is significant (for example, ’t’ and
’transpose’ have the same effect).
transb Character*1. (input)
Specifies the form of op (B ) to be used in the matrix multiplication, as follows:
transb = ’N’ or ’n’: op(B) = B
T
transb = ’T’ or ’t’: op(B) = B
T H
transb = ’C’ or ’c’: op(B) = B (VSGEMM), or op(B) = B (VCGEMM)
This argument can be any length. Only the first character is significant (for example, ’t’ and
’transpose’ have the same effect).

600 004– 2081– 002


VSGEMM ( 3S ) VSGEMM ( 3S )

m Integer. (input)
Number of rows of output matrix.
n Integer. (input)
Number of columns of output matrix.
l Integer. (input)
Number of columns of A = number of rows of B.
alpha VSGEMM: Real. (input)
VCGEMM: Complex. (input)
Scalar factor α.
nunita Integer. (input)
Fortran unit number of virtual matrix A. The virtual matrix itself is composed of real numbers
(VSGEMM) or complex numbers (VCGEMM).
ia1 Integer. (input)
Row subscript of first element of virtual matrix A.
ja1 Integer. (input)
Column subscript of first element of A.
lda Integer. (input)
Leading virtual dimension of virtual matrix A.
nunitb Integer. (input)
Fortran unit number of virtual matrix B. The virtual matrix itself is composed of real numbers
(VSGEMM) or complex numbers (VCGEMM).
ib1 Integer. (input)
Row subscript of first element of virtual matrix B.
jb1 Integer. (input)
Column subscript of first element of B.
ldb Integer. (input)
Leading virtual dimension of virtual matrix B.
beta VSGEMM: Real. (input)
VCGEMM: Complex. (input)
Scalar factor β.
nunitc Integer. (input)
Fortran unit number of virtual matrix C. The virtual matrix itself is composed of real numbers
(VSGEMM) or complex numbers (VCGEMM), and it is changed by this routine.
ic1 Integer. (input)
Row subscript of first element of virtual matrix C.
jc1 Integer. (input)
Column subscript of first element of C.

004– 2081– 002 601


VSGEMM ( 3S ) VSGEMM ( 3S )

ldc Integer. (input)


Leading virtual dimension of virtual matrix C.

NOTES
This routine is the virtual counterpart of the Level 3 BLAS routine SGEMM. The calling sequence is similar
to SGEMM. The difference is that in SGEMM a matrix operand (A, for example) would be defined by the
following two arguments:
a(i,j) The starting position of the matrix A within array a
lda The leading dimension of the array a (distance between adjacent elements of a row of A)
In VSGEMM, however, a matrix operand (A) would be defined by the following four arguments:
nunita The Fortran unit number of the file that contains the virtual matrix
ia1 Row subscript within the virtual matrix file at which the submatrix A begins
ja1 Column subscript within the virtual matrix file at which the submatrix A begins
lda Leading virtual dimension of the virtual matrix file (virtual subscript distance between adjacent
elements of a row of the submatrix A)

EXAMPLES
Two examples of virtual matrix multiplication follow.
Example 1: Multiply the complex virtual matrix V, of dimension N-by-N, by itself, creating complex virtual
matrix W = V * V. Assume the virtual matrix V was already created, on unit 1, and that this operation will
create the virtual matrix W on unit 2.
INT EGER V, W
PAR AMETER (V = 1, W = 2)
CAL L VBE GIN

C Mul tip ly com ple x virtua l mat rix V by itself,


C creati ng comple x virtual matrix W = V*V

CAL L VCGEMM (’NOTR ANSPOS E’, ’NOTRA NSPOSE’, N, N, N,


& (1. 0,0.0),V, 1,1 ,N,V,1,1, N,(0.0,0.0), W,1,1,N)

CALL VEN D
END

Example 2: Let X be an in-memory vector of length N, consisting of random numbers. Copy X to the
virtual matrix on unit 1, which is defined as the parameter VX, that has dimension N-by-1. Multiply this
virtual matrix by its transpose, giving an N-by-N virtual matrix on unit 2, which is defined as the VY
parameter.

602 004– 2081– 002


VSGEMM ( 3S ) VSGEMM ( 3S )

INT EGE R N
PARAME TER (N = 100 0)
REAL X(N)
INT EGE R VX, VY
PARAME TER (VX = 1, VY = 2)

C cre ate ran dom vec tor x


DO, I = 1, N
X(I ) = RAN F()
END DO

C initia lize
CAL L VBEGIN

C create vir tual mat rix VX of dimens ion n by 1


CAL L SCOPY2 RV( N, 1, X, N, VX, 1, 1, N)

C cre ate n by n vir tua l mat rix vy = vx * tra nspose (vx )


CALL VSGEMM (’N OTR ANS POS E’, ’TR ANS POSE’, N, N, 1, 1.0 ,
& VX, 1, 1, N, VX, 1, 1, N, 0.0 , VY, 1, 1, N)

CAL L VEN D
END

SEE ALSO
INTRO_CORE(3S) for an introduction to the out-of-core routines, including usage examples
SGEMM(3S) for a description of SGEMM(3S) and CGEMM(3S), the in-memory equivalents of the out-of-core
routines VSGEMM and VCGEMM

004– 2081– 002 603


VSGETRF ( 3S ) VSGETRF ( 3S )

NAME
VSGETRF, VCGETRF – Computes an LU factorization of a virtual general matrix with real or complex
elements, using partial pivoting with row interchanges

SYNOPSIS
CALL VSGETRF (m, n, nunita, lda, ipiv, info)
CALL VCGETRF (m, n, nunita, lda, ipiv, info)

IMPLEMENTATION
UNICOS systems

DESCRIPTION
VSGETRF and VCGETRF are the out-of-core versions of SGETRF and CGETRF (see SGETRF(3L)). Each
computes an LU factorization of a real (VSGETRF) or complex (VCGETRF) general m-by-n matrix A, using
partial pivoting with row interchanges.
The factorization has the form:
A =P .L .U
where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if
m > n), and U is upper triangular (upper trapezoidal if m < n).
These routines have the following arguments.
m Integer. (input)
The number of rows of the virtual matrix A. m ≥ 0.
n Integer. (input)
The number of columns of the virtual matrix A. n ≥ 0.
nunita Integer. (input)
VSGETRF: Unit number of the virtual real matrix of dimension (lda, n).
VCGETRF: Unit number of the virtual complex matrix of dimension (lda, n).
The virtual matrix itself is used for both input and output.
On entry, A, the m-by-n matrix to be factored, is stored in the virtual matrix file, starting at
subscript (1,1). On exit, the virtual matrix A is replaced by the triangular matrix factors L and
U; the unit diagonal elements of L are not stored.
lda Integer. (input)
The leading (first) dimension of the virtual matrix A. lda ≥ MAX(1,m).
ipiv Integer array of dimension (MIN(m,n)). (output)
The pivot indices. Row i of the matrix was interchanged with row ipiv(i).
info Integer. (output)

604 004– 2081– 002


VSGETRF ( 3S ) VSGETRF ( 3S )

=0 Successful exit.
<0 If info = – k, the kth argument had an illegal value.
>0 If info = k, U(k,k) is exactly 0. The factorization has been completed, but the factor U is
exactly singular, and division by 0 will occur if it is used to solve a system of equations or
to compute the inverse of A.

NOTES
This routine requires workspace of two types:
• If m is the first argument (number of rows) and np is the page size established by VBEGIN(3S), the
amount of buffer space that was allocated in the VBEGIN(3S) routine must be at least m . np words.
This minimum size of workspace is necessary to prevent a huge amount of page thrashing (excessive I/O
to reread data) when doing the partial pivoting operations.
• In addition to the buffer space allocated by VBEGIN(3S), this routine also allocates m . np words of
workspace for its own use. Thus, the total memory requirement is a minimum of 2 . np . m (plus the
much smaller workspace for data structures that is also allocated by VBEGIN(3S)).
If insufficient memory is available, the routine exits with an error message, which indicates how much
memory was needed.

EXAMPLES
This example illustrates using both VSGETRF and VSGETRS to solve a set of 1000 linear equations in 1000
unknowns. It is assumed that a virtual matrix of size 1000 by 1000 was created on unit 1 (defined as
parameter VA), representing the equations, and that a virtual matrix of size 1000 by 1 was created on unit 2
(defined as parameter VB), representing the right-hand side. The square matrix is factored and that
factorization is used, along with the right-hand side matrix, to compute a solution matrix.

C Com pute the LU fac tor izatio n of the virtua l mat rix A on uni t 1,
C which is ass umed to hav e bee n cre ate d pre vio usl y and hav e
C dimens ion 1000 by 100 0.
C Sol ve the equ ati on A*X = B
C where B is a virtua l mat rix on uni t 2, ass umed to have dim ension
C 100 0 by 1.
C
INTEGE R M, LD, VA, VB
PARAME TER (M = 100 0, LD = M, VA = 1, VB = 2)
INTEGE R IPI V(L D)
REAL X(LD)

CALL VBEGIN
C
C LU fac toriza tio n
C

004– 2081– 002 605


VSGETRF ( 3S ) VSGETRF ( 3S )

PRI NT *, ’be ginnin g vsg etr f’

CALL VSG ETRF(M , M, VA, LD, IPI V, INF O)

IF (INFO .NE . 0) THE N


PRI NT *, ’matri x is sin gul ar’
PRI NT *, ’in fo = ’, INF O
CAL L VEND
STO P
END IF
C
C com put e sol uti on vector of A*X = B (repla ce B)
C
PRI NT *, ’be ginnin g vsg etr s’

CAL L VSGETR S(’NO TRA NSP OSE’, M, 1, VA,


& LD, IPIV, VB, LD, INF O)

IF (INFO .NE. 0) THE N


PRI NT *, ’no n-zero info fro m sge trs’
PRI NT *, ’in fo = ’, INF O
CAL L VEND
STO P
END IF
C
C Copy soluti on vec tor b to vec tor x,
C and pri nt the fir st 5 ele men ts of x
C
PRI NT *, ’begin nin g sco py2 vr’

CAL L SCO PY2 VR(M, 1, VB, 1, 1, LD, X, LD)

DO, i = 1, 5
PRI NT *, ’X( ’, I, ’) = ’, X(I )
END DO

CALL VEND
PRI NT *, ’done’
END

606 004– 2081– 002


VSGETRF ( 3S ) VSGETRF ( 3S )

SEE ALSO
INTRO_CORE(3S) for an introduction to the out-of-core routines, including usage examples
VBEGIN(3S)
VSGETRS(3S) (which documents VSGETRS(3S) and VCGETRS(3S)) to solve linear systems based on the
factorization computed by VSGETRF or VCGETRF, respectively
SGETRF(3L) (available only online) for a description of SGETRF(3L) and CGETRF(3L), which are the in-
memory equivalents of the out-of-core routines VSGETRF and VCGETRF

004– 2081– 002 607


VSGETRS ( 3S ) VSGETRS ( 3S )

NAME
VSGETRS, VCGETRS – Solves a virtual system of linear equations, using the LU factorization computed by
VSGETRF(3S) or VCGETRF(3S)

SYNOPSIS
CALL VSGETRS (trans, n, nrhs, nunita, lda, ipiv, nunitb, ldb, info)
CALL VCGETRS (trans, n, nrhs, nunita, lda, ipiv, nunitb, ldb, info)

IMPLEMENTATION
UNICOS systems

DESCRIPTION
VSGETRS and VCGETRS are virtual matrix versions of the LAPACK routines SGETRS and CGETRS (see
SGETRS(3L)). VSGETRS or VCGETRS uses the LU factorization of matrix A as computed by
VSGETRF(3S) or VCGETRF(3S), respectively.
VSGETRS and VCGETRS solve one of the following linear systems:
AX = B
T
A X= B
H
A X = B (VCGETRS only)
T H
A is the transpose of A, A is the conjugate transpose of A, A is an n-by-n matrix, and X and B are n-by-
nrhs matrices.
These routines have the following arguments:
trans Character*1. (input)
Specifies the solution, X, to be computed as follows:
trans = ’N’ or ’n’ (no transpose): AX = B
T
trans = ’T’ or ’t’ (transpose): A X = B
H T
trans = ’C’ or ’c’ (conjugate transpose): A X = B (VCGETRS), or A X = B (VSGETRS)
This argument can be any length. Only the first character is significant (for example, ’t’ and
’transpose’ have the same effect).
n Integer. (input)
Number of rows and columns of the matrix A. n ≥ 0.
nrhs Integer. (input)
Number of right-hand sides. The number of columns of the matrix B. nrhs ≥ 0.
nunita Integer. (input)
VSGETRS: Unit number of the real virtual matrix of dimension (lda, n).
VCGETRS: Unit number of the complex virtual matrix of dimension (lda, n).

608 004– 2081– 002


VSGETRS ( 3S ) VSGETRS ( 3S )

nunita is itself a virtual matrix file used only for input. The matrix contains the LU factorization
of the matrix A, which must be computed by VSGETRF or VCGETRF before calling VSGETRS
or VCGETRS, respectively.
lda Integer. (input)
Leading (first) dimension of the virtual matrix A. lda ≥ MAX(1, n).
ipiv Integer array of dimension n. (input)
Array of pivot indices as determined by VSGETRF or VCGETRF.
nunitb Integer. (input)
VSGETRS: Unit number of the real virtual matrix B of dimension (ldb, n).
VCGETRS: Unit number of the complex virtual matrix B of dimension (ldb, n).
nunitb is itself a virtual matrix file used for input and output. On entry, B contains the right-
hand side vectors for the systems of linear equations. On exit, the solution vectors, columns of
the matrix X, replace the right-hand side vectors.
ldb Integer. (input)
Leading (first) dimension of the virtual matrix B. ldb ≥ MAX(1, n).
info Integer. (output)
= 0 Normal return (successful exit).
< 0 If info = – k, the kth argument has an illegal value.

NOTES
This routine requires workspace of two types:
• The amount of buffer space that was allocated in the VBEGIN routine must be at least m . np words; m
is the first argument (number of rows), and np is the page size established by VBEGIN(3S). This
minimum size of workspace is necessary to prevent a huge amount of page thrashing (excessive I/O to
reread data) when doing the partial pivoting operations.
• In addition to the buffer space allocated by VBEGIN, this routine also allocates m*np words of workspace
for its own use. Thus, the total memory requirement is a minimum of 2 . np . m (plus the much smaller
workspace for data structures that is also allocated by VBEGIN).
If insufficient memory is available, the routine exits with an error message, which indicates how much
memory is needed.

SEE ALSO
INTRO_CORE(3S) for an introduction to the out-of-core routines, including usage examples
VSGETRF(3S) for an example of using VSGETRS in conjunction with VSGETRF
VBEGIN(3S), VCGETRF(3S)
SGETRS(3L) (available only online) for a description of SGETRS and CGETRS, which are the LAPACK
routines on which VSGETRS and VCGETRS are based

004– 2081– 002 609


VSPOTRF ( 3S ) VSPOTRF ( 3S )

NAME
VSPOTRF – Computes the Cholesky factorization of a real symmetric positive definite virtual matrix

SYNOPSIS
CALL VSPOTRF (uplo, n, nunita, lda, info)

IMPLEMENTATION
UNICOS systems

DESCRIPTION
VSPOTRF is the virtual (out-of-core) implementation of the LAPACK routine SPOTRF(3L) (documented
online). It computes the Cholesky factorization of the real symmetric positive definite virtual matrix A,
which is accessed through the I/O unit number nunita. It uses an out-of-core technique based on the Virtual
Level 3 Basic Linear Algebra Subroutines (Virtual Level 3 BLAS or VBLAS).
This routine has the following arguments:
uplo Character*1. (input)
Specifies whether the upper or lower triangular part of the symmetric matrix A is stored.
uplo = ’U’ or ’u’: the upper triangle of matrix A is stored.
uplo = ’L’ or ’l’: the lower triangle of matrix A is stored.
n Integer. (input)
Number of columns in virtual matrix A. n ≥ 0.
nunita Integer. (input)
Unit number of the file that contains the virtual matrix A.
The virtual matrix A is a real array of dimension (lda, n). The virtual matrix file nunita is used
for both input and output. On entry, the virtual matrix contains the symmetric positive definite
matrix to be factored.
If uplo = ’U’ or ’u’, the leading n-by-n upper triangular part of array a contains the upper
triangular part of matrix A, and the strictly lower triangular part of a is not referenced.
If uplo = ’L’ or ’l’, the leading n-by-n lower triangular part of array a contains the lower
triangular part of matrix A, and the strictly upper triangular part of a is not referenced.
On exit, the triangular factor L or U from the Cholesky factorization of matrix A overwrites
virtual matrix A. Given L or U, you can write the factorization as one of the following:
T T
A = U . U, where U is an upper triangular matrix, and U is the transpose of U.
T T
A = L . L , where L is a lower triangular matrix, and L is the transpose of L.
1 ≤ nunita ≤ 99
You may use packed storage mode for the virtual matrix. See the NOTES section.

610 004– 2081– 002


VSPOTRF ( 3S ) VSPOTRF ( 3S )

lda Integer. (input)


Specifies the leading dimension of virtual matrix A. lda ≥ MAX(1, n).
info Integer. (output)
=0 Normal return.
>0 If info = k, the leading minor of order k is not positive definite, and the factorization could
not be completed.
<0 If – info = k, the kth argument has an illegal value.

NOTES
This routine allocates workspace of size (np . np )words ; np is the page size used by the virtual matrix
routines.
See INTRO_CORE(3S) for general information about Virtual BLAS and Virtual LAPACK out-of-core
software.
You can use packed storage mode to store virtual matrix A, reducing the required amount of disk space by
about 50%. The uplo argument itself does not necessarily imply that packed storage mode will be used;
however, if packed storage is used, the storage mode must agree with the value of uplo (that is, both ’U’ or
both ’L’).
To specify packed storage mode, you must use a prior call to the VSTORAGE(3S) routine.

SEE ALSO
INTRO_CORE(3S) for an introduction to the out-of-core routines, including usage examples
VSPOTRS(3S) to solve linear systems by using the factorization computed by VSPOTRF
VSTORAGE(3S) for a definition of the packed storage mode for the virtual matrix used by VSPOTRF
SPOTRF(3L) (available only online), which is the in-memory equivalent of the out-of-core routine VSPOTRF

004– 2081– 002 611


VSPOTRS ( 3S ) VSPOTRS ( 3S )

NAME
VSPOTRS – Solves a virtual system of linear equations with a symmetric positive definite matrix whose
Cholesky factorization has been computed by VSPOTRF(3S)

SYNOPSIS
CALL VSPOTRS (uplo, n, nrhs, nunita, lda, nunitb, ldb, info)

IMPLEMENTATION
UNICOS systems

DESCRIPTION
VSPOTRS is the virtual (out-of-core) implementation of the LAPACK routine. SPOTRS(3L). It solves a
system of linear equations AX = B; A is a symmetric positive definite virtual matrix (stored on I/O unit
nunita), whose Cholesky factorization has been computed by VSPOTRF(3S).
This routine has the following arguments:
uplo Character*1. (input)
Specifies whether the Cholesky factor stored in virtual matrix A is upper triangular or lower
triangular.
uplo = ’U’ or ’u’: the factor is upper triangular.
uplo = ’L’ or ’l’: the factor is lower triangular.
n Integer. (input)
Number of rows (or columns) of virtual matrix A. n ≥ 0.
nrhs Integer. (input)
Number of right-hand sides. (The number of columns of matrix B.) nrhs ≥ 0.
nunita Integer. (input)
Unit number of the virtual matrix of dimension (lda, n).
The virtual matrix contains a Cholesky factor of A. The virtual matrix file nunita is used only
for input. On entry, the virtual matrix must contain the upper or lower triangular Cholesky
factor of the original virtual matrix A, as computed by VSPOTRF(3S). 1 ≤ nunita ≤ 99.
lda Integer. (input)
Leading dimension of the virtual matrix A. lda ≥ MAX(1, n).
nunitb Integer. (input)
Unit number of the virtual matrix B of dimension (ldb, nrhs). The virtual matrix file nunitb is
used for both input and output. On entry, the virtual matrix stores the right-hand-side vectors
(columns of B) for the system of linear equations. On exit, the virtual matrix contains the
solution vectors (columns of X). 1 ≤ nunitb ≤ 99.

612 004– 2081– 002


VSPOTRS ( 3S ) VSPOTRS ( 3S )

ldb Integer. (input)


Leading dimension of virtual matrix B. ldb ≥ MAX(1, n).
info Integer. (output)
= 0 Normal return.
< 0 If – info = k, the kth argument has an illegal value.

NOTES
See INTRO_CORE(3S) for general information about the Virtual BLAS and Virtual LAPACK out-of-core
software.
You can use packed storage mode to store virtual matrix A, reducing the required amount of disk space by
about 50%. The uplo argument itself does not necessarily imply that packed storage mode will be used;
however, if packed storage is used, the storage mode must agree with the value of uplo (that is, both ’U’ or
both ’L’).
To specify packed storage mode, you must use a prior call to the VSTORAGE(3S) routine.

SEE ALSO
INTRO_CORE(3S) for an introduction to the out-of-core routines, including usage examples
VSPOTRF(3S) to compute the factorization used by VSPOTRS
VSTORAGE(3S) for a definition of the packed storage mode for the virtual matrix used by VSPOTRS
SPOTRS(3L) (available only online), which is the in-memory equivalent of the out-of-core routine VSPOTRS

004– 2081– 002 613


VSSYRK ( 3S ) VSSYRK ( 3S )

NAME
VSSYRK – Performs symmetric rank k update of a real or complex symmetric virtual matrix

SYNOPSIS
CALL VSSYRK (uplo, trans, n, k, alpha, nunita, ia1, ja1, lda, beta, nunitc, ic1, jc1, ldc)

IMPLEMENTATION
UNICOS systems

DESCRIPTION
VSSYRK is the virtual matrix equivalent of the Level 3 Basic Linear Algebra Subprogram (Level 3 BLAS)
routines SSYRK(3S). VSSYRK performs a symmetric rank k update of a real symmetric virtual matrix.
VSSYRK performs one of the following symmetric rank k operations:
T
C ← α AA + β C
T
C←αA A+βC
T
where A is the transpose of A, α and β are scalars, C is an n-by-n symmetric virtual matrix, and A is an
n-by-k virtual matrix in the first operation listed previously, or a k-by-n virtual matrix in the second.
This routine has the following arguments:
uplo Character*1. (input)
Specifies whether the upper or lower triangular part of virtual matrix C is referenced, as follows:
uplo = ’U’ or ’u’: only the upper triangular part of c is referenced.
uplo = ’L’ or ’l’: only the lower triangular part of c is referenced.
trans Character*1. (input)
Specifies the operation to be performed, as follows:
T
trans = ’N’ or ’n’: C ← α AA + β C
T
trans = ’T’ or ’t’: C ← α A A + β C
n Integer. (input)
Specifies the order of virtual matrix C (the number of rows or columns in C). n ≥ 0.
k Integer. (input)
On entry with trans = ’N’ or ’n’: k specifies the number of columns of matrix A.
On entry with trans = ’T’ or ’t’: k specifies the number of rows of matrix A.
k ≥ 0.
alpha VSSYRK: Real. (input)
Scalar factor α.

614 004– 2081– 002


VSSYRK ( 3S ) VSSYRK ( 3S )

nunita Integer. (input)


Unit number of the file that contains virtual matrix A of dimension (lda,m), in which m = k if
trans = ’N’ or ’n’, or m = n if trans = ’T’ or ’t’. The virtual matrix itself is a real (VSSYRK)
matrix. The virtual matrix file nunita is used only for input.
ia1 Integer. (input)
Row subscript of the first element of virtual matrix A.
ja1 Integer. (input)
Column subscript of the first element of virtual matrix A.
lda Integer. (input)
Leading dimension of virtual matrix A.
trans = ’N’ or ’n’: lda ≥ MAX(1,n).
trans = ’T’ or ’t’: lda ≥ MAX(1,k).
beta VSSYRK: Real. (input)
Scalar factor β.
nunitc Integer. (input)
Unit number of the file that contains virtual matrix C of dimension (ldc,n). The virtual matrix is
a real (VSSYRK) matrix. The virtual matrix file nunitc is used for both input and output.
ic1 Integer. (input)
Row subscript of the first element of virtual matrix C.
jc1 Integer. (input)
Column subscript of the first element of virtual matrix C.
ldc Integer. (input)
Leading dimension of virtual matrix C. ldc ≥ MAX(1,n).

NOTES
Each calling sequence is similar to that of the equivalent Level 3 BLAS routine, except that a real (in-
memory) matrix is specified by the following:
• Location (for example, A(I,J))
• Leading dimension (LDA)
A virtual (out-of-core) matrix is specified by the following:
• Unit number (NUNITA)
• Location within the file (IA1, JA1)
• Leading dimension (LDA)

004– 2081– 002 615


VSSYRK ( 3S ) VSSYRK ( 3S )

SEE ALSO
INTRO_CORE(3S) for an introduction to the out-of-core routines, including usage examples
SSYRK(3S) for the in-memory equivalent of the out-of-core routine VSSYRK

616 004– 2081– 002


VSTORAGE ( 3S ) VSTORAGE ( 3S )

NAME
VSTORAGE – Declares packed storage mode for a triangular, symmetric, or Hermitian (complex only)
virtual matrix

SYNOPSIS
CALL VSTORAGE (nunit, mode)

IMPLEMENTATION
UNICOS systems

DESCRIPTION
VSTORAGE declares the mode of packed storage, mode, on the I/O unit nunit that contains a triangular,
symmetric, or Hermitian virtual matrix. These packed storage modes are for use with out-of-core routines in
the Level 3 Virtual Basic Linear Algebra Subprograms (Virtual BLAS or VBLAS) and the Virtual LAPACK
routines.
Packed storage of a triangular or symmetric matrix means that only half of the matrix is actually stored. If a
real virtual matrix is declared to be lower triangular, only the lower triangle is stored; if upper triangular,
only the upper triangle is stored. If the matrix is symmetric, either the lower or upper triangular part is
stored.
Likewise, a complex matrix may be lower or upper triangular, or it may be symmetric, with only the lower
or upper triangle being stored. In addition, a complex matrix may be Hermitian (equal to the conjugate of
its transpose), with only the lower or upper triangle being stored.
When reading from or writing to a virtual matrix, the out-of-core routines do not have to distinguish between
a triangular, symmetric, or Hermitian matrix. They must know only which part of the matrix is being stored:
the full matrix, the lower triangle, or the upper triangle. This defines the three modes of storage for a virtual
matrix:
• Full. The full matrix is stored.
• Lower. Only the lower triangle is stored.
• Upper. Only the upper triangle is stored.
VSTORAGE associates one of these modes of storage with the unit number of a virtual matrix. Then,
whenever an out-of-core routine has that unit number as a virtual matrix argument, it handles the virtual
matrix according to the mode of storage declared in the VSTORAGE call (see the NOTES section).
This routine has the following arguments:
nunit Integer. (input)
Fortran unit number of a virtual matrix.
mode Character*1. (input)
The storage mode of the virtual matrix stored in nunit.

004– 2081– 002 617


VSTORAGE ( 3S ) VSTORAGE ( 3S )

mode = ’F’ or ’f’: full storage mode (the default) is used.


mode = ’L’ or ’l’: lower triangular mode is used.
mode = ’U’ or ’u’: upper triangular mode is used.
Only the first character of the argument is significant; any other characters are ignored. Uppercase
and lowercase mean the same thing.

NOTES
For a given virtual matrix, if VSTORAGE is used to set mode = ’L’ or ’l’, any out-of-core routine that
accesses that virtual matrix must not refer to any elements above the main diagonal. Similarly, if
VSTORAGE is used to set mode = ’U’ or ’u’, any out-of-core routine that accesses the virtual matrix must
not refer to any elements below the main diagonal. If the program tries to access any such elements, it will
terminate with the following error message:
Tried to acc ess the upp er part of a low er triang ular
matrix (or vice versa) , uni t num ber nunit.

Use one call to VSTORAGE for each virtual matrix, unless the virtual matrix uses full storage mode. In this
case, calling VSTORAGE is not necessary, because full matrix storage is the default.
The call (or calls) to VSTORAGE should occur right after the call to the VBEGIN(3S) routine. For a given
virtual matrix, any call to VSTORAGE must occur before the first reference to the virtual matrix. After the
mode is defined for a given virtual matrix, that matrix cannot change modes; the same mode applies to all
subsequent references to the matrix, up until the call to VEND(3S).
In the LAPACK and Level 3 BLAS routines, "packed storage" implies a linearized storage scheme. For the
Virtual LAPACK and VBLAS routines, "packed storage" is similar, but more complicated, because it is the
page structure of the virtual matrix binary file that is linearized; therefore, pages that correspond to the upper
(or lower) part of a triangular matrix are omitted.

EXAMPLES
The following program defines a virtual matrix on unit 1 to be stored in upper triangular packed mode, and
it sets all elements on or above the main diagonal to 1.

618 004– 2081– 002


VSTORAGE ( 3S ) VSTORAGE ( 3S )

PRO GRAM EXM PL1


INT EGER N, VA
PAR AME TER(VA = 1) ! UNIT NUMBER OF VIRTUAL MATRIX
PAR AMETER (N = 100 0) ! SIZ E OF VIR TUAL MAT RIX

INT EGER I
REA L X(N )

X = 1.0 ! SET VEC TOR X TO ALL 1’S.

CAL L VBE GIN ! INI TIALIZ E OUT-OF -CO RE ROUTIN ES


CALL VSTORA GE(VA, ’UP PER ’) ! UPPER TRIANGULA R STORAG E MODE

C FOR EACH ROW OF THE VIR TUA L MAT RIX :


DO, I = 1, N
C SET VA(I, I:N ) = X(I :N)
CALL SCOPY2 RV( 1, N+1-I, X(I ), 1, VA, I, I, N)
END DO

CAL L VEN D

END

SEE ALSO
INTRO_CORE(3S) for an introduction to the out-of-core routines, including usage examples
VBEGIN(3S), VEND(3S)

004– 2081– 002 619


VSTRSM ( 3S ) VSTRSM ( 3S )

NAME
VSTRSM, VCTRSM – Solves a virtual real or virtual complex triangular system of equations with multiple
right-hand sides

SYNOPSIS
CALL VSTRSM (side, uplo, transa, diag, m, n, alpha, nunita, ia1, ja1, lda, nunitb, ib1,
jb1, ldb)
CALL VCTRSM (side, uplo, transa, diag, m, n, alpha, nunita, ia1, ja1, lda, nunitb, ib1,
jb1, ldb)

IMPLEMENTATION
UNICOS systems

DESCRIPTION
VSTRSM solves a virtual real triangular system of equations with multiple right-hand sides. VCTRSM solves
a virtual complex triangular system of equations with multiple right-hand sides. VSTRSM and VCTRSM are
out-of-core versions of STRSM(3S) and CTRSM(3S), which are Level 3 Basic Linear Algebra Subprograms
(Level 3 BLAS).
VSTRSM and VCTRSM each solve one of the following matrix equations, using the operation associated with
each:

Equation Operation
op (A )X = α B B ← α op (A −1)B
X op (A ) = α B B ← α B op (A −1)
–1
where A is the inverse of A, α is a scalar, X and B are m-by-n matrices, A is either a unit or nonunit upper
or lower triangular matrix, and op(A) is one of the following:
op(A)=A
T
op(A)=A
H
op(A)=A (VCTRSM only)
T H
where A is the transpose of A, and A is the conjugate transpose of A.

620 004– 2081– 002


VSTRSM ( 3S ) VSTRSM ( 3S )

These routines have the following arguments:


side Character*1. (input)
Specifies whether op(A) appears on the left or right of X, as follows:
side = ’L’ or ’l’: op(A)X = αB.
side = ’R’ or ’r’: X*op(A) = αB.
uplo Character*1. (input)
Specifies whether matrix A is an upper or lower triangular matrix, as follows:
uplo = ’U’ or ’u’: A is an upper triangular matrix.
uplo = ’L’ or ’l’: A is a lower triangular matrix.
transa Character*1. (input)
Specifies the form of op(A) to be used in the matrix multiplication, as follows:
transa = ’N’ or ’n’: op(A) = A.
T
transa = ’T’ or ’t’: op(A) = A
T H
transa = ’C’ or ’c’: op(A) = A or op(A) = A (VCTRSM).
This argument can be of any length. Only the first character is significant (for example, ’n’ and
’notranspose’ have the same effect).
diag Character*1. (input)
Specifies whether A is unit triangular, as follows:
diag = ’U’ or ’u’: A is assumed to be unit triangular.
diag = ’N’ or ’n’: A is not assumed to be unit triangular.
m Integer. (input)
Specifies the number of rows in op(A) and B. m ≥ 0.
n Integer. (input)
Specifies the number of columns in B. n ≥ 0.
alpha VSTRSM: Real. (input)
VCTRSM: Complex. (input)
Scalar factor α.
nunita Integer. (input)
Fortran unit number of the file that contains the triangular virtual matrix A of dimension (lda,k),
in which k = m if trans = ’N’ or ’n’, or k = n if trans = ’T’ or ’t’. The virtual matrix itself is a
real (VSTRSM) or complex (VCTRSM) matrix. The virtual matrix file nunita is used only for
input.
ia1 Integer. (input)
Row subscript of the first element of A.
ja1 Integer. (input)
Column subscript of the first element of A.

004– 2081– 002 621


VSTRSM ( 3S ) VSTRSM ( 3S )

lda Integer. (input)


Specifies the first virtual dimension of virtual matrix A.
nunitb Integer. (input)
Fortran unit number of the file that contains the virtual matrix B of dimension (ldb, n). The
virtual matrix B itself is used for both input and output. On entry, the leading m-by-n part of
the virtual matrix B is the right-hand side matrix. On exit, the solution matrix X overwrites B.
ib1 Integer. (input)
Row subscript of the first element of B.
jb1 Integer. (input)
Column subscript of the first element of B.
ldb Integer. (input)
Specifies the first virtual dimension of virtual matrix B.

SEE ALSO
INTRO_CORE(3S) for an introduction to the out-of-core routines
STRSM(3S) for a description of STRSM(3S) and CTRSM(3S), which are the in-memory equivalents of the
out-of-core routines VSTRSM and VCTRSM

622 004– 2081– 002


INTRO_MACH ( 3S ) INTRO_MACH ( 3S )

NAME
INTRO_MACH – Introduction to machine constant functions

IMPLEMENTATION
UNICOS and UNICOS/mk systems

DESCRIPTION
These functions return machine constants for UNICOS systems.
The SLAMCH routine runs on UNICOS and UNICOS/mk systems. The R1MACH(3S) and SMACH(3S)
routines run only on UNICOS systems.
The following table contains the purpose and name of each machine constant function. The first routine is
the name of the man page that documents all of the listed routines.
• R1MACH: Returns machine constants
• SLAMCH: LAPACK routine (see INTRO_LAPACK(3S)) which returns a wide variety of machine
constants
• SMACH, CMACH: Returns machine epsilon, numerically safe small and large normalized numbers

004– 2081– 002 623


R1MACH ( 3S ) R1MACH ( 3S )

NAME
R1MACH – Returns UNICOS machine constants

SYNOPSIS
r = R1MACH (i)

IMPLEMENTATION
UNICOS systems (except Cray T90 systems that support IEEE arithmetic)

DESCRIPTION
The R1MACH function returns UNICOS machine constants.
This function has the following arguments:
r Real. (output)
Machine constant returned.
i Integer. (input)
Indicates the machine constant to be returned.
Must be an integer from 1 to 5; any other value prints an error message on standard output and
executes a Fortran STOP (thus aborting the program). The following lists the machine constant
returned for each valid value of i:
Value Machine Constant Returned
1 B**(EMIN– 1), the smallest positive magnitude
2 B**EMAX*(1 – B**(– T)), the largest magnitude
3 B**(– T), the smallest relative spacing
4 B**(1– T), the largest relative spacing
5 LOG10(B); B is the base or radix of the machine
where
B = Base of the machine
T = Number of base-B digits in the mantissa
EMIN = Minimum exponent before underflow
EMAX = Largest exponent before overflow
The constants that define the model of rounded floating-point arithmetic on Cray Research systems are as
follows:
B = 2 EMIN = -81 89
T = 47 EMAX = 8190

624 004– 2081– 002


R1MACH ( 3S ) R1MACH ( 3S )

Because of the characteristics of Cray Research floating-point hardware, the constant used for R1MACH(1)
is one bit larger than the smallest magnitude defined by the model. The constants that R1MACH returns, in
both decimal and the internal representation, are as follows:
R1M ACH (1) = 0.3 667207 735109 720E-2 465 020 003400 000 000 000 0001
R1M ACH (2) = 0.2 726870 339 048520 E+2466 057 776 777777 777 777 777 6
R1MACH (3) = 0.7105 427357 601 002E-1 4 037 722 400 000 000000 000 0
R1MACH (4) = 0.1421085 471520 200 E-1 3 037 723400 000 000 000000 0
R1MACH (5) = 0.3 010299 956639 813 E+0 0 037777 464202 324 117 572 0

SEE ALSO
SMACH(3S)

004– 2081– 002 625


SLAMCH ( 3S ) SLAMCH ( 3S )

NAME
SLAMCH – Determines single-precision machine parameters

SYNOPSIS
s = SLAMCH(cmach)

IMPLEMENTATION
UNICOS and UNICOS/mk systems

DESCRIPTION
SLAMCH determines single-precision machine parameters.
This routine accepts the following arguments:
cmach Specifies the value to be returned by SLAMCH.
CHARACTER*1. (input)
’B’ or ’b’ = base (base of the machine)
’E’ or ’e’ = eps (epsilon, relative machine precision, base*(1– t))
’L’ or ’l’ = emax (largest exponent before overflow)
’M’ or ’m’ = emin (minimum exponent before (gradual) overflow)
’N’ or ’n’ = t (number of (base) digits in the mantissa)
’O’ or ’o’ = rmax (overflow threshold, (base*emax)*(1– eps))
’P’ or ’p’ = prec (precision)
’R’ or ’r’ = rnd (1.0 if rounding occurs in addition; otherwise, 0.0)
’S’ or ’s’ = sfmin (safe minimum such that 1/sfmin does not overflow)
’U’ or ’u’ = rmin (underflow threshold, base*(emin– 1))

NOTES
The constants used to define the model of rounded floating-point arithmetic on UNICOS systems are as
follows:
base = 2
t = 47
emin = – 8189
emax = 8190
Two exceptions are the values used for sfmin and rmin. They are taken to be 1 bit larger than the smallest
magnitude defined by the model to ensure that reciprocal operations on these values do not become smaller
than sfmin or larger than rmax. The values returned by SLAMCH on UNICOS systems are as follows:

626 004– 2081– 002


SLAMCH ( 3S ) SLAMCH ( 3S )

SLA MCH (’B’) = 2


SLA MCH (’E’) = 0.1421085471520200E– 13
SLA MCH(’L’) = 8190
SLAMCH (’M’) = – 8189
SLAMCH (’N’) = 47
SLAMCH (’O’) = 0.2726870339048520E+2466
SLAMCH (’P’) = 0.2842170943040401E– 13
SLAMCH(’R’) = 0
SLA MCH(’S’) = 0.3667207735109720E– 2465
SLAMCH (’U’) = 0.3667207735109720E– 2465

The constants used to define the model of rounded floating-point arithmetic on UNICOS/mk systems are as
follows:
base = 2
t = 53
emin = – 1021
emax = 1023
The values returned by SLMACH are as follows:
SLA MCH (’B’) = 2
SLA MCH (’E’) = 0.222044604925031308E– 15
SLA MCH(’L’) = 1023
SLAMCH (’M’) = – 1021
SLAMCH (’N’) = 53
SLAMCH (’O’) = 0.898846567431157754E+308
SLAMCH (’P’) = 0.444089209850062616E– 15
SLAMCH(’R’) = 1
SLA MCH(’S’) = 0.222507385850720138E– 307
SLAMCH (’U’) = 0.222507385850720138E– 307

004– 2081– 002 627


SMACH ( 3S ) SMACH ( 3S )

NAME
SMACH, CMACH – Returns machine epsilon, small or large normalized numbers

SYNOPSIS
result = SMACH (int)
result = CMACH (int)

IMPLEMENTATION
UNICOS systems (except Cray T90 systems that support IEEE arithmetic)

DESCRIPTION
The SMACH and CMACH routines return machine epsilon, small or large normalized numbers.
These routines have the following arguments:
result Real. (output)
Machine constant returned.
int Integer. (input)
Selects machine constant to be returned.
1 ≤ int ≤ 3. Any other value returns an error message.
For SMACH, int indicates that one of the following machine constants is returned as result:
int result
1 0.7105427357601002E-14
The machine epsilon (the smallest positive machine number ε for which 1.0 ± ε ≠ 1.0).
2 0.1290284014791423E-2449
A "numerically safe" number close to the smallest normalized, representable number.
3 0.7750231643082450E+2450
A "numerically safe" number close to the largest normalized, representable number.
For CMACH, int indicates that one of the following machine constants is returned as result:
int result
1 0.7105427357601002E-14
The machine epsilon (the smallest positive machine number ε for which 1.0 ± ε ≠ 1.0).
2 0.1347558278913286E-1216
A "numerically safe" number close to the square root of the smallest normalized, representable
number.
3 0.7420829329967288E+1217
A "numerically safe" number close to the square root of the largest normalized, representable number.

628 004– 2081– 002


SMACH ( 3S ) SMACH ( 3S )

You can use CMACH(2) and CMACH(3) to prevent overflow during complex arithmetic.

SEE ALSO
Lawson, C. L., Hanson, R. J., Kincaid, D. R., and Krogh, F. T., "Basic Linear Algebra Subprograms for
Fortran Usage – An Extended Report," Sandia Technical Report SAND 77-0898, Sandia Laboratories,
Albuquerque, NM, 1977.

004– 2081– 002 629


630 004– 2081– 002
INTRO_SUPERSEDED ( 3S ) INTRO_SUPERSEDED ( 3S )

NAME
INTRO_SUPERSEDED – Introduction to superseded Scientific Library routines

IMPLEMENTATION
UNICOS systems

DESCRIPTION
The routines is this section are superseded by newer routines. Many routines and one software package
(LINPACK) are almost, but not totally, superseded. Each of these superseded routines or packages is
documented in another section, according to its purpose.
Each of these routines, whether fully, mostly, or partially superseded, is minimally supported to maintain
continuity.
These routines are not available on Cray T90 systems that support IEEE arithmetic.
Fully Superseded Routines
The following table contains the purpose and name of each superseded Scientific Library routine. Column 3
contains a reference to the preferred replacement for each superseded routine. Each superseded routine has
its own man page.

Superseded
Purpose routine Preferred routine

Gathers a vector from a source vector GATHER None needed (see GATHER(3S))
Solves a system of linear equations by inverting a MINV SGESV (see
square matrix INTRO_LAPACK(3S))
Multiplies a matrix by a vector (unit increments) MXV SGEMV(3S)
Multiplies a matrix by a vector (arbitrary increments) MXVA SGEMV(3S)
Multiplies a matrix by a matrix (unit increments) MXM SGEMM(3S)
Multiplies a matrix by a matrix (arbitrary increments) MXMA SGEMM(3S)
Multiplies a matrix by a column vector and adds the SMXPY SGEMV(3S)
result to another column vector
Multiplies a matrix by a row vector and adds the result SXMPY SGEMV(3S)
to another row vector
Scatters a vector into another vector SCATTER None needed (see SCATTER(3S))
Solves a tridiagonal system TRID SGTSV (see
INTRO_LAPACK(3S))

004– 2081– 002 631


INTRO_SUPERSEDED ( 3S ) INTRO_SUPERSEDED ( 3S )

Mostly Superseded and Partially Superseded Routines


The following table contains the purpose and name of each mostly superseded or partially superseded
Scientific Library routine. Column 3 contains a reference to the preferred replacement for each routine.
Each of the named routines has its own man page.
All individual routines listed here are Fast Fourier Transform (FFT) routines, which are partially or mostly
superseded by routines from the UNICOS Standard FFT package currently available only on UNICOS. The
routine package LINPACK(3S) is also listed. Most routines in this package are completely superseded by
routines from the more recent LAPACK package. A few LINPACK routines are not superseded at all.

Purpose Routine Preferred routine

Applies a complex-to-complex FFT CFFT CCFFT(3S)


Applies a complex-to-complex FFT CFFT2 CCFFT(3S)
Applies a complex-to-complex two-dimensional CFFT2D CCFFT2D(3S)
FFT
Applies a complex-to-complex three- CFFT3D CCFFT3D(3S)
dimensional FFT
Applies multiple complex-to-complex FFTs CFFTMLT CCFFTM(3S)
Applies a complex-to-real FFT CSFFT2 CSFFT (see SCFFT(3S))
Contains routines which solve real or complex LINPACK INTRO_LAPACK
dense linear systems
Applies multiple complex-to-complex FFTs MCFFT CCFFTM(3S)
Applies multiple real-to-complex or RFFTMLT SCFFTM or CSFFTM (see
complex-to-real FFTs SCFFTM(3S))
Applies a real-to-complex FFT SCFFT2 SCFFT(3S)

632 004– 2081– 002


GATHER ( 3S ) GATHER ( 3S )

NAME
GATHER – Gathers a vector from a source vector

SYNOPSIS
CALL GATHER (n, a, b, index)

IMPLEMENTATION
UNICOS systems (except Cray T90 systems that support IEEE arithmetic)

DESCRIPTION
GATHER is defined as follows:

ai = b j i
where i = 1, . . ., n

This routine has the following arguments:


n Integer. (input)
Number of elements in arrays a and index (not in b).
a Real or integer array of dimension n. (output)
Contains the result vector.
b Real or integer array of dimension max(index(i): i=1,. . .,n). (input)
Contains the source vector.
index Integer array of dimension n. (input)
Contains the vector of indices.
The Fortran equivalent of this routine is as follows:
DO 100 I=1 ,N
A(I )=B (INDEX (I))
100 CONTIN UE

CAUTIONS
You should not use this routine on systems that have Compress-Index Gather-Scatter (CIGS) hardware,
because it will degrade performance.

SEE ALSO
SCATTER(3S)

004– 2081– 002 633


MINV ( 3S ) MINV ( 3S )

NAME
MINV – Solves systems of linear equations by inverting a square matrix

SYNOPSIS
CALL MINV (ab, n, ldab, scratch, det, tol, m, mode)

IMPLEMENTATION
UNICOS systems (except Cray T90 systems that support IEEE arithmetic)

DESCRIPTION
MINV computes the determinant of a matrix A, subject to the restriction imposed by tol (see the CAUTIONS
section. You may also use it to solve systems of linear equations (if m > 0) or to compute the inverse of a
square matrix (if mode ≠ 0).
If m>0, MINV solves the following matrix equation:
AX = B

where B represents an n-by-m matrix of known values, and X represents an n-by-m matrix of unknowns for
which to solve.
You may consider each column of B to be the right-hand side values of a system of linear equations, and
each corresponding column of X to be the unknowns for the system of linear equations defined by A and the
corresponding column of A. On output, the solution matrix X overwrites the right-hand side matrix B.
If mode ≠ 0, MINV calculates A −1, which overwrites A. If mode = 0, A is still overwritten, but not by A −1.
This routine has the following arguments:
ab Real array of dimension (ldab,n+m). (input and output)
On input, ab contains the augmented matrix A:B. A is the square matrix to be inverted (if
mode ≠ 0), and B is the matrix whose columns are the right-hand sides for the systems of linear
equations to be solved.
On output, ab contains the augmented matrix Z:X. Z is either A −1 (mode ≠ 0), the inverse of A
(mode is nonzero), or some other n-by-n matrix replacing A. X is the matrix, each column of
which is the solution vector for the system of linear equations defined by the corresponding
column of B.
n Integer. (input)
Order of matrix A; that is, the number of rows in A (same as number of columns).
ldab Integer. (input)
Leading dimension of array ab.
ldab ≥ n .

634 004– 2081– 002


MINV ( 3S ) MINV ( 3S )

scratch Real array of dimension 2*n. (output)


Workspace for MINV.
det Real. (output)
Determinant of A, computed as the product of pivot elements.
tol Real. (input)
Lower limit for the determinant’s partial products.
A is declared singular when the partial product of pivot elements is less than or equal in
magnitude to this parameter, which should be positive.
m Integer. (input)
Number of columns in B.
m = 0 implies that no right-hand sides exist, hence no linear systems to solve.
mode Integer. (input)
Specifies whether the inverse of A is required.
In ab, the inverse of A overwrites A only if mode ≠0.
The following summarizes the effect of different combinations of parameter values:

Parameter values Results returned by MINV

m=0, mode=0 det(A)


m=0, mode≠0 det(A), A −1
m>0, mode=0 det(A), X = A −1B
m>0, mode≠0 det(A), A −1, X = A −1B

NOTES
MINV solves linear equations by using a partial pivot search (one unused row) and Gauss-Jordan reduction.
MINV is superseded by the LAPACK routines SGETRF(3L) and SGETRI(3L) (which together can calculate
the determinant and inverse of a general square matrix), or by the LAPACK routine SGESV(3L) (which
solves the matrix equation AX = B). LAPACK routines are preferred because they are the emerging de facto
standard linear systems interface. Using LAPACK routines will enhance your program’s portability, and also
should enhance its performance portability.
Man pages for the LAPACK routines SGETRF(3L), SGETRI(3L), SGESV(3L) are available only online,
using the man(1) command.

CAUTIONS
At each reduction step, MINV computes the partial product of pivot elements. If this product’s magnitude is
less than or equal to tol, MINV aborts computation. Therefore, if the value returned in det is less or equal in
magnitude to the value input as tol, MINV did not invert A or solve for X (although A:B may have been
overwritten); in this case, the value returned in det may not be the determinant of A.

004– 2081– 002 635


MINV ( 3S ) MINV ( 3S )

SEE ALSO
INTRO_LAPACK(3S) for more information and further references regarding the preferred routines
SGETRF(3L), SGETRI(3L), SGESV(3L) (available only online)
man(1) in the UNICOS User Commands Reference Manual
Partial Pivoting Linear Equation Solver (MINV), publication SN– 0215 (1980), which contains more
information on the algorithm used by MINV
Knuth, D.E., The Art of Computer Programming, Volume 1 (Fundamental Algorithms), Reading, MA:
Addison-Wesley, 1973; pp. 301– 302

636 004– 2081– 002


MXM ( 3S ) MXM ( 3S )

NAME
MXM – Computes matrix-times-matrix product (unit increments)

SYNOPSIS
CALL MXM (a, nra, b, nca, c, ncb)

IMPLEMENTATION
UNICOS systems (except Cray T90 systems that support IEEE arithmetic)

DESCRIPTION
MXM computes the nra-by-ncb matrix product C = AB of the nra-by-nca matrix A and the nca-by-ncb matrix
B.
This routine has the following arguments:
a Real array of dimension (nra,nca). (input)
Matrix A, the first factor.
nra Integer. (input)
Number of rows in A (same as number of rows in C).
b Real array of dimension (nca,ncb). (input)
Matrix B, the second factor.
nca Integer. (input)
Number of columns in A (same as number of rows in B).
c Real array of dimension (nra,ncb). (output)
Matrix C, the product AB.
ncb Integer. (input)
Number of columns in B (same as number of columns in C).

NOTES
You should use the Level 3 Basic Linear Algebra Subprogram (Level 3 BLAS) SGEMM(3S) rather than MXM.
BLAS routines are preferred because they are the de facto standard linear algebra interface. Using Level 3
BLAS routines will enhance your program’s portability, and also should enhance its performance portability.
For example,
CAL L MXM (A, NRA, B, NCA, C, NCB )

is equivalent to,
CALL SGEMM (’N ’, ’N’, NRA , NCB, NCA , 1.0 ,
$ A, NRA , B, NCA , 0.0, C, NRA )

004– 2081– 002 637


MXM ( 3S ) MXM ( 3S )

MXM is restricted to multiplying matrices that have elements stored by columns in successive memory
locations. MXMA(3S) is a general subroutine for multiplying matrices that can be used to multiply matrices
that do not satisfy the requirements of MXM (although SGEMM also supersedes MXMA). If B and C have only
one column, MXV(3S) or MXVA(3S) (both superseded by Level 2 BLAS routine SGEMV, see SGEMV(3S)) are
similar subroutines, each of which computes the product of a matrix and a vector.

CAUTIONS
The product must not overwrite either factor. For example, the following call will not (in general) assign the
product AB to A:
CALL MXM (A,NRA,B, NCA ,A,NCA)

SEE ALSO
MXMA(3S) to multiply less strictly declared matrices
MXV(3S), MXVA(3S) to perform a matrix-vector multiply
SGEMM(3S), which supersedes MXM and MXMA
SGEMV(3S), which supersedes MXV and MXVA

638 004– 2081– 002


MXMA ( 3S ) MXMA ( 3S )

NAME
MXMA – Computes matrix-times-matrix product (arbitrary increments)

SYNOPSIS
CALL MXMA (sa, iac, iar, sb, ibc, ibr, sc, icc, icr, nrp, m, ncp)

IMPLEMENTATION
UNICOS systems (except Cray T90 systems that support IEEE arithmetic)

DESCRIPTION
MXMA calculates the following nrp-by-ncp matrix product where A is a nrp-by-m matrix, and B is a m-by-ncp
matrix:
C = AB

This routine has the following arguments:


sa Real array of dimension ( max ( iac , iar ), m ). (input)
Contains matrix A, the first operand.
iac Integer. (input)
Memory increment in sa between adjacent column elements of A.
iar Integer. (input)
Memory increment in sa between adjacent row elements of A.
min ( iac , iar ) * nrp ≤ max ( iac , iar ).
sb Real array of dimension ( max ( ibc , ibr ), ncp ). (input)
Contains matrix B, the second operand.
ibc Integer. (input)
Memory increment in sb between adjacent column elements of B.
ibr Integer. (input)
Memory increment in sb between adjacent row elements of B.
min ( ibc , ibr ) * m ≤ max ( ibc , ibr ).
sc Real array of dimension ( max ( icc , icr ), ncp ). (output)
Array that receives C, the product AB.
icc Integer. (input)
Memory increment in sc between adjacent column elements of C.
icr Integer. (input)
Memory increment in sc between adjacent row elements of C.
min ( icc , icr ) * nrp ≤ max ( icc , icr ).

004– 2081– 002 639


MXMA ( 3S ) MXMA ( 3S )

nrp Integer. (input)


Number of rows in C (same as number of rows in A).
m Integer. (input)
Middle dimension: number of columns in A (same as number of rows in B).
ncp Integer. (input)
Number of columns in C (same as number of columns in B).

NOTES
You should use the Level 3 Basic Linear Algebra Subprogram (Level 3 BLAS) SGEMM (see SGEMM(3S))
rather than MXMA, because they are the de facto standard linear algebra interface. Using Level 3 BLAS
routines will enhance your program’s portability, and also should enhance its performance portability.
MXMA is a general subroutine for multiplying matrices. It can be used to compute a product of matrices in
which one or more of the operands or the product must be transposed. You can use MXMA to multiply any
matrices whose elements are not stored by columns in successive memory locations, provided only that the
elements of rows and columns are spaced by increments constant for each matrix. (The preferred routine,
SGEMM, also can do these operations.)
If B and C have only one column, MXVA(3S) (superseded by Level 2 BLAS routine SGEMV, see SGEMV(3S))
is a similarly general subroutine that computes the product of a matrix and a vector.
The product of matrices whose elements are stored by columns in successive memory locations can be
computed slightly faster using MXM(3S) (superseded by SGEMM) for matrices of more than one column or
MXV(3S) (superseded by SGEMV) for matrices B and C which have only one column.
The following subroutine calls are equivalent:
CALL MXM A(SA,1,NR P,S B,1,M,SC,1,N CP,NRP,M,NCP )

CALL MXM (SA,NRP,S B,M ,SC,NC P)

(The product elements computed by MXM are also stored by columns in successive memory locations.)

CAUTIONS
To be computed correctly, the product must not overwrite either operand. Thus, if ALPHA is a
one-dimensional array,
CALL MXM A(ALPHA,3 ,9, BETA,1,2,ALP HA(2),1,3, 3,2,2)

correctly computes the product of the matrices defined in ALPHA and BETA, whereas the following does not
(in general):
CAL L MXM A(A LPHA,3 ,9,BET A,1 ,2,ALPHA, 1,3 , 3,2 ,2)

640 004– 2081– 002


MXMA ( 3S ) MXMA ( 3S )

SEE ALSO
MXM(3S) to multiply more strictly declared matrices
MXV(3S), MXVA(3S) to perform a matrix-vector multiply
SGEMM(3S), which supersedes MXM and MXMA
SGEMV(3S), which supersedes MXV and MXVA

004– 2081– 002 641


MXV ( 3S ) MXV ( 3S )

NAME
MXV – Computes matrix-times-vector product (unit increments)

SYNOPSIS
CALL MXV (a, nra, b, nca, c)

IMPLEMENTATION
UNICOS systems (except Cray T90 systems that support IEEE arithmetic)

DESCRIPTION
MXV computes the nra vector product c = Ab of the nra-by-nca matrix A and the nca vector b.
This routine has the following arguments:
a Real array of dimension (nra,nca). (input)
Matrix factor.
nra Integer. (input)
Number of rows in the matrix.
b Real array of dimension nca. (input)
Vector factor.
nca Integer. (input)
Number of columns in the matrix.
c Real array of dimension nra. (output)
Vector product.

NOTES
You should use the Level 2 Basic Linear Algebra Subprogram (Level 2 BLAS) SGEMV (see SGEMV(3S))
rather than MXV, because they are the de facto standard linear algebra interface. Using Level 2 BLAS
routines will enhance your program’s portability, and also should enhance its performance portability. For
example,
CAL L MXV (A, NRA , B, NCA, C)

is equivalent to,
CALL SGEMV (’N’, NRA, NCA, 1.0 , A, NRA, B, 1, 0.0, C, 1)

MXV is restricted to using matrix and vector arguments that have elements stored by columns in successive
memory locations. MXVA(3S) is a general matrix-vector multiply subroutine that can use matrix and vector
arguments that do not satisfy the requirements of MXV (although SGEMV also supersedes MXVA).

642 004– 2081– 002


MXV ( 3S ) MXV ( 3S )

CAUTIONS

MXV is restricted to multiplying a vector that occupies successive memory locations (in order) by a matrix
whose elements are stored by columns in successive memory locations. MXVA is a general subroutine for
multiplying a matrix and a vector, which can be used to multiply a vector by a matrix stored with arbitrary
column and row increments.

SEE ALSO
MXM(3S), MXMA(3S) to perform a matrix-matrix multiply
MXVA(3S) to multiply with less strictly declared matrix and vector arguments
SGEMM(3S), which supersedes MXM and MXMA
SGEMV(3S), which supersedes MXV and MXVA

004– 2081– 002 643


MXVA ( 3S ) MXVA ( 3S )

NAME
MXVA – Computes matrix-times-vector product (arbitrary increments)

SYNOPSIS
CALL MXVA (sa, iac, iar, sb, ib, sc, ic, nra, nca)

IMPLEMENTATION
UNICOS systems (except Cray T90 systems that support IEEE arithmetic)

DESCRIPTION
MXVA calculates the following nra matrix-vector product:
c = Ab

where A is an nra-by-nca matrix, and b is an nca vector.


This routine has the following arguments:
sa Real array of dimension ( max ( iac , iar ), nca ). (input)
Contains matrix A, the first operand.
iac Integer. (input)
Memory increment in sa between adjacent column elements of A.
iar Integer. (input)
Memory increment in sa between adjacent row elements of A.
min ( iac , iar ) * nrp ≤ max ( iac , iar ).
sb Real array of dimension (nca − 1) * ib + 1. (input)
Contains vector b, the second operand.
ib Integer. (input)
Memory increment in sb between adjacent elements of b.
sc Real array of dimension (nra − 1) * ic + 1. (output)
Array that receives c, the product Ab.
ic Integer. (input)
Memory increment in sc between adjacent elements of c.
nra Integer. (input)
Number of rows in A (same as number of elements in c).
nca Integer. (input)
Number of columns in A (same as number of elements in b). Suppose sa is a two-dimensional array
defined to have leading dimension ldsa, as follows:
DIMENS ION SA( LDSA,N CA)

644 004– 2081– 002


MXVA ( 3S ) MXVA ( 3S )

Then
CAL L MXV A(SA,IAC, LDS A,SB,I B,SC,IC,NCA, NCA)

multiplies a square submatrix A of sa times a vector b from sb, storing the product c in sc, while
CAL L MXV A(SA,LDSA ,IA C,SB,I B,SC,IC,NCA, NCA)

computes the product c as A T times the same vector b.

NOTES
You should use the Level 2 Basic Linear Algebra Subprogram (Level 2 BLAS) SGEMV(3S) rather than
MXVA, because they are the de facto standard linear algebra interface. Using Level 2 BLAS routines will
enhance your program’s portability, and also should enhance its performance portability.
MXVA is a general matrix-vector multiply subroutine. As demonstrated earlier, you can use MXVA with a
matrix or its transpose. You can use MXVA to multiply any vector or matrix arguments whose elements are
not stored by columns in successive memory locations, provided only that the elements of rows and columns
are spaced by increments constant for each matrix. (The preferred routine, SGEMV, also can do these
operations.)
The the matrix-vector product whose elements are stored by columns in successive memory locations can be
computed slightly faster using MXV(3S) (superseded by SGEMV).
The following two subroutine calls have the same result:
CAL L MXVA(S A,1,NR A,S B,1 ,SC ,1,NRA ,NC A)

CAL L MXV (SA,NR A,SB,N CA, SC)

(The product elements computed by MXV are also stored in successive memory locations.)

CAUTIONS
To be computed correctly, the product must not overwrite either operand. Thus, for example, the following
call will not (in general) compute correctly the product of the matrix in sa and the vector in sb:
CAL L MXVA(S A,IAC, IAR ,SB ,IB ,SB,IB ,NR A,N CA)

SEE ALSO
MXM(3S), MXMA(3S) to perform a matrix-matrix multiply
MXV(3S) to multiply with more strictly declared matrix and vector arguments
SGEMM(3S), which supersedes MXM and MXMA
SGEMV(3S), which supersedes MXV and MXVA

004– 2081– 002 645


SCATTER ( 3S ) SCATTER ( 3S )

NAME
SCATTER – Scatters a vector into another vector

SYNOPSIS
CALL SCATTER (n, a, index, b)

IMPLEMENTATION
UNICOS systems (except Cray T90 systems that support IEEE arithmetic)

DESCRIPTION
SCATTER is defined as follows:
a j = bi
i
where i = 1, . . ., n
This routine has the following arguments:
n Integer. (input)
Number of elements in arrays index and b (not in a).
a Real or integer array of dimension max(index(i): i=1,. . .,n). (output)
Contains the result vector.
b Real or integer array of dimension n. (input)
Contains the source vector.
index Integer array of dimension n. (input)
Contains the vector of indices.
The Fortran equivalent of this routine is as follows:
DO 100 I=1 ,N
A(I NDEX(I ))= B(I)
100 CONTIN UE

CAUTIONS
You should not use this routine on systems that have Compress-Index Gather-Scatter (CIGS) hardware,
because it will degrade performance.

SEE ALSO
GATHER(3S)

646 004– 2081– 002


SMXPY ( 3S ) SMXPY ( 3S )

NAME
SMXPY – Multiplies a column vector by a matrix and adds the result to another column vector

SYNOPSIS
CALL SMXPY (n1, y, n2, ldam, x, am)

IMPLEMENTATION
UNICOS systems (except Cray T90 systems that support IEEE arithmetic)

DESCRIPTION
SMXPY performs the matrix-vector operation:
y ← y + Mx
where y is a vector of length n1, M is an n1-by-n2 matrix, and x is a vector of length n2.
This routine has the following arguments:
n1 Integer. (input)
Number of elements in y (same as number of rows in M).
y Real array of dimension n1. (input and output)
On input, y is the vector to be added to the product of M and x. On output, the result vector
overwrites y.
n2 Integer. (input)
Number of elements in x (same as number of columns in M).
ldam Integer. (input)
Leading dimension of array am, which contains the matrix M.
n1 ≤ ldam.
x Real array of dimension n2. (input)
Vector used in the matrix-vector product.
am Real array of dimension (ldam, n2). (input)
Contains the n1-by-n2 matrix M used in the matrix-vector product.

NOTES
You should use the Level 2 Basic Linear Algebra Subprogram (Level 2 BLAS) SGEMV (see SGEMV(3S))
rather than SMXPY, because they are the de facto standard linear algebra interface. Using Level 2 BLAS
routines will enhance your program’s portability, and also should enhance its performance portability.

004– 2081– 002 647


SMXPY ( 3S ) SMXPY ( 3S )

SEE ALSO
SGEMV(3S), which supersedes SMXPY
SXMPY(3S) (also superseded by SGEMV) to multiply a row vector by a matrix and add the result to another
row vector

648 004– 2081– 002


SXMPY ( 3S ) SXMPY ( 3S )

NAME
SXMPY – Multiplies a row vector by a matrix and adds the result to another row vector

SYNOPSIS
CALL SXMPY (n1, ldy, sy, n2, ldx, sx, ldam, am)

IMPLEMENTATION
UNICOS systems (except Cray T90 systems that support IEEE arithmetic)

DESCRIPTION
SXMPY performs the matrix-vector operation:
y <-- y + xM
where y is a row vector of length n1, x is a vector of length n2, and M is an n2-by-n1 matrix.
These "row vectors" would normally be written as transposes in the more conventional "column vector"
notation; however, SXMPY assumes that these vectors are actual rows from matrices Y and X, not merely lists
of elements considered to be a row for algebraic purposes. For some numbers l and m, the elements of y
and x are as follows:
yi = Yli for i = 1,. . .,n1 x j = Xm j for j = 1,. . .,n2
This routine has the following arguments:
n1 Integer. (input)
Number of columns in Y (same as number of elements in y, same as number of columns in M).
ldy Integer. (input)
Leading dimension of Y (same as increment between elements of y).
sy Real element from array of dimension (ldy, n1). (input and output)
sy locates the first element of the vector y; that is, Yl 1, or Y(l,1).
On input, y is the vector to be added to the product of x and M. On output, the result vector
overwrites y.
n2 Integer. (input)
Number of columns in X (same as number of elements in x, same as number of rows in M).
ldx Integer. (input)
Leading dimension of X (same as increment between elements of x).
sx Real element from array of dimension (ldx, n2). (input)
sx locates the first element of the vector x; that is, Xm 1, or X(m,1).
x is the row vector used in the vector-matrix product.
ldam Integer. (input)
Leading dimension of array am.
n2 ≤ ldam.

004– 2081– 002 649


SXMPY ( 3S ) SXMPY ( 3S )

am Real array of dimension (ldam, n1). (input)


Contains n2-by-n1 matrix M used in vector-matrix product.

NOTES
Cray Research recommends that you use the Level 2 Basic Linear Algebra Subprogram (Level 2 BLAS)
SGEMV (see SGEMV(3S)) rather than SXMPY, because they are the de facto standard linear algebra interface.
Using Level 2 BLAS routines will enhance your program’s portability, and also should enhance its
performance portability.

SEE ALSO
SGEMV(3S), which supersedes SXMPY
SMXPY(3S) (also superseded by SGEMV), to multiply a matrix by a column vector and add the result to
another column vector

650 004– 2081– 002


TRID ( 3S ) TRID ( 3S )

NAME
TRID – Solves a tridiagonal system

SYNOPSIS
CALL TRID (tl, tc, tr, inct, n, s, incs)

IMPLEMENTATION
UNICOS systems (except Cray T90 systems that support IEEE arithmetic)

DESCRIPTION
TRID solves a tridiagonal system for a single right-hand side by a combination of burn-at-both-ends and 3:1
cyclic reduction. 3:1 cyclic reduction is used until the size of the system is reduced to 40. Then the reduced
system is solved directly using a burn-at-both-ends algorithm. The remaining values are obtained by
backfilling. No type of pivoting is done.
This routine has the following arguments:
tl Real array of dimension (n– 1)*inct+1. (input)
Contains the lower off-diagonal of the tridiagonal matrix with tl(1) = 0.0.
tc Real array of dimension (n– 1)*inct+1. (input)
Contains the main diagonal of the tridiagonal matrix.
tr Real array of dimension (n– 1)*inct+1. (input)
Contains the upper off-diagonal of the tridiagonal matrix with tr(1+(n– 1)*inct) = 0.0.
inct Integer. (input)
Increment between elements of tl, tc, and tr.
Typically, inct = 1.
n Integer. (input)
Contains the dimension of the matrix system being solved.
s Real array of dimension (n– 1)*incs+1. (input and output)
On input, s contains the right-hand side values of the matrix system. On output, s contains the
solution of the matrix system.
incs Integer. (input)
Increment between elements of s.
Typically, incs = 1.

004– 2081– 002 651


TRID ( 3S ) TRID ( 3S )

NOTES
To perform this operation using the same algorithm, CRI recommends that you use the newer routine
SDTSOL(3S) rather than TRID. SDTSOL(3S) uses the same algorithm as TRID, but SDTSOL(3S) is part of
a larger package of tridiagonal system routines, including SDTTRF(3S) to factor the tridiagonal matrix, and
SDTTRS(3S) to solve systems based on that factorization. There are also complex versions of these
routines: CDTSOL(3S), CDTTRF(3S), and CDTTRS(3S).
To perform this operation for ill-conditioned systems, CRI recommends the LAPACK routine SGTSV(3L),
which uses partial pivoting for better numerical stability.
When calling TRID, the elements tl(1) and tr(1+(n– 1)*inct) must be allocated and set equal to 0.

EXAMPLES
The following examples show how to set up arguments tl, tc, and tr, given the tridiagonal matrix T. Let T
be the tridiagonal matrix:

 11 12 0 0 0 
 
 21 22 23 0 0 
T= 0 32 33 34 0 
 0 0 43 44 45 
 
 0 0 0 54 55 
Then to pass T to TRID (with inct = 1), set

0  11  12 
     
21  22  23 
tl = 32  tc = 33  tr = 34 
43  44  45 
     
54  55  0 
SEE ALSO
SDTSOL(3S), SDTTRF(3S), SDTTRS(3S) to factor and solve tridiagonal systems by using the same
algorithm as TRID
SGTSV(3L) (available only online) to solve tridiagonal system by using Gaussian elimination with partial
pivoting

652 004– 2081– 002


INDEX

2D array (ScaLAPACK) ................................................................................. descinit(3S) ..................................................... 362


2D grid computation (BLACS) ....................................................................... blacs_pcoord(3S) ............................................ 545
3D grid partition initialization (BLACS) ........................................................ gridinit3d(3S) ................................................ 548
3D processor grids (BLACS) .......................................................................... gridinfo3d(3S) ................................................ 547
a grid of processors ......................................................................................... blacs_gridmap(3S) ......................................... 544
Assigns default values to the parameter arguments for SITRSOL(3S) ......... dfaults(3S)........................................................ 461
BAKVEC(3S) .................................................................................................... eispack(3S)........................................................ 349
bakvec(3S) .................................................................................................... eispack(3S)........................................................ 349
BALANC(3S) .................................................................................................... eispack(3S)........................................................ 349
balanc(3S) .................................................................................................... eispack(3S)........................................................ 349
BALBAK(3S) .................................................................................................... eispack(3S)........................................................ 349
balbak(3S) .................................................................................................... eispack(3S)........................................................ 349
Banded symmetric systems of linear equations .............................................. eispack(3S)........................................................ 349
BANDR(3S) ...................................................................................................... eispack(3S)........................................................ 349
bandr(3S) ...................................................................................................... eispack(3S)........................................................ 349
BANDV(3S) ...................................................................................................... eispack(3S)........................................................ 349
bandv(3S) ...................................................................................................... eispack(3S)........................................................ 349
Basic Linear Algebra Communication Subprograms ...................................... intro_blacs(3S) .............................................. 535
Basic Linear Algebra Subprogram .................................................................. vssyrk(3S) .......................................................... 614
BISECT(3S) .................................................................................................... eispack(3S)........................................................ 349
bisect(3S) .................................................................................................... eispack(3S)........................................................ 349
BLACS ............................................................................................................ intro_blacs(3S) .............................................. 535
BLACS introduction ....................................................................................... intro_blacs(3S) .............................................. 535
BLACS(3S) ...................................................................................................... intro_blacs(3S) .............................................. 535
blacs(3S) ...................................................................................................... intro_blacs(3S) .............................................. 535
BLACS_BARRIER(3S) ................................................................................... blacs_barrier(3S) ......................................... 539
blacs_barrier(3S) ................................................................................... blacs_barrier(3S) ......................................... 539
BLACS_EXIT(3S) .......................................................................................... blacs_exit(3S) ................................................ 540
blacs_exit(3S) .......................................................................................... blacs_exit(3S) ................................................ 540
BLACS_GRIDEXIT(3S) ................................................................................. blacs_gridexit(3S) ....................................... 541
blacs_gridexit(3S) ................................................................................. blacs_gridexit(3S) ....................................... 541
BLACS_GRIDINFO(3S) ................................................................................. blacs_gridinfo(3S) ....................................... 542
blacs_gridinfo(3S) ................................................................................. blacs_gridinfo(3S) ....................................... 542
BLACS_GRIDINIT(3S) ................................................................................. blacs_gridinit(3S) ....................................... 543
blacs_gridinit(3S) ................................................................................. blacs_gridinit(3S) ....................................... 543
BLACS_GRIDMAP(3S) ................................................................................... blacs_gridmap(3S) ......................................... 544
blacs_gridmap(3S) ................................................................................... blacs_gridmap(3S) ......................................... 544
BLACS_PCOORD(3S) ...................................................................................... blacs_pcoord(3S) ............................................ 545
blacs_pcoord(3S) ...................................................................................... blacs_pcoord(3S) ............................................ 545
BLACS_PNUM(3S) .......................................................................................... blacs_pnum(3S) ................................................ 546
blacs_pnum(3S) .......................................................................................... blacs_pnum(3S) ................................................ 546
BLAS 3 ........................................................................................................... vssyrk(3S) .......................................................... 614
BQR(3S) ........................................................................................................... eispack(3S)........................................................ 349
bqr(3S) ........................................................................................................... eispack(3S)........................................................ 349
broadcast a trapezoidal rectangular matrix (BLACS) ..................................... itrbs2d(3S)........................................................ 566
broadcast general rectangular matrix (BLACS) .............................................. igebs2d(3S)........................................................ 556
Broadcasts a general rectangular matrix to all or a subset of processors ...... igebs2d(3S)........................................................ 556

004– 2081– 002 Index-1


Broadcasts a trapezoidal rectangular matrix to all or a subset of
processors ........................................................................................................ itrbs2d(3S)........................................................ 566
CBABK2(3S) .................................................................................................... eispack(3S)........................................................ 349
cbabk2(3S) .................................................................................................... eispack(3S)........................................................ 349
CBAL(3S) ........................................................................................................ eispack(3S)........................................................ 349
cbal(3S) ........................................................................................................ eispack(3S)........................................................ 349
cchdc(3S) ...................................................................................................... linpack(3S)........................................................ 355
cchdd(3S) ...................................................................................................... linpack(3S)........................................................ 355
cchex(3S) ...................................................................................................... linpack(3S)........................................................ 355
cchud(3S) ...................................................................................................... linpack(3S)........................................................ 355
CCOPY2RV(3S) ............................................................................................... scopy2rv(3S) ..................................................... 590
ccopy2rv(3S) ............................................................................................... scopy2rv(3S) ..................................................... 590
CCOPY2VR(3S) ............................................................................................... scopy2vr(3S) ..................................................... 593
ccopy2vr(3S) ............................................................................................... scopy2vr(3S) ..................................................... 593
CDTSOL(3S) .................................................................................................... sdtsol(3S) .......................................................... 518
cdtsol(3S) .................................................................................................... sdtsol(3S) .......................................................... 518
CDTTRF(3S) .................................................................................................... sdttrf(3S) .......................................................... 520
cdttrf(3S) .................................................................................................... sdttrf(3S) .......................................................... 520
CDTTRS(3S) .................................................................................................... sdttrs(3S) .......................................................... 523
cdttrs(3S) .................................................................................................... sdttrs(3S) .......................................................... 523
CG(3S) ............................................................................................................. eispack(3S)........................................................ 349
cg(3S) ............................................................................................................. eispack(3S)........................................................ 349
CGAMN2D(3S) ................................................................................................. igamn2d(3S)........................................................ 550
cgamn2d(3S) ................................................................................................. igamn2d(3S)........................................................ 550
CGAMX2D(3S) ................................................................................................. igamx2d(3S)........................................................ 552
cgamx2d(3S) ................................................................................................. igamx2d(3S)........................................................ 552
cgbco(3S) ...................................................................................................... linpack(3S)........................................................ 355
cgbdi(3S) ...................................................................................................... linpack(3S)........................................................ 355
cgbfa(3S) ...................................................................................................... linpack(3S)........................................................ 355
cgbsl(3S) ...................................................................................................... linpack(3S)........................................................ 355
CGEBR2D(3S) ................................................................................................. igebr2d(3S)........................................................ 554
cgebr2d(3S) ................................................................................................. igebr2d(3S)........................................................ 554
CGEBS2D(3S) ................................................................................................. igebs2d(3S)........................................................ 556
cgebs2d(3S) ................................................................................................. igebs2d(3S)........................................................ 556
cgeco(3S) ...................................................................................................... linpack(3S)........................................................ 355
cgedi(3S) ...................................................................................................... linpack(3S)........................................................ 355
cgefa(3S) ...................................................................................................... linpack(3S)........................................................ 355
CGERV2D(3S) ................................................................................................. igerv2d(3S)........................................................ 558
cgerv2d(3S) ................................................................................................. igerv2d(3S)........................................................ 558
CGESD2D(3S) ................................................................................................. igesd2d(3S)........................................................ 560
cgesd2d(3S) ................................................................................................. igesd2d(3S)........................................................ 560
cgesl(3S) ...................................................................................................... linpack(3S)........................................................ 355
CGMAX2D(3S) ................................................................................................. igamx2d(3S)........................................................ 552
cgmax2d(3S) ................................................................................................. igamx2d(3S)........................................................ 552
CGMIN2D(3S) ................................................................................................. igamn2d(3S)........................................................ 550
cgmin2d(3S) ................................................................................................. igamn2d(3S)........................................................ 550
CGSUM2D(3S) ................................................................................................. igsum2d(3S)........................................................ 562
cgsum2d(3S) ................................................................................................. igsum2d(3S)........................................................ 562
cgtsl(3S) ...................................................................................................... linpack(3S)........................................................ 355
CH(3S) ............................................................................................................. eispack(3S)........................................................ 349

Index-2 004– 2081– 002


ch(3S) ............................................................................................................. eispack(3S)........................................................ 349
chico(3S) ...................................................................................................... linpack(3S)........................................................ 355
chidi(3S) ...................................................................................................... linpack(3S)........................................................ 355
chifa(3S) ...................................................................................................... linpack(3S)........................................................ 355
chisl(3S) ...................................................................................................... linpack(3S)........................................................ 355
Cholesky factorization ..................................................................................... intro_lapack(3S) ............................................ 333
Cholesky factorization ..................................................................................... linpack(3S)........................................................ 355
Cholesky factorization (CORE) ...................................................................... vspotrf(3S)........................................................ 610
Cholesky factorization (CORE) ...................................................................... vspotrs(3S)........................................................ 612
Cholesky factorzation computation (ScaLAPACK) ........................................ pspotrf(3S)........................................................ 418
chpco(3S) ...................................................................................................... linpack(3S)........................................................ 355
chpdi(3S) ...................................................................................................... linpack(3S)........................................................ 355
chpfa(3S) ...................................................................................................... linpack(3S)........................................................ 355
chpsl(3S) ...................................................................................................... linpack(3S)........................................................ 355
CINVIT(3S) .................................................................................................... eispack(3S)........................................................ 349
cinvit(3S) .................................................................................................... eispack(3S)........................................................ 349
CMACH(3S) ...................................................................................................... smach(3S) ............................................................ 628
cmach(3S) ...................................................................................................... smach(3S) ............................................................ 628
Column vector ................................................................................................. smxpy(3S) ............................................................ 647
COMBAK(3S) .................................................................................................... eispack(3S)........................................................ 349
combak(3S) .................................................................................................... eispack(3S)........................................................ 349
COMHES(3S) .................................................................................................... eispack(3S)........................................................ 349
comhes(3S) .................................................................................................... eispack(3S)........................................................ 349
COMLR2(3S) .................................................................................................... eispack(3S)........................................................ 349
comlr2(3S) .................................................................................................... eispack(3S)........................................................ 349
COMLR(3S) ...................................................................................................... eispack(3S)........................................................ 349
comlr(3S) ...................................................................................................... eispack(3S)........................................................ 349
complex distributed matrix inverse computation (ScaLAPACK) ................... psgetri(3S)........................................................ 408
complex distributed matrix LQ factorization (ScaLAPACK) ......................... psgelqf(3S)........................................................ 387
complex distributed matrix LU factorization (ScaLAPACK) ......................... psgetrf(3S)........................................................ 405
complex distributed matrix QL factorization (ScaLAPACK) ......................... psgeqlf(3S)........................................................ 390
complex distributed matrix QR factorization (ScaLAPACK) ........................ psgeqpf(3S)........................................................ 393
complex distributed matrix QR factorization (ScaLAPACK) ........................ psgeqrf(3S)........................................................ 396
complex distributed matrix QR factorization (ScaLAPACK) ........................ psgerqf(3S)........................................................ 399
complex distributed matrix reduction (ScaLAPACK) .................................... psgebrd(3S)........................................................ 382
complex distributed system of linear equations solution (ScaLAPACK) ....... psgetrs(3S)........................................................ 411
complex distributed triangular system computation (ScaLAPACK) .............. pstrtrs(3S)........................................................ 449
complex Hermitian distributed matrix reduction (ScaLAPACK) ................... pssytrd(3S)........................................................ 442
complex Hermitian matrix inverse (ScaLAPACK) ........................................ pspotri(3S)........................................................ 421
complex Hermitian positive definite distributed matrix (ScaLAPACK) ........ pspotrf(3S)........................................................ 418
complex Hermitian positive definite system solution (ScaLAPACK) ............ pspotrs(3S)........................................................ 424
complex system computation (ScaLAPACK) ................................................. psgesv(3S) .......................................................... 402
complex triangular distributed matrix inverse computation (ScaLAPACK) .. pstrtri(3S)........................................................ 446
compute grid coordinates (BLACS) ............................................................... pcoord3d(3S) ..................................................... 573
compute local matrix (ScaLAPACK) ............................................................. numroc(3S) .......................................................... 365
compute processing element coordinate (ScaLAPACK) ................................ indxg2p(3S)........................................................ 364
Computes a QL factorization of a real or complex distributed matrix ........... psgeqlf(3S)........................................................ 390
Computes a QR factorization of a real or complex distributed matrix ........... psgeqrf(3S)........................................................ 396
Computes a QR factorization with column pivoting of a real or complex
distributed matrix ............................................................................................ psgeqpf(3S)........................................................ 393

004– 2081– 002 Index-3


Computes a RQ factorization of a real or complex distributed matrix ........... psgerqf(3S)........................................................ 399
Computes an LQ factorization of a real or complex distributed matrix ......... psgelqf(3S)........................................................ 387
Computes an LU factorization of a real or complex distributed matrix ......... psgetrf(3S)........................................................ 405
Computes an LU factorization of a virtual general matrix with real or
complex elements, using partial pivoting with row interchanges ................... vsgetrf(3S)........................................................ 604
Computes coordinates in two-dimensional grids ............................................ blacs_pcoord(3S) ............................................ 545
Computes matrix-times-matrix product (arbitrary increments) ....................... mxma(3S)............................................................... 639
Computes matrix-times-matrix product (unit increments) .............................. mxm(3S) ................................................................. 637
Computes matrix-times-vector product (arbitrary increments) ....................... mxva(3S)............................................................... 644
Computes matrix-times-vector product (unit increments) .............................. mxv(3S) ................................................................. 642
Computes selected eigenvalues and eigenvectors of a Hermitian-definite
eigenproblem ................................................................................................... pcheevx(3S)........................................................ 366
Computes selected eigenvalues and eigenvectors of a Hermitian-definite
generalized eigenproblem ................................................................................ pchegvx(3S)........................................................ 374
Computes selected eigenvalues and eigenvectors of a real symmetric
matrix .............................................................................................................. pssyevx(3S)........................................................ 427
Computes selected eigenvalues and eigenvectors of a real symmetric-
definite generalized eigenproblem ................................................................... pssygvx(3S)........................................................ 434
Computes the Cholesky factorization of a real symmetric or complex
Hermitian positive definite distributed matrix ................................................ pspotrf(3S)........................................................ 418
Computes the Cholesky factorization of a real symmetric positive definite
virtual matrix ................................................................................................... vspotrf(3S)........................................................ 610
Computes the coordinate of the processing element (PE) that possesses
the entry of a distributed matrix ..................................................................... indxg2p(3S)........................................................ 364
Computes the inverse of a real or complex distributed matrix ...................... psgetri(3S)........................................................ 408
Computes the inverse of a real or complex upper or lower triangular
distributed matrix ............................................................................................ pstrtri(3S)........................................................ 446
Computes the inverse of a real symmetric or complex Hermitian positive
definite distributed matrix ............................................................................... pspotri(3S)........................................................ 421
Computes the number of rows or columns of a distributed matrix owned
locally .............................................................................................................. numroc(3S) .......................................................... 365
Computes the solution to a real or complex system of linear equations ....... psgesv(3S) .......................................................... 402
Computes three-dimensional (3D) processor grid coordinates ...................... pcoord3d(3S) ..................................................... 573
Condition number ............................................................................................ intro_lapack(3S) ............................................ 333
Conjugate gradient .......................................................................................... dfaults(3S)........................................................ 461
Conjugate gradient .......................................................................................... sitrsol(3S)........................................................ 466
Constant ........................................................................................................... r1mach(3S) .......................................................... 624
Constants ......................................................................................................... intro_mach(3S) ................................................ 623
Copies a submatrix of a real or complex matrix in memory into a virtual
matrix .............................................................................................................. scopy2rv(3S) ..................................................... 590
Copies a submatrix of a virtual matrix to a real or complex (in memory)
matrix .............................................................................................................. scopy2vr(3S) ..................................................... 593
Copying matrices ............................................................................................ scopy2rv(3S) ..................................................... 590
Copying matrices ............................................................................................ scopy2vr(3S) ..................................................... 593
CORTB(3S) ...................................................................................................... eispack(3S)........................................................ 349
cortb(3S) ...................................................................................................... eispack(3S)........................................................ 349
CORTH(3S) ...................................................................................................... eispack(3S)........................................................ 349
corth(3S) ...................................................................................................... eispack(3S)........................................................ 349
cpbco(3S) ...................................................................................................... linpack(3S)........................................................ 355
cpbdi(3S) ...................................................................................................... linpack(3S)........................................................ 355

Index-4 004– 2081– 002


cpbfa(3S) ...................................................................................................... linpack(3S)........................................................ 355
cpbsl(3S) ...................................................................................................... linpack(3S)........................................................ 355
cpoco(3S) ...................................................................................................... linpack(3S)........................................................ 355
cpodi(3S) ...................................................................................................... linpack(3S)........................................................ 355
cpofa(3S) ...................................................................................................... linpack(3S)........................................................ 355
cposl(3S) ...................................................................................................... linpack(3S)........................................................ 355
cppco(3S) ...................................................................................................... linpack(3S)........................................................ 355
cppdi(3S) ...................................................................................................... linpack(3S)........................................................ 355
cppfa(3S) ...................................................................................................... linpack(3S)........................................................ 355
cppsl(3S) ...................................................................................................... linpack(3S)........................................................ 355
cptsl(3S) ...................................................................................................... linpack(3S)........................................................ 355
cqrdc(3S) ...................................................................................................... linpack(3S)........................................................ 355
cqrsl(3S) ...................................................................................................... linpack(3S)........................................................ 355
cspco(3S) ...................................................................................................... linpack(3S)........................................................ 355
cspdi(3S) ...................................................................................................... linpack(3S)........................................................ 355
cspfa(3S) ...................................................................................................... linpack(3S)........................................................ 355
cspsl(3S) ...................................................................................................... linpack(3S)........................................................ 355
csvdc(3S) ...................................................................................................... linpack(3S)........................................................ 355
CTRBR2D(3S) ................................................................................................. itrbr2d(3S)........................................................ 564
ctrbr2d(3S) ................................................................................................. itrbr2d(3S)........................................................ 564
CTRBS2D(3S) ................................................................................................. itrbs2d(3S)........................................................ 566
ctrbs2d(3S) ................................................................................................. itrbs2d(3S)........................................................ 566
CTRRV2D(3S) ................................................................................................. itrrv2d(3S)........................................................ 568
ctrrv2d(3S) ................................................................................................. itrrv2d(3S)........................................................ 568
CTRSD2D(3S) ................................................................................................. itrsd2d(3S)........................................................ 570
ctrsd2d(3S) ................................................................................................. itrsd2d(3S)........................................................ 570
Declares packed storage mode for a triangular, symmetric, or Hermitian
(complex only) virtual matrix ......................................................................... vstorage(3S) ..................................................... 616
Dense ............................................................................................................... linpack(3S)........................................................ 355
Dense linear algebra ........................................................................................ intro_lapack(3S) ............................................ 333
Dense linear system ........................................................................................ linpack(3S)........................................................ 355
Dense linear system ........................................................................................ vsgetrs(3S)........................................................ 608
Dense linear system solvers ............................................................................ intro_lapack(3S) ............................................ 333
Dense linear systems ....................................................................................... intro_lapack(3S) ............................................ 333
Dense solver .................................................................................................... linpack(3S)........................................................ 355
Dense solver .................................................................................................... vsgetrs(3S)........................................................ 608
DESCINIT(3S) ............................................................................................... descinit(3S) ..................................................... 362
descinit(3S) ............................................................................................... descinit(3S) ..................................................... 362
determine maximum value (BLACS) ............................................................. igamx2d(3S)........................................................ 552
determine minimum absolute values (BLACS) .............................................. igamn2d(3S)........................................................ 550
Determines maximum absolute values of rectangular matrices ...................... igamx2d(3S)........................................................ 552
Determines minimum absolute values of rectangular matrices ...................... igamn2d(3S)........................................................ 550
Determines single-precision machine parameters ........................................... slamch(3S) .......................................................... 626
DFAULTS(3S) ................................................................................................. dfaults(3S)........................................................ 461
dfaults(3S) ................................................................................................. dfaults(3S)........................................................ 461
Direct ............................................................................................................... ssgetrf(3S)........................................................ 482
Direct ............................................................................................................... ssgetrs(3S)........................................................ 487
Direct ............................................................................................................... sspotrf(3S)........................................................ 489
Direct ............................................................................................................... sspotrs(3S)........................................................ 494
Direct ............................................................................................................... sststrf(3S)........................................................ 496

004– 2081– 002 Index-5


Direct ............................................................................................................... sststrs(3S)........................................................ 501
Direct sparse solver ......................................................................................... intro_sparse(3S) ............................................ 453
Direct sparse solver ......................................................................................... ssgetrf(3S)........................................................ 482
Direct sparse solver ......................................................................................... ssgetrs(3S)........................................................ 487
Direct sparse solver ......................................................................................... sspotrf(3S)........................................................ 489
Direct sparse solver ......................................................................................... sspotrs(3S)........................................................ 494
Direct sparse solver ......................................................................................... sststrf(3S)........................................................ 496
Direct sparse solver ......................................................................................... sststrs(3S)........................................................ 501
distributed matrix (ScaLAPACK) ................................................................... numroc(3S) .......................................................... 365
Eigenvalue problem ........................................................................................ eispack(3S)........................................................ 349
Eigenvalues ..................................................................................................... eispack(3S)........................................................ 349
eigenvalues and eigenvectors computation (ScaLAPACK) ............................ pcheevx(3S)........................................................ 366
eigenvalues and eigenvectors computation (ScaLAPACK) ............................ pchegvx(3S)........................................................ 374
eigenvalues and eigenvectors computation (ScaLAPACK) ............................ pssyevx(3S)........................................................ 427
eigenvalues and eigenvectors computation (ScaLAPACK) ............................ pssygvx(3S)........................................................ 434
Eigenvectors .................................................................................................... eispack(3S)........................................................ 349
EISPACK(3S) ................................................................................................. eispack(3S)........................................................ 349
eispack(3S) ................................................................................................. eispack(3S)........................................................ 349
element summation operations (BLACS) ....................................................... igsum2d(3S)........................................................ 562
ELMBAK(3S) .................................................................................................... eispack(3S)........................................................ 349
elmbak(3S) .................................................................................................... eispack(3S)........................................................ 349
ELMHES(3S) .................................................................................................... eispack(3S)........................................................ 349
elmhes(3S) .................................................................................................... eispack(3S)........................................................ 349
ELTRAN(3S) .................................................................................................... eispack(3S)........................................................ 349
eltran(3S) .................................................................................................... eispack(3S)........................................................ 349
Factorization .................................................................................................... vsgetrf(3S)........................................................ 604
Factorization .................................................................................................... vspotrf(3S)........................................................ 610
Factors a real sparse general matrix with a symmetric nonzero pattern (no
form of pivoting is implemented) ................................................................... sststrf(3S)........................................................ 496
Factors a real sparse general matrix with threshold pivoting implemented ... ssgetrf(3S)........................................................ 482
Factors a real sparse symmetric definite matrix ............................................. sspotrf(3S)........................................................ 489
Factors a real-valued or complex-valued tridiagonal system ......................... sdttrf(3S) .......................................................... 520
FIGI2(3S) ...................................................................................................... eispack(3S)........................................................ 349
figi2(3S) ...................................................................................................... eispack(3S)........................................................ 349
FIGI(3S) ........................................................................................................ eispack(3S)........................................................ 349
figi(3S) ........................................................................................................ eispack(3S)........................................................ 349
First-order linear recurrence ............................................................................ folrc(3S) ............................................................ 511
First-order linear recurrence ............................................................................ folrn(3S) ............................................................ 513
First-order linear recurrences .......................................................................... folr(3S)............................................................... 504
First-order linear recurrences .......................................................................... folr2(3S) ............................................................ 509
FOLR2(3S) ...................................................................................................... folr2(3S) ............................................................ 509
folr2(3S) ...................................................................................................... folr2(3S) ............................................................ 509
FOLR2P(3S) .................................................................................................... folr2(3S) ............................................................ 509
folr2p(3S) .................................................................................................... folr2(3S) ............................................................ 509
FOLR(3S) ........................................................................................................ folr(3S)............................................................... 504
folr(3S) ........................................................................................................ folr(3S)............................................................... 504
FOLRC(3S) ...................................................................................................... folrc(3S) ............................................................ 511
folrc(3S) ...................................................................................................... folrc(3S) ............................................................ 511
FOLRN(3S) ...................................................................................................... folrn(3S) ............................................................ 513
folrn(3S) ...................................................................................................... folrn(3S) ............................................................ 513

Index-6 004– 2081– 002


FOLRNP(3S) .................................................................................................... folrn(3S) ............................................................ 513
folrnp(3S) .................................................................................................... folrn(3S) ............................................................ 513
FOLRP(3S) ...................................................................................................... folr(3S)............................................................... 504
folrp(3S) ...................................................................................................... folr(3S)............................................................... 504
free grid (BLACS) .......................................................................................... blacs_gridexit(3S) ....................................... 541
Frees a grid ..................................................................................................... blacs_gridexit(3S) ....................................... 541
Frees all existing grids .................................................................................... blacs_exit(3S) ................................................ 540
Gather a vector ................................................................................................ gather(3S) .......................................................... 633
GATHER(3S) .................................................................................................... gather(3S) .......................................................... 633
gather(3S) .................................................................................................... gather(3S) .......................................................... 633
Gathers a vector from a source vector ............................................................ gather(3S) .......................................................... 633
Global reduction routines ................................................................................ intro_blacs(3S) .............................................. 535
GRIDINFO3D(3S) .......................................................................................... gridinfo3d(3S) ................................................ 547
gridinfo3d(3S) .......................................................................................... gridinfo3d(3S) ................................................ 547
GRIDINIT3D(3S) .......................................................................................... gridinit3d(3S) ................................................ 548
gridinit3d(3S) .......................................................................................... gridinit3d(3S) ................................................ 548
halt execution (BLACS) .................................................................................. blacs_barrier(3S) ......................................... 539
Handles terminal processing for the out-of-core routines .............................. vend(3S)............................................................... 598
Hermitian matrix ............................................................................................. vstorage(3S) ..................................................... 616
Horner’s method ............................................................................................. folrn(3S) ............................................................ 513
Horner’s rule ................................................................................................... folrn(3S) ............................................................ 513
HTRIB3(3S) .................................................................................................... eispack(3S)........................................................ 349
htrib3(3S) .................................................................................................... eispack(3S)........................................................ 349
HTRIBK(3S) .................................................................................................... eispack(3S)........................................................ 349
htribk(3S) .................................................................................................... eispack(3S)........................................................ 349
HTRID3(3S) .................................................................................................... eispack(3S)........................................................ 349
htrid3(3S) .................................................................................................... eispack(3S)........................................................ 349
HTRIDI(3S) .................................................................................................... eispack(3S)........................................................ 349
htridi(3S) .................................................................................................... eispack(3S)........................................................ 349
IGAMN2D(3S) ................................................................................................. igamn2d(3S)........................................................ 550
igamn2d(3S) ................................................................................................. igamn2d(3S)........................................................ 550
IGAMX2D(3S) ................................................................................................. igamx2d(3S)........................................................ 552
igamx2d(3S) ................................................................................................. igamx2d(3S)........................................................ 552
IGEBR2D(3S) ................................................................................................. igebr2d(3S)........................................................ 554
igebr2d(3S) ................................................................................................. igebr2d(3S)........................................................ 554
IGEBS2D(3S) ................................................................................................. igebs2d(3S)........................................................ 556
igebs2d(3S) ................................................................................................. igebs2d(3S)........................................................ 556
IGERV2D(3S) ................................................................................................. igerv2d(3S)........................................................ 558
igerv2d(3S) ................................................................................................. igerv2d(3S)........................................................ 558
IGESD2D(3S) ................................................................................................. igesd2d(3S)........................................................ 560
igesd2d(3S) ................................................................................................. igesd2d(3S)........................................................ 560
igmax2d(3S) ................................................................................................. igamx2d(3S)........................................................ 552
igmin2d(3S) ................................................................................................. igamn2d(3S)........................................................ 550
IGSUM2D(3S) ................................................................................................. igsum2d(3S)........................................................ 562
igsum2d(3S) ................................................................................................. igsum2d(3S)........................................................ 562
IMTQL1(3S) .................................................................................................... eispack(3S)........................................................ 349
imtql1(3S) .................................................................................................... eispack(3S)........................................................ 349
IMTQL2(3S) .................................................................................................... eispack(3S)........................................................ 349
imtql2(3S) .................................................................................................... eispack(3S)........................................................ 349
IMTQLV(3S) .................................................................................................... eispack(3S)........................................................ 349

004– 2081– 002 Index-7


imtqlv(3S) .................................................................................................... eispack(3S)........................................................ 349
INDXG2P(3S) ................................................................................................. indxg2p(3S)........................................................ 364
indxg2p(3S) ................................................................................................. indxg2p(3S)........................................................ 364
initialization routine (BLACS) ........................................................................ blacs_gridinit(3S) ....................................... 543
Initialization routines (CORE) ........................................................................ vbegin(3S) .......................................................... 595
initialize descriptor vector ............................................................................... descinit(3S) ..................................................... 362
Initializes a descriptor vector of a distributed two-dimensional array ........... descinit(3S) ..................................................... 362
Initializes counters, variables, and so on, for the BLACS routines ............... blacs_gridinit(3S) ....................................... 543
Initializes the out-of-core routine data structures ........................................... vbegin(3S) .......................................................... 595
Initializes variables for a three-dimensional (3D) grid partition of
processor set .................................................................................................... gridinit3d(3S) ................................................ 548
INTRO_BLACS(3S) ........................................................................................ intro_blacs(3S) .............................................. 535
intro_blacs(3S) ........................................................................................ intro_blacs(3S) .............................................. 535
INTRO_CORE(3S) .......................................................................................... intro_core(3S) ................................................ 575
intro_core(3S) .......................................................................................... intro_core(3S) ................................................ 575
Introduction to Basic Linear Algebra Communication Subprograms ........... intro_blacs(3S) .............................................. 535
introduction to BLACS routines ..................................................................... intro_blacs(3S) .............................................. 535
Introduction to Eigensystem computation for dense linear systems ............... eispack(3S)........................................................ 349
Introduction to LAPACK solvers for dense linear systems ........................... intro_lapack(3S) ............................................ 333
Introduction to machine constant functions .................................................... intro_mach(3S) ................................................ 623
introduction to scalar lapack ........................................................................... intro_scalapack(3S) ..................................... 359
Introduction to solvers for sparse linear systems ........................................... intro_sparse(3S) ............................................ 453
Introduction to solvers for special linear systems .......................................... intro_spec(3S) ................................................ 503
Introduction to superseded Scientific Library routines ................................... intro_superseded(3S) .................................. 631
Introduction to the Cray Research Scientific Library out-of-core routines
for linear algebra ............................................................................................. intro_core(3S) ................................................ 575
Introduction to the ScaLAPACK routines for distributed matrix
computations ................................................................................................... intro_scalapack(3S) ..................................... 359
INTRO_LAPACK(3S) ...................................................................................... intro_lapack(3S) ............................................ 333
intro_lapack(3S) ...................................................................................... intro_lapack(3S) ............................................ 333
INTRO_MACH(3S) .......................................................................................... intro_mach(3S) ................................................ 623
intro_mach(3S) .......................................................................................... intro_mach(3S) ................................................ 623
INTRO_SCALAPACK(3S) .............................................................................. intro_scalapack(3S) ..................................... 359
intro_scalapack(3S) .............................................................................. intro_scalapack(3S) ..................................... 359
INTRO_SPARSE(3S) ...................................................................................... intro_sparse(3S) ............................................ 453
intro_sparse(3S) ...................................................................................... intro_sparse(3S) ............................................ 453
INTRO_SPEC(3S) .......................................................................................... intro_spec(3S) ................................................ 503
intro_spec(3S) .......................................................................................... intro_spec(3S) ................................................ 503
INTRO_SUPERSEDED(3S) ............................................................................ intro_superseded(3S) .................................. 631
intro_superseded(3S) ............................................................................ intro_superseded(3S) .................................. 631
Inverse ............................................................................................................. intro_lapack(3S) ............................................ 333
inverse computation (ScaLAPACK) ............................................................... pspotri(3S)........................................................ 421
inverse computation (ScaLAPACK) ............................................................... pstrtri(3S)........................................................ 446
Inverse of square matrix ................................................................................. minv(3S)............................................................... 634
INVIT(3S) ...................................................................................................... eispack(3S)........................................................ 349
invit(3S) ...................................................................................................... eispack(3S)........................................................ 349
Iterative ........................................................................................................... dfaults(3S)........................................................ 461
Iterative ........................................................................................................... sitrsol(3S)........................................................ 466
Iterative sparse solver ..................................................................................... intro_sparse(3S) ............................................ 453
Iterative sparse solver ..................................................................................... dfaults(3S)........................................................ 461

Index-8 004– 2081– 002


Iterative sparse solver ..................................................................................... sitrsol(3S)........................................................ 466
ITRBR2D(3S) ................................................................................................. itrbr2d(3S)........................................................ 564
itrbr2d(3S) ................................................................................................. itrbr2d(3S)........................................................ 564
ITRBS2D(3S) ................................................................................................. itrbs2d(3S)........................................................ 566
itrbs2d(3S) ................................................................................................. itrbs2d(3S)........................................................ 566
ITRRV2D(3S) ................................................................................................. itrrv2d(3S)........................................................ 568
itrrv2d(3S) ................................................................................................. itrrv2d(3S)........................................................ 568
ITRSD2D(3S) ................................................................................................. itrsd2d(3S)........................................................ 570
itrsd2d(3S) ................................................................................................. itrsd2d(3S)........................................................ 570
LAPACK ......................................................................................................... intro_lapack(3S) ............................................ 333
LAPACK ......................................................................................................... slamch(3S) .......................................................... 626
Lapack ............................................................................................................. intro_lapack(3S) ............................................ 333
Lapack ............................................................................................................. slamch(3S) .......................................................... 626
LAPACK(3S) .................................................................................................... intro_lapack(3S) ............................................ 333
lapack(3S) .................................................................................................... intro_lapack(3S) ............................................ 333
LDU factorization ........................................................................................... vsgetrf(3S)........................................................ 604
Level 3 Basic Linear Algebra Subprogram .................................................... vssyrk(3S) .......................................................... 614
Level 3 BLAS ................................................................................................. vssyrk(3S) .......................................................... 614
Linear .............................................................................................................. slamch(3S) .......................................................... 626
Linear algebra ................................................................................................. intro_lapack(3S) ............................................ 333
Linear algebra ................................................................................................. slamch(3S) .......................................................... 626
Linear equations .............................................................................................. linpack(3S)........................................................ 355
Linear equations .............................................................................................. minv(3S)............................................................... 634
Linear recurrence ............................................................................................ folr(3S)............................................................... 504
Linear recurrence ............................................................................................ folr2(3S) ............................................................ 509
Linear recurrence ............................................................................................ folrc(3S) ............................................................ 511
Linear recurrence ............................................................................................ folrn(3S) ............................................................ 513
Linear recurrence ............................................................................................ recpp(3S) ............................................................ 516
Linear recurrence ............................................................................................ solr(3S)............................................................... 526
Linear recurrence ............................................................................................ solr3(3S) ............................................................ 528
Linear recurrence ............................................................................................ solrn(3S) ............................................................ 531
Linear system solvers ...................................................................................... intro_lapack(3S) ............................................ 333
Linear systems ................................................................................................. intro_lapack(3S) ............................................ 333
Linear systems ................................................................................................. slamch(3S) .......................................................... 626
LINPACK(3S) ................................................................................................. linpack(3S)........................................................ 355
linpack(3S) ................................................................................................. linpack(3S)........................................................ 355
LU factorization .............................................................................................. intro_lapack(3S) ............................................ 333
LU factorization .............................................................................................. vsgetrf(3S)........................................................ 604
LU factorization .............................................................................................. vsgetrs(3S)........................................................ 608
LU solver ........................................................................................................ vsgetrs(3S)........................................................ 608
MACH_CON(3S) ............................................................................................... intro_mach(3S) ................................................ 623
mach_con(3S) ............................................................................................... intro_mach(3S) ................................................ 623
Machine constant ............................................................................................ r1mach(3S) .......................................................... 624
Machine constants ........................................................................................... intro_mach(3S) ................................................ 623
Machine epsilon .............................................................................................. smach(3S) ............................................................ 628
Matrix copy ..................................................................................................... scopy2rv(3S) ..................................................... 590
Matrix copy ..................................................................................................... scopy2vr(3S) ..................................................... 593
Matrix multiplication (VBLAS) ...................................................................... vsgemm(3S) .......................................................... 600
Matrix-matrix multiplication (arbitrary increments) ....................................... mxma(3S)............................................................... 639
Matrix-matrix multiplication (unit increments) ............................................... mxm(3S) ................................................................. 637

004– 2081– 002 Index-9


Matrix-matrix multiplication (VBLAS) .......................................................... vsgemm(3S) .......................................................... 600
Matrix-vector multiplication ........................................................................... smxpy(3S) ............................................................ 647
Matrix-vector multiplication ........................................................................... sxmpy(3S) ............................................................ 649
Matrix-vector multiplication (arbitrary increments) ........................................ mxva(3S)............................................................... 644
Matrix-vector multiplication (unit increments) ............................................... mxv(3S) ................................................................. 642
MINFIT(3S) .................................................................................................... eispack(3S)........................................................ 349
minfit(3S) .................................................................................................... eispack(3S)........................................................ 349
MINV(3S) ........................................................................................................ minv(3S)............................................................... 634
minv(3S) ........................................................................................................ minv(3S)............................................................... 634
Multiplies a column vector by a matrix and adds the result to another
column vector .................................................................................................. smxpy(3S) ............................................................ 647
Multiplies a row vector by a matrix and adds the result to another row
vector ............................................................................................................... sxmpy(3S) ............................................................ 649
Multiplies a virtual real or complex general matrix by a virtual real or
complex general matrix ................................................................................... vsgemm(3S) .......................................................... 600
Multiplying matrices ....................................................................................... mxm(3S) ................................................................. 637
Multiplying matrices ....................................................................................... mxma(3S)............................................................... 639
MXM(3S) ........................................................................................................... mxm(3S) ................................................................. 637
mxm(3S) ........................................................................................................... mxm(3S) ................................................................. 637
MXMA(3S) ........................................................................................................ mxma(3S)............................................................... 639
mxma(3S) ........................................................................................................ mxma(3S)............................................................... 639
MXV(3S) ........................................................................................................... mxv(3S) ................................................................. 642
mxv(3S) ........................................................................................................... mxv(3S) ................................................................. 642
MXVA(3S) ........................................................................................................ mxva(3S)............................................................... 644
mxva(3S) ........................................................................................................ mxva(3S)............................................................... 644
MYNODE(3S) .................................................................................................... mynode(3S) .......................................................... 572
mynode(3S) .................................................................................................... mynode(3S) .......................................................... 572
Non-symmetric ................................................................................................ sststrf(3S)........................................................ 496
Non-symmetric ................................................................................................ sststrs(3S)........................................................ 501
Non-symmetric matrix .................................................................................... sststrf(3S)........................................................ 496
Non-symmetric matrix .................................................................................... sststrs(3S)........................................................ 501
Normalized number ......................................................................................... smach(3S) ............................................................ 628
NUMROC(3S) .................................................................................................... numroc(3S) .......................................................... 365
numroc(3S) .................................................................................................... numroc(3S) .......................................................... 365
ORTBAK(3S) .................................................................................................... eispack(3S)........................................................ 349
ortbak(3S) .................................................................................................... eispack(3S)........................................................ 349
ORTHES(3S) .................................................................................................... eispack(3S)........................................................ 349
orthes(3S) .................................................................................................... eispack(3S)........................................................ 349
ORTRAN(3S) .................................................................................................... eispack(3S)........................................................ 349
ortran(3S) .................................................................................................... eispack(3S)........................................................ 349
Out of core ...................................................................................................... intro_core(3S) ................................................ 575
Out of core ...................................................................................................... vend(3S)............................................................... 598
Out of core ...................................................................................................... vstorage(3S) ..................................................... 616
Outdated .......................................................................................................... intro_superseded(3S) .................................. 631
OUT_OF_CORE(3S) ........................................................................................ intro_core(3S) ................................................ 575
out_of_core(3S) ........................................................................................ intro_core(3S) ................................................ 575
Packed storage ................................................................................................. vstorage(3S) ..................................................... 616
Partial products problem ................................................................................. recpp(3S) ............................................................ 516
Partial summation problem ............................................................................. recpp(3S) ............................................................ 516
PCGEBRD(3S) ................................................................................................. psgebrd(3S)........................................................ 382

Index-10 004– 2081– 002


PCGELQF(3S) ................................................................................................. psgelqf(3S)........................................................ 387
PCGEQLF(3S) ................................................................................................. psgeqlf(3S)........................................................ 390
PCGEQPF(3S) ................................................................................................. psgeqpf(3S)........................................................ 393
PCGEQRF(3S) ................................................................................................. psgeqrf(3S)........................................................ 396
PCGERQF(3S) ................................................................................................. psgerqf(3S)........................................................ 399
PCGESV(3S) .................................................................................................... psgesv(3S) .......................................................... 402
PCGETRF(3S) ................................................................................................. psgetrf(3S)........................................................ 405
PCGETRI(3S) ................................................................................................. psgetri(3S)........................................................ 408
PCGETRS(3S) ................................................................................................. psgetrs(3S)........................................................ 411
PCHEEVX(3S) ................................................................................................. pcheevx(3S)........................................................ 366
pcheevx(3S) ................................................................................................. pcheevx(3S)........................................................ 366
PCHEGVX(3S) ................................................................................................. pchegvx(3S)........................................................ 374
pchegvx(3S) ................................................................................................. pchegvx(3S)........................................................ 374
PCHETRD(3S) ................................................................................................. pssytrd(3S)........................................................ 442
PCOORD3D(3S) ............................................................................................... pcoord3d(3S) ..................................................... 573
pcoord3d(3S) ............................................................................................... pcoord3d(3S) ..................................................... 573
PCPOSV(3S) .................................................................................................... psposv(3S) .......................................................... 414
PCPOTRF(3S) ................................................................................................. pspotrf(3S)........................................................ 418
PCPOTRI(3S) ................................................................................................. pspotri(3S)........................................................ 421
PCPOTRS(3S) ................................................................................................. pspotrs(3S)........................................................ 424
PCTRTRI(3S) ................................................................................................. pstrtri(3S)........................................................ 446
PCTRTRS(3S) ................................................................................................. pstrtrs(3S)........................................................ 449
Performs element summation operations on rectangular matrices ................. igsum2d(3S)........................................................ 562
Performs symmetric rank k update of a real or complex symmetric virtual
matrix .............................................................................................................. vssyrk(3S) .......................................................... 614
PNUM3D(3S) .................................................................................................... pnum3d(3S) .......................................................... 574
pnum3d(3S) .................................................................................................... pnum3d(3S) .......................................................... 574
Positive definite matrix (CORE) ..................................................................... vspotrf(3S)........................................................ 610
process element number (BLACS) ................................................................. blacs_pnum(3S) ................................................ 546
processing element coordinates (ScaLAPACK) ............................................. indxg2p(3S)........................................................ 364
processor grid information (BLACS) .............................................................. blacs_gridinfo(3S) ....................................... 542
PSGEBRD(3S) ................................................................................................. psgebrd(3S)........................................................ 382
psgebrd(3S) ................................................................................................. psgebrd(3S)........................................................ 382
PSGELQF(3S) ................................................................................................. psgelqf(3S)........................................................ 387
psgelqf(3S) ................................................................................................. psgelqf(3S)........................................................ 387
PSGEQLF(3S) ................................................................................................. psgeqlf(3S)........................................................ 390
psgeqlf(3S) ................................................................................................. psgeqlf(3S)........................................................ 390
PSGEQPF(3S) ................................................................................................. psgeqpf(3S)........................................................ 393
psgeqpf(3S) ................................................................................................. psgeqpf(3S)........................................................ 393
PSGEQRF(3S) ................................................................................................. psgeqrf(3S)........................................................ 396
psgeqrf(3S) ................................................................................................. psgeqrf(3S)........................................................ 396
PSGERQF(3S) ................................................................................................. psgerqf(3S)........................................................ 399
psgerqf(3S) ................................................................................................. psgerqf(3S)........................................................ 399
PSGESV(3S) .................................................................................................... psgesv(3S) .......................................................... 402
psgesv(3S) .................................................................................................... psgesv(3S) .......................................................... 402
PSGETRF(3S) ................................................................................................. psgetrf(3S)........................................................ 405
psgetrf(3S) ................................................................................................. psgetrf(3S)........................................................ 405
PSGETRI(3S) ................................................................................................. psgetri(3S)........................................................ 408
psgetri(3S) ................................................................................................. psgetri(3S)........................................................ 408
PSGETRS(3S) ................................................................................................. psgetrs(3S)........................................................ 411

004– 2081– 002 Index-11


psgetrs(3S) ................................................................................................. psgetrs(3S)........................................................ 411
PSPOSV(3S) .................................................................................................... psposv(3S) .......................................................... 414
psposv(3S) .................................................................................................... psposv(3S) .......................................................... 414
PSPOTRF(3S) ................................................................................................. pspotrf(3S)........................................................ 418
pspotrf(3S) ................................................................................................. pspotrf(3S)........................................................ 418
PSPOTRI(3S) ................................................................................................. pspotri(3S)........................................................ 421
pspotri(3S) ................................................................................................. pspotri(3S)........................................................ 421
PSPOTRS(3S) ................................................................................................. pspotrs(3S)........................................................ 424
pspotrs(3S) ................................................................................................. pspotrs(3S)........................................................ 424
PSSYEVX(3S) ................................................................................................. pssyevx(3S)........................................................ 427
pssyevx(3S) ................................................................................................. pssyevx(3S)........................................................ 427
PSSYGVX(3S) ................................................................................................. pssygvx(3S)........................................................ 434
pssygvx(3S) ................................................................................................. pssygvx(3S)........................................................ 434
PSSYTRD(3S) ................................................................................................. pssytrd(3S)........................................................ 442
pssytrd(3S) ................................................................................................. pssytrd(3S)........................................................ 442
PSTRTRI(3S) ................................................................................................. pstrtri(3S)........................................................ 446
pstrtri(3S) ................................................................................................. pstrtri(3S)........................................................ 446
PSTRTRS(3S) ................................................................................................. pstrtrs(3S)........................................................ 449
pstrtrs(3S) ................................................................................................. pstrtrs(3S)........................................................ 449
QR factorization .............................................................................................. linpack(3S)........................................................ 355
QZHES(3S) ...................................................................................................... eispack(3S)........................................................ 349
qzhes(3S) ...................................................................................................... eispack(3S)........................................................ 349
QZIT(3S) ........................................................................................................ eispack(3S)........................................................ 349
qzit(3S) ........................................................................................................ eispack(3S)........................................................ 349
QZVAL(3S) ...................................................................................................... eispack(3S)........................................................ 349
qzval(3S) ...................................................................................................... eispack(3S)........................................................ 349
QZVEC(3S) ...................................................................................................... eispack(3S)........................................................ 349
qzvec(3S) ...................................................................................................... eispack(3S)........................................................ 349
R1MACH(3S) .................................................................................................... r1mach(3S) .......................................................... 624
r1mach(3S) .................................................................................................... r1mach(3S) .......................................................... 624
Rank k update ................................................................................................. vssyrk(3S) .......................................................... 614
RATQR(3S) ...................................................................................................... eispack(3S)........................................................ 349
ratqr(3S) ...................................................................................................... eispack(3S)........................................................ 349
real distributed matrix inverse computation (ScaLAPACK) .......................... psgetri(3S)........................................................ 408
real distributed matrix LQ factorization (ScaLAPACK) ................................ psgelqf(3S)........................................................ 387
real distributed matrix LU factorization (ScaLAPACK) ................................ psgetrf(3S)........................................................ 405
real distributed matrix QL factorization (ScaLAPACK) ................................ psgeqlf(3S)........................................................ 390
real distributed matrix QR factorization (ScaLAPACK) ................................ psgeqpf(3S)........................................................ 393
real distributed matrix QR factorization (ScaLAPACK) ................................ psgeqrf(3S)........................................................ 396
real distributed matrix QR factorization (ScaLAPACK) ................................ psgerqf(3S)........................................................ 399
real distributed matrix reduction (ScaLAPACK) ............................................ psgebrd(3S)........................................................ 382
real distributed system of linear equations solution (ScaLAPACK) .............. psgetrs(3S)........................................................ 411
real distributed triangular system computation (ScaLAPACK) ...................... pstrtrs(3S)........................................................ 449
real symmetric distributed matrix reduction (ScaLAPACK) .......................... pssytrd(3S)........................................................ 442
real symmetric matrix inverse (ScaLAPACK) ............................................... pspotri(3S)........................................................ 421
real symmetric positive definite matrix computation (ScaLAPACK) ............ pspotrf(3S)........................................................ 418
real symmetric positive definite system solution (ScaLAPACK) ................... pspotrs(3S)........................................................ 424
real system computation (ScaLAPACK) ........................................................ psgesv(3S) .......................................................... 402
real triangular distributed matrix inverse computation (ScaLAPACK) .......... pstrtri(3S)........................................................ 446
REBAK(3S) ...................................................................................................... eispack(3S)........................................................ 349

Index-12 004– 2081– 002


rebak(3S) ...................................................................................................... eispack(3S)........................................................ 349
REBAKB(3S) .................................................................................................... eispack(3S)........................................................ 349
rebakb(3S) .................................................................................................... eispack(3S)........................................................ 349
receive a trapezoidal rectangular matrix (BLACS) ........................................ itrrv2d(3S)........................................................ 568
receive broadcast trapezoidal rectangular matrix (BLACS) ........................... itrbr2d(3S)........................................................ 564
receive general rectangular matrix (BLACS) ................................................. igerv2d(3S)........................................................ 558
Receives a broadcast general rectangular matrix from all or a subset of
processors ........................................................................................................ igebr2d(3S)........................................................ 554
Receives a broadcast trapezoidal rectangular matrix from all or a subset
of processors ................................................................................................... itrbr2d(3S)........................................................ 564
Receives a general rectangular matrix from another processor ...................... igerv2d(3S)........................................................ 558
Receives a trapezoidal rectangular matrix from another processor ................ itrrv2d(3S)........................................................ 568
receives general rectangular matrix (BLACS) ................................................ igebr2d(3S)........................................................ 554
RECPP(3S) ...................................................................................................... recpp(3S) ............................................................ 516
recpp(3S) ...................................................................................................... recpp(3S) ............................................................ 516
RECPS(3S) ...................................................................................................... recpp(3S) ............................................................ 516
recps(3S) ...................................................................................................... recpp(3S) ............................................................ 516
REDUC2(3S) .................................................................................................... eispack(3S)........................................................ 349
reduc2(3S) .................................................................................................... eispack(3S)........................................................ 349
REDUC(3S) ...................................................................................................... eispack(3S)........................................................ 349
reduc(3S) ...................................................................................................... eispack(3S)........................................................ 349
Reduces a real or complex distributed matrix to bidiagonal form ................. psgebrd(3S)........................................................ 382
Reduces a real symmetric or complex Hermitian distributed matrix to
tridiagonal form ............................................................................................... pssytrd(3S)........................................................ 442
return calling processor number (BLACS) ..................................................... mynode(3S) .......................................................... 572
return processor element number (BLACS) ................................................... pnum3d(3S) .......................................................... 574
Returns Cray PVP machine constants ............................................................ r1mach(3S) .......................................................... 624
Returns information about the three-dimensional processor grid ................... gridinfo3d(3S) ................................................ 547
Returns information about the two-dimensional processor grid ..................... blacs_gridinfo(3S) ....................................... 542
Returns machine epsilon, small or large normalized numbers ....................... smach(3S) ............................................................ 628
Returns the calling processor’s assigned number ........................................... mynode(3S) .......................................................... 572
Returns the processor element number for specified coordinates in
two-dimensional grids ..................................................................................... blacs_pnum(3S) ................................................ 546
Returns the processor element number for specified three-dimensional
(3D) coordinates .............................................................................................. pnum3d(3S) .......................................................... 574
RG(3S) ............................................................................................................. eispack(3S)........................................................ 349
rg(3S) ............................................................................................................. eispack(3S)........................................................ 349
RGG(3S) ........................................................................................................... eispack(3S)........................................................ 349
rgg(3S) ........................................................................................................... eispack(3S)........................................................ 349
Row vector ...................................................................................................... sxmpy(3S) ............................................................ 649
RS(3S) ............................................................................................................. eispack(3S)........................................................ 349
rs(3S) ............................................................................................................. eispack(3S)........................................................ 349
RSB(3S) ........................................................................................................... eispack(3S)........................................................ 349
rsb(3S) ........................................................................................................... eispack(3S)........................................................ 349
RSG(3S) ........................................................................................................... eispack(3S)........................................................ 349
rsg(3S) ........................................................................................................... eispack(3S)........................................................ 349
RSGAB(3S) ...................................................................................................... eispack(3S)........................................................ 349
rsgab(3S) ...................................................................................................... eispack(3S)........................................................ 349
RSGBA(3S) ...................................................................................................... eispack(3S)........................................................ 349
rsgba(3S) ...................................................................................................... eispack(3S)........................................................ 349

004– 2081– 002 Index-13


RSM(3S) ........................................................................................................... eispack(3S)........................................................ 349
rsm(3S) ........................................................................................................... eispack(3S)........................................................ 349
RSP(3S) ........................................................................................................... eispack(3S)........................................................ 349
rsp(3S) ........................................................................................................... eispack(3S)........................................................ 349
RST(3S) ........................................................................................................... eispack(3S)........................................................ 349
rst(3S) ........................................................................................................... eispack(3S)........................................................ 349
RT(3S) ............................................................................................................. eispack(3S)........................................................ 349
rt(3S) ............................................................................................................. eispack(3S)........................................................ 349
scalapack ......................................................................................................... intro_scalapack(3S) ..................................... 359
scalapack(3S) ............................................................................................. intro_scalapack(3S) ..................................... 359
Scatter a vector ............................................................................................... scatter(3S)........................................................ 646
SCATTER(3S) ................................................................................................. scatter(3S)........................................................ 646
scatter(3S) ................................................................................................. scatter(3S)........................................................ 646
Scatters a vector into another vector .............................................................. scatter(3S)........................................................ 646
schdc(3S) ...................................................................................................... linpack(3S)........................................................ 355
schdd(3S) ...................................................................................................... linpack(3S)........................................................ 355
schex(3S) ...................................................................................................... linpack(3S)........................................................ 355
schud(3S) ...................................................................................................... linpack(3S)........................................................ 355
SCOPY2RV(3S) ............................................................................................... scopy2rv(3S) ..................................................... 590
scopy2rv(3S) ............................................................................................... scopy2rv(3S) ..................................................... 590
SCOPY2VR(3S) ............................................................................................... scopy2vr(3S) ..................................................... 593
scopy2vr(3S) ............................................................................................... scopy2vr(3S) ..................................................... 593
SDTSOL(3S) .................................................................................................... sdtsol(3S) .......................................................... 518
sdtsol(3S) .................................................................................................... sdtsol(3S) .......................................................... 518
SDTTRF(3S) .................................................................................................... sdttrf(3S) .......................................................... 520
sdttrf(3S) .................................................................................................... sdttrf(3S) .......................................................... 520
SDTTRS(3S) .................................................................................................... sdttrs(3S) .......................................................... 523
sdttrs(3S) .................................................................................................... sdttrs(3S) .......................................................... 523
Second-order linear recurrences ...................................................................... solr(3S)............................................................... 526
Second-order linear recurrences ...................................................................... solr3(3S) ............................................................ 528
Second-order linear recurrences ...................................................................... solrn(3S) ............................................................ 531
send general rectangular matrix (BLACS) ..................................................... igesd2d(3S)........................................................ 560
send trapezoidal rectangular matrix (BLACS) ................................................ itrsd2d(3S)........................................................ 570
Sends a general rectangular matrix to another processor ............................... igesd2d(3S)........................................................ 560
Sends a trapezoidal rectangular matrix to another processor ......................... itrsd2d(3S)........................................................ 570
SGAMN2D(3S) ................................................................................................. igamn2d(3S)........................................................ 550
sgamn2d(3S) ................................................................................................. igamn2d(3S)........................................................ 550
SGAMX2D(3S) ................................................................................................. igamx2d(3S)........................................................ 552
sgamx2d(3S) ................................................................................................. igamx2d(3S)........................................................ 552
sgbco(3S) ...................................................................................................... linpack(3S)........................................................ 355
sgbdi(3S) ...................................................................................................... linpack(3S)........................................................ 355
sgbfa(3S) ...................................................................................................... linpack(3S)........................................................ 355
sgbsl(3S) ...................................................................................................... linpack(3S)........................................................ 355
SGEBR2D(3S) ................................................................................................. igebr2d(3S)........................................................ 554
sgebr2d(3S) ................................................................................................. igebr2d(3S)........................................................ 554
SGEBS2D(3S) ................................................................................................. igebs2d(3S)........................................................ 556
sgebs2d(3S) ................................................................................................. igebs2d(3S)........................................................ 556
sgeco(3S) ...................................................................................................... linpack(3S)........................................................ 355
sgedi(3S) ...................................................................................................... linpack(3S)........................................................ 355
sgefa(3S) ...................................................................................................... linpack(3S)........................................................ 355

Index-14 004– 2081– 002


SGERV2D(3S) ................................................................................................. igerv2d(3S)........................................................ 558
sgerv2d(3S) ................................................................................................. igerv2d(3S)........................................................ 558
SGESD2D(3S) ................................................................................................. igesd2d(3S)........................................................ 560
sgesd2d(3S) ................................................................................................. igesd2d(3S)........................................................ 560
sgesl(3S) ...................................................................................................... linpack(3S)........................................................ 355
SGMAX2D(3S) ................................................................................................. igamx2d(3S)........................................................ 552
sgmax2d(3S) ................................................................................................. igamx2d(3S)........................................................ 552
SGMIN2D(3S) ................................................................................................. igamn2d(3S)........................................................ 550
sgmin2d(3S) ................................................................................................. igamn2d(3S)........................................................ 550
SGSUM2D(3S) ................................................................................................. igsum2d(3S)........................................................ 562
sgsum2d(3S) ................................................................................................. igsum2d(3S)........................................................ 562
sgtsl(3S) ...................................................................................................... linpack(3S)........................................................ 355
Single-precision real and complex LINPACK routines .................................. linpack(3S)........................................................ 355
Singular value decomposition ......................................................................... eispack(3S)........................................................ 349
Singular value decomposition ......................................................................... linpack(3S)........................................................ 355
SITRSOL(3S) ................................................................................................. sitrsol(3S)........................................................ 466
sitrsol(3S) ................................................................................................. sitrsol(3S)........................................................ 466
SLAMCH(3S) .................................................................................................... slamch(3S) .......................................................... 626
slamch(3S) .................................................................................................... slamch(3S) .......................................................... 626
SMACH(3S) ...................................................................................................... smach(3S) ............................................................ 628
smach(3S) ...................................................................................................... smach(3S) ............................................................ 628
SMXPY(3S) ...................................................................................................... smxpy(3S) ............................................................ 647
smxpy(3S) ...................................................................................................... smxpy(3S) ............................................................ 647
SOLR3(3S) ...................................................................................................... solr3(3S) ............................................................ 528
solr3(3S) ...................................................................................................... solr3(3S) ............................................................ 528
SOLR(3S) ........................................................................................................ solr(3S)............................................................... 526
solr(3S) ........................................................................................................ solr(3S)............................................................... 526
SOLRN(3S) ...................................................................................................... solrn(3S) ............................................................ 531
solrn(3S) ...................................................................................................... solrn(3S) ............................................................ 531
Solver .............................................................................................................. minv(3S)............................................................... 634
Solves a first-order linear recurrence with a scalar multiplier ....................... folrc(3S) ............................................................ 511
Solves a partial product or partial summation problem ................................. recpp(3S) ............................................................ 516
Solves a real general sparse system, using a preconditioned conjugate
gradient-like method ....................................................................................... sitrsol(3S)........................................................ 466
Solves a real or complex distributed system of linear equations ................... psgetrs(3S)........................................................ 411
Solves a real or complex distributed triangular system .................................. pstrtrs(3S)........................................................ 449
Solves a real sparse general system, using the factorization computed in
SSGETRF(3S) ................................................................................................. ssgetrs(3S)........................................................ 487
Solves a real sparse general system with a symmetric nonzero pattern,
using the factorization computed in SSTSTRF(3S) ....................................... sststrs(3S)........................................................ 501
Solves a real sparse symmetric definite system, using the factorization
computed in SSPOTRF(3S) ............................................................................ sspotrs(3S)........................................................ 494
Solves a real symmetric or complex Hermitian system of linear equations .. psposv(3S) .......................................................... 414
Solves a real symmetric positive definite or complex Hermitian positive
definite system of linear equations ................................................................. pspotrs(3S)........................................................ 424
Solves a real-valued or complex-valued tridiagonal system with one
right-hand side ................................................................................................. sdtsol(3S) .......................................................... 518
Solves a real-valued or complex-valued tridiagonal system with one
right-hand side, using its factorization as computed by SDTTRF(3S) or
CDTTRF(3) ...................................................................................................... sdttrs(3S) .......................................................... 523

004– 2081– 002 Index-15


Solves a second-order linear recurrence ......................................................... solr(3S)............................................................... 526
Solves a second-order linear recurrence for only the last term ...................... solrn(3S) ............................................................ 531
Solves a second-order linear recurrence for three terms ................................ solr3(3S) ............................................................ 528
Solves a tridiagonal system ............................................................................. trid(3S)............................................................... 651
Solves a virtual real or virtual complex triangular system of equations
with multiple right-hand sides ........................................................................ vstrsm(3S) .......................................................... 619
Solves a virtual system of linear equations, using the LU factorization
computed by VSGETRF(3S) or VCGETRF(3S) .............................................. vsgetrs(3S)........................................................ 608
Solves a virtual system of linear equations with a symmetric positive
definite matrix whose Cholesky factorization has been computed by
VSPOTRF(3S) ................................................................................................. vspotrs(3S)........................................................ 612
Solves first-order linear recurrences ............................................................... folr(3S)............................................................... 504
Solves first-order linear recurrences without overwriting the operand
vector ............................................................................................................... folr2(3S) ............................................................ 509
Solves for the last term of first-order linear recurrence ................................. folrn(3S) ............................................................ 513
Solves systems of linear equations by inverting a square matrix .................. minv(3S)............................................................... 634
Sparse .............................................................................................................. dfaults(3S)........................................................ 461
Sparse .............................................................................................................. sitrsol(3S)........................................................ 466
Sparse .............................................................................................................. ssgetrf(3S)........................................................ 482
Sparse .............................................................................................................. ssgetrs(3S)........................................................ 487
Sparse .............................................................................................................. sspotrf(3S)........................................................ 489
Sparse .............................................................................................................. sspotrs(3S)........................................................ 494
Sparse .............................................................................................................. sststrf(3S)........................................................ 496
Sparse .............................................................................................................. sststrs(3S)........................................................ 501
Sparse factor .................................................................................................... ssgetrf(3S)........................................................ 482
Sparse factor .................................................................................................... sspotrf(3S)........................................................ 489
Sparse factor .................................................................................................... sststrf(3S)........................................................ 496
Sparse linear system ........................................................................................ intro_sparse(3S) ............................................ 453
Sparse linear system ........................................................................................ dfaults(3S)........................................................ 461
Sparse linear system ........................................................................................ sitrsol(3S)........................................................ 466
Sparse linear system ........................................................................................ ssgetrf(3S)........................................................ 482
Sparse linear system ........................................................................................ ssgetrs(3S)........................................................ 487
Sparse linear system ........................................................................................ sspotrf(3S)........................................................ 489
Sparse linear system ........................................................................................ sspotrs(3S)........................................................ 494
Sparse linear system ........................................................................................ sststrf(3S)........................................................ 496
Sparse linear system ........................................................................................ sststrs(3S)........................................................ 501
Sparse matrix .................................................................................................. intro_sparse(3S) ............................................ 453
Sparse matrix .................................................................................................. dfaults(3S)........................................................ 461
Sparse matrix .................................................................................................. sitrsol(3S)........................................................ 466
Sparse matrix .................................................................................................. ssgetrf(3S)........................................................ 482
Sparse matrix .................................................................................................. ssgetrs(3S)........................................................ 487
Sparse matrix .................................................................................................. sspotrf(3S)........................................................ 489
Sparse matrix .................................................................................................. sspotrs(3S)........................................................ 494
Sparse matrix .................................................................................................. sststrf(3S)........................................................ 496
Sparse matrix .................................................................................................. sststrs(3S)........................................................ 501
Sparse matrix factoring ................................................................................... ssgetrf(3S)........................................................ 482
Sparse matrix factoring ................................................................................... sspotrf(3S)........................................................ 489
Sparse matrix factoring ................................................................................... sststrf(3S)........................................................ 496
Sparse solver ................................................................................................... intro_sparse(3S) ............................................ 453
Sparse solver ................................................................................................... dfaults(3S)........................................................ 461

Index-16 004– 2081– 002


Sparse solver ................................................................................................... sitrsol(3S)........................................................ 466
Sparse solver ................................................................................................... ssgetrf(3S)........................................................ 482
Sparse solver ................................................................................................... ssgetrs(3S)........................................................ 487
Sparse solver ................................................................................................... sspotrf(3S)........................................................ 489
Sparse solver ................................................................................................... sspotrs(3S)........................................................ 494
Sparse solver ................................................................................................... sststrf(3S)........................................................ 496
Sparse solver ................................................................................................... sststrs(3S)........................................................ 501
SPARSE(3S) .................................................................................................... intro_sparse(3S) ............................................ 453
spbco(3S) ...................................................................................................... linpack(3S)........................................................ 355
spbdi(3S) ...................................................................................................... linpack(3S)........................................................ 355
spbfa(3S) ...................................................................................................... linpack(3S)........................................................ 355
spbsl(3S) ...................................................................................................... linpack(3S)........................................................ 355
Special linear systems ..................................................................................... intro_spec(3S) ................................................ 503
SPEC_SYS(3S) ............................................................................................... intro_spec(3S) ................................................ 503
spoco(3S) ...................................................................................................... linpack(3S)........................................................ 355
spodi(3S) ...................................................................................................... linpack(3S)........................................................ 355
spofa(3S) ...................................................................................................... linpack(3S)........................................................ 355
sposl(3S) ...................................................................................................... linpack(3S)........................................................ 355
sppco(3S) ...................................................................................................... linpack(3S)........................................................ 355
sppdi(3S) ...................................................................................................... linpack(3S)........................................................ 355
sppfa(3S) ...................................................................................................... linpack(3S)........................................................ 355
sppsl(3S) ...................................................................................................... linpack(3S)........................................................ 355
sptsl(3S) ...................................................................................................... linpack(3S)........................................................ 355
sqrdc(3S) ...................................................................................................... linpack(3S)........................................................ 355
sqrsl(3S) ...................................................................................................... linpack(3S)........................................................ 355
Square matrix .................................................................................................. minv(3S)............................................................... 634
SSGETRF(3S) ................................................................................................. ssgetrf(3S)........................................................ 482
ssgetrf(3S) ................................................................................................. ssgetrf(3S)........................................................ 482
SSGETRS(3S) ................................................................................................. ssgetrs(3S)........................................................ 487
ssgetrs(3S) ................................................................................................. ssgetrs(3S)........................................................ 487
ssico(3S) ...................................................................................................... linpack(3S)........................................................ 355
ssidi(3S) ...................................................................................................... linpack(3S)........................................................ 355
ssifa(3S) ...................................................................................................... linpack(3S)........................................................ 355
ssisl(3S) ...................................................................................................... linpack(3S)........................................................ 355
sspco(3S) ...................................................................................................... linpack(3S)........................................................ 355
sspdi(3S) ...................................................................................................... linpack(3S)........................................................ 355
sspfa(3S) ...................................................................................................... linpack(3S)........................................................ 355
SSPOTRF(3S) ................................................................................................. sspotrf(3S)........................................................ 489
sspotrf(3S) ................................................................................................. sspotrf(3S)........................................................ 489
SSPOTRS(3S) ................................................................................................. sspotrs(3S)........................................................ 494
sspotrs(3S) ................................................................................................. sspotrs(3S)........................................................ 494
sspsl(3S) ...................................................................................................... linpack(3S)........................................................ 355
SSTSTRF(3S) ................................................................................................. sststrf(3S)........................................................ 496
sststrf(3S) ................................................................................................. sststrf(3S)........................................................ 496
SSTSTRS(3S) ................................................................................................. sststrs(3S)........................................................ 501
sststrs(3S) ................................................................................................. sststrs(3S)........................................................ 501
ssvdc(3S) ...................................................................................................... linpack(3S)........................................................ 355
Stops execution until all specifed processes have called a routine ................ blacs_barrier(3S) ......................................... 539
STRBR2D(3S) ................................................................................................. itrbr2d(3S)........................................................ 564
strbr2d(3S) ................................................................................................. itrbr2d(3S)........................................................ 564

004– 2081– 002 Index-17


STRBS2D(3S) ................................................................................................. itrbs2d(3S)........................................................ 566
strbs2d(3S) ................................................................................................. itrbs2d(3S)........................................................ 566
strco(3S) ...................................................................................................... linpack(3S)........................................................ 355
strdi(3S) ...................................................................................................... linpack(3S)........................................................ 355
STRRV2D(3S) ................................................................................................. itrrv2d(3S)........................................................ 568
strrv2d(3S) ................................................................................................. itrrv2d(3S)........................................................ 568
STRSD2D(3S) ................................................................................................. itrsd2d(3S)........................................................ 570
strsd2d(3S) ................................................................................................. itrsd2d(3S)........................................................ 570
strsl(3S) ...................................................................................................... linpack(3S)........................................................ 355
SUPERSEDED(3S) .......................................................................................... intro_superseded(3S) .................................. 631
superseded(3S) .......................................................................................... intro_superseded(3S) .................................. 631
SVD(3S) ........................................................................................................... eispack(3S)........................................................ 349
svd(3S) ........................................................................................................... eispack(3S)........................................................ 349
SXMPY(3S) ...................................................................................................... sxmpy(3S) ............................................................ 649
sxmpy(3S) ...................................................................................................... sxmpy(3S) ............................................................ 649
Symmetric matrix ............................................................................................ vssyrk(3S) .......................................................... 614
Symmetric matrix ............................................................................................ vstorage(3S) ..................................................... 616
Symmetric matrix (CORE) ............................................................................. vspotrs(3S)........................................................ 612
Symmetric rank k update ................................................................................ vssyrk(3S) .......................................................... 614
System of linear equations .............................................................................. minv(3S)............................................................... 634
Termination ..................................................................................................... vend(3S)............................................................... 598
TINVIT(3S) .................................................................................................... eispack(3S)........................................................ 349
tinvit(3S) .................................................................................................... eispack(3S)........................................................ 349
TQL1(3S) ........................................................................................................ eispack(3S)........................................................ 349
tql1(3S) ........................................................................................................ eispack(3S)........................................................ 349
TQL2(3S) ........................................................................................................ eispack(3S)........................................................ 349
tql2(3S) ........................................................................................................ eispack(3S)........................................................ 349
TQLRAT(3S) .................................................................................................... eispack(3S)........................................................ 349
tqlrat(3S) .................................................................................................... eispack(3S)........................................................ 349
TRBAK3(3S) .................................................................................................... eispack(3S)........................................................ 349
trbak3(3S) .................................................................................................... eispack(3S)........................................................ 349
TRBAK(3S) ...................................................................................................... eispack(3S)........................................................ 349
trbak(3S) ...................................................................................................... eispack(3S)........................................................ 349
TRED1(3S) ...................................................................................................... eispack(3S)........................................................ 349
tred1(3S) ...................................................................................................... eispack(3S)........................................................ 349
TRED2(3S) ...................................................................................................... eispack(3S)........................................................ 349
tred2(3S) ...................................................................................................... eispack(3S)........................................................ 349
TRED3(3S) ...................................................................................................... eispack(3S)........................................................ 349
tred3(3S) ...................................................................................................... eispack(3S)........................................................ 349
Triangular matrix ............................................................................................ vstorage(3S) ..................................................... 616
Triangular system of equations ....................................................................... vstrsm(3S) .......................................................... 619
TRID(3S) ........................................................................................................ trid(3S)............................................................... 651
trid(3S) ........................................................................................................ trid(3S)............................................................... 651
Tridiagonal ...................................................................................................... sdtsol(3S) .......................................................... 518
Tridiagonal ...................................................................................................... sdttrf(3S) .......................................................... 520
Tridiagonal ...................................................................................................... sdttrs(3S) .......................................................... 523
Tridiagonal ...................................................................................................... trid(3S)............................................................... 651
Tridiagonal system .......................................................................................... sdtsol(3S) .......................................................... 518
Tridiagonal system .......................................................................................... sdttrf(3S) .......................................................... 520
Tridiagonal system .......................................................................................... sdttrs(3S) .......................................................... 523

Index-18 004– 2081– 002


Tridiagonal system .......................................................................................... trid(3S)............................................................... 651
TRIDIB(3S) .................................................................................................... eispack(3S)........................................................ 349
tridib(3S) .................................................................................................... eispack(3S)........................................................ 349
TSTURM(3S) .................................................................................................... eispack(3S)........................................................ 349
tsturm(3S) .................................................................................................... eispack(3S)........................................................ 349
user-created grids (BLACS) ............................................................................ blacs_exit(3S) ................................................ 540
variable initialization (BLACS) ...................................................................... blacs_gridmap(3S) ......................................... 544
VBEGIN(3S) .................................................................................................... vbegin(3S) .......................................................... 595
vbegin(3S) .................................................................................................... vbegin(3S) .......................................................... 595
VBLAS ............................................................................................................ intro_core(3S) ................................................ 575
VBLAS(3S) ...................................................................................................... intro_core(3S) ................................................ 575
vblas(3S) ...................................................................................................... intro_core(3S) ................................................ 575
VCGEMM(3S) .................................................................................................... vsgemm(3S) .......................................................... 600
vcgemm(3S) .................................................................................................... vsgemm(3S) .......................................................... 600
VCGETRF(3S) ................................................................................................. vsgetrf(3S)........................................................ 604
vcgetrf(3S) ................................................................................................. vsgetrf(3S)........................................................ 604
VCGETRS(3S) ................................................................................................. vsgetrs(3S)........................................................ 608
vcgetrs(3S) ................................................................................................. vsgetrs(3S)........................................................ 608
VCOPY(3S) ...................................................................................................... intro_core(3S) ................................................ 575
vcopy(3S) ...................................................................................................... intro_core(3S) ................................................ 575
VCTRSM(3S) .................................................................................................... vstrsm(3S) .......................................................... 619
vctrsm(3S) .................................................................................................... vstrsm(3S) .......................................................... 619
VEND(3S) ........................................................................................................ vend(3S)............................................................... 598
vend(3S) ........................................................................................................ vend(3S)............................................................... 598
Virtual ............................................................................................................. intro_core(3S) ................................................ 575
Virtual ............................................................................................................. scopy2rv(3S) ..................................................... 590
Virtual ............................................................................................................. scopy2vr(3S) ..................................................... 593
Virtual ............................................................................................................. vend(3S)............................................................... 598
Virtual ............................................................................................................. vsgemm(3S) .......................................................... 600
Virtual ............................................................................................................. vsgetrf(3S)........................................................ 604
Virtual ............................................................................................................. vsgetrs(3S)........................................................ 608
Virtual ............................................................................................................. vstorage(3S) ..................................................... 616
Virtual ............................................................................................................. vstrsm(3S) .......................................................... 619
Virtual BLAS .................................................................................................. intro_core(3S) ................................................ 575
Virtual BLAS .................................................................................................. vbegin(3S) .......................................................... 595
Virtual copy .................................................................................................... intro_core(3S) ................................................ 575
Virtual LAPACK ............................................................................................ intro_core(3S) ................................................ 575
Virtual routines ............................................................................................... vspotrf(3S)........................................................ 610
VLAPACK(3S) ................................................................................................. intro_core(3S) ................................................ 575
vlapack(3S) ................................................................................................. intro_core(3S) ................................................ 575
VSGEMM(3S) .................................................................................................... vsgemm(3S) .......................................................... 600
vsgemm(3S) .................................................................................................... vsgemm(3S) .......................................................... 600
VSGETRF(3S) ................................................................................................. vsgetrf(3S)........................................................ 604
vsgetrf(3S) ................................................................................................. vsgetrf(3S)........................................................ 604
VSGETRS(3S) ................................................................................................. vsgetrs(3S)........................................................ 608
vsgetrs(3S) ................................................................................................. vsgetrs(3S)........................................................ 608
VSPOTRF(3S) ................................................................................................. vspotrf(3S)........................................................ 610
vspotrf(3S) ................................................................................................. vspotrf(3S)........................................................ 610
VSPOTRS(3S) ................................................................................................. vspotrs(3S)........................................................ 612
vspotrs(3S) ................................................................................................. vspotrs(3S)........................................................ 612

004– 2081– 002 Index-19


VSSYRK(3S) .................................................................................................... vssyrk(3S) .......................................................... 614
vssyrk(3S) .................................................................................................... vssyrk(3S) .......................................................... 614
VSTORAGE(3S) ............................................................................................... vstorage(3S) ..................................................... 616
vstorage(3S) ............................................................................................... vstorage(3S) ..................................................... 616
VSTRSM(3S) .................................................................................................... vstrsm(3S) .......................................................... 619
vstrsm(3S) .................................................................................................... vstrsm(3S) .......................................................... 619

Index-20 004– 2081– 002

You might also like